I did 15 AI Engineer interviews in the last 6 months [R]

Reddit r/MachineLearning / 4/27/2026

💬 OpinionSignals & Early TrendsIdeas & Deep Analysis

共有:

Key Points

The author reports that, over the past six months, AI engineer interviews focused less on deep theory (e.g., Transformer internals) and more on practical decision-making.
Interview questions centered on trade-offs between approaches like RAG vs fine-tuning and on how candidates evaluated hallucinations in real systems.
The author credits improved outcomes to explaining concrete choices—such as dataset-driven reasons for using RAG, latency-oriented model selection (MiniLM), and semantic chunking that reduced hallucination rates by 40%.
Cost and latency considerations became major differentiators, with the author describing how they cut inference costs by 60% using a hybrid local/cloud setup, Phi-3.5-mini, and aggressive request caching.
During live coding, candidates were expected to “architect out loud,” including future scalability considerations (e.g., when to switch from FAISS flat indexes to HNSW).

I’ve spent the last half of 2025 in interview hell. I walked into my first few rounds prepared for deep math proofs, Transformer internals, and heavy LeetCode, but almost none of that came up.

What they asked was way more practical, and I failed the first three rounds because I was over-preparing for the wrong things. Recruiters don't want a lecture on attention mechanisms anymore, they want to hear about your decisions.

Whenever I walked through a project, the questions were always: "Why RAG instead of fine-tuning for this?" or "How did you actually evaluate the hallucinations?" I failed early on because I’d just say, "I built a PDF chat app." Now, I lead with the trade-offs.

I explain that I chose RAG because fine-tuning was too expensive for the dataset, used MiniLM for speed, and implemented a semantic chunking strategy that dropped the hallucination rate by 40%. That shift in how I talked about my work changed everything.

Another huge factor is cost and latency. I got my best offer because I could explain exactly how I cut inference costs by 60% using a hybrid local/cloud setup with Phi-3.5-mini and aggressive request caching.

Companies want to know you aren't just burning GPU credits for fun. During live coding, they usually just had me "build a simple retriever" or fix a hallucination. I used to code in silence and fail; now, I narrate the whole time.

If I’m using a FAISS flat index, I explain it’s for a small dataset but mention I’d pivot to HNSW for speed if we hit a million vectors. They don't want perfect code, they want to hear you architecting out loud.

The next time you’re in a technical round, don't just describe what you built. Describe why you didn't build it the other way. Showing that you weighed the cost of tokens against the accuracy of the model is exactly what separates a hobbyist from a senior engineer.

submitted by /u/Cold_Bass3981
[link] [comments]