I’ve spent the last half of 2025 in interview hell. I walked into my first few rounds prepared for deep math proofs, Transformer internals, and heavy LeetCode, but almost none of that came up.
What they asked was way more practical, and I failed the first three rounds because I was over-preparing for the wrong things. Recruiters don't want a lecture on attention mechanisms anymore, they want to hear about your decisions.
Whenever I walked through a project, the questions were always: "Why RAG instead of fine-tuning for this?" or "How did you actually evaluate the hallucinations?" I failed early on because I’d just say, "I built a PDF chat app." Now, I lead with the trade-offs.
I explain that I chose RAG because fine-tuning was too expensive for the dataset, used MiniLM for speed, and implemented a semantic chunking strategy that dropped the hallucination rate by 40%. That shift in how I talked about my work changed everything.
Another huge factor is cost and latency. I got my best offer because I could explain exactly how I cut inference costs by 60% using a hybrid local/cloud setup with Phi-3.5-mini and aggressive request caching.
Companies want to know you aren't just burning GPU credits for fun. During live coding, they usually just had me "build a simple retriever" or fix a hallucination. I used to code in silence and fail; now, I narrate the whole time.
If I’m using a FAISS flat index, I explain it’s for a small dataset but mention I’d pivot to HNSW for speed if we hit a million vectors. They don't want perfect code, they want to hear you architecting out loud.
The next time you’re in a technical round, don't just describe what you built. Describe why you didn't build it the other way. Showing that you weighed the cost of tokens against the accuracy of the model is exactly what separates a hobbyist from a senior engineer.
[link] [comments]



