How do virtual assistants like Siri, Alexa, Bixby, Cortana, and Google assistant work? I have found some things searching how Google assistant and Siri work, and this book on Google books: using Google scholar https://books.google.com/books?hl=en&lr=&id=H7daEAAAQBAJ&oi=fnd&pg=PP12&dq=info:OJRgUdIalvcJ:scholar.google.com/&ots=9luE8VnJh1&sig=RW40JMpgGsZgenYaI2GEsLfbGUk&redir_esc=y#v=onepage&q&f=false but besides the book I have not been able to find how they work and when I do the diagrams and descriptions seem to be quite vague and generalize a lot like grouping components into boxes in diagrams.
Or they seem to be too specific for a niche. I am looking to see how they worked before LLMs became popular where there are AI agents which are LLMs receiving speech to text and then calling tools and doing text to speech. like openclaw. I am looking to see how it would have been done before chatgpt was released
I have found mentions about intent matching which is probably a text classifier using a custom trained classifier and rule based matching like string matching in programming with else ifs or something similar and then calling "tools" based on the result. But I am wondering if that's really it
If anyone can point me to any widely used literature I would appreciate it.
[link] [comments]




