Asking because I'm just curious.
The LLMs are only as good as the data they are trained with. Let's take coding for example. If as an attack, the sources for these LLM's training data are filled with garbage or deliberately poorly written code, what happens to these frontier models. I'm reading that more and more businesses, like travel etc are getting more and more paranoid about AI taking over because of how good they have gotten with the models trained with actual data. What if they deliberately flood the source with bad data to sabotage training? What are the guardrails in place to prevent such thing from happening?
[link] [comments]