There is a finite set of symbols that LLMs can learn from. Of course, the number of possible combinations is enormous, but many of those combinations are not valid or meaningful. Big players claim that scaling laws are still working, but I assume they will eventually stop—at least once most meaningful combinations of our symbols are covered. Models with like 500B parameters can represent a huge number of combinations. So is something like Claude Opus 4.6 good just because it’s bigger, or because of the internal tricks and optimizations they use? [link] [comments]