It seems like our free lunch is slightly erroding with hints of some OS model providers moving away from at least providing as much, and fair enough, but I think we all here value the stability, privacy, and let's be honest the cool factor/fun of local models.
What are the big barriers to a community growing a system for decentralised training?
I can see a few off....
GPU Brand Mismatch
Nvidia is hands down the best for CUDA, but to utilise a decentralised compute you'd likely need a brand agnostic framework, maybe Vulkan? I'm sure Vulkan is terrible for training too.
Data Curation and Quality
We'd need to make our own datasets across a variety of tasks, scrub for PII, and check quality which would take experts for the given task. Also find a place to store that data and build a process for all of the other issues above of curation, PII removal, and quality check.
Decentralised Compute Usage
Assuming we can solve the two above then we need to use high latency, small compute environments to check point the data, and the lack of ECC might hurt. I don't even imagine how we go about this with how to slice the work up and deal with uptimes of gpu's being inconsistent
Defining what types of models to build
You'll have super users wanting 400B+ which seems right as a baseline to distill from, but then the community might be heavily torn between the 30B-200B range of what they want built.
Getting people who actually know how to train.
All this seems like a lot, but I think this should be discussed more because we can't expect our free lunch to last forever, and see if there is even a chance to a community driven way for this?
Any thoughts? I'm sure I've missed a lot more issues, and challenges, or misunderstood some.
[link] [comments]



