CERN eggheads burn AI into silicon to stem data deluge
The operating system of the universe isn’t going to debug itself
feature CERN is nothing like today's agentic AI jockeys, who mostly rely on pre-set weights and generic TPUs and GPUs to generate their slop. CERN burns custom nanosecond-speed AI into the silicon itself just to eliminate excess data.
Like the major league pitcher who comes to his kid's take-your-parent-to-school day, CERN's Thea Aarrestad gave a presentation at the virtual Monster Scale Summit earlier this month about meeting a set of ultra-stringent requirements that few of her peers may ever experience.
Aarrestad is an assistant professor of particle physics at ETH Zurich. AT CERN (European Organization for Nuclear Research), she uses machine learning to optimize data collection from the Large Hadron Collider (LHC). Her specialty is anomaly detection, a core component of any proper observability system.
Each year the LHC produces 40,000 EBs of unfiltered sensor data alone, or about a fourth of the size of the entire Internet, Aarrestad estimated. CERN can't store all that data. As a result, "We have to reduce that data in real time to something we can afford to keep."
By "real time," she means extreme real time. The LHC detector systems process data at speeds up to hundreds of terabytes per second, far more than Google or Netflix, whose latency requirements are also far easier to hit as well.
Algorithms processing this data must be extremely fast," Aarrestad said. So fast that decisions must be burned into the chip design itself.
Smash burgers
Contained in a 27-kilometer ring located a hundred meters underground between the border of Switzerland and France, the LHC smashes subatomic particles together at near-light speeds. The resulting collisions are expected to produce new types of matter that fill out our understanding of the Standard Model of particle physics — the operating system of the universe.
At any given time, there are about 2,800 bunches of protons whizzing around the ring at nearly the speed of light, separated by 25-nanosecond intervals. Just before they reach one of the four underground detectors, specialized magnets squeeze these bunches together to increase the odds of an interaction. Nonetheless, a direct hit is incredibly rare: out of the billions of protons in each bunch, only about 60 pairs actually collide during a crossing.
When particles do collide, their energy is converted into a mass of new outgoing particles (E=MC2 in the house!). These new particles "shower" through CERN's detectors, making traces "which we try to reconstruct," she said, in order to identify any new particles produced in ensuing melee.
Each collision produces a few megabytes of data, and there are roughly a billion collisions per second, resulting in about a petabyte of data (about the size of the entire Netflix library).
Rather than try to transport all this data up to ground level, CERN found it more feasible to create a monster-sized edge compute system to sort out the interesting bits at the detector-level instead.
Gargantuan edge compute
"If we had infinite compute we could look at all of it," Aarrestad said. But less than 0.02% of this data actually gets saved and analyzed. It is up to the detectors themselves to pick out the action scenes.
The detectors, built on ASICs, buffer the captured data for up to 4 microiseconds, after which the data "falls over the cliff," forever lost to history if it is not saved.
Making that decision is the "Level One Trigger," an aggregate of about 1,000 FPGAs that digitally reconstruct the event information from a set of reduced event information provided by the detector via fiber optic line at about 10 TB/sec. The trigger produces a single value, either an "accept" (1), or "reject" ("0").
Making the decision to keep or lose a collision is the job of the anomaly-detection algorithm. It has to be incredibly selective, rejecting more than 99.7 percent of the input outright. The algo, affectionately named AXOL1TL, is trained on the "background" — the areas of the Standard Model that have largely been sussed out already. It knows the typical topology of a standard collision, allowing it to instantly flag events that fall outside those boundaries. As Aarrestad put it, it's hunting for "rare physics."
The algorithm must make a decision within 50 nanoseconds. Only about 0.02% of all collision data, or about 110,000 events per second, make the cut, and are subsequently saved and transported to ground level. Even this slimmed-down throughput results in terabytes per second being sent up to the on-ground servers.
Once on the surface, the data goes through a second round of filtering, called the "High Level Trigger," which again discards the vast majority of captured collisions, identifying only about 1,000 interesting collisions from the 100,000 events per second that come through the pipe. This system has 25,600 CPUs and 400 GPUs, to reproduce the original collision and analyze the results, and produces about a petabyte a day.
"This is the data we will actually analyze," Aarrestad said.
From there the data is replicated across 170 sites in 42 countries, where it can be analyzed by researchers worldwide, with an aggregate power of 1.4 million computer cores.
A hothouse environment for AI
The LHC detectors are a hothouse environment rarely encountered by AI. So much so that the CERN engineers had to create their own toolbox.
Sure, there are already plenty of real-time libraries for consumer applications such as noise-cancelling headphones, things like MLPerfMobile and MLPerfTiny. But they don't come anywhere close to supporting the streaming data rates and ultra-low latencies CERN requires.
So CERN trained machine learning models "to be small from the get-go," she said. They were quantized, pruned, parallelized, and distilled to the essential knowledge only. Every operation on an FPGA is quantized. Unique bitwidths were defined for each parameter, and they were made differential, so they could be optimized using gradient descent.
The engineering team developed a transpiler, HLS4ML, that would write the model in C++ code targeted for specific platforms, so it can be run on an accelerator or system-on-a-chip, a custom FPGA, or even use it to "print silicon" on an ASIC.
The detector architecture breaks from the traditional Von Neumann model of memory-processor-I/O. Nothing is sequentially-driven. Rather it is based on the "availability of data," she said. "As soon as this data becomes available, the next process will start."
Most crucially, decisions must be made on-chip – nothing can be handed off to even very fast memory. Every piece of hardware is tailored for a specific model. Decisions take place at design time. Each layer of FPGAs is a separate compute unit.
A good chunk of the on-chip silicon is taken up by pre-calculations in order to save the processing to do each calculator anew. The output of every possible input is referenced in a lookup table.
Naturally, you can't put huge models on these slivers of silicon. No room for huge transformation deep learning models here. This is where CERN found that tree-based models are very powerful, compared to the deep learning ones.
In CERN's experience, tree-based models offer the same performance but at a fraction the costs of deep learning models. This is not surprising given the Standard Model could be viewed as a collection of tabular data. For each collision, the LHC spits out a structured set of discrete measurements.
More data, please
CERN is trying to measure all of the parameters of collisions to the 5-sigma level – that's 99.999%, five-nines, the gold standard for claiming a discovery. The Higgs boson subatomic particle was found using this standard.
The LHC collider has found at least 80 other hadrons, or particles held together by strong nuclear force (including one last week).
The hunt is on for new processes that occur in fewer than one in a trillion collisions.
At the end of this year, the LHC is shutting down to make way for the High Luminosity LHC, due to become operational in 2031. It will provide more of the sweet, sweet data particle physicists crave.
It will have more powerful magnets to focus the beams on very tiny spots. The bunches of protons will be doubled in size ("so there is more of a probability that those protons will talk to each other").
That means a lot more collisions and a 10-fold increase of data, leading to a much denser "event complexity." The event size jumps from 2MB to 8MB, but the resulting trails of data will jump from 4 Tb/sec to 63 Tb/sec.
The detectors are being upgraded to identify each collision, then track each particle-pairing back to its original collision point – all within a few microseconds.
While the frontier AI labs build ever-larger models, CERN is, in many ways, heading in the opposite direction, embracing aggressive anomaly detection, heterogeneously-quantized transformers and other tricks to make the AI smaller and faster than ever. When building our understanding of the universe, it is sometimes better to know what information to throw away. ®
