Spark creator bags computing gong for making big data a little bit smaller
ACM salutes Databricks co-founder Matei Zaharia with $250K prize
The Association for Computing Machinery (ACM) has awarded its annual Prize in Computing to Matei Zaharia for his work developing open source data and analytics software, including the widely used Apache Spark analytics engine.
The ACM Prize in Computing recognizes early-to-mid-career computer scientists whose work has had a broad and lasting impact. The award carries a $250,000 prize, with financial support provided by an endowment from tech services and consultancy company Infosys.
Few would insist Zaharia needs the money right now, though. After developing Apache Spark as part of his PhD at UCL Berkley, Zaharia went on to co-found Databricks, which provides an analytics and machine learning platform based on Spark and other technologies. The company has an estimated value of $130 billion.
Zaharia has helped develop a number of other open source projects, including Delta Lake, an open source storage framework governed by the Linux Foundation, and MLflow, an open source platform for machine learning lifecycles.
But he is best known for Apache Spark, which was widely adopted by the machine learning and analytics community. It is available from the leading cloud providers and data platforms such as Snowflake and Cloudera.
In an interview with The Register, Zaharia described how he developed the new approach to distributed computing to use memory more reliably and accelerate computations. It also opened up so-called "big data" computing to a new set of users.
When Zaharia started work on Spark around 2010, analyzing "big data" generally meant using MapReduce, the Java-based programming model that ran on the Hadoop Distributed File System, plus a fair bit of software engineering.
- Snowflake builds Spark clients for its own analytics engine
- How Apache Spark lit up the tech world and outshone its big data brethren
- Databricks shakes VC money tree and $500M falls out
- Leaving Spark behind, Databricks enters new territory as it eyes 2021 IPO
Zaharia took inspiration from the researchers using big data for machine learning and discovering new viruses, for example. "These are really interesting use cases where they won't sit down and learn Java and spend many weeks building an application. We wanted to make it as easy as possible for them to do their stuff," he said.
Part of the plan to broaden the appeal was to introduce new programming languages. As well as Java, users can work in Scala, statistical language R, C#, and Python, a high-level general-purpose language that has achieved widespread popularity in machine learning. The de facto database language standard, SQL, was added in 2014.
ACM President Yannis Ioannidis said Zaharia's work has made a lasting impact on how data is used at scale. "By addressing key limitations in earlier systems, he developed technologies that quickly became standard tools for data analytics, machine learning, and artificial intelligence. Matei's open source philosophy has been essential: he made these tools accessible to all. His contributions continue to influence both research and industry, and I look forward to seeing where his current work on AI systems takes us next."
As well as working for Databricks, Zaharia has co-authored recent open source research, including DSPy and GEPA, which focus on optimizing prompts and models to improve AI agent quality for specific tasks. He has held academic roles at MIT and Stanford, and is now an associate professor of computer science at the University of California, Berkeley. ®



