State of Open Source on Hugging Face: Spring 2026

Hugging Face Blog / 3/18/2026

💬 OpinionIdeas & Deep AnalysisIndustry & Market MovesModels & Research

Read original →

共有:

Key Points

The Spring 2026 edition of State of Open Source on Hugging Face highlights continued expansion of the open-source ML ecosystem, with more models, datasets, and community contributions on the hub.
The report examines governance, licensing, and safety policies shaping how open-source models are shared and used across industries.
It notes tooling and infrastructure advances in Hugging Face libraries that streamline development, testing, and deployment of open-source models.
The piece discusses challenges like reproducibility, licensing clarity, and benchmarking, and it forecasts stronger enterprise adoption and cross-community collaboration.

Back to Articles

State of Open Source on Hugging Face: Spring 2026

Team Article Published March 17, 2026

Upvote

This post examines how the open source AI landscape has shifted across competition, geography, technical trends, and emerging communities over the past year. We primarily examine community activity on Hugging Face across many types of metrics to give a holistic view of the ecosystem.

This post builds on an earlier analysis conducted mid-2025, available here, which examined what the Hugging Face Community is building. We recommend reading additional perspectives on the open source ecosystem in and outside of Hugging Face from the Data Provenance Initiative, Interconnects, OpenRouter and a16z, and MIT and the Linux Foundation. As the Hugging Face ecosystem is distributed, analyses are a combination of Hugging Face and community members' work, each of which is appropriately credited.

Activity in the open source AI ecosystem has rapidly grown, with the number of users, model, and dataset repositories all close to doubling. In 2025, Hugging Face grew to 11 million users, more than 2 million public models, and over 500,000 public datasets. This growth signals more than increased interest in open source; it reflects a shift toward active participation, with users increasingly creating derivative artifacts such as fine-tuned models, adapters, benchmarks, and applications rather than only consuming pre-trained systems.

Data from Hugging Face | Hugging Face's two million models and counting: Graph and story by AI World

The ecosystem remains highly concentrated. Approximately half of the models on Hugging Face have less than 200 total downloads, and the top 200 most downloaded models, or 0.01% of models, comprise 49.6% of all downloads.

Specialized communities form around particular domains, languages, or problem areas, and often show sustained engagement and reuse even when their overall download counts are modest. Open source AI is best understood as a collection of overlapping sub-ecosystems rather than a single uniform market.

Open Source in Competition

More companies, both large and small, are building on open source. Over 30% of the Fortune 500 now maintain verified accounts on Hugging Face. Startups frequently use open models as default components: Thinking Machines built its Tinker model options entirely on open weights, while popular IDEs such as VSCode and Cursor support both open and closed models. Established American companies such as Airbnb have increased their engagement with the open ecosystem, and Hugging Face has seen more legacy companies upgrading their organizational subscriptions over the course of 2025.

Big Tech companies are frequently creating new repositories on Hugging Face Hub; visualized side-by-side, the strong increase in repository growth shows investment over time. NVIDIA has emerged as the strongest contributor.

Data from Hugging Face | Big Tech Is All-In On Open-Source AI, Graph and story by AI World

Studies of open software more broadly suggest that the downstream value created by open artifacts far exceeds the cost of producing them. Similar dynamics are emerging in AI, where open models are reused, adapted, and specialized across thousands of downstream applications. Organizations that rely exclusively on closed systems often incur higher costs and face reduced flexibility in deployment and customization.

The Geography of Open Source

All-time downloads over the past four years show clear frontrunner regions in model popularity. The U.S. and China have historically been top contributors, with the UK, Germany, and France as secondary in popularity. Models developed by individual users or distributed organizations without a clear geographic base account for about half of all platform downloads.

Data from Hugging Face | Graph and Research from Longpre et al. “Economies of Open Intelligence: Tracing Power & Participation in the Model Ecosystem”

The geographic composition of the open source ecosystem has fundamentally changed. Hugging Face data shows China surpassing the U.S. in monthly downloads and overall downloads. In the past year, Chinese models quickly accounted for the plurality or 41% of downloads.

Data and Graph from Hugging Face

Industry's share of overall development fell from around 70% before 2022 to roughly 37% in 2025. Meanwhile, independent or unaffiliated developers rose from 17% to 39% of all downloads over the same period, at times accounting for more than half of total usage. Individuals and small collectives focused on quantizing, adapting, and redistributing base models. These intermediaries now steer a meaningful portion of what typical users can run and how innovations spread through the ecosystem.

Data from Hugging Face | Graph and Research from Longpre et al. “Economies of Open Intelligence: Tracing Power & Participation in the Model Ecosystem”

Different regions contribute in different ways. The United States and Western Europe have historically dominated through large industry labs (Google, Meta, OpenAI, Stability AI), while China has increasingly led on both releases and adoption. France, Germany, and the UK continue to contribute through research organizations, national AI initiatives, and specialized model families. Ecosystems supporting a variety of contributors and organizational forms tend to produce more widely adopted artifacts.

Countries, Organizations, and Individual Users

Popular models from startups were more widespread. Competitive countries were France and South Korea. Notably, the fourth most popular entity for developing new trending models were individual users, not organizations. Creating competitive models at a user level is more accessible than ever before.

Data and Graph from Hugging Face

Between the U.S. and China

Of the newly created models in 2025, the majority of trending models were either developed in China or derivative of a model developed in China. The most popular models were developed by large organizations, predominantly from the U.S. and China. For more on the Chinese AI ecosystem, read our three part series reflecting on the changes in one year since the "DeepSeek Moment", with one on strategic changes, two on architectural changes, and three on organizations and the future.

In 2025, China’s AI ecosystem steered heavily into open source, following the viral release of DeepSeek’s R1 model in January. The number of competitive Chinese organizations releasing models and the number of repositories on Hugging Face skyrocketed. Baidu went from zero releases on the Hub in 2024 to over 100 in 2025. ByteDance and Tencent each increased releases by eight to nine times. Organizations that had previously favored closed approaches, including Baidu and MiniMax, shifted decisively toward open releases.

Data and Graph from Hugging Face

A similar number of popular U.S. organizations have consistently contributed a higher volume of repositories over time. Meta and its former Facebook research organization account for a significant proportion of open releases, as does Google to a lesser extent.

Data and Graph from Hugging Face

Next to each other, the steep upward trajectory of repository growth among popular Chinese organizations emerges as a key strategic difference.

Data and Graph from Hugging Face

Global Open Source and Sovereignty

Open source AI is increasingly tied to questions of sovereignty. Open weight models allow governments and public institutions to fine-tune systems on local data under national legal frameworks. Models that can be deployed on domestic hardware reduce reliance on foreign-controlled cloud infrastructure. Transparency around model architecture, training processes, and evaluation supports regulatory review and public accountability. Read more about the open source approach to sovereignty here.

At the national level, governments are taking action. South Korea's National Sovereign AI Initiative launched mid-2025 named national champions LG AI Research, SK Telecom, Naver Cloud, NC AI, and Upstage to produce competitive domestic models. Three models from South Korea trended simultaneously on Hugging Face Hub in February 2026. In March 2026, In 2026, South Korea and U.S. startup Reflection AI announced a data center partnership, also bringing frontier open weight models to South Korea.

Switzerland's Swiss AI initiative and various EU-funded projects reflect similar priorities. The UK's principle of "public money, public code" has influenced several government-backed AI initiatives.

Hugging Face Trending Page February 2026

These investments in open-source and open weight AI are already paying dividends for countries with thriving AI training ecosystems of their own, as we see that models and datasets are typically most used in the regions where they're developed; with developers often turning to the models that best represent their languages and reflect similar technical and application requirements.

Data and Graph from Hugging Face

Model Popularity

Most liked models on the Hub show community attention, in terms of ability to go back to or reference the model or general popularity. While this metric does not always reflect usage, the attention collected over time can show signals of interest. In one year, the most liked models went from predominantly U.S.-developed from Meta’s Llama family, to an international mix with China’s DeepSeek-R1 at the top.

Data and Graphic from Hugging Face

Papers and Scientific Contributions

While determining the value of scientific contributions can be determined by many metrics, our upvote feature on the Hub shows papers from large AI organizations be widely appreciated by community members. Notably, the most upvoted papers are from large organizations, mostly from the U.S. and China. The majority of the top organizations are Chinese Big Tech companies, with ByteDance sharing a high volume of high impact papers.

Space by Hugging Face | PaperVerse Explorer

Of Hugging Face's Daily Papers, a set of papers curated by Hugging Face's AK, papers that reference model and dataset creation, showing the most open source adoption, are generally diverse. Prominent takeaways show medical papers being influential, while Big Tech's influence is sparse.

Data from Hugging Face | Graphic and story by AI World

Derivative Models

How our community members choose to build on models, whether via fine-tuning, merging, or other methods, reflects model popularity and usability. Alibaba as an organization has more derivative models than both Google and Meta combined, with the Qwen family constituting more than 113,000 derivative models. When including all models that tag Qwen, that number balloons to over 200,000 models.

Data and Graph from Hugging Face

Adoption and Accessibility

Model development has increasingly emphasized accessibility alongside scale. Smaller models are downloaded and deployed at far higher rates than very large systems, reflecting practical constraints around cost, latency, and hardware availability.

This small-model dominance occurs in part because far more models are released at that size. But even when normalizing for this, the data from the ATOM Project's Relative Adoption Metric shows that the median top-10 models from 1-9B parameters are only downloaded about 4x more than models above 100B. Automated systems and CI pipelines further inflate small model download counts, but the trend toward smaller, deployable models is real.

Data from Hugging Face | Graph and Article by ATOM

Engagement with open models tends to peak almost immediately after release, then slow. Mean engagement duration is approximately 6 weeks. Continuous improvement and frequent updates have become critical for maintaining relevance. DeepSeek's successive releases (V3, R1, V3.2) kept it competitive even as challengers emerged. Organizations that stagnate in development tend to lose share quickly to those with frequent updates or domain-specific fine-tunes.

Data from Hugging Face | Graph and Research from Choksi et al. "The Brief and Wondrous Life of Open Models"

The mean size of downloaded open models rose from 827M parameters in 2023 to 20.8B in 2025, driven largely by quantization and mixture-of-experts architectures. The median, however, increased only marginally, from 326M to 406M parameters. This divergence indicates that high-end LLM users are pulling up the mean while underlying small-model usage remains stable.

Data from Hugging Face | Graph and Research from Longpre et al. "Economies of Open Intelligence: Tracing Power & Participation in the Model Ecosystem"

Performance differences between frontier models and smaller systems often narrow rapidly through fine-tuning and task-specific adaptation. On the Hub, models with hundreds of millions of parameters support search, tagging, and document processing workflows, while models in the single-digit billions are widely used for coding, reasoning, and multimodal tasks. As a result, most major model developers now release families of models spanning a range of sizes. The rise of capable small models shifts autonomy closer to the edge, reducing dependency on centralized cloud providers.

Compute, Hardware, and Open Source

Open source AI development is closely linked to hardware trends. Most models are optimized for NVIDIA GPUs, but support for AMD hardware continues to expand. Stability AI model collections now optimize for both NVIDIA and AMD platforms. Libraries increasingly target both, and tooling has improved to make cross-hardware deployment more straightforward. In 2025 Hugging Face launched the Kernel Hub to load and run kernels optimized for NVIDIA and AMD GPUs.

In parallel, Chinese open models are being released with explicit support for domestically developed chips. Alibaba has invested in inference-focused chip architectures designed to fill Chinese data centers with hardware capable of running open source models locally.

While access to compute remains a core necessity across the board for development and deployment of AI models, open-source and open-weight models are helping break away from an ecosystem where it becomes the be-all and end-all, with increasingly more models at all levels of performance pushing efficiency from 10x to 1000x lower costs than flagship AI models the largest developers.

Data and Graphic from Hugging Face

Still, the question of infrastructure investment for open source remains urgent. Public funding for data centers capable of training and serving open models has become a growing policy discussion, particularly in Europe and the UK. The gap between the compute resources available to large closed-model companies and those accessible to the open source community continues to shape what is feasible in open development.

Sub-Communities: Robotics

Robotics has emerged as one of the fastest-growing sub-communities on Hugging Face. The numbers are striking: robotics datasets grew from 1,145 in 2024 to 26,991 in 2025, climbing from rank 44 to the single largest dataset category on the Hub in just three years. For comparison, text generation, the second-largest category, had only around 5,000 datasets in 2025.

Data from Hugging Face | Graph and Story by AI World

Community-contributed datasets span everything from household manipulation tasks to autonomous driving. The largest multimodal dataset for spatial intelligence, Learning to Drive (L2D), was released through a LeRobot collaboration with Yaak. Datasets like RoboMIND, with over 107,000 real-world trajectories across 479 distinct tasks and multiple robot embodiments, provide the kind of scale and diversity needed for training generalizable robotic policies.

Hugging Face's acquisition of Pollen Robotics opened open source robotic sales to both industry and academic labs, as well as everyday hobbyists. LeRobot, Hugging Face's open source robotics library that provides models, datasets, and tools for real-world robotics in PyTorch, covering imitation learning, reinforcement learning, and vision-language-action models, experienced rapid growth. Over the past year, its GitHub repository stars nearly tripled.

Data from GitHub | Graphic from star-history.com

Sub-Communities: AI for Science

Scientific research has become another particularly active area. Open models and datasets are increasingly used for protein folding, molecular dynamics, drug discovery, and scientific data analysis. All frontier AI companies now have dedicated science teams, though much current focus remains on literature discovery rather than direct experimentation.

Space by Hugging Face | Science Release Heatmap

Community-led projects have formed around shared research goals, often involving hundreds of contributors working across institutions and disciplines. These efforts highlight the role of open source as a mechanism for coordinating large-scale, interdisciplinary work that would be difficult to organize through traditional academic or corporate structures alone.

Looking Forward

The open source AI ecosystem continues to evolve through a combination of global participation, technical specialization, and institutional adoption. Several trends are likely to define the next phase.

The geographic rebalancing of power is accelerating. Western organizations increasingly seek commercially deployable alternatives to Chinese models, creating urgency around efforts like OpenAI's GPT-OSS, AI2's OLMo, and Google's Gemma to offer competitive open options from US and European developers. Whether these efforts can match the adoption momentum of Qwen and DeepSeek will be a defining question of 2026.

The growth of sub-communities in robotics and science suggests that open source AI is expanding beyond language and image generation into the physical and experimental domains. The infrastructure, norms, and coordination mechanisms developed around text and image models are being adapted for new modalities and use cases.

For researchers, developers, companies, and governments, open source remains a foundational layer for building, evaluating, and governing AI systems. With increasing agent deployments, open-source and its interoperability will be key for agents to thrive. Its trajectory over the past year makes one thing clear: the open source ecosystem is where much of the practical work of AI development, adaptation, and deployment takes place, and its influence on the broader AI landscape continues to grow.

Thank you to the Hugging Face community for continuing to build the foundation of the AI ecosystem 🤗

Community

EditPreview

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Comment

· Sign up or log in to comment

Upvote

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成

日経XTECH

Hey dev.to community – sharing my journey with Prompt Builder, Insta Posts, and practical SEO

Dev.to

How to Build Passive Income with AI in 2026: A Developer's Practical Guide

Dev.to

The Research That Doesn't Exist

Dev.to

日産、E2Eロボタクシーで「水平分業」ウーバー・NVIDIAと対テスラ

日経XTECH

State of Open Source on Hugging Face: Spring 2026

Key Points

State of Open Source on Hugging Face: Spring 2026

Open Source in Competition