BigCode Profile picture
Open and responsible research and development of large language models for code. #BigCodeProject run by @huggingface + @ServiceNowRSRCH
Oct 8, 2025 6 tweets 3 min read
Introducing BigCodeArena, a human-in-the-loop platform for evaluating code through execution.

Unlike current open evaluation platforms that collect human preferences on text, it enables interaction with runnable code to assess functionality and quality across any language. Image Why does this matter?

Benchmarks like HumanEval only scratch the surface. Reading code “by eye” is error-prone. True quality emerges when you actually run it: web apps render, games play, edge cases break.

BigCodeArena makes execution feedback the default. Image
Jun 8, 2023 7 tweets 4 min read
📣 Introducing ⭐ StarCoder+ & StarChat Beta!

We trained StarCoder on the Falcon model's English web dataset and Instruction-tuned it. Both models rank high in the LLM leaderboard, with strong natural language performance and coding capabilities.

huggingface.co/HuggingFaceH4/… Image StarCoderBase showed promise in natural language reasoning despite being trained solely on GitHub code. So we fine-tuned it on the English web dataset used in Falcon pre-training:

huggingface.co/bigcode/starco…
huggingface.co/datasets/tiiua…
May 4, 2023 11 tweets 6 min read
Introducing: 💫StarCoder

StarCoder is a 15B LLM for code with 8k context and trained only on permissive data in 80+ programming languages. It can be prompted to reach 40% pass@1 on HumanEval and act as a Tech Assistant.

Try it here: shorturl.at/cYZ06r

Release thread🧵 Image In addition to chatting with StarCoder, it can also help you code in the new VSCode plugin. By pressing CTRL+ESC you can also check if the current code was in the pretraining dataset!

marketplace.visualstudio.com/items?itemName…
Dec 22, 2022 15 tweets 7 min read
Announcing a holiday gift: 🎅SantaCoder - a 1.1B multilingual LM for code that outperforms much larger open-source models on both left-to-right generation and infilling!

Demo: hf.co/spaces/bigcode…
Paper: hf.co/datasets/bigco…
Attribution: hf.co/spaces/bigcode…

A🧵: SantaCoder is trained on Python, Java, and JavaScript and outperforms other large multilingual models such as InCoder (6.7B) or CodeGen-multi (2.7B) considerably!

A lot of pieces from a lot of collaborators came together to get to that result:
Nov 29, 2022 7 tweets 3 min read
Between now and Christmas🎄 we are running a series on experiments to figure out what the best pre-processing is for code datasets such as The Stack. We'll share the W&B dashboards of these 🎅-models so if you are interested you can follow along! Image We are training ~1B parameter models on the Python/Java/JavaScript subset of The Stack. On the architecture side we want to evaluate the Fill-in-the-Middle (FIM) objective, as well as multi-query attention.
Oct 27, 2022 12 tweets 6 min read
Introducing 📑 The Stack - a 3TB dataset of permissively licensed code in 30 programming languages.

hf.co/datasets/bigco…

You want your code excluded from the model training? There is an opt-out form and data governance plan:

bigcode-project.org/docs/about/the…

Let's take a tour🧵 Dataset collection: With gharchive.org over 220M repos were identified and 137M successfully cloned with over 50B files and 90TB of data. Filtered by extension and permissive licenses this yields 3TB of data. We also make a near-deduplicated version (1.5TB) available.
Sep 26, 2022 9 tweets 3 min read
print("Hello world! 🎉")

Excited to announce the BigCode project led by @ServiceNowRSRCH and @huggingface! In the spirit of BigScience we aim to develop large language models for code in an open and responsible way.

Join here: bigcode-project.org/docs/about/joi…

A thread with our goals🧵 Image 🌸Language models for code (Codex, CodeGen) and the applications they power (AI assisted programming) are gaining traction. Some models have been released, but there are still questions around data governance, robustness of evaluation benchmarks, and the engineering behind them.