How we select models
The principles behind the Berget AI model catalogue: why we prefer open models, how we evaluate them, and what we optimise for
Our model catalogue reflects a set of deliberate choices about quality, transparency, and what "useful" actually means for teams building in Europe. Understanding those choices helps you understand why certain models are available, why others aren't, and what you can expect as the catalogue evolves.
Four principles
Every model we add is weighed against four criteria.
Open by default. We prefer models with open weights. Open models can be inspected, audited, and fine-tuned. They don't carry hidden data-use terms. When a model is open, you know what you're running.
Privacy and copyright by design. We don't add models trained on data with unclear provenance or models whose terms create downstream legal risk for your organisation. This matters especially for teams in regulated industries.
Performance relative to resource use. A model that requires ten times the compute to achieve a marginal quality improvement isn't a good trade-off for most use cases. We look for models that deliver strong results at a size that makes them practical to run at scale.
Nordic and Swedish relevance. Many of you build products for Scandinavian markets. We actively seek models with strong multilingual support and, where they exist, models trained specifically on Nordic languages. KB-Whisper is a good example: it's a Swedish speech recognition model from the National Library of Sweden that outperforms generic alternatives on Swedish audio.
What we mean by "open models"
Open-weight models are released with their parameters publicly available. This is different from open-source software in some respects (licences vary, and not all open-weight models permit commercial use without restriction), but the core property is the same: the model can be examined, reproduced, and modified.
Open models benefit from collective improvement: researchers and practitioners publish findings, fine-tunes, and evaluations that closed models never receive. Iteration is faster because the community can identify failure modes and propose fixes without waiting for a vendor. Specialisation is possible because you can fine-tune on your own data without sending that data to a third party.
There's also a transparency argument. When a model's weights are public, its behaviour can be studied. That's not a guarantee of safety or correctness, but it's a meaningful property for teams that need to explain their systems to auditors or regulators.
How we evaluate new models
When a new model is released, we aim to have an initial evaluation within 48 hours. That evaluation covers standard benchmarks (MMLU for general knowledge and reasoning, HumanEval for code generation, HELM for broader capability assessment) alongside domain-specific tests we've built for Swedish-language tasks.
Benchmarks tell you something, but they don't tell you everything. A model that scores well on MMLU may still produce unreliable structured output or struggle with multi-turn instruction following. We test against both benchmark scores and the tasks you're likely to run in practice: classification, extraction, summarisation, code review, and multilingual generation.
We publish the results of this process in the models overview and capabilities pages. If you're evaluating a model for a specific use case and want to understand how it performed on a particular dimension, the test matrix lets you verify capabilities directly against the API.
What this means for the catalogue
We don't add every available open model. The catalogue is intentionally narrow. A model that doesn't meet the bar on privacy, performance, or practical utility doesn't belong here, regardless of how much attention it's received.
When we add a model, it's because we've tested it and believe it's worth running. When we remove one, it's because something better is available or the model no longer meets our criteria. We try to communicate those changes clearly so you can plan accordingly.
Further reading
Models overview
Available models for Berget AI serverless inference
Model capabilities
Capability matrix for Berget AI language models, including tool use, JSON output, streaming, multimodal input, and throughput
Choose a language model
A decision process for picking the right Berget AI language model for your use case
Model chains
How combining specialised models in sequence can outperform a single large model, and when to use this pattern