How we select models

The principles behind the Berget AI model catalogue: why we prefer open models, how we evaluate them, and what we optimise for

Our model catalogue reflects a set of deliberate choices about quality, transparency, and what "useful" actually means for teams building in Europe. Understanding those choices helps you understand why certain models are available, why others aren't, and what you can expect as the catalogue evolves.

Four principles

Every model we add is weighed against four criteria.

Open by default. We prefer models with open weights. Open models can be inspected, audited, and fine-tuned. They don't carry hidden data-use terms. When a model is open, you know what you're running.

Privacy and copyright by design. We don't add models trained on data with unclear provenance or models whose terms create downstream legal risk for your organisation. This matters especially for teams in regulated industries.

Performance relative to resource use. A model that requires ten times the compute to achieve a marginal quality improvement isn't a good trade-off for most use cases. We look for models that deliver strong results at a size that makes them practical to run at scale.

Nordic and Swedish relevance. Many of you build products for Scandinavian markets. We actively seek models with strong multilingual support and, where they exist, models trained specifically on Nordic languages. KB-Whisper is a good example: it's a Swedish speech recognition model from the National Library of Sweden that outperforms generic alternatives on Swedish audio.

What we mean by "open models"

Open-weight models are released with their parameters publicly available. This is different from open-source software in some respects (licences vary, and not all open-weight models permit commercial use without restriction), but the core property is the same: the model can be examined, reproduced, and modified.

Open models benefit from collective improvement: researchers and practitioners publish findings, fine-tunes, and evaluations that closed models never receive. Iteration is faster because the community can identify failure modes and propose fixes without waiting for a vendor. Specialisation is possible because you can fine-tune on your own data without sending that data to a third party.

There's also a transparency argument. When a model's weights are public, its behaviour can be studied. That's not a guarantee of safety or correctness, but it's a meaningful property for teams that need to explain their systems to auditors or regulators.

How we evaluate new models

When a new model is released, we aim to have an initial evaluation within 48 hours. That evaluation covers standard benchmarks (MMLU for general knowledge and reasoning, HumanEval for code generation, HELM for broader capability assessment) alongside domain-specific tests we've built for Swedish-language tasks.

Benchmarks tell you something, but they don't tell you everything. A model that scores well on MMLU may still produce unreliable structured output or struggle with multi-turn instruction following. We test against both benchmark scores and the tasks you're likely to run in practice: classification, extraction, summarisation, code review, and multilingual generation.

We publish the results of this process in the models overview and capabilities pages. If you're evaluating a model for a specific use case and want to understand how it performed on a particular dimension, the test matrix lets you verify capabilities directly against the API.

What this means for the catalogue

We don't add every available open model. The catalogue is intentionally narrow. A model that doesn't meet the bar on privacy, performance, or practical utility doesn't belong here, regardless of how much attention it's received.

When we add a model, it's because we've tested it and believe it's worth running. When we remove one, it's because something better is available or the model no longer meets our criteria. We try to communicate those changes clearly so you can plan accordingly.

How we select models

Four principles

What we mean by "open models"

How we evaluate new models

What this means for the catalogue

Further reading

Models overview

Model capabilities

Choose a language model

Model chains

On this page