Large language models (LLMs) may have made their big breakthrough in recent years, but small language models (SLMs) are not to be eclipsed. SLMs are emerging as a more accessible and less expensive alternative to their larger counterparts, and may even be more suitable for decentralised AI (DeAI) and agentic AI.

In a world dominated by massive models and opaque centralised corporations holding all the cards, FLock.io believes there’s room for small, transparent, controllable systems. Models you can fine-tune and run on your own terms.

Both sizes shine in different scenarios – this blog breaks it down.

What’s the difference between small and large language models?

LLMs and SLMs differ primarily in size, computational requirements, and their suitability for different tasks.

As the name suggests, LLMs are massive and often have billions to trillions of parameters. They excel in generative tasks requiring deep language understanding. This means they require significant computational resources for training and deployment, and very few companies have the means to run them. Examples include Gemini, LaMDA, Claude and GPT-4.

SLMs, on the other hand, are much smaller, ranging from millions to a few billion parameters.

The size is often achieved via a technique called distillation (check out our explainer blog), which involves compressing a large model by transferring its knowledge to a smaller, denser, more efficient and lightweight one. It involves transferring knowledge like a teacher to a student. Other methods include pruning and quantisation.

Carry on reading to find out what SLMs are great at.

SLMs shine at speed and efficiency with low resource requirements

SLMs are more lean, compact and efficient than their larger counterparts. As such, they require less memory and computational power, making them ideal for edge computing. Edge computing allows devices in remote locations to process data at the "edge" of the network, either by the device or a local server. On-device AI is made possible, meaning no need for an internet connection or cloud services.

SLMs being lean makes them more efficient, lower cost and less demanding in terms of resources like compute. SLMs include domain-focused models for specific tasks, offering superior accuracy than more broad, versatile LLMs. They are also easier to fine-tune.

SLMs have faster inference and quicker response times because they have fewer parameters to process, making them perfect for real-time applications like virtual assistants.

The tradeoff is that they don’t match frontier performance on generalist benchmarks, have a narrower scope and are more prone to errors in ambiguous scenarios.

SLMs could be the future of agentic AI

The meteoric rise of agentic AI could usher in a mass of applications for which SLMs are sufficiently powerful, inherently more suitable and more economical.

Most modern AI agents are powered by massive LLMs. Being generalist, they can serve a large volume of diverse requests and are great for broad conversational abilities. However, SLMs are more favourable for repetitive, narrow and simpler tasks.

Agentic systems can be built with multiple specialised SLMs, and can be built into scalable architecture that is easier to debug and update. Read here about FLock’s new research paper proposing a decentralised agent swarm network.

Running SLMs locally on devices minimises data transfer, addressing privacy concerns and enabling applications in sensitive areas like healthcare. Read our paper on how blockchain-enabled federated learning has the potential to transform global healthcare.

New research from the Alan Turing Institute backs it up

The Alan Turing Institute (UK’s national institute for data science and AI) has just released new open-source research showing why SLMs are more relevant than ever. The team set out to see how far a small, open-weight language model could be pushed using lightweight tools and without massive infrastructure.

The results were extremely promising. They experimented on real-world health queries, and achieved incredible performance with a model small enough to run locally on a laptop.

The model had near-frontier reasoning performance. To get there, the team combined retrieval augmented generation (RAG), reasoning trace fine-tuning, and budget forcing at interference time.

Given that medical data is sensitive and difficult to obtain, the team bootstrapped to make it realistic and generated synthetic queries.

With just a few thousand synthetic examples and a bit of test-time reasoning, their lean model, just 3 billion parameters, was suddenly able to engage in thoughtful, grounded reasoning over real-world health content.

Out of the box, small models performed poorly on this task. RAG alone wasn’t enough for interpreting documents or connecting them to user input.

Once they added lightweight test-time scaling techniques, the picture changed. The models became far more capable of drawing logical inferences from retrieved context. In many cases, their responses were nearly indistinguishable from those of much larger frontier models.

The new wave of SLMs

A wave has been released over the past year, typically in the 1B to 8B parameter: Microsoft’s Phi series, NVIDIA’s Nemotron-H family, DeepSeek’s compact distillations, Qwen3’s small models, and Hugging Face’s SmolLLM collection.

Microsoft's Mu is the tech giant's latest lightweight on-device SLM that runs locally on Neural Processing Units (NPUs), aiming to reduce reliance on cloud-based processing.

Get started staking and earning with FLock.io!

FLock’s ecosystem consists of three key components: AI Arena, a platform for competitive model training; FL Alliance, a privacy-focused collaboration framework that enhances models while preserving data sovereignty; and Moonbase, our new rewards layer.

FLock is part of the mission to dismantle the power concentration of AI held by a handful of centralised corporations, promoting more equitable, transparent and composable development.

Find out more about FLock.io by reading our docs. For future updates, follow FLock.io on Twitter.

Why the AI world still needs Small Language Models (SLMs)

What’s the difference between small and large language models?

SLMs shine at speed and efficiency with low resource requirements

SLMs could be the future of agentic AI

New research from the Alan Turing Institute backs it up

The new wave of SLMs

Get started staking and earning with FLock.io!

More Articles

Why the AI world still needs Small Language Models (SLMs)

What’s the difference between small and large language models?

SLMs shine at speed and efficiency with low resource requirements

SLMs could be the future of agentic AI

New research from the Alan Turing Institute backs it up

The new wave of SLMs

Get started staking and earning with FLock.io!

Weekly newsletter

More Articles