Choosing the right language model depends on an organisation’s specific needs, task complexity, and available resources.
Small language models are ideal for organisations aiming to build applications that run locally on a device (rather than in the cloud).
Some argue that Large language models are better suited for applications requiring the orchestration of complex tasks, advanced reasoning, data analysis, and contextual understanding.
Small language models provide potential solutions for regulated industries and sectors needing high-quality results while keeping data on their own premises.
Latency refers to the delay in communication between Large Language Models (LLMs) and the cloud when retrieving information to generate answers to user prompts. In some use-cases cases, makers can prioritise waiting for high-quality answers, while in others, speed is crucial for user satisfaction. However, for conversational experiences, latency is a non-negotiable.
Cost is also a consideration that makes the use of SLMs very attractive.
Small Language Models (SLMs), which can operate offline, significantly broaden AI’s applicability.
What we’re going to start to see is not a shift from large to small, but a shift from a singular category of models to a portfolio of models where customers get the ability to make a decision on what is the best model for their scenario .~ Source
There are a number of features which are part and parcel of Large Language Models. For instance Natural Language Generation, dialog and conversational context management, reasoning and knowledge.
The knowledge portion is not being used to a large degree as RAG is used to inject contextual reference data at inference. Hence Small Language Models (SLMs) are ideal, even though they lack being knowledge intensive.
Seeing that the knowledge intensive nature of LLMs are not used in any-case.
The model simply does not have the capacity to store too much “factual knowledge”, which can be seen for example with low performance on TriviaQA.
However, we believe such weakness can be resolved by augmentation with a search engine. ~ Source
Another limitation related to the model’s capacity is that the model is primarily restricted to English.
Below is an example of how anyone can interact with the Phi-3 small language model. Within HuggingFace’s HuggingChat, anyone can go to settings and under models select the Phi-3 model.