Teaching Small Language Models to Reason | by Cobus Greyling | Jul, 2024

10Jul

Chain-Of-Thought Prompting at a foundational level is so successful, that it gave rise to something some refer to as the Chain-Of-X phenomenon. Google Research explored how to generate a CoT data ontology for existing datasets using LLMs and then how to fine-tune smaller Language Models on the CoT.

As most everyone knows, Chain-Of-Thought prompting improves the reasoning capabilities of large language models.

Google asserts that reasoning capabilities only emerge in models with at least tens of billions of parameters. This research from Google explores transferring these capabilities to smaller models via knowledge distillation.

They fine-tuned a student model using the Chain-Of-Thought outputs from a larger teacher model.

Researchers from Google found that this method improves task performance in arithmetic, common sense, and symbolic reasoning datasets.

Chain of thought (CoT) prompting teaches Language Models (LMs) to decompose a reasoning task into a series of intermediate steps.

It is demonstrated that this prompting significantly increases the task accuracy of large language models (LLMs) across common sense, symbolic and mathematical reasoning datasets.

However, the reasoning capabilities of smaller LMs do not improve with CoT prompting, mostly producing illogical CoT. Notably, CoT prompting even reduces the accuracy of models with less than 10 billion parameters.

Research attributes this to abilities, such as semantic understanding and symbolic mapping, only emerging at larger scale models.

Google Research propose a two-step pipeline for CoT (Chain-Of-Thought) knowledge distillation.

Annotation with CoT Reasoning

Use a teacher model, like PaLM 540B or GPT-3 175B, to annotate an existing supervised dataset with CoT reasoning.
Perform few-shot prompting with 8 examples to generate CoTs, adapting prompts to provide the target answer after the question and before the example CoT. This helps correct small mistakes.
Remove incorrect CoTs based on the target answer to ensure quality.

Fine-Tuning the Student Model

Fine-Tune a student model using teacher forcing.
Provide the question as input and the CoT and answer as the target.
This training eliminates the need for prompting during fine-tuning.

An overview of the proposed method is shown in the figure below:

Source link