Researchers at Stanford Propose a Family of Representation Finetuning (ReFT) Methods that Operates on a Frozen Base Model and Learn Task-Specific Interventions on Hidden Representations

16Apr

Pretrained language models (LMs) are commonly finetuned to adapt them to new domains or tasks, a process known as finetuning. While finetuning allows for adaptation to various functions with small amounts of in-domain data, it can be prohibitively expensive for large LMs.

Parameter-efficient finetuning (PEFT) methods offer a solution by updating only a fraction of the weights, reducing memory usage and training time. Adapters, a common PEFT approach, learn edits that can be added to a subset of model weights or operate alongside the frozen base model. Recent advancements like LoRA and its variants reduce the number of trainable parameters by using low-rank approximations during adapter training.

However, a significant aspect of current PEFT methods is their focus on modifying weights rather than representations, despite prior research indicating that representations encode rich semantic information. Representation Finetuning (ReFT) methods have been proposed in response to this by a team of researchers from Stanford and Pr(Ai)2R Group.

Instead of adapting model weights, ReFT methods train interventions to manipulate a small fraction of model representations, steering model behaviors to solve downstream tasks at inference time. Their approach draws inspiration from recent work in LM interpretability, which intervenes on representations to identify causal mechanisms and steer model behaviors at inference time.

One notable instance of the ReFT family is the Low-rank Linear Subspace ReFT (LoReFT), which intervenes on hidden representations in the linear subspace spanned by a low-rank projection matrix. LoReFT builds directly on existing methods like distributed alignment search (DAS), demonstrating state-of-the-art performance on various benchmarks while using significantly fewer parameters than traditional PEFT methods. Their results suggest that ReFT methods offer more efficient and effective alternatives to weight-based PEFTs, deserving further exploration across different model families and domains.

Future research directions for ReFT include exploring its effectiveness on other model families and vision-language models and automating hyperparameter search. Additionally, investigating more effective interventions for specific tasks and exploring the power of learned orthogonal subspaces are areas of interest. ReFT advances neural network interpretability research and contributes insights back to the field, challenging traditional approaches to interpreting individual neurons in isolation.

In terms of evaluation practices, it’s essential to establish benchmarks that allow for fair comparisons of PEFTs and ReFTs, including compute- or time-matched hyperparameter-tuning comparisons and disallowing tuning or model selection based on the test set to mitigate overfitting and ensure real-world performance assessment.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 40k+ ML SubReddit

Want to get in front of 1.5 Million AI Audience? Work with us here

Arshad is an intern at MarktechPost. He is currently pursuing his Int. MSc Physics from the Indian Institute of Technology Kharagpur. Understanding things to the fundamental level leads to new discoveries which lead to advancement in technology. He is passionate about understanding the nature fundamentally with the help of tools like mathematical models, ML models and AI.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others…

Source link