Self-AMPLIFY: Improving small language models with self post hoc explanations
Back to all articlesThis article introduces Self-AMPLIFY, a novel method to enhance Small Language Models by leveraging self-generated post hoc explanations to improve performance and trust.
The rise of small language models and rationale enhancement
Large language models (LLM) have demonstrated strong capabilities in several natural language processing (NLP) tasks, such as text generation, question answering,or reasoning. Despite their proficiency, LLMs like Llama-3.1 405B face limitations due to their parameter sizes and computational cost, often requiring cloud API use. Therefore, small language models (SLMs) are increasingly favored for their low inference latency, cost-effectiveness, efficient development, and easy customization and adaptability.
Rationales are explanations that provide insights into a model's reasoning process and have been shown to significantly boost the performance of language models when used in prompts or In-Context Learning (ICL). However, generating high-quality rationales typically requires human annotation or the use of auxiliary proxy models, which can be time-consuming and costly. We propose a method called Self-AMPLIFY to automatically generate natural language rationales that can improve the performance of SLMs without the use of any auxiliary model.
Self-AMPLIFY overview
Self-AMPLIFY is a three-step method that uses post hoc explanation methods to automatically generate natural language rationales and leverage them to improve the performance of SLMs through ICL. These steps aredefined as follows:
- Target samples: Identify promising samples from the training data using the SLMprediction. Implement two strategies, based onwhether instances are initially well (success) or incorrectly (error) classified by the SLM.
- Rationale generation: Apply post hoc explanation methods to the SLM to generate rationales. Implement three types of rationales, either based on attribution explainers (SHAP, DeepLift…), self_topk explanations, or post hoc Chain-of-Thoughts (Ph-CoT).
- Build prompts: Construct the final prompt, including the original input, the generated rationale, and the ground truth. Use the prompt to leverage ICL to boost the SLM performance.
How is the rationale constructed from a post hoc explanation?
Self-AMPLIFY implements three types of post hoc explanations to generatenatural language rationale:
- post hoc attributions (DeepLift and KernelSHAP)
- post hoc Self_topk explanations
- post hoc CoT (Ph-CoT)
Post hoc explanations are computed to explain each (x, y) pair to finally build their associated rationales r.
DeepLift and KernelShap are computed to explain the (x, y) pair, i.e. f output neuron related to y. The k tokens with the highest attribution score are then selected to build the rationale: it is defined following the template "The k keywords ⟨word1⟩, ⟨word2⟩,..., and ⟨wordk⟩ are important to predict that the answer is ⟨y⟩". This way, Self-AMPLIFY generates rationales from post hoc attribution methods by converting a numerical vector of importance into a natural language rationale. Self_topk consists in directly prompting f to generate the k mos timportant tokens used to make its prediction. Self_topk is generated in a predict-then-explain post hoc manner since the text containing the k most important keywords is generated given the ground truth answer y.
Finally, Ph-CoT consists in prompting f to generate a p-step free text explanation in a post hoc manner, given the ground truth y. Therefore, Ph-CoT can be defined as a post hocChain-of-Thought explanation. The final related rationale r is defined following the template "p-step rationale: ⟨ϕ⟩, therefore the answer is ⟨y⟩", where ϕ is the post hoc free text rationale previously generated, and p is the number of steps in the rationale.
How is the final prompt built to leverage ICL?
This final step consists in designing the prompt that is used to make the prediction on the test set. The output prompt is built based on the previously generated rationales. The latter is built following the template: " (x1,r1, y1), (x2,r2, y2), ..., (xn,rn, yn)". Finally, this n-shot prompt is used as acontext to make predictions in an ICL setting on the test set.
Some results on common benchmarks
Self-AMPLIFY gives competitive results as compared to the state-of-the-art, such as common ICL settings (IO), Auto-CoT, and AMPLIFY. In particular, Ph-CoT leads to the best results, due to the richness of free text rationales as compared to topk explanations such as DeepLift.
Some perspectives
We plan to improve the Self-AMPLIFYframework with other kinds of rationales, such as counterfactual explanations. We will also investigate thelink between the faithfulness of Self-AMPLIFY generated rationales and theinduced gain in performance.
At Ekimetrics, a significant part of our project involves manipulating autoregressive language models. Self-AMPLIFY canthen be used to improve the performance of SLMs and teach them to generate rationales to explain their behavior and favor trust with final users.