I taught my 1B to follow instructions. It got worse at following instructions…

**What Happened:** A user named GPUburnout conducted a study where they trained three models at different parameter sizes (1B, 2B, and 3B)…

By AI Maestro May 14, 2026 1 min read
I taught my 1B to follow instructions. It got worse at following instructions…

**What Happened:** A user named GPUburnout conducted a study where they trained three models at different parameter sizes (1B, 2B, and 3B) using the same instruction-following training (SFT) method. The initial InstructEval scores for these models were as follows: 1B scored 20.50, 2B scored 21.94, and 3B scored 23.14. After applying SFT to all models, the scores dropped significantly. Specifically, the 1B model saw a drop of 5.75 points from its initial score, while the 2B model decreased by 4.91 points. The 3B model actually improved slightly, scoring 25.18 after SFT compared to 23.14 before.

**Why It Matters:** This study highlights a critical issue in fine-tuning large language models (LLMs) — the phenomenon where instruction-following performance can regress or even decline after applying standard fine-tuning techniques like self-supervised pre-training and transfer learning. The results suggest that smaller models might be particularly vulnerable to this effect, as evidenced by the 1B model’s significant drop in performance post-SFT. This finding is important because it could impact the reliability of instruction-following mechanisms across different LLM sizes, potentially undermining trust in these systems for applications like customer support or safety-critical tasks.

**Takeaways:**
– **Instruction-Following Performance Declines:** Smaller models (1B) showed a notable drop in their ability to follow instructions after applying standard fine-tuning.
– **Parameter Size Sensitivity:** The performance impact varied across different parameter sizes, with the 3B model showing an improvement.
– **Further Investigation Needed:** To understand this phenomenon better and mitigate it, further research is required into why smaller models are more susceptible to such declines and how to stabilize instruction-following mechanisms.

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

Name
Scroll to Top