**I taught my 1B to follow instructions. It got worse at following instructions…**
A recent experiment conducted by a user on Reddit has revealed some unexpected results in the realm of instruction-following for large language models (LLMs). The user trained three models starting from 1 billion parameters down to just 3 billion, each using similar methods but varying learning rates and model sizes. The initial findings were mixed: while both larger models (2B and 3B) improved their performance in following instructions as measured by the Instruction Following Evaluation (IFEval), a smaller model (1B) actually saw its ability to follow instructions deteriorate.
This outcome is particularly concerning because instruction-following was one of the primary goals for these models. The user hypothesized that this could be due to either insufficient capacity or an issue with how the SFT (Self-Supervised Fine-Tuning) method works at such a reduced scale. To further investigate, they plan to run another experiment on their 2B model using a lower learning rate.
**Why does this matter?**
This news highlights several critical issues in LLM development and deployment:
– **Instruction Robustness:** Models need robust mechanisms to ensure they adhere strictly to the provided instructions without unintended side effects.
– **Model Size Sensitivity:** The performance of instruction-following can vary significantly with model size, indicating that smaller models might struggle more with these tasks.
– **Methodological Questions:** There are questions about how SFT methods scale and perform across different parameter sizes. Understanding these nuances is crucial for improving the reliability and trustworthiness of LLMs.
**Takeaways:**
– **Further Investigation Needed:** The inconsistency observed in 1B’s performance suggests that more research is required to understand why smaller models might struggle with instruction-following.
– **Methodological Flexibility:** Experimentation across different learning rates could provide insights into which methods are most effective for fine-tuning LLMs at various scales.
– **Robustness Metrics:** Developing and validating robust metrics to measure a model’s ability to follow instructions reliably is essential.
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.




