MOOSE-Star (ICML 2026): 7B model + 108K-paper dataset for scientific hypothesis discovery

“`html A British researcher has shared a significant collection on Hugging Face, featuring an 7B model post-trained for scientific hypothesis discovery and…

By AI Maestro May 14, 2026 1 min read
MOOSE-Star (ICML 2026): 7B model + 108K-paper dataset for scientific hypothesis discovery

“`html

A British researcher has shared a significant collection on Hugging Face, featuring an 7B model post-trained for scientific hypothesis discovery and the dataset behind it. The paper was accepted at ICML 2026.

  • The collection includes MS-IR-7B, MS-HC-7B, and MS-7B models tailored for inspiration retrieval, hypothesis composition, and joint use, respectively. These are built on top of the base model DeepSeek-R1-Distill-Qwen-7B.
  • TOMATO-Star is a 108,717 paper dataset decomposed into (background, hypothesis, inspirations), with each inspiration anchored to a real citation. The dataset covers various scientific fields such as biology, chemistry, medicine, medical imaging, psychology, and cognitive science.
  • The model has been evaluated on its ability to retrieve scientific hypotheses. MS-7B and MS-IR-7B both outperformed previous models in terms of inspiration retrieval accuracy, with MS-7B achieving a 54.34% success rate compared to the base model’s 28.42%.

This work is significant as it provides a robust framework for AI systems to understand and generate scientific hypotheses, potentially aiding in fields such as drug discovery, disease diagnosis, and more.

“`

– Takeaways:
– The collection offers a new approach to using large language models (LLMs) for scientific hypothesis generation.
– It includes detailed metrics on model performance, which can be used to benchmark other similar systems.
– This work could have significant applications in various scientific research areas by automating or enhancing parts of the discovery process.


Originally published at reddit.com. Curated by AI Maestro.

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

Name
Scroll to Top