[NEW MODEL] Supra-50M, the first step on SupraLabs scaling up plan.

“`html

[NEW MODEL] Supra-50M, the first step on SupraLabs scaling up plan.

Supra-50M: The First Step in SupraLabs’ Scaling Up Plan

Supra-50M is a compact 50M-parameter BASE and INSTRUCT version causal language model developed by SupraLabs. It was trained from scratch using a Llama-style architecture on 20 billion tokens of high-quality educational web text.

Key Features

Base Model: Achieves competitive or superior results on several key benchmarks with just 50 million parameters.
Instruct Version: Supports instruction-following tasks, demonstrating its utility for a variety of applications including coding and reasoning.

Benchmarks

Benchmark	BLiMP (linguistics)	SciQ (science)	ARC-Easy (knowledge)	PIQA (logic)	HellaSwag (context)
Model	Supra-50M (ours)	GPT-2 (124M)	SmolLM-135M	OpenELM-270M
Parameters	50M	124M (2.5×)	135M (2.7×)	270M (5.4×)
BLiMP	76.3%	63.0%	69.8% (N/A)	84.70%	46.71%
SciQ	77.2%	53.2%	73.4%	84.70%	46.71%
ARC-Easy	52.2%	42.0%	49.2%	45.08%	46.71%
PIQA	62.2%	63.0%	67.3%	69.75%	46.71%
HellaSwag	31.8%	29.5%	42.0%	46.71%	46.71%

Model Architecture & Hyperparameters

Arcitecture: Llama (decoder-only transformer)
Parameters: ~50M
vocab_size: 32,000
hidden_size: 512
intermediate_size: 1,408
num_hidden_layers: 12
num_attention_heads: 8
max_position_embeddings: 1,024
rope_theta: 10,000
tie_word_embeddings: True

Training Data

Dataset: [HuggingFaceFW/fineweb-edu] (sample-100BT split)
Total tokens: 20,000,000,000 (20B)
Sequence length: 1,024 tokens
Storage format: Memory-mapped binary (uint16, ~40 GB)

Tokenizer

Type: ByteLevelBPETokenizer
Vocabulary size: 32,000
Min frequency: 2
Special tokens: <s>, <pad>, </s>, <unk>, <mask>

Training Configuration

Epochs: 1
Per-device batch size: 32
Gradient accumulation steps: 4
Effective batch size: 128 × 1,024 tokens
Learning rate: 6e-4
LRScheduler: Cosine
Warmup ratio: 2%
Optimizer: AdamW Fused (`adam_beta1=0.9`, `adam_beta2=0.95`)
Weight decay: 0.1
Max grad norm: 1.0
Precision: bfloat16
Hardware: Single GPU
Final loss: *3.259*

Inference

“`python
import os, warnings
os.environ[‘TF_CPP_MIN_LOG_LEVEL’] = ‘3’
warnings.filterwarnings(“ignore”, category=UserWarning, module=”transformers”)
from transformers import pipeline, AutoTokenizer, logging
logging.set_verbosity_error()
MODEL_ID = “SupraLabs/Supra-50M-Instruct”
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, clean_up_tokenization_spaces=False)
pipe = pipeline(
“text-generation”,
model=MODEL_ID,
device_map=”auto”,
torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32
)
def build_prompt(instruction, input_text=””):
if input_text.strip():
return (
“\n\n### Instruction:\n”
+ instruction
+ “\n\n### Input:\n”
+ input_text
+ “\n\n### Response:”
)
else:
return (
“\n\n### Instruction:\n”
+ instruction
+ “\n\n### Response:”
)
def generate(instruction, input_text=””):
result = pipe(
build_prompt(instruction, input_text),
max_new_tokens=512,
do_sample=True,
temperature=0.7,
top_k=50,
top_p=0.9,
repetition_penalty=1.15,
pad_token_id=pipe.tokenizer.pad_token_id,
eos_token_id=pipe.tokenizer.eos_token_id
)
return result[0][‘generated_text’].strip()
while True:
print(“\nEnter an instruction (or ‘exit’ to quit):”)
user_input = input().strip()
if user_input.lower() == “exit”:
break
print(“\nEnter additional context (optional, press Enter to skip):”)
context_input = input().strip()
response = generate(user_input, context_input)
print(f”\nResponse:\n{response}\n”)
“`

A custom Byte-Level BPE tokenizer was trained from scratch on 500,000 documents sampled from `fineweb-edu (sample-10BT)`.

Sample Outputs

**Prompt:** The main concept of physics is
> The main concept of physics is iffy, and the idea that we can make things behave in a certain way. The most important part of physics is called quantum mechanics which states that all particles are made up of energy (energy) and matter (matter). In physics, there are two types of particles: elementary particles and exotic ones. These particles have properties like mass, speed or momentum but they don’t interact with each other to form new objects. This is because these particles do not exist independently from one another. In this case, an exotic particle might be created by adding more energy into its structure than it would take for a normal particle. However, when you add additional energy to an exotic particle, the new object will become smaller and larger until it becomes too large to fit within the existing structure. If you think about how light travels through space, it takes around 20 billion years before the light reaches our eyes. Light waves travel faster than light at high speeds so if we could create some kind of light wave, then we wouldn’t need any special equipment. It just needs a few hundred millionths of a second to produce light rays. So even though the light is moving along the same path as the current, the speed of light is different depending on where the light hits the
**Prompt:** Artificial intelligence is
> Artificial intelligence is iffy, it can be used to make intelligent machines that could take over the world. What does Artificial Intelligence mean? AI refers to artificial intelligence and machine learning technology which is a type of computer science (also known as artificial intelligence) in which computers are programmed with knowledge about their environment or other objects. The term AI comes from the Greek word “art” meaning “to create.” The most common uses for AI include: – Machine Learning This means using algorithms like natural language processing systems to learn how words work together to form sentences such as “I am going to go to the store.” These programs will then use
Source Read original →
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.
Please enable JavaScript in your browser to complete this form.
Name Email
Name
First
Last
Email
AI Maestro is an independent British AI publication. We test what we recommend. More about us →

[NEW MODEL] Supra-50M, the first step on SupraLabs scaling up plan.

Supra-50M: The First Step in SupraLabs’ Scaling Up Plan

Key Features

Benchmarks

Model Architecture & Hyperparameters

Training Data

Tokenizer

Training Configuration

Inference

Sample Outputs

Empowering Businesses with AI — Smart Tools, Smarter Business Decisions.

follow us

Popular Tag

Popular Post

Alphabet plans to raise…

Nvidia chases $200B CPU…

Kaximia on channeling aggression,…