[NEW MODEL] Supra-50M, the first step on SupraLabs scaling up plan.

“`html [NEW MODEL] Supra-50M, the first step on SupraLabs scaling up plan. Supra-50M: The First Step in SupraLabs’ Scaling Up Plan Supra-50M…

By AI Maestro May 22, 2026 4 min read
[NEW MODEL] Supra-50M, the first step on SupraLabs scaling up plan.

“`html




[NEW MODEL] Supra-50M, the first step on SupraLabs scaling up plan.

Supra-50M: The First Step in SupraLabs’ Scaling Up Plan

Supra-50M is a compact 50M-parameter BASE and INSTRUCT version causal language model developed by SupraLabs. It was trained from scratch using a Llama-style architecture on 20 billion tokens of high-quality educational web text.

Key Features

  • Base Model: Achieves competitive or superior results on several key benchmarks with just 50 million parameters.
  • Instruct Version: Supports instruction-following tasks, demonstrating its utility for a variety of applications including coding and reasoning.

Benchmarks

ModelSupra-50M (ours)GPT-2 (124M)SmolLM-135MOpenELM-270M
Parameters50M124M (2.5×)135M (2.7×)270M (5.4×)
BenchmarkBLiMP (linguistics)SciQ (science)ARC-Easy (knowledge)PIQA (logic)HellaSwag (context)
BLiMP76.3%63.0%69.8% (N/A)84.70%46.71%
SciQ77.2%53.2%73.4%84.70%46.71%
ARC-Easy52.2%42.0%49.2%45.08%46.71%
PIQA62.2%63.0%67.3%69.75%46.71%
HellaSwag31.8%29.5%42.0%46.71%46.71%

Model Architecture & Hyperparameters

  • Arcitecture: Llama (decoder-only transformer)
  • Parameters: ~50M
  • vocab_size: 32,000
  • hidden_size: 512
  • intermediate_size: 1,408
  • num_hidden_layers: 12
  • num_attention_heads: 8
  • max_position_embeddings: 1,024
  • rope_theta: 10,000
  • tie_word_embeddings: True

Training Data

  • Dataset: [HuggingFaceFW/fineweb-edu] (sample-100BT split)
  • Total tokens: 20,000,000,000 (20B)
  • Sequence length: 1,024 tokens
  • Storage format: Memory-mapped binary (uint16, ~40 GB)

Tokenizer

  • Type: ByteLevelBPETokenizer
  • Vocabulary size: 32,000
  • Min frequency: 2
  • Special tokens: <s>, <pad>, </s>, <unk>, <mask>

Training Configuration

  • Epochs: 1
  • Per-device batch size: 32
  • Gradient accumulation steps: 4
  • Effective batch size: 128 × 1,024 tokens
  • Learning rate: 6e-4
  • LRScheduler: Cosine
  • Warmup ratio: 2%
  • Optimizer: AdamW Fused (`adam_beta1=0.9`, `adam_beta2=0.95`)
  • Weight decay: 0.1
  • Max grad norm: 1.0
  • Precision: bfloat16
  • Hardware: Single GPU
  • Final loss: *3.259*

Inference

“`python
import os, warnings
os.environ[‘TF_CPP_MIN_LOG_LEVEL’] = ‘3’
warnings.filterwarnings(“ignore”, category=UserWarning, module=”transformers”)

from transformers import pipeline, AutoTokenizer, logging
logging.set_verbosity_error()

MODEL_ID = “SupraLabs/Supra-50M-Instruct”
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, clean_up_tokenization_spaces=False)
pipe = pipeline(
“text-generation”,
model=MODEL_ID,
device_map=”auto”,
torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32
)

def build_prompt(instruction, input_text=””):
if input_text.strip():
return (
“\n\n### Instruction:\n”
+ instruction
+ “\n\n### Input:\n”
+ input_text
+ “\n\n### Response:”
)
else:
return (
“\n\n### Instruction:\n”
+ instruction
+ “\n\n### Response:”
)

def generate(instruction, input_text=””):
result = pipe(
build_prompt(instruction, input_text),
max_new_tokens=512,
do_sample=True,
temperature=0.7,
top_k=50,
top_p=0.9,
repetition_penalty=1.15,
pad_token_id=pipe.tokenizer.pad_token_id,
eos_token_id=pipe.tokenizer.eos_token_id
)
return result[0][‘generated_text’].strip()

while True:
print(“\nEnter an instruction (or ‘exit’ to quit):”)
user_input = input().strip()
if user_input.lower() == “exit”:
break

print(“\nEnter additional context (optional, press Enter to skip):”)
context_input = input().strip()

response = generate(user_input, context_input)
print(f”\nResponse:\n{response}\n”)
“`

A custom Byte-Level BPE tokenizer was trained from scratch on 500,000 documents sampled from `fineweb-edu (sample-10BT)`.

Sample Outputs

**Prompt:** The main concept of physics is

> The main concept of physics is iffy, and the idea that we can make things behave in a certain way. The most important part of physics is called quantum mechanics which states that all particles are made up of energy (energy) and matter (matter). In physics, there are two types of particles: elementary particles and exotic ones. These particles have properties like mass, speed or momentum but they don’t interact with each other to form new objects. This is because these particles do not exist independently from one another. In this case, an exotic particle might be created by adding more energy into its structure than it would take for a normal particle. However, when you add additional energy to an exotic particle, the new object will become smaller and larger until it becomes too large to fit within the existing structure. If you think about how light travels through space, it takes around 20 billion years before the light reaches our eyes. Light waves travel faster than light at high speeds so if we could create some kind of light wave, then we wouldn’t need any special equipment. It just needs a few hundred millionths of a second to produce light rays. So even though the light is moving along the same path as the current, the speed of light is different depending on where the light hits the

**Prompt:** Artificial intelligence is

> Artificial intelligence is iffy, it can be used to make intelligent machines that could take over the world. What does Artificial Intelligence mean? AI refers to artificial intelligence and machine learning technology which is a type of computer science (also known as artificial intelligence) in which computers are programmed with knowledge about their environment or other objects. The term AI comes from the Greek word “art” meaning “to create.” The most common uses for AI include: – Machine Learning This means using algorithms like natural language processing systems to learn how words work together to form sentences such as “I am going to go to the store.” These programs will then use

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

Name
Scroll to Top