For makers and artists building custom career assistants, the latest release from the build-small hackathon offers a stark alternative to generic search engines. Instead of flooding a user with fifty irrelevant results, this system delivers a tight shortlist where every recommendation is backed by transparent, defensible reasoning. You can see exactly why the model ranks a specific role higher than another, reading through the internal logic that weighs skills, experience, and seniority.
How the system operates
The workflow breaks down into three distinct phases:
- Queries. The model ingests a user’s resume alongside their specific preferences—such as job type, work modality, location, and free-form notes—to draft a small set of LinkedIn-style search queries. It verbalises its reasoning as it constructs these queries.
- Search. These queries are executed sequentially against LinkedIn via JobSpy.
- Scoring. For each returned posting, the model analyses the pair of resume and job description to generate a five-dimension fit score:
- Skills match
- Experience relevance
- Education and certifications
- Industry or domain fit
- Seniority alignment
The output is not a broad list of roles. It is a curated shortlist with clear justification. You can read the model’s explanation for why the second-ranked job outperforms the third.
Technical architecture
Dataset curation: The teacher and the student
The “teacher” model is DeepSeek V4 Pro. It excels at structured reasoning, adheres strictly to output schemas, and is cost-effective enough to process a large corpus offline. Crucially, it acts as a label generator rather than a dependency during inference.
The “student” is Qwen3-8B. When quantised to Q4_K_M, it is small enough to run on a single ZeroGPU slice yet large enough to absorb the teacher’s structured judgement.
The training corpus was generated through a closed, resume-aware loop:
- Resumes: 2,500 examples built on the Divyaamith/Kaggle-Resume dataset.
- Queries: The teacher first drafted LinkedIn-shaped search queries from each resume.
- Jobs: JobSpy then scraped LinkedIn for the actual results of those queries. Approximately 10,000 postings were collected, every one surfaced by a query the teacher wrote specifically for that resume.
- Labels: The teacher scored every resulting (resume, job) pair across the same five dimensions used at inference, providing one sentence of reasoning per dimension.
The entire project ships in four foreign-key-clean configurations located at build-small-hackathon/job-search-distill.
Training (Modal)
Two LoRA SFT runs were executed on a single A100 via Modal, one for each task:
- Adapter: Rank 16, alpha 16, dropout disabled, targeting attention plus MLP projections.
- Schedule: One epoch per task. Mid-epoch checkpoints were saved every 200 steps to allow sanity checks before the full run completed.
- Output: Safetensors files at
build-small-hackathon/job-searcher-qwen3-8B, alongside Q4_K_M base and LoRA-GGUF sidecars atbuild-small-hackathon/job-searcher-qwen3-8B-gguffor the llama.cpp serving path.
LoraConfig( r=16, lora_alpha=16, task_type="CAUSAL_LM", target_modules=[ "q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", ], )The Space: Inference (llama.cpp)
The deployment runs
llama-cpp-pythonwith a pre-built CUDA wheel on a HuggingFace ZeroGPU Space. Two design choices are critical:
Llamainside@spaces.GPU. ZeroGPU recycles the CUDA context per call; a module-level instance would retain a dead context on subsequent uses.- One GPU call per submission, not per job. All fit evaluations for a single submission run inside one
@spaces.GPUcall. The model loads once and yields events for every job, avoiding fresh cold starts and proxy-token requests for each posting.
Streaming uses the OpenAI-shaped create_chat_completion(stream=True) so the reasoning appears in the UI token by token. The live demo is available at build-small-hackathon/job-search-assistant.
The traces
The complete Claude Code session used to build this Space is published as a HuggingFace agent-traces dataset at build-small-hackathon/job-search-assistant-agent-trace. It contains raw JSONL events, the native HuggingFace trace viewer, and records of every dead end and recovery. This is useful if you want to see how the system was actually assembled rather than the cleaned-up version.
Try it
Upload your resume at huggingface.co/spaces/build-small-hackathon/job-search-assistant and stop sifting through noise.
What was learned
Two adapters outperformed one. An attempt to fold query generation and fit evaluation into a single LoRA caused the model to leak formatting in both directions—JSON for the query task and prose for the evaluation. Splitting them into two heads on the same base, hot-swapped per call, eliminated this class of bugs.
The teacher’s prompt mattered more than the student’s size. Rewriting the teacher’s labelling prompt to score against specific resume details (“four years of Rust; the role asks for five” instead of “strong technical match”) propagated through distillation. The student adopted the same habit.
Key takeaways
- Separate query generation and evaluation into distinct LoRA adapters to prevent formatting leakage and improve reliability.
- Refine the teacher model’s labelling prompt with specific resume details to ensure the student model learns precise, defensible reasoning.
- Optimise inference by loading the model once per submission and processing all job evaluations within a single GPU context to reduce latency.
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.



