LeanLoop, the Tool Claude Leans on

So I bought a second graphics card the other week to get in on the local AI craze and I’ve been having the hardest time using it to build my website. It’s been unreliable, the context gets eaten up, kind of hallucinates sometimes. I had to double check everything it has been very tricky. I use the cloud models too, expensive, but they’re top quality. So the question becomes, how do I get the best of both worlds?

This is my answer to subsidizing cloud API costs with my local LLM with a qwen3.6 35B A3B running at 32k context.

Learn Like a Leaner

Claude or whatever is used for planning tasks, creating a leanfile with bite-sized tasks for the local AI to execute. The quality of the execution is ensured through unit tests that are run at the end of each task.
The -p <prompt> argument and file write capabilities are required for any agent CLI tool like Aider or qwencodeCLI to be compatible with LeanLoop. However, once these tools support these requirements, they can be dropped into the LeanLoop folder without needing additional configuration.
I plan on supporting a multi-threaded approach so users can send multiple tasks in parallel. This could involve running multiple local models or spawning multiple cloud LLMs for different purposes.
For fun, here is my run config for my dual RDNA2 GPU setup (6800 and 6700xt) which runs at around 60-70 tokens per second depending on context length. The server is hosted locally with the following command:

exec "$REPO/mtp-build/bin/llama-server" \
    -m "Qwen3.6-35B-A3B-UD-Q4_K_M.gguf" \
    --spec-type draft-mtp \
    --spec-draft-n-max 2 \
    -fa on \
    --no-mmproj \
    -ngl 50 \
    -ts 16,10 \
    -c 32000 \
    --parallel 1 \
    --host 127.0.0.1 --port 8080

I need your help in validating the leaners/ scripts. Pull requests are welcome as I only have qwen3.6 code installed and made a best guess on how the other leaner scripts might work.

submitted by /u/DiscipleofDeceit666

Key Takeaways

LeanLoop is designed to execute tasks with minimal context and guidance, ensuring that the local AI remains reliable.
The use of unit tests at the end of each task helps maintain quality control over the execution process.
I plan on supporting a multi-threaded approach in the future, allowing for parallel processing of multiple tasks or running different models concurrently.
For those interested, here is an example run configuration for my dual GPU setup with qwen3.6.
Pull requests are welcome to help validate and improve the leaners/ scripts.

Source Read original →