Follow-up: adding Ollama support to my open-source cursor-aware AI app - looking for beta testers with vision-capable local models

Follow-up: adding Ollama support to my open-source cursor-aware AI app – looking for beta testers with vision-capable local models

I’ve added support for Ollama as a first-class built-in provider in the upcoming v1.2.0 release of AIPointer. This implementation now supports:

Auto-detection on localhost:11434
Model dropdown populated from /api/tags
Vision + text input pipeline (region screenshot routes to vision model)
Tool calling for AIPointer’s 10 built-in tools (fetch_url, open_url, search_web, play_music, set_volume, copy_to_clipboard, read_clipboard, launch_app, save_document, reveal_in_finder)
Per-model timeout (uncapped option for large models on slower hardware)
Same config UX as the cloud providers, just point it at Ollama, pick model, done

I’ve received helpful feedback from this community regarding fast vision-capable local models. I’m now implementing support for Ollama and will need beta testers to help with testing.

What We Need From Beta Testers

M-series Mac (M1/M2/M3/M4, Pro/Max/Ultra) – measuring TTFT against Gemini 2+3 Flash cloud baseline
RTX 3090, 4090, or 5090 on Windows or Linux – same baseline
AMD GPU on Linux (ROCm) – would love to know if this works at all
16GB-class VRAM cards – checking what’s the realistic model ceiling
Mac mini M4 or M4 Pro – fastest consumer Apple Silicon, want to see TTFT

To participate in the beta testing, please:

Install AIPointer (signed + notarized on Mac, NSIS on Windows, AppImage on Linux)
Point it at your local Ollama, pick a vision model (Qwen2.5-VL, MiniCPM-V, Llama 3.2 Vision, Pixtral, whatever you already have running)
Use it for 30-60 minutes of normal daily tasks – screenshots, region queries, tool calls
Send back: TTFT numbers, model + quant + hardware, what worked, what didn’t, any tool-call failures

I’ll fold the feedback into the v1.2.0 release notes and credit testers/contributors if you want. If we find that one model + one inference setup consistently delivers sub-2s TTFT with reliable tool calls on consumer hardware, that becomes the recommended default in onboarding.

This is not meant to compete with any other systems; I’m building this to provide a local-inference option for people in this community. If you’re interested in participating or need more information, please let me know via DM.

Source Read original →

Follow-up: adding Ollama support to my open-source cursor-aware AI app – looking for beta testers with vision-capable local models