Got local Qwen 3.5/3.6 generating meeting summaries entirely offline on an M4 Max. Demo with Wi-Fi off. This is the future.

I’m the founder behind Hedy, an AI meeting app. I’m a huge supporter of Local AI, and we’ve been working on making…

By AI Maestro May 14, 2026 3 min read
Got local Qwen 3.5/3.6 generating meeting summaries entirely offline on an M4 Max. Demo with Wi-Fi off. This is the future.

Got local Qwen 3.5/3.6 generating meeting summaries entirely offline on an M4 Max. Demo with Wi-Fi off. This is the future.

I’m the founder behind Hedy, an AI meeting app. I’m a huge supporter of Local AI, and we’ve been working on making it “consumer friendly”. Speech recognition in Hedy has always run on-device (whisper.cpp and now also parakeet). What just shipped is that the rest of the AI pipeline (summaries, detailed notes, chat with the meeting, live coaching) can now run on-device too using llama.cpp. Wi-Fi off, nothing leaves the laptop. Video above shows the full flow.

A few technical specifics:

  • Models supported out of the box. Qwen 3.6, Qwen 3.5, and Gemma 4 families. Range goes from 2B at the low end (works on newer iPhones) through 9B Qwen 3.5 as the sweet spot for most laptops, up to the newest Qwen 3.6 at 27B and 35B for users with more VRAM. Multiple quantization levels per model. On the 9B Qwen for example, you can pick between Q4 and Q8 depending on memory headroom.
  • Bring your own model. You can download any compatible GGUF model from Hugging Face and load it into Hedy. Not restricted to the curated list. This was a deliberate call. The local AI space moves fast and we don’t want users stuck waiting for us to update the bundled options.
  • Acceleration. Metal on Apple Silicon, Vulkan on Windows GPUs, CPU fallback when needed. Mac unified memory means total system RAM is the constraint. Windows is VRAM-bound and the picker tells you when layers will spill to CPU.
  • The app surfaces fit. Before you download a model, the picker tells you whether it’ll be a great fit, a tight fit, or won’t fit your hardware. It also shows current memory footprint so you know what headroom you have. No silent OOMs.
  • Honest tradeoffs:
    • Cloud is still faster and higher quality for many use cases. Local is opt-in. The 27B+ parameter models roughly match the quality of our cloud models.
    • No silent cloud fallback. If local fails, you see an error. That was a deliberate call.
    • Mobile is restricted to the smallest models (iPhone 15 Pro and later, plus M-series iPads). Older devices don’t see the toggle.
    • Android and Web are on the roadmap but not ready. Hardware variation on Android is too much to deliver a consistent experience today.
    • Automatic Suggestions are heavy (since it runs inference very frequently during the meeting). The app prompts you to disable them during local sessions.

On the demo specifically: That was an M4 Max running Qwen 3.5 4B (needed to prioritize speed for the demo). The summary in the video took about 15 seconds for a ~10-minute meeting transcript. Your mileage will vary by model size and hardware.

Happy to answer questions about model selection, the BYO setup, integration challenges, or anything else technical. Staying in the thread for a few hours.

Key Takeaways

  • The app now supports running the entire AI pipeline locally on an M4 Max without any network connection.
  • Users can bring their own models from Hugging Face, allowing for flexibility and customizability.
  • The app provides tools to help users understand how well a model will fit their hardware and memory constraints.
  • While local processing is faster than cloud services, it’s still an opt-in feature due to the current limitations in mobile environments.
  • Motion graphics are used to demonstrate the flow of features from end-to-end without any network interruptions.
  • The app includes options for users to disable automatic suggestions during local sessions to avoid potential performance issues.
  • For now, Android and Web support are not ready due to hardware variation across different devices.

Originally published at reddit.com. Curated by AI Maestro.

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

Name
Scroll to Top