OpenAI engineers informed colleagues earlier this month that they reduced inference costs for guest ChatGPT users by more than half. The Information reports this based on a person familiar with the internal discussions. These optimisations apply specifically to visitors without accounts, dropping the number of Nvidia GPUs required to serve them to just a few hundred. It is unclear how many chips were needed previously or what specific techniques achieved this reduction. Guest users access only a limited set of features, so it remains uncertain whether these savings will extend to the full product.
Deepseek recently released an open-source method that can accelerate inference requests by 60 to 85 percent. Freed resources might scale services, improve models, speed up responses, or increase margins. However, slow data centre buildouts mean such gains will likely provide labs with more breathing room rather than reduce chip demand. The move highlights operational efficiency over hardware acquisition.
- Cost reduction applies only to unregistered visitors
- GPU count for guest traffic fell to a few hundred
- Hardware demand is not expected to drop significantly




