“`html

Import AI 454

Import AI 454: Automating Alignment Research; Safety Study of a Chinese Model; HiFloat4

Huawei’s HiFloat4 training format beats Western-developed MXFP4 in Ascend chip bakeoff:

Could this also be a symptom of the impact of export controls in driving Chinese interest towards maximizing training and inference efficiency? Perhaps…
Huawei researchers have tested out HiFloat4, a 4-bit precision format for AI training and inference, against MXFP4, an Open Compute Project 4-bit format. They found that HiFloat4 is superior to MXFP4 in reducing loss error on Huawei Ascend chips compared to a full-precision baseline.
In this paper, the authors train three model types (OpenPangu-1B, Llama3-8B, and Qwen3-MoE-30B) on Huawei Ascend chips. They found that HiFloat4 achieves lower relative loss compared to MXFP4 for all models tested.
This research suggests that Chinese companies are continually trying to maximize the efficiency of their homegrown chips by developing low-precision data formats coupled with their own hardware platforms, a strategy influenced by export controls.

What they tested:

In this study, Huawei researchers compare HiFloat4, an even lower precision version of HiFloat8, to MXFP4. They train three model types on Huawei Ascend chips and find that HiFloat4 performs better than MXFP4 in reducing loss error for all models tested.
The authors write: “We conduct a systematic evaluation of the HiFloat4 (HiF4) format and show that it achieves lower relative loss (≈ 1.0%) compared to MXFP4 (≈ 1.5%) when against a full-precision baseline.”
For larger models, HiFloat4 consistently achieves significantly lower relative error than MXFP4. For Llama and Qwen, HiFloat4 gets within ~1% of the full precision loss with only RHT as a stabilization trick, while MXFP4 needs RHT + stochastic rounding + truncation-free scaling to get to ~1.5%.

Why this matters, symptom of hardware maturity, and a possible influence of export controls:

This research indicates that HiFloat4 is an even lower precision version of HiFloat8, which generally maps to the fact that Huawei (and Chinese chipmakers in general) are continually trying to maximize efficiency out of their chips. This comes against the broader background of export controls where China is being starved of frontier compute due to not being able to access H100s etc in large volume, thus making it even more valuable to improve the efficiency of its homegrown chips by carefully developing low-precision formats tailored for its own hardware.
Read more: HiFloat4 Format for Language Model Pre-training on Ascend NPUs (arXiv).

Anthropic shows how to automate AI safety research:

For many people working in AI, the ultimate goal is to automate the art of AI research itself. Now, researchers with the Anthropic Fellows Program and Anthropic have published some early warning signs that automating AI research is possible today, though many caveats apply.
In this study, they ask: can Claude develop, test, and analyze alignment ideas of its own? They succeed in creating autonomous AI agents that propose ideas, run experiments, and iterate on an open research problem: how to train a strong model using only a weaker model’s supervision. These agents outperform human researchers, suggesting that automating this kind of research is already practical.
The researchers found that the most effective method from their AAR project didn’t lead to a statistically significant improvement when applied to Claude Sonnet 4 with Anthropic’s production training infrastructure.

Why this matters, a very early sign that AI research itself could be automated:

This research suggests that “automated research on outcome-gradable problems is already practical,” the authors note. The key bottleneck for alignment research is moving from proposing and executing ideas to designing evaluations: we should find the right metrics (data, models) that AARs can reliably hill-climb without overfitting. We are excited to apply automation to ambitious alignment research today.
The true question is at what point the machines can propose their own research directions effectively, which would remove the only meaningful role a human played in this research. At that point, it might not just be the expansion of a machine economy but the expansion of an entire machine civilization.
Read the blog: Automated Alignment Researchers: Using large language models to scale scalable oversight (Anthropic blog).
Read the paper: Automated Weak-to-Strong Researcher (Alignment Science Blog).

How are Chinese models different to American ones?

A group of researchers have tested out Kimi K2.5, probably the best large-scale open weight model available, and compared it to DeepSeek V3.2, as well as Claude Opus 4.5 and GPT 5.2.
Their results show that Kimi K2.5 has “similar dual-use capabilities to GPT 5.2 and Claude Opus 4.5, but with significantly fewer refusals on CBRNE-related requests.”

Who did it:

The research was conducted by people affiliated with Constellation, Anthropic Fellows Program, Brown University, University of Wisconsin-Madison, Imperial College London, University of Maryland, Georgia Institute of Technology, Bar Ilan University, University of Toronto, and the University of Oxford.

Main findings of interest:

CBRN: K2.5 is a bit more dangerous on bio tasks with a lower rate of refusals in response to queries that involve things like dangerous virology.
Cyber, K2.5 mostly seems like a decent but not expert cyber-model, with performance lagging behind the Western frontier models but significantly ahead of DeepSeek.
Alignment: “In the automated behavioral audit, it scores substantially higher than GPT-5.2 and Claude Opus 4.5 on misaligned behavior, sycophancy, harmful system-prompt compliance, and cooperation with human misuse”, suggesting that Kimi K2.5 has fewer safety issues.
Censorship: The model has a meaningfully higher refusal rate on Sensitive Chinese political topics compared to Claude Opus 4.5 and GPT-5.2 Pro, though less than DeepSeek V3.2. On the other hand, it doesn’t see the inverse test, running the model on Sensitive Western political topics.

Fine-tuning:

The researchers also demonstrate how with a small amount of compute they’re able to further strip away the (relatively minor but non-zero) safeguards built into Kimi K2.5: “Using less than $500 of compute and about 10 hours, an expert red-teamer reduced refusals on HarmBench from 100% to 5%. The final model was willing to give detailed instructions for how to construct bombs, select targets for terrorist attacks, and synthesize chemical weapons. Critically, the finetuned model appears to have retained nearly all of its capabilities.”

Key Takeaways

Huawei’s HiFloat4 format is superior to Western-developed MXFP4 in reducing loss error on Huawei Ascend chips.
Anthropic researchers have shown that it’s possible to automate AI safety research, with their autonomous agents outperforming human researchers.
Chinese models like Kimi K2.5 show fewer refusals on CBRNE-related tasks but may have more censorship issues compared to Western models.

“`

Source Read original →