A research team that includes Huawei Technologies says it has successfully used the firm’s Ascend 910C chips to complete post-training for the DeepSeek-V4-Pro model, marking a major step forward as China’s semiconductor industry tries to leap from supporting basic AI inference to more complex model training amid tightening US sanctions.
While Chinese chipmakers have found success in supporting AI inference – the relatively simple process of running an already-finished model to answer user prompts – they have struggled with training, the far more complex process of building or refining a model’s brain.
If initial “pre-training” teaches a model how to speak by absorbing massive amounts of data, post-training teaches it how to work by following human instructions, safety rules and specific tasks.
To achieve this, the researchers ran DeepSeek’s largest model to date – boasting 1.6 trillion parameters – on a computing cluster powered by at least 1,000 Huawei chips, according to a social media post from the Shenzhen government on Friday.
The team successfully conducted “full-parameter” post-training, meaning the model’s entire architecture was updated and refined without cutting corners, the post said.
Previously, domestic computing power was primarily used for inference, “much like building a one-way road for the model: input a question, output an answer”, the post explained. The project, however, allowed a model to self-reflect and adjust.
This added “complex flyovers and loops to that one-way road, instantly multiplying the computational and communication demands by several times”, it added.
The exploration – jointly conducted by Huawei, the Shenzhen Loop Area Institute, the Shenzhen campus of Harbin Institute of Technology and Shenzhen Research Institute of Big Data – “will help enhance the self-reliance of China’s AI industry chain”, the post said.
Because full pre-training from scratch requires massive infrastructure and months of compute time, many AI teams opt to take open-source models and customise them via post-training instead.
However, the more complicated training processes have historically relied almost entirely on restricted hardware from US chip giants like Nvidia and Advanced Micro Devices, even though Nvidia’s H200 chips were cleared for export by Washington but have not yet been approved for sale in China.
When the open-source DeepSeek-V4 was launched in April, local chip firms including Huawei, Moore Threads and Cambricon Technologies rushed to announce “day-zero” compatibility for inference.
However, DeepSeek has remained tight-lipped about the hardware stack used to train V4 from scratch. Its predecessor, DeepSeek-V3, was trained on a cluster of 2,048 Nvidia H800 processors – chips that are now restricted under US export controls.
The latest trial on Huawei hardware proved both stable and effective, according to the team. The model completed more than 1,500 training iterations without a single interruption or error, while the process also improved the model’s mathematical capabilities, according to an announcement by the Shenzhen Loop Area Institute in May.
While US restrictions on access to advanced chips from American semiconductor giants have slowed Chinese AI model development, they have also forced domestic rivals to try and fill the gap. Some Chinese firms have been experimenting with using domestic chips for model training.
Last month, Baidu executive vice-president Shen Dou said the training of a major version of the firm’s Ernie 5.1 model had been successfully completed on a cluster powered by its Kunlunxin chip unit. But he did not specify which training process its chips were involved in.
In April, Chinese on-demand services group Meituan invited users to test a new trillion-parameter AI model, which local reports said was trained entirely on domestically produced chips.
Meanwhile, Huawei has pushed forward with AI’s agentic capabilities, the ability to perform tasks other than responding to chatbot queries. On Friday, the company’s cloud unit unveiled a new “Agentic Infra” paradigm, which includes new infrastructure such as a platform to allocate compute power for inference and training that can increase resource utilisation by more than 30 per cent.