Students who use AI to do their homework get better marks on the assignments but lose up to 24 percent of their scores on exams, with the full extent of the decline appearing only two years after they start.
In this article
Researchers in central China tracked 30 months of data from more than 26,000 secondary school students across a county with over one million residents. The dataset includes monthly exam results, homework completion times, and high-stakes entrance exam scores for high school and college.
Self-reported use of the technology grew from near zero to roughly 80 percent during the study period. A sharp increase coincided with the release of DeepSeek V2.5 in September 2024 and DeepSeek R1 in January 2025. The most frequently used tools were Doubao, DeepSeek, ChatGLM, Ernie Bot, and Qwen.
The study relies on the fact that students discovered these tools at different times. The authors applied a difference-in-differences method to measure changes in the group using AI before and after the intervention, then subtracted the changes seen in a comparison group that had not yet adopted the technology. This approach assumes both groups would have developed similarly without the tool.
Better homework, worse test scores
Six months after first using AI, homework scores rose by 18 percent while the average time spent on each assignment fell from 64 to 45 minutes. At the same time, scores on monthly closed-book exams dropped by 20 percent.
The impact on high-stakes entrance exams was equally large but developed more slowly. Regular exam performance declined within half a year, yet the full effect on entrance exams took about two years to surface, ranging from an 18 to 24 percent drop. Short-term research therefore misses the long-term cost to learning, the researchers note.
Four out of five long-term users show signs of outsourcing
After more than five months of AI use, roughly 81 percent of students finished their homework in under 50 minutes, faster than even the quickest non-users. They achieved high homework grades but performed poorly on exams. The combination of short completion times, high homework marks, and low exam scores suggests these students were outsourcing their work to AI, the authors write.
AI users who spent a similar amount of time on homework as their non-AI classmates scored just as well on exams while also earning better homework grades. This group showed no sign of positive selection based on prior performance, meaning they were not simply better students to begin with, and AI is not harmful by default. It causes damage mainly when it replaces independent thinking.
Social sciences take the biggest hit
Social science subjects like politics and geography saw an average decline of 27 percent, STEM subjects 22 percent, English 17 percent, and Chinese 9 percent. This matters because most previous experiments have focused on math, programming, and foreign languages.
The effects also varied sharply across student groups. Younger students in lower secondary school lost more than older ones (24 versus 17 percent), and boys were hit harder than girls (21.6 versus 18.4 percent), which the study attributes mainly to heavier AI use among boys.
Top performers suffered the most, with the top third seeing a minus 24 percent effect compared to minus 16 percent in the bottom third. A dose-response pattern emerged as well. Students using AI for up to one hour per week lost about 5 percent, while those using it five hours or more lost 30 percent.
Why almost no one is pushing back
The estimated learning penalty fell from about 25 percent in early 2023 to 16 percent by June 2025. The decline also showed up in a fixed group of early adopters, suggesting some degree of adaptation by students and teachers, but the losses have not gone away.
The study explains why the reaction has been muted. Teachers typically see students in only one subject, where a 20 percent grade drop is not unusual on its own. The aggregate effect on the county average did not reach about minus 10 percent until June 2025 because few students had been using AI long enough for the damage to accumulate. Students themselves often do not connect the dots, mistaking the mental effort of independent learning for a sign that they are learning poorly.
As countermeasures, the study suggests giving students credible information about the long-term costs of outsourcing, putting more weight on in-person exams, and tracking completion time instead of homework grades. AI erodes the value of homework as a signal, and among AI users with above-average homework scores, higher homework grades actually predict worse exam results.
Anthropic researcher Andrej Karpathy has argued that schools should stop trying to police AI-generated homework and instead shift the majority of grading to in-class work. His reasoning aligns with what this study found. When students know they will be tested without AI, they stay motivated to actually learn the material.
The pattern lines up with recent findings from other settings. An Anthropic study recently showed that participants who learned new programming skills with AI help scored 17 percent worse on follow-up knowledge tests than the control group, without saving any real time. The results depended on how people used the tool. Those who simply copied AI answers performed worse, while those who used AI to better understand the tasks did not see the same decline.
A study by the Swiss Business School found a negative link between AI use and critical thinking. A separate study by researchers at several American and British universities showed that people who treat AI mainly as an answer machine lose cognitive skills the fastest.
A UC Berkeley study analyzing more than 500,000 grades also showed that the share of top A grades in writing- and programming-heavy courses has risen by 13 percentage points since ChatGPT launched. There, too, the effect was concentrated on unsupervised homework, while proctored exams showed no comparable gains.
What it means
The data suggests that homework grades are becoming a poor indicator of actual learning when AI is involved. Schools may need to rely more on supervised assessments to ensure students are retaining knowledge rather than just generating text.




