In April 2026, a research team from the Vrije Universiteit Brussel (VUB) in Belgium published a study that sent shockwaves through both the artificial intelligence and pure mathematics communities: ChatGPT-5.2, OpenAI's language model, generated an original mathematical proof for a geometry conjecture formulated in 2024 — a problem that no human mathematician had managed to formally solve until then.
This is not simply about a machine "getting a calculation right." It is about an AI system producing genuine mathematical reasoning, constructing chained logical arguments across multiple conversation sessions, and arriving at a conclusion that was subsequently verified and validated by human mathematicians. The method used was dubbed "vibe-proving" — a term that captures the collaborative and iterative nature of the process between human and machine.
This discovery raises profound questions about the role of artificial intelligence in scientific research, the limits of computational reasoning, and the future of mathematics as a human discipline.
What Happened
The process that led to the proof was neither instantaneous nor automatic. According to the study published by the VUB team, the proof emerged over the course of 7 chat sessions and 4 evolving argument drafts. The researchers did not simply ask ChatGPT to "solve this problem" and receive a ready-made answer. The work was iterative, collaborative, and deeply dependent on human interaction.
In the first session, the researchers presented the geometry conjecture to the model, providing the necessary mathematical context — definitions, related theorems, and previous proof attempts that had failed. ChatGPT-5.2 responded with an initial approach that, while incomplete, contained promising insights about the structure of the problem.
In subsequent sessions, the mathematicians refined their questions, pointed out flaws in the model's arguments, suggested alternative directions, and asked ChatGPT to explore specific lines of reasoning. With each iteration, the model produced more sophisticated versions of the argument, incorporating the researchers' corrections and suggestions.
The fourth draft contained the essential structure of the proof. The researchers then manually verified each logical step, confirmed the validity of the arguments, and filled in minor gaps that the model had left. The final result was a complete and rigorous proof that withstood scrutiny from the mathematical community.
The most remarkable aspect of the process was the division of labor: ChatGPT proved extraordinarily useful for searching and exploring proof paths — testing approaches, combining techniques from different areas of mathematics, and generating candidate arguments at a speed impossible for a human. The human researchers, meanwhile, were essential for correctness verification — ensuring that each logical step was valid, that there were no unjustified leaps, and that the proof as a whole was mathematically sound.
As one of the study's authors summarized: "AI was brilliant for search; humans were indispensable for verification."
The VUB discovery has profound implications for the future of mathematical and scientific research in general. For the first time, we have concrete evidence that language models can contribute substantively to the production of new mathematical knowledge — not merely verifying existing proofs or solving textbook exercises, but generating original arguments that advance the frontier of human knowledge.
This does not mean mathematicians are about to be replaced. On the contrary, the VUB study demonstrates that human-AI collaboration is far more powerful than either working alone. ChatGPT without human guidance produces arguments that frequently contain subtle errors or unjustified logical leaps. Mathematicians without AI assistance are limited by the speed of human thought and the biases of their specific training.
Together, however, they form a formidable team: AI explores the space of possibilities with superhuman speed and breadth, while the human mathematician filters, validates, and directs that exploration with rigor and intuition that the machine lacks.
Several research groups around the world are already adopting variations of the vibe-proving method to attack open problems in different areas of mathematics. There is expectation that, in the coming years, we will see a significant acceleration in the resolution of conjectures that have remained open for decades.
ChatGPT's achievement in mathematics is part of a broader trend of AI contributing to scientific discoveries. AlphaFold, from Google DeepMind, revolutionized biology by predicting protein structures. GNoME discovered millions of new crystalline materials. AI models are accelerating drug discovery, optimizing chemical reactions, and even predicting earthquakes with greater precision.
What makes the mathematics case special is that mathematics is often considered the "purest" domain of human reasoning — the discipline where creativity, intuition, and logic meet in their most abstract form. If AI can contribute here, some argue, it can contribute anywhere.
Others are more cautious, noting that mathematics is also the domain where verification is most rigorous. A proof is either correct or it is not — there is no room for "almost certain" or "probably true." This binary clarity makes mathematics an ideal field for human-AI collaboration: the machine generates candidates, the human verifies with absolute certainty.
The expectation is that, in the coming years, we will see an explosion of mathematical results produced by human-AI collaborations, with vibe-proving becoming a standard tool in the arsenal of researchers around the world.
Context and Background
The geometry proof achievement was not an isolated case. In 2025, ChatGPT was tested on problems from the International Mathematical Olympiad (IMO), one of the most prestigious and challenging competitions in the world, bringing together the best mathematics students from dozens of countries.
The result was impressive: ChatGPT correctly solved 5 out of 6 problems in the competition. For context, the IMO presents problems that challenge even the most talented young mathematicians on the planet. Many competitors with years of intensive training cannot solve all six problems within the allotted time.
The only problem ChatGPT failed to solve involved a particularly creative geometric construction requiring a type of visual insight that language models have not yet mastered. This result illustrates both the power and current limitations of AI in mathematics: it is extraordinarily competent at problems that can be approached through symbolic manipulation and formal logic, but still stumbles on problems requiring visual creativity or spatial intuition.
The IMO performance also sparked debates about the future of mathematical competitions. If an AI can solve most of the problems, what is the value of training humans to do the same? The mathematical community's response has been surprisingly pragmatic: competitions test specific human skills — creativity, elegance, speed under pressure — that remain valuable regardless of what machines can accomplish.
Impact on the Population
| Aspect | Previous Situation | Current Situation | Impact |
|---|---|---|---|
| Scale | Limited | Global | High |
| Duration | Short-term | Medium/long-term | Significant |
| Reach | Regional | International | Broad |
What the Key Players Are Saying
The specific conjecture that ChatGPT helped prove was formulated in 2024 by a group of European mathematicians and involved properties of certain configurations of points and lines in projective geometry. Without going into excessive technical detail, the problem asked whether a particular relationship between angles and distances in a family of geometric figures was always true, or whether counterexamples existed.
Several proof attempts had been published between 2024 and 2025, but all contained gaps or errors identified by reviewers. The problem was considered "probably true" by the community — there was strong computational evidence and many verified special cases — but no one had managed to construct a rigorous general proof.
ChatGPT-5.2, in its sessions with the VUB researchers, approached the problem in a way that none of the mathematicians had tried: it combined techniques from algebraic geometry with number theory methods, creating a bridge between two fields that are rarely connected in this type of problem. This interdisciplinary approach — which the AI generated through pattern recognition in its vast training base — was the key to unlocking the proof.
The researchers acknowledged that they probably would not have arrived at this approach on their own, at least not in the short term. The AI's ability to "think outside the box" — or, more precisely, to not be limited by the same mental boxes that human experts develop over the course of their careers — was the decisive differentiator.
The publication of the VUB study also reignited debates about authorship, credit, and ethics in AI-assisted research. If ChatGPT generated the essential structure of the proof, should it be listed as a co-author of the paper? The academic community is divided.
Some scientific journals already prohibit listing AI models as co-authors, arguing that authorship implies responsibility — and an AI cannot be held accountable for errors or fraud. Others adopt a more flexible position, requiring only that AI use be declared transparently.
The VUB researchers opted for an intermediate approach: they did not list ChatGPT as a co-author but dedicated an entire section of the paper to describing in detail how the AI was used, including transcripts of the chat sessions. This transparency was praised by the community as a model to be followed.
There are also concerns about the impact on training new mathematicians. If students can use AI to solve difficult problems, how will they develop the deep reasoning skills that are the essence of mathematical training? Universities around the world are reformulating their curricula to incorporate responsible AI use, treating it as a tool to be mastered, not a shortcut to be exploited.
Next Steps
Despite the enthusiasm, it is essential to maintain a balanced perspective on what AI can and cannot do in mathematics. The VUB study is explicit about the limitations observed during the process.
First, ChatGPT frequently produced arguments that appeared correct but contained subtle errors. In several of the 7 sessions, the researchers identified invalid logical steps that the model presented with complete confidence. Without human verification, these errors would have gone unnoticed and the "proof" would have been invalid.
Second, the model demonstrated difficulty with problems requiring creative constructions — inventing new mathematical objects, defining ingenious auxiliary functions, or finding unexpected counterexamples. Its strength lies in combining and applying existing techniques, not in inventing genuinely new ones.
Third, the quality of the result depends critically on the quality of human interaction. Researchers who knew how to ask the right questions, identify promising paths, and redirect the model when necessary obtained far superior results compared to researchers who simply asked ChatGPT to "solve the problem."
These limits suggest that AI is best understood as an amplification tool rather than a substitute for human mathematical reasoning. It accelerates discovery but does not eliminate the need for human expertise, judgment, and creativity.
Closing
These limits suggest that AI is best understood as an amplification tool rather than a substitute for human mathematical reasoning. It accelerates discovery but does not eliminate the need for human expertise, judgment, and creativity.
Sources and References
The term "vibe-proving" was coined by the VUB researchers to describe this new method of AI-assisted mathematical reasoning. The name references the concept of "vibe coding" — the practice of programming intuitively with AI assistance, without necessarily understanding every line of generated code — but applied to the far more rigorous domain of mathematical proof.
In vibe-proving, the human mathematician acts as a research director: defining the problem, evaluating the quality of AI-generated arguments, redirecting reasoning when necessary, and validating the final result. The AI, in turn, functions as a tireless research assistant: exploring dozens of possible approaches, combining techniques from different fields, generating proof candidates, and refining its arguments based on human feedback.
The importance of vibe-proving extends beyond a single solved problem. The method suggests a new paradigm for mathematical research, where AI does not replace the mathematician but dramatically amplifies their capacity for exploration. Problems that would take months or years of trial and error for a human researcher can have their solution space explored in hours or days with AI assistance.
However, the study's authors are careful to emphasize the limitations: ChatGPT does not "understand" mathematics in the human sense. It lacks geometric intuition, cannot visualize figures, and does not feel the elegance of an argument. What it does is manipulate linguistic and logical patterns with superhuman efficiency, generating candidate arguments that may or may not be correct. Human verification remains absolutely essential.




