Formalisation, Mathematics, and Artificial Intelligence

Formalisation is the verification of proofs by a computer. The proofs can be of mathematical results or of correctness of software, hardware, networking protocols, and cryptographic systems. Formalisation is often done with interactive theorem provers (such as the Lean prover),  which help in finding proofs in addition to ensuring correctness. The use of formalisation has grown greatly in industry, as increasingly complex and critical systems testing is not adequate to guarantee correctness, hence security and safety. The main barrier to a more wide adoption of formalisation is the relative difficulty of formalisation as compared to testing. With the development of artificial intelligence (AI)-based tools, this barrier can be considerably reduced.

On the other hand, while artificial intelligence systems have become very powerful, they still have issues with accuracy In many domains (including mathematics), as errors cascade, even a relatively small rate of incorrect responses acts as a barrier to scaling and autonomy. It is widely believed that the best solution to this is to use formalisation (also assisted by AI) to ensure correctness and enable scalability and autonomy.

Thus, formalisation and artificial intelligence promise to work together to allow computers to greatly assist mathematical reasoning as well as the development of software and hardware.

A series of three talks on the closely-related areas of formalisation, mathematical reasoning, and artificial intelligence was organised by the Kotak IISc AI-ML Centre (KIAC) at the Indian Institute of Science (IISc), Bengaluru on 21 April 2025. The programme was structured and hosted by Siddhartha Gadgil (Professor, Department of Mathematics, IISc). The talks were by three eminent mathematicians, namely, Koundinya Vajjha (Formal Verification Engineer, Intel Corporation), Bhavik Mehta (Chapman Fellow in Mathematics, Imperial College, London), and Patrick Massot (Professor, Université Paris-Saclay, Saint-Aubin, France). Together, they gave the audience an introduction to formalisation – its use in mathematics, software, and hardware, and limitations; how AI has been able to do mathematics; and the working of the Lean prover. The following is an account of their talks, compiled from their presentations.

Koundinya Vajjha gave an insight into Formal Verification for Mathematics, Software, and Hardware.

Vajjha started off by telling three stories, from which three lessons could be learnt, and remedial measures taken. The first story was on the Kepler conjuncture, proposed by Johannes Kepler in 1611, which states that “No arrangement of spheres in 3D space is denser than the face-centred cubic packing”. In 1998, a 300-page proof of the Kepler conjecture was given by Thomas Hales and Sam Ferguson, which was published in the Annals of Mathematics in 2006, with a disclaimer that they could not certify the correctness of the computer calculations.

The second story was on the maiden flight of the Ariane 5 rocket that failed catastrophically 37 seconds after launch. The reason for the failure was a software bug that caused an overflow exception (arising from converting a double precision float to a 16-bit integer), leading to a processor failure in the inertial reference system, causing the rocket to veer off course and self-destruct.

The third story was on the first recall (costing $475 million) of a computer chip. In October 1994, Thomas R Nicely was trying to estimate the numerical value of Brun’s constant. In this process, a hardware design defect (the Intel FDIV bug) was discovered in the floating-point unit that caused incorrect results in specific division calculations, with noticeable relative errors.

The lessons to be learnt from these three stories are:

1.  Mathematical proofs are now approaching a level of complexity that trumps traditional referees.

2. Unchecked software bugs in safety-critical missions have the potential to spell disaster.

3. Hardware implementations need to ensure correct data paths, as the cost of silicon escapees is extremely high.

Formal verification (FV) refers to a set of methods using mathematical proof to ensure the quality and correctness of a design. It can be applied across disciplines, in (a) mathematics: to rigorously verify proofs of theorems all the way down to the fundamental axioms, (b) software: to detect bugs that traditional testing misses, and (c) in hardware: to ensure correctness in circuits before fabrication.

FV was used to address the problems in the above three stories as follows. In the first story, in response to the referee report, Hales launched the Flyspeck project (early 2000s–2014) that aimed to formally verify the entire proof. It used a combination of two proof assistants, namely ‘HOL Light’ and ‘Isabelle’. The project involved a large international collaboration and 20 person-years of effort. In the second story, Xavier Leroy proved the correctness of an optimising C compiler, using the Coq proof assistant . It involved 6 person-years of effort and was called the CompCert project. In the third story, Intel invested in formal verification and has caught many corner-case issues that traditional simulations missed before they reached silicon. Various in-house and industry-standard tools are being used for data path and control path verification. The correctness statements broadly include statements that the implementation meets a given specification or that a given property (expressed in temporal logic) holds of an abstracted model of the design under test.

All FV tools operate on the same principles. These are:

1.  Syntax: it is possible to give a mathematical representation (model) of the system of interest.

2. Formal reasoning: using the apparatus of formal logic, properties of the mathematical model are established.

3. Semantics: an interpretation of the formal symbols grounds the elements of the system being studied in reality. It is thus deduced that the correctness proofs actually translate to the real world.

Most FV tools implement a base logic that is expressive enough to model the domain to be validated. Usually, there is a trade-off between expressive power and automation:

  • Interactive theorem provers/proof assistants usually have highly-expressive logic to formalise arbitrary mathematics, relying on low automation and significant user guidance. This trade-off allows precise reasoning about complex structures but requires substantial effort to develop proofs.
  • Model checkers, satisfiability (SAT) solvers, and satisfiability modulo theories (SMT) solvers trade expressiveness for near push-button automation, efficiently verifying (small) finite-state systems and propositional constraints, but struggling with richer mathematical properties.

The Principia Mathematica, by Russell and Whitehead, aimed to reduce all mathematics to formal logic – an example of formal verification done by hand, without any tool support. According to the logicist thesis, ‘mathematics is just a part of logic and can be rigorously derived from formal axioms’. Russel’s work laid the foundation of type theory, the basis of some modern proof assistants such as Lean and Coq. A hundred years later, large portions of mathematics—ranging from foundational results in algebra and analysis to advanced topics such as algebraic topology and category theory—have been rigorously formalised with these proof assistants.

Some proof assistants implement alternate foundations of mathematics and come equipped with computational semantics; this means that their implementations also double up as programming languages; this enables reasoning about and running of programs in the same environment. Lean is its own programming language. The mathematics developed in the Lean theorem prover is open source and housed in the mathlib repository.

On software verification, Vajjha pointed out that to trust an algorithm running on a computer completely, it is necessary to:

1.  Check the mathematical proof that the algorithm does what it is supposed to do.

2. Check that the implementation of the algorithm in a programming language of choice is faithful to its semantics.

3. Check that a compiler which compiles the program down to machine code preserves semantics.

4. Check that the silicon that runs the machine code is free of any hardware bugs that could jeopardise semantics.

Vajjha also indicated that such end-to end proofs are extremely rare and usually the first two checks alone are done. He also gave an example of formal verification of software that implements an AI algorithm.

On hardware verification, Vajjha explained two paradigms:

1.  Equivalence checking: here, it is proved that the specification (e.g. as in the programmer’s manual) agrees with the RTL (register transfer level) implementation. There are mis-specification risks in this paradigm.

2. Property verification: here, it is proved that the design satisfies a set of properties (e.g. no deadlocks, mutual exclusion). There are under-specification risks in this paradigm.

The best practice is to use both the paradigms.

Simulation is the traditional method of debugging hardware. Here, inputs are fed into the system, and the outputs are checked to see if they are as expected. Vajjha explained symbolic simulation, ternary simulation, and a combination of the two. The main data path verification method used at Intel is the symbolic trajectory evaluation (STE), which is sometimes combined with light-weight theorem proving; this combination has been applied successfully to verify many hardware components.

For control path verification, the main technique used is model checking, wherein the design to be verified is modelled as a finite-state machine, and the property to be checked is formalised by expressing it in temporal logic. The safety property declares that something should not happen, and the liveness property declares what should eventually happen. The engine traverses the reachable states of the design to verify the property. If a property fails, a trace of states that falsify the property (a counterexample) is generated.

Finally, Vajjha discussed the limitations of formal verification:

1.  FV only proves correctness relative to the specification.

2. State explosion makes full verification of complex systems difficult.

3. Hardware/software proofs often abstract away real-world effects (e.g. power, analog behaviour, manufacturing defects).

4. Writing formal proofs and invariants is difficult and time consuming.

5. FV typically verifies functional correctness, but non-functional properties (e.g. performance, security side-channels) may need testing.

Vajjha pointed out that though formal verification is a powerful tool, it is best used alongside simulation, testing, and real-world validation.

Bhavik Mehta spoke on Mathematics, AI, and Lean. He described the ways in which artificial intelligence has been able to do mathematics, by (a) finding (counter) examples, (b) generating ideas, and (c) creating proofs. He demonstrated this through four case studies.

1.  Constructions in combinatorics via neural networks

     Here, a good representation of the object is found, and then, optimisation is done to find increasingly good candidates for the metric. The advantages are that it can use very small models and runs fast if it succeeds. The disadvantages are that it relies on the object having a finite and discrete construction, and the optimal construction is iteratively constructed.

2.  Mathematical discoveries through program search with large language models

     Here, a search is conducted in the function space, and properties of the output of the function are provided. This is optimised to find good functions. The advantage is that it finds a solution with low complexity and the disadvantage is that it will only find a solution with low complexity.

3.  Advancing mathematics by guiding human intuition with AI

     Here, patterns in data are found using AI; human mathematicians then use these to make conjectures. The advantage is that the underlying objects can be complicated, and the disadvantage is that a large dataset needs to be generated.

4.  AlphaProof

     AlphaProof is a reinforcement learning-based system for mathematical reasoning. It solved three out of five problems in the International Mathematical Olympiad (IMO) held in 2023. The solved problems were in algebra and number theory. AlphaProof achieved a score that is equivalent to a silver medallist in the IMO.

Patrick Massot asked, Why Formalise Mathematics? He spoke about extending the mathematical toolbox, keeping everything we like and adding more. Massot’s team uses Lean, developed mostly by Leonardo de Moura. Lean and its mathematical library are open source and there are ~500 contributors to the library. Massot pointed out that very few mathematicians use proof assistants and out of those, almost all of them use Lean. He  gave demonstrations on working with Lean and explained how it becomes useful for mathematicians.

1.  Checking

     The referee’s dream—when a paper is submitted to him/her—is that he/she needs to check only for relevance and if there are new ideas and to leave the proof checking to the computer. Here, the focussed targets are (a) long proofs and (b) very technical fragments of the proofs. Some recent formalisations include (i) Smale’s sphere eversion theorem, (ii) Marton’s polynomial Freiman–Ruzsa conjecture, and (iii) the bounds on Ramsey numbers.

2.  Explaining

     Traditionally, while writing mathematics, the author decides the level of details. The goal here is to let the reader decide the level of details he/she wishes to see and to allow a continuous zoom in from heuristic high-level overviews to full details and back. Massot showed how this is possible through a demonstration.

3.  Teaching

     In addition to explaining mathematics, it is necessary to teach students to write proofs. Massot has been using Lean with first year undergraduate students since 2019 in his university, reproving things that they had learnt in the previous term. He aimed at pen and paper proofs; however, it seemed that students were finding it difficult to do this. So, Massot is using a custom input language, which has more English words in the proof.

4.  Creating

     It is easy to tweak definitions and statements and to find unnecessary assumptions. With AI, searching for mathematics should become much easier. It may help in filling easy proofs or suggesting analogies. Formalisation encourages clean abstractions.

     The current status of the use of AI in mathematics is that large language models are useful in searching for existing lemmas and definitions. However, proof capabilities are not yet useful on a daily basis.

5.  Having fun together

     Using proof assistants is fun and very collaborative.