
MIT Department of Mathematics researchers David Roe ’06 and Andrew Sutherland ’90, PhD ’07 are among the inaugural recipients of the Renaissance Philanthropy and XTX Markets’ AI for Math grants.
Four additional MIT alumni — Anshula Gandhi ’19, Viktor Kunčak SM ’01, PhD ’07; Gireeja Ranade ’07; and Damiano Testa PhD ’05 — were also honored for separate projects.
The first 29 winning projects will support mathematicians and researchers at universities and organizations working to develop artificial intelligence systems that help advance mathematical discovery and research across several key tasks.
Roe and Sutherland, along with Chris Birkbeck of the University of East Anglia, will use their grant to boost automated theorem proving by building connections between the L-Functions and Modular Forms Database (LMFDB) and the Lean4 mathematics library (mathlib).
“Automated theorem provers are quite technically involved, but their development is under-resourced,” says Sutherland. With AI technologies such as large language models (LLMs), the barrier to entry for these formal tools is dropping rapidly, making formal verification frameworks accessible to working mathematicians.
Mathlib is a large, community-driven mathematical library for the Lean theorem prover, a formal system that verifies the correctness of every step in a proof. Mathlib currently contains on the order of 105 mathematical results (such as lemmas, propositions, and theorems). The LMFDB, a massive, collaborative online resource that serves as a kind of “encyclopedia” of modern number theory, contains more than 109 concrete statements. Sutherland and Roe are managing editors of the LMFDB.
Roe and Sutherland’s grant will be used for a project that aims to augment both systems, making the LMFDB’s results available within mathlib as assertions that have not yet been formally proved, and providing precise formal definitions of the numerical data stored within the LMFDB. This bridge will benefit both human mathematicians and AI agents, and provide a framework for connecting other mathematical databases to formal theorem-proving systems.
The main obstacles to automating mathematical discovery and proof are the limited amount of formalized math knowledge, the high cost of formalizing complex results, and the gap between what is computationally accessible and what is feasible to formalize.
To address these obstacles, the researchers will use the funding to build tools for accessing the LMFDB from mathlib, making a large database of unformalized mathematical knowledge accessible to a formal proof system. This approach enables proof assistants to identify specific targets for formalization without the need to formalize the entire LMFDB corpus in advance.
“Making a large database of unformalized number-theoretic facts available within mathlib will provide a powerful technique for mathematical discovery, because the set of facts an agent might wish to consider while searching for a theorem or proof is exponentially larger than the set of facts that eventually need to be formalized in actually proving the theorem,” says Roe.
The researchers note that proving new theorems at the frontier of mathematical knowledge often involves steps that rely on a nontrivial computation. For example, Andrew Wiles’ proof of Fermat’s Last Theorem uses what is known as the “3-5 trick” at a crucial point in the proof.
“This trick depends on the fact that the modular curve X_0(15) has only finitely many rational points, and none of those rational points correspond to a semi-stable elliptic curve,” according to Sutherland. “This fact was known well before Wiles’ work, and is easy to verify using computational tools available in modern computer algebra systems, but it is not something one can realistically prove using pencil and paper, nor is it necessarily easy to formalize.”
While formal theorem provers are being connected to computer algebra systems for more efficient verification, tapping into computational outputs in existing mathematical databases offers several other benefits.
Using stored results leverages the thousands of CPU-years of computation time already spent in creating the LMFDB, saving money that would be needed to redo these computations. Having precomputed information available also makes it feasible to search for examples or counterexamples without knowing ahead of time how broad the search can be. In addition, mathematical databases are curated repositories, not simply a random collection of facts.
“The fact that number theorists emphasized the role of the conductor in databases of elliptic curves has already proved to be crucial to one notable mathematical discovery made using machine learning tools: murmurations,” says Sutherland.
“Our next steps are to build a team, engage with both the LMFDB and mathlib communities, start to formalize the definitions that underpin the elliptic curve, number field, and modular form sections of the LMFDB, and make it possible to run LMFDB searches from within mathlib,” says Roe. “If you are an MIT student interested in getting involved, feel free to reach out!”