Differential Geometry and the Complexity of Robust Optimisation of Smooth Functions

Speaker: Ramarathnam Venkatesan, Partner Architect (Cloud & AI, Microsoft)
Date: 02 September 2025

Complexity theory, cryptography, and differential geometry enter the world of formalising threat modelling and analysis of machine learning algorithms, also known as artificial intelligence (AI). People think that we need to find optima that are robust to small adversarial corruptions. However, we have to treat adversaries with respect, not as Gaussian noise generators.

Considering optimisation in the age of AI, it must deal with general smooth functions (not just convex, strongly convex) because all the AI algorithms are smooth. We impose the condition that we impose optima that are stable against small adversarial corruptions.

Adversarial attacks mean corrupting a model being trained for attack detection, by (a) corrupting the data on which the model is trained. ‘Data poisoning’ makes a car think the speed limit is 60 MPH in a 30 MPH zone. (b) figuring out the model being used. We need new principles and need to use randomness like in crypto. We can train a small language model on publicly available data and use it as an adversary to penetrate into enterprises and cloud systems and also use it to defend the system from attacks. These (zero day) attacks can be generated using ChatGPT in a rapid pace. Stealing computer resources is a huge profitable business. All the infrastructure on which we are going to use something for a common purpose, all the data, should be held private.

Ramarathnam Venkatesan gave an overview of the landscape of optimisation. Linear regression, logistic regression, linear programming, support vector machines, quadratic programming, and deep neural networks (DNNs) fall under the results he presented in his talk. DNNs do not have a closed form expression. So, how do we optimise? The usual form of optimisation fails if we want to optimise a general smooth function. DNNs can at best be modelled using layers of Ridge functions. The speaker explained robust optimisation. We need to use existing optimisation and machine learning routines. However, there are many foundational difficulties in formulating them.

The speaker gave an algorithm to optimise a smooth function. The robustness comes from the fact that the random walk cannot distinguish smooth functions away from functions coming from a dense subset of smooth functions. However, there are some big problems that cannot be solved; there are physical processes in which a small perturbation can send us far way, such as weather forecasting; such problems cannot be covered by this algorithm.

To define and characterise the convergence limits, we need different concepts from differential geometry. Given these hurdles, the results are unconditional and rely on the stability and transversality theory of smooth functions and manifolds. However, the random walks cannot distinguish between a black box optimiser for f and one for a Morse approximation F; they use the same random sequences and produce co-convergent sequences.

The technique presented can be used to solve partial differential equations, Navier Stokes equations, Schrödinger’s equations, and others.