Mathematical Foundations of Machine Learning & AI

Introduction to Machine Learning and AI from a Mathematical Perspective

Overview of ML and AI

Artificial intelligence (AI) is a branch of computer science focused on creating systems capable of performing tasks that typically require human intelligence, such as problem-solving, decision-making, and understanding language. Machine learning (ML) is a subset of AI that involves creating algorithms that enable computers to learn patterns and make decisions or predictions based on data, without being explicitly programmed for every task.
According to Jordan and Mitchell (2015), ``machine learning is concerned with the question of how to construct programs that automatically improve their performance at some tasks through experience''. More formally, a computer program is said to learn from experience $E$ with respect to some class of tasks $T$ and performance measure $P$ if its performance at tasks in $T$, as measured by $P$, improves with experience $E$.
Developers of ML and AI systems might in some cases find easier to train a system with a training sample than programming an exhaustive set of rules policies, i.e., the mappings from perceived states of the environment to the actions; see Sutton and Barto (2018). ML and AI might be viewed as a neighbour branch of scientific machine learning, the purpose of which is to construct reliable reduced models of highly complex systems from even partially observed trajectories of data.
ML methods rely on a multilayer neural network (MNN), an input/output connectionist system of variable dimension, the architecture of which consists of layers of nodes connected in an organized and structured way. The use of hierarchical and organized structures for modelling complex systems represented by structured data is rather intuitive, and has been advocated by the Turing Award and Nobel laureate Herbert Simon (1962).
There is a wide array of reference books on the topic, e.g., arranged in chronological order Vapnik (1998, 2000), Bishop (2006), Hastie et al. (2009), Shalev-Shwartz and Ben-David (2014), Goodfellow et al. (2016), Aggarwal (2018), Mohri et al. (2018), etc. The current rate of evolution of ML and AI is almost exponential, making the task of writing an exhaustive reference book solely on its mathematics a Sisyphean endeavor.

Mathematical formalization in ML and AI

This fast development pace of ML and AI is supported by its mathematical formalization, which borrows from a wide array of fields in mathematics and computer science: algebra, optimization, statistical inference and estimation. This allows to obtain reliable and reproducible results, to bridge connections between different academic fields and develop new tools for obtaining powerful and sometimes counter-intuitive new results.

I: Mathematical Foundations and Tools

Introduction

ML and AI make extensive use of calculus, linear algebra, optimization, probability and statistics. There is a wide array of excellent books covering these topics, see e.g., MacKay (2003), Wasserman (2004), Nocedal and Wright (2006), Bishop (2006), Hastie et al. (2009), Goodfellow et al. (2016), Strang (2019), Bishop and Bishop (2023) among others.

Key Algorithms and Models

Neural Networks

The multilayer neural network (MNN), a connectionist system the architecture of which consists of several connected nodes and layers, constitute the building block of ML models. The internal organization of these computational models, inspired by the human brain and consisting of layers of interconnected nodes, depends on the purpose of the model, the structure of the data used for its training and other inductive biases (Goyal and Bengio, 2022). A huge dataset is then presented to the network for the training, i.e., estimating the weights associated with each connection, the hyperparameters, by optimizing a performance measure $P$ with algorithms. The use of multilayer neural networks has strong mathematical foundations, as Hornik et al. (1989) demonstrated that even shallow neural networks are universal approximators, i.e., they can approximate any function $$ X \in \mathbb{R}^m\mapsto Y=f(X) \in \mathbb{R}^n, \: m, n \geqslant 1, $$ and its derivatives. However, these shallow neural networks are useless, as their generalization properties, i.e., their performance on samples different from the training sample, are rather poor. This overfitting in the estimation of statistical models is corrected by using regularization methods, e.g., the use of penalty functions as in nonparametric regression analysis (Green and Silverman, 1994). The estimation of the function $f$ is obtained by minimizing a regularized loss function: $$ \min_{f \in \mathcal{H}} \mathcal{L}(Y,X;\theta) + \lambda R(f), $$ where $\mathcal{H}$ is the space of candidate functions, $\theta \in \Theta$ the set of hyperparameters to be estimated, $R(f)$ is a penalty functional with weight $\lambda >0$. The hyperparameters are adjusted on a training set, the model is then tested and validated on different datasets.
Since the works by Krizhevsy et al. (2012) introducing deep networks, the picture changed completely as it was empirically observed that the generalization properties strongly improve with the size of the network. New regularization methods like Dropout (Srivastava et al., 2014) proved useful for these deep networks. The seminal mathematical works by Belkin and its co-author study the mathematical foundations of the empirical "auto-regularization" properties of over-parameterizd systems, e.g., the large language models with over several trillions of hyper-parameters, characterized by the "double-descent" phenomena: once the interpolation threshold is bypassed, the risk function of the systems decreases monotonically with its capacity.
Furthermore, although the risk function of these over-parameterized systems is neither convex, nor locally convex, other topological properties, studied by Łojasiewicz (1962) and Poljak (1963), verified by these over-parameterized systems allows to use gradient methods, including the stochastic gradient method (SGD) commonly used in ML.
Further details on the minimization of the loss function, and the recent advances in this field for very large systems are given in the AI & Math page.

Deep Learning Models

The following architectures constitute the main building blocks of MNN. The combination of these architectures yields the class of hybrid models.
  1. Convolutional Neural Networks (CNN) are based on the invariance property of the convolution operator for translation and rotation, so that an object will be recognized as the same after translations and/or rotation. This makes CNNs particularly well-suited for tasks involving image processing, image recognition, and other visual applications.
  2. Recurrent Neural Networks (RNN) are designed for sequential data, with loops to process temporal dependencies. Time series analysis and speech recognition are their main domain of applications.
  3. Long Short-Term Memory (LSTM), proposed by Hochreiter and Schmidhuber (1997), was designed to address the vanishing gradient problem encountered in RNNs with a large number of loops. During training, the model minimizes a risk or loss function. However, the standard backpropagation algorithm, which relies on a chain of derivative calculations, can lead to gradients with extremely small numerical values, making them practically unusable.
  4. Transformers: A Transformer is a deep learning achitecture, based on the attention mechanism (Vaswani et al., 2017), primarily for processing sequential data, e.g., text, without resorting to RNN. This architecture is particularly effective for tasks such as translation, summarization, and text generation.

II: Theoretical Aspects of ML and AI algorithms

Machine Learning Paradigms

Supervised Learning

Supervised learning is the simplest learning method, where the model learns from labeled data, i.e., (input,output) pairs. We have a dataset $ \mathcal{D} = \{(x_1, y_1), (x_2, y_2), \ldots, (x_N, y_N)\} $, which consists of a collection of $N$ pairs of inputs, $x_i$ or features, and outputs $ y_i$, written in matrix form as $ \mathcal{D} =(X,Y)$. The output $Y$ might be discrete, continuous, a set of labels in the case of the classification problem, etc.
The goal of supervised learning is to approximate the function $f$ that maps $X$ to $Y$, i.e., $Y=f(X)$, by learning this function from the dataset $\mathcal{D}$.

Unsupervised Learning

The model learns patterns or structures from data without labeled outcomes, as labels are either costly to obtain, or at least partially available. Furthermore, errors in the labels affect the result of supervised learning and require either to use specific corrective tools (Northcutt et al., 2021), or to design alternative methods for estimating the model without depending on labels.
Unsupervised learning takes into account the internal structure/organisation of the data, and does not need the set of labels. This approach is similar to the non-parametric approach in statistics which 'let the data speak for themselves'.
Semi-supervised learning is the intermediate case between supervised and unsupervised learning: the learning procedure uses first the labelled data while leveraging a larger amount of unlabeled data, which strongly enhance the accuracy of the model. This resembles nonparametric statistics, where no model is specified and the user "let the data speak for themselves". As a consequence, some nonparametric methods are used in unsupervised learning. Standard exemples of unsupervised learning are:
  1. Clustering: Grouping similar data points (e.g., customer segmentation).
  2. Dimensionality Reduction: Simplifying high-dimensional data while preserving important features (e.g., PCA for data visualization).

Self-supervised Learning

Self-supervised learning is an approach in machine learning that utilizes unlabeled data to train models. Instead of relying on external labels, it formulates auxiliary tasks based on inherent data properties, e.g., internal structure and organization , such as predicting missing parts, detecting relationships, or reconstructing data representations.
The model learns meaningful representation or features of the data using e.g., pretext tasks, contrastive learning, redundancy reduction (Zbontar et al., 2021), etc. These meaningful patterns and features are then used for downstream tasks, often with minimal additional supervision. SSL bridges the gap between supervised learning and unsupervised learning by exploiting the vast availability of unlabeled data while reducing the consequences of possible errors in labeled datasets mentioned above.

Reinforcement Learning

Reinforcement learning (RL) is a fundamental learning method at the intersection of artificial intelligence and machine learning. In the standard supervised, unsupervised, self-supervised frameworks, the learner receives the data and learns a model from them.
In the RL framework, the learner is more active as he/she is an agent, or decision maker, interacting in an unsupervised manner with its environment through a sequence of actions. In response to the action just taken, the agent receives two types of information through a standard feedback mechanism:
  1. Its current state in the environment,
  2. A real-valued immediate reward for its actions. In more advanced frameworks, future or long-term reward are provided. The trade-off between immmediate and future reward is controled by the discount factor $\gamma \in (0,1)$.
The agent takes into account this feedback for its future actions, The objective of the agent is to maximize its reward and thus to select the best course of actions, the policy, for maximizing the cumulative reward over time. If the environment is known, this is a planning problem, whilst if the environment is unknow, this is a learning problem.
In the latter case, the agent balances the trade-off between the exploration, i.e., obtaining more information on the environmemt and rewards, and the exploitation of the information already collected for optimizing the rewards.
RL methods are often computationally intensive. However, the recent emergence of DeepSeek-R1 (2025), which utilizes RL and operates with fewer resources than state-of-the-art large language models (LLMs), is likely to enhance the adoption of RL methods.

The standard framework for formalizing the reinforcement learning is the Markov decision process (MDP), defined by $M=(\mathcal{S,A,P},r,\gamma )$ where $\mathcal{S,A}$ are respectively the state space and the action space, which might be infinite, $r : \mathcal{S\times A} \rightarrow [0,1]$ is the reward function, i.e., $r(s,a)$ is the immediate reward upon choosing the action $a \in \mathcal{A}$ while in the state $s \in \mathcal{S}.$
The probability transition kernel is defined as $\mathcal{P}: \mathcal{S \times A} \rightarrow \Delta(S)$, where $\mathcal{P}(s^{'}|s,a)$ is the probability of transiting from state $s$ to state $s^{'}$ when $a$ is selected; $\Delta(S)$ is the probability simplex over $\mathcal{S}$. In the generative case, the probability transition kernel $\mathcal{P}$ is empirically estimated from the data.
The model is Markovian since the transition and reward probabilities depend only on the current state $s$ and not the entire history of states and actions. Usually, the number of states $|\mathcal{S}|$ and actions $|\mathcal{A}|$ is finite: this is the tabular case.

A deterministic policy $\pi$ is a mapping $\pi: \mathcal{S}\rightarrow \mathcal{A}$. The objective of the agent is to find the optimal policy $\pi^\star$ maximizing the value function $V^\pi(s)$ , which is the expected discounted total reward starting from the initial state $s^0=s$ , i.e., \begin{eqnarray*} \pi^\star &:= &\arg \max_\pi V^\pi(s),\\ V^\pi(s)& := &E \Big[ \sum_{t=0}^\infty \gamma^t r(s^t, a^t)\big | s^0 = s \Big ]. \end{eqnarray*} The corresponding action value function $Q^\pi: \mathcal{S\times A} \rightarrow \mathbb{R}$ of a policy $\pi$ is defined as $$ Q^\pi(s,a) := E \Big [ \sum_{t=0}^\infty \gamma^t r(s^t, a^t) \big | (s^0, a^0)=(s, a) \Big ], $$ $\forall (s,a) \in \mathcal{S \times A}$. There exists an optimal policy $\pi^\star$ that simultaneoulsy maximizes $V^\pi(s) \; \forall s \in \mathcal{S}$, and $Q^\pi(s,a) \; \forall (s,a) \in \mathcal {S \times A}$.

This MDP framework is flexible enough to deal with several configurations, e.g., off-policy RL where data logged from a different policy are used in the exploration step, but rigorous enough as it allows to assess the reliability of the results, e.g., we can estimate confidence intervals for the rewards. Furthermore, this setup allows to provides guarantees on the reliability of the algorithms used for estimating the optimal value function $V^\star := V^{\pi^\star}$, or optimal action value function $Q^\star := Q^{\pi^\star}$. In particular, new results show that the minimum sample size for achieving the minimax sample complexity is $ N \geqslant O\left(\frac{|\mathcal{S}||\mathcal{A}|}{1-\gamma}\right)$.
This MDP framework has been extended to the non-tabular case, with a very large number of states, and/or continuous states. For that purpose, the authors resort to kernel function approximation, already studied by Farahmand et al. (2016) and Long et al. (2022). Long and Han (2022) introduce the perturbational complexity $\Delta_\mathcal{M}(\varepsilon)$ which indicates whether the RL problem can be solved efficiently, in terms of sample complexity, i.e., without being affected by the curse of dimensionality. Furthermore, $\Delta_\mathcal{M}(\varepsilon)$ is the lower bound of the estimation error for the RL-type algorithms they propose. Long and Han (2023) provide an exhaustive survey on the non-tabular case.

III: Current Research and Open Problems

Recent Advances in Reinforcement Learning

Natural Language Processing

IV: Learning Resources & References

Academic Learning Resources

References

  1. Aggarwal, C.C., 2018. Neural networks and deep learning. A textbook. Springer, Cham. MR3966422
  2. Bahdanau, D., Cho, K., and Bengio, Y. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
  3. Belkin, M., 2021. Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numerica 3, 203-248. MR4298218
  4. Belkin , M., Hsu, D., Ma, S. and Mandal, S., 2019. Reconciling modern machine-learning practice and the classical bias-variance trade-off, Proc. Natl. Acad. Sci. USA 116, no. 32, 15849-15854. MR3997901
  5. Belkin, M., Hsu, D. and Xu, J., 2020. Two models of double descent for weak features, SIAM Journal on Mathematics of Data Science 2, no. 4, 1167-1180. MR4186534
  6. Bishop, C.M. and Bishop, H., 2023. Deep learning: Foundations and concepts. Springer Nature. MR4719738
  7. Bishop, C. M., 2006. Pattern recognition and machine learning. Inf. Sci. Stat. Springer, New York. MR2247587
  8. Brown, T. B. et al., 2020. Language models are few-shot learners. arXiv preprint ArXiv:2005.14165.
  9. Cobbe, K., Kosaraju, V., Bavarian, M., Chen, M., Jun, H., Kaiser, L., Plappert, M., Tworek, J., Hilton, J., Nakano, R. and Hesse, C., 2021. Training verifiers to solve math word problems.arXiv preprint arXiv:2110.14168.
  10. Colbrook, M.J., Antun, V. and Hansen, A.C. 2022. The difficulty of computing stable and accurate neural networks: on the barriers of deep learning and Smale's 18th problem. Proc. Natl. Acad. Sci. USA 119, no. 12, Paper No. e2107151119, 10 pp. MR4417588
  11. Dixon, M.F., Halperin, I. and Bilokon, P. 2020. Machine learning in finance—from theory to practice. Springer, Cham. MR4220647
  12. Farahmand, A., Ghavamzadeh, M., Szepesvàri, C. and Mannor, S. 2016. Regularized policy iteration with nonparametric function spaces. Journal of Machine Learning Research, 17, Paper No. 139, 66 pp. MR3555030
  13. Fujisawa, I., Nobe, S., Seto, H., Onda, R., Uchida, Y., Ikoma, H., Chien, P.C. and Kanai, R., 2024. ProcBench: Benchmark for Multi-Step Reasoning and Following Procedure. arXiv preprint arXiv:2410.03117.
  14. Geiping, J., Garrido, Q., Fernandez, P., Bar, A., Pirsiavash, H., LeCun, Y., and Goldblum, M., 2023. A Cookbook of Self-Supervised Learning. arXiv preprint arXiv:2304.12210.
  15. Goodfellow, I., Bengio, Y., and Courville, A., 2016. Deep learning. Adapt. Comput. Mach. Learn. MIT Press, Cambridge, MA. MR3617773
  16. Goyal, A. and Bengio, Y., 2022. Inductive biases for deep learning of higher-level cognition. Proc. R. Soc. A.478 no.2266, Paper No. 20210068, 35 pp. MR4505795
  17. Green, P. J. and Silverman, B. W., 1994. Nonparametric regression and generalized linear models. A roughness penalty approach. Monogr. Statist. Appl. Probab., 58 Chapman & Hall, London. MR1270012
  18. Greff, K., Srivastava, R., Koutník, J., Steunebrink, B. and Schmidhuber, J., 2017. LSTM: a search space odyssey. IEEE Transactions on Neural Networks and Learning Systems 28, no. 10, 2222-2232. MR3709742
  19. Hastie, T., Tibshirani, R. and Friedman, J., 2009. The elements of statistical learning. Data mining, inference, and prediction. Second edition Springer Ser. Statist. Springer, New York. MR2722294
  20. Hinton, G.E. and Salakhutdinov, R.R., 2006. Reducing the dimensionality of data with neural networks. Science, 313(5786), pp.504-507 MR2242509
  21. Hinton, G., 2023. How to represent part-whole hierarchies in a neural network. Neural Computation, 35(3), pp.413-452. MR4555492
  22. Hochreiter, S. and Schmidhuber, J., 1997. Long short-term memory. Neural Computation, 9(8), pp.1735-1780.
  23. Hornik, K., Stinchcombe, M. and White, H., 1989. Multilayer feedforward networks are universal approximators. Neural networks, 2(5), pp.359-366.
  24. Jordan, M. I., and Mitchell, T. M., 2015. Machine learning: trends, perspectives, and prospects. Science, 349, no. 6245, 255-260. MR3382217
  25. Kahneman, D., 2011. Thinking Fast and Slow. Macmillan.
  26. Kallus, N. and Uehara, M., 2020. Double reinforcement learning for efficient off-policy evaluation in Markov decision processes. Journal of Machine Learning Research, Paper No. 167, 63 pp. MR4209453
  27. Kallus, N. and Uehara, M., 2022. Efficiently breaking the curse of horizon in off-policy evaluation with double reinforcement learning, Operations Research 70 no.~6, 3282-3302. MR4538517
  28. Kambhampati, S., 2024. Can large language models reason and plan? Annals of the New York Academy of Sciences, 1534(1), pp.15-18.
  29. Kojima, T., Gu, S., Reid, M., Matsuo, Y. and Iwasawa, Y., 2022. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35, 22199-22213.
  30. Lan, G., 2023. Policy mirror descent for reinforcement learning: linear convergence, new sampling complexity, and generalized problem classes.Mathematical Programming 198, no.1, Ser. A, 1059-1106. MR4550970
  31. LeCun, Y., Bengio, Y. and Hinton, G., 2015. Deep learning. Nature, 521(7553), pp.436-444.
  32. Li, G., Wei, Y., Chi, Y. and Chen, Y., 2024. Breaking the sample size barrier in model-based reinforcement learning with a generative model. Operation Research, 72, no.1, 203–221. MR4705834
  33. Liu, C., Zhu, L. and Belkin, M., 2022. Loss landscapes and optimization in over-parameterized non-linear systems and neural networks, Applied and Computational Harmonic Analysis 59, 85-116. MR4412181
  34. Łojasiewicz, S., 1962. Une propriété topologique des sous-ensembles analytiques réels, in Les Équations aux Dérivées Partielles , 87-89. MR0160856
  35. Long, J. and Han, J., 2022. Perturbational complexity by distribution mismatch: a systematic analysis of reinforcement learning in reproducing kernel Hilbert space.Journal of Machine Learning 1, no. 1, 1–37. MR4716372
  36. Long, J,, Han, J. and E, W. 2022. An $L^2$ analysis of reinforcement learning in high dimensions with kernel and neural network approximation. CSIAM Transactions on Applied Mathematics 3, no.2, 191-220. MR4456674
  37. Long, J. and Han, J. 2023. Reinforcement learning with function approximation: from linear to nonlinear. Journal of Machine Learning 2, no. 3, 161-193. MR4734710
  38. MacKay, D.J.C., 2003. Information theory, inference and learning algorithms. Cambridge University Press, New York, 2003. MR2012999
  39. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G. and Petersen, S., 2015. Human-level control through deep reinforcement learning. Nature, 518(7540), pp.529-533.
  40. Mohajerin Esfahani, P. and Kuhn, D., 2018. Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations. Mathematical Programming 171, no. 1-2, 115-166. MR3844536
  41. Mohri, M., Rostamizadeh, A. and Talwalkar, A., 2018. Foundations of machine learning, second edition. Adaptive Computation and Machine Learning, MIT Press, Cambridge, MA. MR3931734
  42. Newell, A., Shaw, J.C. and Simon, H.A., 1959, June. Report on a general problem solving program. In IFIP congress, Vol. 256, p. 64.
  43. Newell, A. and Simon, H., 1972. Human problem solving, ACS symposium series, Prentice Hall.
  44. Nocedal, J., and Wright, S. J., 2006. Numerical optimization. Second edition. Springer Ser. Oper. Res. Financ. Eng. Springer, New York. MR2244940
  45. Northcutt, C., Jiang, L. and Chuang, I., 2021. Confident learning: Estimating uncertainty in dataset labels. Journal of Artificial Intelligence Research 70, pp.1373-1411. MR4306614
  46. Poljak, B. T., 1963. Gradient methods for minimizing functionals, Ž. Vyčisl. Mat i Mat. Fiz. 3, 643-653. MR0158568
  47. Radhakrishnan, A., Belkin, M. and Uhler, C., 2020. Overparameterized neural networks implement associative memory, Proc. Natl. Acad. Sci. USA, 117, no. 44, 27162-27170. MR4255943
  48. Rahimi, A. and Recht, B., 2007. Random features for large-scale kernel machines. Advances in neural information processing systems (NIPS 2007) 20 (J. C. Platt et al., eds), Curran Associates, pp. 1177-1184.
  49. Schmidhuber, J., 2015. Deep learning in neural networks: An overview. Neural networks, 61, pp.85-117.
  50. Shalev-Shwartz, S. and Ben-David, S., 2014. Understanding machine learning: From theory to algorithms. Cambridge university press.
  51. Shi, C., Zhang, S., Lu, W. and Song, R., 2020. Statistical inference of the value function for reinforcement learning in infinite-horizon settings, Journal of the Royal Statistical Society. Series B. 84, no.~3, 765-793. MR4460575
  52. Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A. and Chen, Y., 2017. Mastering the game of go without human knowledge. Nature, 550(7676), pp.354-359.
  53. Simon, H. A., 1962. The architecture of complexity, Proceedings of The American Philosophical Society, 106(6), 467-482. Jstor
  54. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. and Salakhutdinov, R., 2014. Dropout: a simple way to prevent neural networks from overfitting.Journal of Machine Learning Research, 15(1), pp.1929-1958. MR3231592
  55. Strang, G., 2019. Linear algebra and learning from data. Wellesley-Cambridge Press.
  56. Sutton, R. S. and Barto, A.G., 2018. Reinforcement learning: an introduction.. Second edition Adapt. Comput. Mach. Learn. MIT Press, Cambridge, MA. MR3889951
  57. Szepesvàri, C., 2022. Algorithms for reinforcement learning, reprint of the 2010 original, Synthesis Lectures on Artificial Intelligence and Machine Learning, 9, Springer, Cham, MR4647095
  58. Teyssière, G., 2022. Review of "Belkin, M. (2021). Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation". Mathematical Reviews, American Mathematical Society.
  59. Teyssière, G., 2023. Review of "Lan, G., 2023. Policy mirror descent for reinforcement learning: linear convergence, new sampling complexity, and generalized problem classes". Mathematical Reviews, American Mathematical Society.
  60. Teyssière, G., 2023. Review of "Dixon, M.F., Halperin, I. and Bilokon, P. 2020. Machine learning in finance—from theory to practice." Mathematical Reviews, American Mathematical Society.
  61. Teyssière, G., 2025. Review of "Long, J. and Han, J., 2022. Perturbational complexity by distribution mismatch: a systematic analysis of reinforcement learning in reproducing kernel Hilbert space". Mathematical Reviews, American Mathematical Society.
  62. Teyssière, G., 2025. Review of "Li, G., Wei, Y., Chi, Y. and Chen, Y., 2024. Breaking the sample size barrier in model-based reinforcement learning with a generative model". Mathematical Reviews, American Mathematical Society.
  63. Teyssière, G., 2025. Review of "Wang, J., Gao, R. and Zha, H., 2024. Reliable off-policy evaluation for reinforcement learning ". Mathematical Reviews, American Mathematical Society.
  64. Vapnik, V.N., 1998. Statistical learning theory. Adapt. Learn. Syst. Signal Process. Commun. Control Wiley-Intersci. Publ. John Wiley & Sons, Inc., New York. MR1641250
  65. Vapnik, V.N., 2000. The nature of statistical learning theory. Second edition Stat. Eng. Inf. Sci. Springer-Verlag, New York. MR1719582
  66. Wang, J., Gao, R., and Zha, H., 2024. Reliable off-policy evaluation for reinforcement learning. Operation Research, 72, no. 2, 699-716. MR4677384
  67. Wasserman, L. 2004. All of statistics. A concise course in statistical inference. Springer Texts Statist. Springer-Verlag, New York. MR2055670
  68. Yang, I., 2017. A convex optimization approach to distributionally robust Markov decision processes with Wasserstein distance. IEEE Control Systems Letters 1, no.1, 164-169. MR4208527
Updated February 2025.