**NSF CCF-2323532: **An Algebraic, Convex, and Scalable Framework for Kernel Learning with Activation Functions

**Project Description:**

Machine learning has been adopted by a broad range of emerging applications, including health monitoring, autonomous driving, advanced manufacturing, etc. However, any machine learning system cannot be 100% accurate due to the accuracy limitation posed by machine learning algorithms and the circuit-level non-ideal features associated with its hardware implementation. This project investigates a radically new framework for efficient validation of machine learning systems implemented with nano-scale integrated circuits. It aims to identify and synthesize the critical corner cases for which a machine learning system is likely to fail. The project is expected to initialize a paradigm shift in today's design methodology for complex machine learning systems, thereby leading to an immediate impact on a broad range of industrial sectors relying on machine intelligence. In addition, the proposed education activities create a large number of unique training opportunities for both academic and industrial participants, substantially improving the education infrastructure and generate high-quality researchers and practitioners for the society.

To achieve the goals of accuracy, scalability and interpretability, the project poses an algebraic reformulation of the classical problem of kernel learning. Specifically, for any given kernel algebra, the positive kernels in that algebra and their associated feature maps may be represented by positive matrices – leading to a convex optimization problem whose solution yields an explicit feature map which may be interpreted in terms of measurable physical quantities. Based on this framework, activation functions are used to define kernel algebras which are universal, yet which are dense in the set of all kernels and whose feature maps mimic those of the neural tangent kernel which defines neural networks – leading to improved accuracy of the algorithms. Next, a saddle-point representation and primal-dual approach is used to convert the kernel learning problem to quadratic programming – resulting in more scalable kernel learning algorithms. Finally, a singular value decomposition of the resulting feature map is obtained by solving an associated partial differential equation. This decomposition is used to identify key features in the data and, furthermore, yields reduced algorithms which scale linearly with the number of samples – implying scalability to datasets with tens of thousands of samples.

**Project Duration:**

12/1/2023-11/30/2026

**Project Personnel:**

Matthew M. Peet (PI), Arizona State University

Alexandr Talitsky (PhD), Arizona State University