Alireza Fallah, Ph.D. Candidate, EECS. Based on a joint work with Aryan Mokhtari, UT Austin, and Asu Ozdaglar, MIT.
Imagine sitting in your autonomous car, going for a vacation. Your vehicle should follow the directions provided by the navigation app, and also use multiple sensors to monitor other vehicles, road signs, street light, etc. As a result, during the course of your journey, your car might need to take actions within a few seconds, such as turning or stopping. The question is how should your vehicle be programmed to be able to adapt to the new tasks within a short amount of time and limited data.
In a large fraction of artificial intelligence problems, ranging from robotics to image classification and pattern recognition, the goal is to design systems that use prior experience and knowledge to learn new skills more efficiently. Meta-learning or learning to learn formalizes this goal by using data from previous tasks to learn update rules or model parameters that can be fine-tuned to perform well on new tasks with small amount of data.
A particularly simple and effective approach for this problem, proposed by Finn et al., is model-agnostic meta learning (MAML). This approach finds a meta initialization which can be updated after the arrival of the new task and by using a gradient-based method. Several recent papers studied the empirical performance of various methods in this setting. In our work Fallah et al., we establish the first convergence guarantees of MAML type methods for general non-convex setting.
Let denote the set of all tasks and let p be the probability distribution over tasks T.
Each task Ti has a corresponding loss function which measures how well the model w performs on that task. We also define as the expectation of fi over all tasks, i.e.,
Figure 1a: Traditional statistical learning problem, Figure 1b: MAML problem
In traditional statistical learning, we aim to minimize the loss function f, as we expect its solution to be a proper approximation for the optimal solution of a new unseen task Ti (see Figure 1).
However, in the model-agnostic meta-learning problem (Figure 1), we aim to find the best point w* that performs well as an initial point for learning a new task Ti when we have budget for running a few steps of gradient descent. For simplicity, we focus on finding an initialization w* such that, after observing a new task Ti, one gradient step would lead to a good approximation for the minimizer of . We can formulate this goal as the following optimization problem
Table 1 : Our theoretical results for convergence of MAML, first-order approximation of MAML (FO-MAML), and our proposed Hessian-free MAML (HF-MAML) to a first-order stationary point (FOSP) in nonconvex settings. Here, d is the problem dimension, is a bound on the standard deviation of from its mean , and is a bound on the standard deviation of estimating using a single data point.
- In Figure 2a, we assume gradients and Hessians are exact and focus on task variation. We assume that the batch-size for tasks is equal to number of tasks equal to 20. In this case, even though the gradients are exact, FO-MAML converges to an error level with a gap compared to two others. This is consistent with our results.
- We next consider noisy gradients and Hessians. We also set number of tasks equal to 50 and take a batch of tasks with size 10 at each iteration. In Figure 2b, we choose the variance of gi small to ensure the tasks are relatively similar. Here the additional persistent error of FO-MAML is negligible compared to the other term and all three methods behave similarly.
- In Figure 2c, we increase the variance of gi (tasks are less similar). In this case, the term dominates the other term in the error of FO-MAML, and it has worse performance compared to two others.
Alireza Fallah is a third-year Ph.D. student in the Electrical Engineering and Computer Science department, advised by Professor Asuman Ozdaglar. Before that, he earned a dual B.Sc. degree in Electrical Engineering and Pure Mathematics from Sharif University of Technology, Tehran, Iran. His research interest is mainly in optimization, the theory of machine learning, statistical inference, and Game Theory. His current research focuses on the theoretical understanding of optimization methods as well as developing fast and robust algorithms for machine learning applications. He has received several awards and fellowships, including Mathworks Engineering Fellowship, Siebel Scholarship, First prize in International Mathematics Competition (IMC) in 2016, and the silver medal in International Mathematical Olympiad (IMO) in 2011 and 2012. He can be contacted here.