Assistant Professor |
2145 Sheridan Road, Evanston, IL 60208-3109
Email: chang-han.rhee@northwestern.edu
http://chrhee.github.io/
Rare Event Analysis, Large Deviations, Metastability, Heavy Tails
Debiased Multilevel Monte Carlo
Markov chain Monte Carlo, Exact Estimation
Sensitivity Analysis, Gradient Estimation
Experimental Design
Deep Learning Theory
A strange and beautiful mathematical structure called heavy tail underlies seemingly disparate rare events such as the recent global pandemic, the 2012 blackout in India, and the 2007 financial crisis. In fact, the list of examples goes on far beyond literal catastrophes, and heavy tails are pervasive in large-scale complex systems and modern algorithms. Heavy tails provide mathematical models for extreme variability, and under the presence of heavy tails, high-impact rare events are guaranteed to happen eventually. Understanding how they will happen allows us to design resilient systems and control (or even utilize) the impact they inflict. A particularly well-known and simple manifestation of heavy tails is the “80-20 rule”—e.g., the richest 20% of the population control 80% of the wealth—whose variations are repeatedly discovered in a wide variety of application areas. One of the most recent and surprising discovery of heavy tails emerged in deep neural networks.
The unprecedented empirical success of deep neural networks in modern AI tasks is often attributed to the stochastic gradient descent (SGD) algorithm’s mysterious ability to avoid sharp local minima in the loss landscape. Recently, heavy-tailed SGD has attracted significant attention for its ability to escape sharp local minima through a single big jump, and hence, within a realistic training horizon. In practice, however, when SGD exhibits such behaviors, practitioners adopt truncated variations of SGD to temper such movements. At first glance, this truncation scheme—known as gradient clipping—appears to effectively eliminate heavy tails from the SGDs’ dynamics, obliterating the aforementioned effects. Curiously, however, such modifications lead to the opposite of what a naive intuition predicts: heavy-tailed SGDs with gradient clipping almost completely avoid sharp local minima.
To unravel this mystery and potentially further enhance SGD's ability to find flat minima, it is imperative to go beyond the traditional local convergence analysis and acquire a comprehensive understanding of SGDs’ global dynamics within complex non-convex loss landscapes. My recent research with Xingyu Wang provides systematic tools for characterizing the global dynamics of such variations of SGDs through the lens of heavy-tailed large deviations and metastability analysis.
I recently presented our findings in a lecture series [1, 2, 3, 4] in an Isaac Newton Institute Satellite Programme on heavy-tails in machine learning at the Alan Turing Institute. The lecture videos and slides are available in the above links. The first lecture gives an overview of heavy-tailed large deviations approach to the characterization of global dynamics of SGD. The subsequent lectures explain the mathematical foundations in more details and discuss other related topics. Please also see our conference paper for the high-level description of the phenomena, and a follow-up journal paper for a more streamlined development of the mathematical foundation behind them. The preprints of the these papers were recognized with the 2022 IEMS Nemhauser Best Student Paper Prize and the 2nd Place in the 2023 INFORMS George Nicholson Student Paper Competition, respectively.
In addition to providing insights into the global dynamics of heavy-tailed SGDs, this line of research also contributes to the advances in the general mathematical machinery for heavy-tailed dynamical systems:
We proposed a new locally uniform formulation of heavy-tailed large deviations that paves the way to the analysis of global dynamics of SGD.
To facilitate this, we developed a uniform version of M-convergence theory.
Building on these developments, we devised a streamlined framework for lifting the new large deviations bounds to a sharp local stability analysis.
Subsequently we developed further tools to lift the local stability analysis to the sample-path level convergence analysis of the scaled SGD.
Ph.D., Computational and Mathematical Engineering, Stanford University
M.S., Computational and Mathematical Engineering, Stanford University
B.S., Mathematics and Computer Science, Seoul National University
INFORMS George Nicholson Student Paper Competition 2nd Place (as advisor), 2023
NSF CAREER Award, 2022–2027
INFORMS Simulation Society Outstanding Simulation Publication Award, 2016
INFORMS George Nicholson Student Paper Competition Finalist, 2013
Winter Simulation Conference Best Student Paper Award (MS/OR), 2012
Samsung Fellowship, 2008–2012
Jeffrey Wang (PhD at Northwestern IEMS)
Xingyu Wang (PhD at Northwestern IEMS)
Zhe Su (PhD at Northwestern IEMS)
Jingyi Zhao (MS at Northwestern ESAM): graduated in March 2020
Mihail Bazhba (PhD at CWI Stochastics): co-supervised with Bert Zwart; defended in May 2021
Bohan Chen (PhD at CWI Stochastics): co-supervised with Bert Zwart; defended in December 2019
with M. Bazhba, J. Blanchet, B. Zwart
To appear in Mathematics of Operations Research,
arXiv:2003.14381
with B. Chen and B. Zwart
Electronic Journal of Probability, 29: 1–44, 2024.
with P. Glynn
Mathematics of Operations Research, 48(4): 2029–2042, 2022.
with M. Bazhba, B. Zwart
Special issue of Queueing Systems in honor of Masakiyo Miyazawa, 102: 25–52, 2022.
with M. Bazhba, J. Blanchet, and B. Zwart
Annals of Applied Probability, 30(6): 2695–2739, 2020.
with J. Blanchet and B. Zwart
Annals of Probability, 47(6): 3551-3605, 2019.
with B. Chen, J. Blanchet, and B. Zwart
Mathematics of Operations Research, 44(3): 919-942, 2019.
with M. Bazhba, J. Blanchet, and B. Zwart
Queueing Systems, 93(3–4): 195-226, 2019.
with B. Chen and B. Zwart
Advances in Applied Probability, 50(3): 805-832. 2018.
with P. Glynn
Operations Research, 63(5): 1026–1043, 2015.
2016 INFORMS Simulation Society Outstanding Simulation Publication Award
2013 INFORMS George Nicholson Student Paper Competition Finalist
with P. Glynn
Journal of Applied Probability (Special Jubilee Issue), 51A:377-389, 2014.
with E. Zhou and P. Qiu.
arXiv:1710.11616
with X. Wang
arXiv:2309.13820
with X. Wang
2023 Winter Simulation Conference, Advanced Tutorial (2023)
with X. Wang and S. Oh
International Conference on Learning Representations (2022)
2022 IEMS Nemhauser Best Student Paper Prize
with X. Wang
2020 Winter Simulation Conference (2020)
with E. Zhou and P. Qiu
2014 Winter Simulation Conference (2014)
with P. Glynn
2012 Winter Simulation Conference (2012)
2012 WSC Best Student Paper Award (MS/OR)
with X. Wang
arXiv:2307.03479
2023 INFORMS George Nicholson Student Paper Competition 2nd Place
with Z. Su
arXiv:2410.20799
To be submitted to Electronic Journal of Probability
with X. Wang
To be submitted to Management Science
with J. Ryu and I. Seo
To be submitted to Probability Theory and Related Fields
with J. Wang
To be submitted to Stochastic Systems
with X. Wang
with Z. Su
with Z. Su
with J. Wang
with N. Vasmel, and B. Zwart
with B. Zwart and J. Blanchet
with P. Glynn