Chang-Han Rhee

Assistant Professor
Industrial Engineering and Management Sciences
Northwestern University

Contact

2145 Sheridan Road, Evanston, IL 60208-3109
Email: chang-han.rhee@northwestern.edu
http://chrhee.github.io/

Research

Rare Event Analysis, Large Deviations, Metastability, Heavy Tails
Debiased Multilevel Monte Carlo
Markov chain Monte Carlo, Exact Estimation
Sensitivity Analysis, Gradient Estimation
Experimental Design
Deep Learning Theory

On Heavy-Tails and Global Dynamics of SGD in Deep Learning

A strange and beautiful mathematical structure called “heavy tail” underlies seemingly disparate rare events such as the recent global pandemic, the 2012 blackout in India, and the 2007 financial crisis. In fact, the list of examples goes on far beyond literal catastrophes, and heavy tails are pervasive in large-scale complex systems and modern algorithms. Heavy tails provide mathematical models for extreme variability, and under the presence of heavy tails, high-impact rare events are guaranteed to happen eventually. Understanding how they will happen allows us to design resilient systems and control (or even utilize) the impact they inflict. A particularly well-known and simple manifestation of heavy tails is the “80-20 rule”—e.g., the richest 20% of the population control 80% of the wealth—whose variations are repeatedly discovered in a wide variety of application areas. One of the most recent and surprising discovery of heavy tails emerged in deep neural networks.

The unprecedented empirical success of deep neural networks in modern AI tasks is often attributed to the stochastic gradient descent (SGD) algorithm’s mysterious ability to avoid sharp local minima in the loss landscape. Recently, heavy-tailed SGD has attracted significant attention for its ability to escape sharp local minima through a single big jump, and hence, within a realistic training horizon. In practice, however, when SGD exhibits such behaviors, practitioners adopt truncated variations of SGD to temper such movements. At first glance, this truncation scheme—known as gradient clipping—appears to effectively eliminate heavy tails from the SGDs’ dynamics, obliterating the aforementioned effects. Curiously, however, such modifications lead to the opposite of what a naive intuition predicts: heavy-tailed SGDs with gradient clipping almost completely avoid sharp local minima.

To unravel this mystery and potentially further enhance SGD's ability to find flat minima, it is imperative to go beyond the traditional local convergence analysis and acquire a comprehensive understanding of SGDs’ global dynamics within complex non-convex loss landscapes. My research with Xingyu Wang provides systematic tools for characterizing the global dynamics of such variations of SGDs through the lens of heavy-tailed large deviations and metastability analysis.

I recently presented our findings in a lecture series [1, 2, 3, 4] in an Isaac Newton Institute Satellite Programme on heavy-tails in machine learning at the Alan Turing Institute. Lecture videos and slides are available in the above links. The first lecture gives an overview of heavy-tailed large deviations approach to the characterization of global dynamics of SGD. The subsequent lectures explain the mathematical foundations in more details and discuss related topics. Please also see our conference paper for the high-level description of the phenomena, and a follow-up journal paper for a more streamlined development of the mathematical foundation behind them. The preprints of the these papers were recognized with the 2022 IEMS Nemhauser Best Student Paper Prize and the 2nd Place in the 2023 INFORMS George Nicholson Student Paper Competition, respectively.

In addition to providing insights into the global dynamics of heavy-tailed SGDs, this line of research also contributes to the advances in the general mathematical machinery for heavy-tailed dynamical systems:

We proposed a new locally uniform formulation of heavy-tailed large deviations that paves the way to the analysis of global dynamics of SGD.
To facilitate this, we developed a uniform version of M-convergence theory.
Building on these developments, we devised a streamlined framework for lifting the new large deviations bounds to a sharp local stability analysis.
Subsequently we developed further tools to lift the local stability analysis to the sample-path level convergence analysis of the scaled SGD.

Education

Ph.D., Computational and Mathematical Engineering, Stanford University
M.S., Computational and Mathematical Engineering, Stanford University
B.S., Mathematics and Computer Science, Seoul National University

Honors & Awards

INFORMS George Nicholson Student Paper Competition 2nd Place (as advisor), 2023
NSF CAREER Award, 2022–2027
INFORMS Simulation Society Outstanding Simulation Publication Award, 2016
INFORMS George Nicholson Student Paper Competition Finalist, 2013
Winter Simulation Conference Best Student Paper Award (MS/OR), 2012
Samsung Fellowship, 2008–2012

Students

Jeffrey Wang (PhD at Northwestern IEMS)
Xingyu Wang (PhD at Northwestern IEMS)
Zhe Su (PhD at Northwestern IEMS)
Jingyi Zhao (MS at Northwestern ESAM): graduated in March 2020
Mihail Bazhba (PhD at CWI Stochastics): co-supervised with Bert Zwart; defended in May 2021
Bohan Chen (PhD at CWI Stochastics): co-supervised with Bert Zwart; defended in December 2019

Journal Papers

Sample-path large deviations for unbounded additive functionals of the reflected random walk: with M. Bazhba, J. Blanchet, B. Zwart
To appear in Mathematics of Operations Research, arXiv:2003.14381

Sample-path large deviations for a class of heavy-tailed Markov additive processes: with B. Chen and B. Zwart
Electronic Journal of Probability, 29: 1–44, 2024.

Lyapunov conditions for differentiability of Markov chain expectations: with P. Glynn
Mathematics of Operations Research, 48(4): 2029–2042, 2022.

Large deviations for stochastic fluid networks with Weibullian tails: with M. Bazhba, B. Zwart
Special issue of Queueing Systems in honor of Masakiyo Miyazawa, 102: 25–52, 2022.

Sample-path large deviations for Levy processes and random walks with Weibull increments: with M. Bazhba, J. Blanchet, and B. Zwart
Annals of Applied Probability, 30(6): 2695–2739, 2020.

Sample-path large deviations for Levy processes and random walks with regularly varying increments: with J. Blanchet and B. Zwart
Annals of Probability, 47(6): 3551-3605, 2019.

Efficient rare-event simulation for multiple jump events in regularly varying random walks and compound Poisson processes: with B. Chen, J. Blanchet, and B. Zwart
Mathematics of Operations Research, 44(3): 919-942, 2019.

Queue length asymptotics for the multiple server queue with heavy-tailed Weibull service times: with M. Bazhba, J. Blanchet, and B. Zwart
Queueing Systems, 93(3–4): 195-226, 2019.

Importance sampling of heavy-tailed iterated random functions: with B. Chen and B. Zwart
Advances in Applied Probability, 50(3): 805-832. 2018.

Unbiased estimation with square root convergence for SDE models: with P. Glynn
Operations Research, 63(5): 1026–1043, 2015.
2016 INFORMS Simulation Society Outstanding Simulation Publication Award
2013 INFORMS George Nicholson Student Paper Competition Finalist

Exact estimation for equilibrium of Markov chains: with P. Glynn
Journal of Applied Probability (Special Jubilee Issue), 51A:377-389, 2014.

Submitted Papers

Space filling design for non-linear models: with E. Zhou and P. Qiu.
arXiv:1710.11616

Conference Papers

Importance sampling strategy for heavy-tailed systems with catastrophe principle: with X. Wang
2023 Winter Simulation Conference, Advanced Tutorial (2023)

Eliminating sharp minima from SGD with truncated heavy tails: with X. Wang and S. Oh
International Conference on Learning Representations (2022)
2022 IEMS Nemhauser Best Student Paper Prize

Rare-event simulation for multiple jump events in heavy-tailed Levy processes with infinite activities: with X. Wang
2020 Winter Simulation Conference (2020)

An iterative algorithm for sampling from manifolds: with E. Zhou and P. Qiu
2014 Winter Simulation Conference (2014)

A new approach to unbiased estimation for SDE's: with P. Glynn
2012 Winter Simulation Conference (2012)
2012 WSC Best Student Paper Award (MS/OR)

Working Papers

Large deviations and metastability analysis for heavy-tailed dynamical systems: with X. Wang
arXiv:2307.03479
2023 INFORMS George Nicholson Student Paper Competition 2nd Place

Strongly Efficient Rare-Event Simulation for Multiple-Jump Events in Regularly Varying Levy Processes with Infinite Activities: with X. Wang
arXiv:2309.13820

On tail asymptotics for stationary distributions of heavy-tailed processes: with X. Wang

On sample-path large deviations for Levy processes and random walks with lognormal increments: with Z. Su

On queue length asymptotics for queueing systems with lognormal service times: with Z. Su

On unbiased estimation for Markov chain expectations and Poisson equations: with J. Wang.

On rare event simulation for electric power distribution networks with high variability: with N. Vasmel, and B. Zwart

On quasi-variational problems in heavy-tailed large deviations theory: with B. Zwart and J. Blanchet

On Lyapunov conditions for differentiability of Markov chain expectations: the contracting case: with P. Glynn