Chang-Han Rhee

Under Construction 

Assistant Professor
Industrial Engineering and Management Sciences
Northwestern University



2145 Sheridan Road, Evanston, IL 60208-3109


On Heavy-Tails and Global Dynamics of SGD in Deep Learning

A strange and beautiful mathematical structure called “heavy tail” underlies seemingly disparate rare events such as the recent global pandemic, the 2012 blackout in India, and the 2007 financial crisis. In fact, the list of examples goes on far beyond literal catastrophes, and heavy tails are pervasive in large-scale complex systems and modern algorithms. Heavy tails provide mathematical models for extreme variability, and under the presence of heavy tails, high-impact rare events are guaranteed to happen eventually. Understanding how they will happen allows us to design resilient systems and control (or even utilize) the impact they inflict. A particularly well-known and simple manifestation of heavy tails is the “80-20 rule”—e.g., the richest 20% of the population control 80% of the wealth—whose variations are repeatedly discovered in a wide variety of application areas. One of the most recent and surprising discovery of heavy tails emerged in deep neural networks.

The unprecedented empirical success of deep neural networks in modern AI tasks is often attributed to the stochastic gradient descent (SGD) algorithm’s mysterious ability to avoid sharp local minima in the loss landscape. Recently, heavy-tailed SGD has attracted significant attention for its ability to escape sharp local minima through a single big jump, and hence, within a realistic training horizon. In practice, however, when SGD exhibits such behaviors, practitioners adopt truncated variations of SGD to temper such movements. At first glance, this truncation scheme—known as gradient clipping—appears to effectively eliminate heavy tails from the SGDs’ dynamics, obliterating the aforementioned effects. Curiously, however, such modifications lead to the opposite of what a naive intuition predicts: heavy-tailed SGDs with gradient clipping almost completely avoid sharp local minima.

To unravel this mystery and potentially further enhance SGD's ability to find flat minima, it is imperative to go beyond the traditional local convergence analysis and acquire a comprehensive understanding of SGDs’ global dynamics within complex non-convex loss landscapes. My research with Xingyu Wang provides systematic tools for characterizing the global dynamics of such variations of SGDs through the lens of heavy-tailed large deviations and metastability analysis.

I recently presented our findings in a lecture series [1, 2, 3, 4] in an Isaac Newton Institute Satellite Programme on heavy-tails in machine learning at the Alan Turing Institute. Lecture videos and slides are available in the above links. The first lecture gives an overview of heavy-tailed large deviations approach to the characterization of global dynamics of SGD. The subsequent lectures explain the mathematical foundations in more details and discuss related topics. Please also see our conference paper for the high-level description of the phenomena, and a follow-up journal paper for a more streamlined development of the mathematical foundation behind them. The preprints of the these papers were recognized with the 2022 IEMS Nemhauser Best Student Paper Prize and the 2nd Place in the 2023 INFORMS George Nicholson Student Paper Competition, respectively.

In addition to providing insights into the global dynamics of heavy-tailed SGDs, this line of research also contributes to the advances in the general mathematical machinery for heavy-tailed dynamical systems:


Honors & Awards


Journal Papers

Sample-path large deviations for unbounded additive functionals of the reflected random walk

with M. Bazhba, J. Blanchet, B. Zwart
To appear in Mathematics of Operations Research, arXiv:2003.14381

Sample-path large deviations for a class of heavy-tailed Markov additive processes

with B. Chen and B. Zwart
Electronic Journal of Probability, 29: 1–44, 2024.

Lyapunov conditions for differentiability of Markov chain expectations

with P. Glynn
Mathematics of Operations Research, 48(4): 2029–2042, 2022.

Large deviations for stochastic fluid networks with Weibullian tails

with M. Bazhba, B. Zwart
Special issue of Queueing Systems in honor of Masakiyo Miyazawa, 102: 25–52, 2022.

Sample-path large deviations for Levy processes and random walks with Weibull increments

with M. Bazhba, J. Blanchet, and B. Zwart
Annals of Applied Probability, 30(6): 2695–2739, 2020.

Sample-path large deviations for Levy processes and random walks with regularly varying increments

with J. Blanchet and B. Zwart
Annals of Probability, 47(6): 3551-3605, 2019.

Efficient rare-event simulation for multiple jump events in regularly varying random walks and compound Poisson processes

with B. Chen, J. Blanchet, and B. Zwart
Mathematics of Operations Research, 44(3): 919-942, 2019.

Queue length asymptotics for the multiple server queue with heavy-tailed Weibull service times

with M. Bazhba, J. Blanchet, and B. Zwart
Queueing Systems, 93(3–4): 195-226, 2019.

Importance sampling of heavy-tailed iterated random functions

with B. Chen and B. Zwart
Advances in Applied Probability, 50(3): 805-832. 2018.

Unbiased estimation with square root convergence for SDE models

with P. Glynn
Operations Research, 63(5): 1026–1043, 2015.
2016 INFORMS Simulation Society Outstanding Simulation Publication Award
2013 INFORMS George Nicholson Student Paper Competition Finalist

Exact estimation for equilibrium of Markov chains

with P. Glynn
Journal of Applied Probability (Special Jubilee Issue), 51A:377-389, 2014.

Submitted Papers

Space filling design for non-linear models

with E. Zhou and P. Qiu.

Conference Papers

Importance sampling strategy for heavy-tailed systems with catastrophe principle

with X. Wang
2023 Winter Simulation Conference, Advanced Tutorial (2023)

Eliminating sharp minima from SGD with truncated heavy tails

with X. Wang and S. Oh
International Conference on Learning Representations (2022)
2022 IEMS Nemhauser Best Student Paper Prize

Rare-event simulation for multiple jump events in heavy-tailed Levy processes with infinite activities

with X. Wang
2020 Winter Simulation Conference (2020)

An iterative algorithm for sampling from manifolds

with E. Zhou and P. Qiu
2014 Winter Simulation Conference (2014)

A new approach to unbiased estimation for SDE's

with P. Glynn
2012 Winter Simulation Conference (2012)
2012 WSC Best Student Paper Award (MS/OR)

Working Papers

Large deviations and metastability analysis for heavy-tailed dynamical systems

with X. Wang
2023 INFORMS George Nicholson Student Paper Competition 2nd Place

Strongly Efficient Rare-Event Simulation for Multiple-Jump Events in Regularly Varying Levy Processes with Infinite Activities

with X. Wang

On tail asymptotics for stationary distributions of heavy-tailed processes

with X. Wang

On sample-path large deviations for Levy processes and random walks with lognormal increments

with Z. Su

On queue length asymptotics for queueing systems with lognormal service times

with Z. Su

On unbiased estimation for Markov chain expectations and Poisson equations

with J. Wang.

On rare event simulation for electric power distribution networks with high variability

with N. Vasmel, and B. Zwart

On quasi-variational problems in heavy-tailed large deviations theory

with B. Zwart and J. Blanchet

On Lyapunov conditions for differentiability of Markov chain expectations: the contracting case

with P. Glynn