Draft:Strong Lottery Ticket Hypothesis
Review waiting, please be patient.
This may take 8 weeks or more, since drafts are reviewed in no specific order. There are 1,817 pending submissions waiting for review.
Where to get help
How to improve a draft
You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article. Improving your odds of a speedy review To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags. Editor resources
Reviewer tools
|
The Strong Lottery Ticket Hypothesis (SLTH) is a theoretical framework in deep learning suggesting that sufficiently large random neural networks can contain sparse subnetworks capable of approximating any target neural network of smaller size without requiring additional training. This hypothesis builds upon the foundational Lottery Ticket Hypothesis (LTH), which posits that sparse subnetworks can achieve comparable performance to the full network when trained in isolation from initialization.
Origins and Background
[edit]The LTH, introduced by Frankle and Carbin (2018)[1], demonstrated that iterative pruning and weight rewinding could identify sparse subnetworks—referred to as "winning tickets"—that match or exceed the performance of the original network. The SLTH extends this idea, suggesting that certain subnetworks, even without training, can already approximate specific target networks.
The SLTH gained attention due to its implications for efficient deep learning, particularly in identifying "winning tickets" directly from large, randomly initialized networks, eliminating the need for extensive training.[2][3]
Formalization
[edit]The SLTH can be described informally as follows:
With high probability, a random neural network with parameters contains a sparse subnetwork that can approximate any target neural network of a smaller size, e.g., , to within a specified error .[2][4]
Results
[edit]Advancements in this area have focused on improving theoretical guarantees about the size and structure of these sparse subnetworks, often relying on techniques such as the Random Subset Sum (RSS) Problem or its variants.[2][5] These tools provide insights into how sparsity impacts the overparameterization required to ensure the existence of such subnetworks.[2][4]
Theoretical Guarantees
[edit]1. A random network with -parameters can be pruned to approximate target networks with parameters.[5][6] 2. SLTH results have been extended to different neural architectures, including dense, convolutional, and more general equivariant networks.[7][4]
Sparsity Constraints
[edit]The authors of Natale et al. provide proofs for the SLTH in classical settings, such as dense and equivariant networks, with guarantees on the sparsity of the subnetworks.[2]
Challenges and Open Questions
[edit]Despite theoretical guarantees, the practical discovery of winning tickets remains algorithmically challenging:
- Efficiency of Identification: There are no formal guarantees for reliably finding winning tickets efficiently. Empirical methods, like "training by pruning," often require computationally expensive operations such as backpropagation.[5]
- Sparse Structure: The relationship between sparsity levels, overparameterization, and network architectures is still not fully understood.[7]
While empirical methods suggest that sparse subnetworks can be found, reliable algorithms for their discovery are still an open research area.[5]
See also
[edit]References
[edit]- ^ a b Frankle, J.; Carbin, M. (2018). "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks." International Conference on Learning Representations (ICLR).
- ^ a b c d e f Pensia, A.; et al. (2020). "Optimal Lottery Tickets via SUBSETSUM: Logarithmic Overparameterization is Sufficient." Advances in Neural Information Processing Systems (NeurIPS).
- ^ a b Zhou, H.; et al. (2019). "Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask." Advances in Neural Information Processing Systems.
- ^ a b c d Natale, E.; et al. (2024). "On the Sparsity of the Strong Lottery Ticket Hypothesis." NeurIPS.
- ^ a b c d e Malach, E.; et al. (2020). "Proving the Lottery Ticket Hypothesis: Pruning Is All You Need." International Conference on Machine Learning (ICML).
- ^ a b Orseau, L.; et al. (2020). "Logarithmic Pruning Is All You Need." Advances in Neural Information Processing Systems (NeurIPS).
- ^ a b c Ferbach, D.; et al. (2023). "A General Framework For Proving The Equivariant Strong Lottery Ticket Hypothesis." International Conference on Learning Representations (ICLR).