Abstract
The global integration of renewable energy sources (RES) into the power grid is paramount for decarbonization but introduces profound challenges due to their stochastic, non-dispatchable, and geographically dispersed nature. Traditional optimization paradigms often fall short in addressing the high-dimensional, non-linear, and multi-temporal complexities inherent to modern renewable-rich power systems. This paper proposes a novel, unified framework that systematically leverages cutting-edge Artificial Intelligence (AI) paradigms to address these challenges across the entire RES lifecycle. The proposed methodology provides a structured decision-making pipeline for problem characterization, AI architecture selection, and robust implementation tailored to four critical domains: (i) probabilistic forecasting and prediction, (ii) strategic resource allocation and sizing, (iii) real-time control and operational management, and (iv) resilient grid integration and stability. The framework incorporates and defines the role of advanced AI architectures, including Transformer-based models for multi-horizon spatio-temporal forecasting, selective state space models like MAMBA for efficient long-sequence processing, large language models (LLMs) for technical knowledge extraction and constraint formulation, and Graph Neural Networks (GNNs) for topology-aware spatial optimization. A comprehensive implementation strategy elaborates on data fusion, hybrid (physics-informed AI) modeling, validation protocols, and deployment considerations for computationally constrained environments. This structured approach bridges the gap between theoretical AI advancements and their practical, impactful deployment, ultimately facilitating a more reliable, efficient, and scalable renewable energy infrastructure.
Keywords
- Artificial Intelligence
- Renewable Energy Optimization
- Deep Learning Frameworks
- Energy Forecasting
- Grid Integration
- Physics-Informed Machine Learning
- Sustainable Systems
References
- 1. International Energy Agency. World Energy Outlook 2023. Paris: IEA Publications; 2023.
- 2. Intergovernmental Panel on Climate Change. Climate Change 2022: Mitigation of Climate Change. Cambridge: Cambridge University Press; 2022.
- 3. Zhang Y, Wang J, Wang X. Review on probabilistic forecasting of wind power generation. Renew Sustain Energy Rev 2014;32:255-270.
- 4. Antonanzas J, Osorio N, Escobar R, et al. Review of photovoltaic power forecasting. Sol Energy 2016;136:78-91.
- 5. Zia MF, Elbouchikhi E, Benbouzid M. Microgrids energy management systems: A critical review. Appl Energy 2018;222:1033-1055.
- 6. Hannan MA, Hoque MM, Mohamed A, et al. Review of energy storage systems for electric vehicle applications. Renew Sustain Energy Rev 2017;69:771-789.
- 7. Wang H, Lei Z, Zhang X, et al. A review of deep learning for renewable energy forecasting. Energy Convers Manag 2019;198:111799.
- 8. Ahmad T, Zhang D, Huang C, et al. Artificial intelligence in sustainable energy industry. J Clean Prod 2021;289:125834.
- 9. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Adv Neural Inf Process Syst 2017;30:5998-6008.
- 10. Gu A, Dao T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 2023.
- 11. Zhou J, Cui G, Hu S, et al. Graph neural networks: A review of methods and applications. AI Open 2020;1:57-81.
- 12. Mosavi A, Salimi M, Ardabili SF, et al. State of the art of machine learning models in energy systems. Energies 2019;12(7):1301.
- 13. Qazi A, Fayaz H, Wadi A, et al. The artificial intelligence revolution in smart grids. Sustain Energy Technol Assess 2022;52:102306.
- 14. Boroojeni KG, Amini MH, Nejadpak A, et al. A novel multi-time-scale modeling for electric power demand forecasting. Electr Power Syst Res 2017;142:58-73.
- 15. Meng L, Sanseverino ER, Luna A, et al. Microgrid supervisory controllers and energy management systems. Renew Sustain Energy Rev 2016;60:1263-1273.
- 16. Hong T, Pinson P, Fan S, et al. Probabilistic energy forecasting. Int J Forecast 2016;32(3):896-913.
- 17. Sobri S, Koohi-Kamali S, Rahim NA. Solar photovoltaic generation forecasting methods. Energy Convers Manag 2018;156:459-497.
- 18. Voyant C, Notton G, Kalogirou S, et al. Machine learning methods for solar radiation forecasting. Renew Energy 2017;105:569-582.
- 19. Raza MQ, Nadarajah M, Ekanayake C. On recent advances in PV output power forecast. Sol Energy 2016;136:125-144.
- 20. Kaabeche A, Belhamel M, Ibtiouen R. Optimal sizing method for stand-alone hybrid PV/wind power generation system. Rev Energies Renouvelables 2010;13(2):257-267.
- 21. Maleki A, Pourfayaz F. Optimal sizing of autonomous hybrid photovoltaic/wind/battery power system. Sol Energy 2015;115:471-483.
- 22. Chakraborty S, Senjyu T, Yona A, et al. Optimal thermal unit commitment integrated with renewable energy sources. IEEJ Trans Electr Electron Eng 2009;4(5):609-617.
- 23. Moradi MH, Abedini M. A combination of genetic algorithm and particle swarm optimization for optimal DG location and sizing. Int J Electr Power Energy Syst 2012;34(1):66-74.
- 24. Worku MY, Hassan MA, Abido MA. Real time energy management and control of renewable energy based microgrid. Energies 2019;12(2):276.
- 25. Khodayar ME, Shahidehpour M, Wu L. Enhancing the dispatchability of variable wind generation by coordination with pumped-storage hydro units. IEEE Trans Power Syst 2013;28(3):2808-2818.
- 26. Zhang C, Wu J, Zhou Y, et al. Peer-to-peer energy trading in a microgrid. Appl Energy 2018;220:1-12.
- 27. Esmaeel Nezhad A, Rahimnejad A, Gadsden SA. Home energy management systems. IEEE Access 2021;9:165457-165479.
- 28. Hossain MS, Madlool NA, Rahim NA, et al. Role of smart grid in renewable energy. Renew Sustain Energy Rev 2016;60:1168-1184.
- 29. Akram U, Nadarajah M, Shah R, et al. A review on rapid responsive energy storage technologies for frequency regulation. Renew Sustain Energy Rev 2020;120:109626.
- 30. Zhang Z, Zhang D, Qiu RC. Deep reinforcement learning for power system applications. CSEE J Power Energy Syst 2019;6(1):213-225.
- 31. Kumar N, Hussain I, Singh B, et al. Framework of maximum power extraction from solar PV panel using self adaptive FLC-MPPT algorithm. IEEE Trans Energy Convers 2023;38(1):176-186.
- 32. Zhou H, Zhang S, Peng J, et al. Informer: Beyond efficient transformer for long sequence time-series forecasting. AAAI Conf Artif Intell 2021;35:11106-11115.
- 33. Wu H, Xu J, Wang J, et al. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Adv Neural Inf Process Syst 2021;34:22419-22430.
- 34. Li S, Jin X, Xuan Y, et al. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Adv Neural Inf Process Syst 2019;32:5243-5253.
- 35. Liu Y, Wu H, Wang J, et al. Non-stationary transformers: Exploring the stationarity in time series forecasting. Adv Neural Inf Process Syst 2022;35:9881-9893.
- 36. Zhou T, Ma Z, Wen Q, et al. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. Int Conf Mach Learn 2022;162:27268-27286.
- 37. Wu H, Hu T, Liu Y, et al. TimesNet: Temporal 2D-variation modeling for general time series analysis. Int Conf Learn Represent 2023.
- 38. Smith JT, Warrington A, Linderman SW. Simplified state space layers for sequence modeling. Adv Neural Inf Process Syst 2023;36:12271-12288.
- 39. Mehta H, Gupta A, Cutkosky A, et al. Long range language modeling via gated state spaces. arXiv preprint arXiv:2206.13947 2022.
- 40. Goel K, Gu A, Donahue C, et al. It's raw! Audio generation with state-space models. Int Conf Mach Learn 2022;162:7616-7633.
- 41. Gupta A, Gu A, Berant J. Diagonal state spaces are as effective as structured state spaces. Adv Neural Inf Process Syst 2022;35:22982-22994.
- 42. Devlin J, Chang MW, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL-HLT 2019;1:4171-4186.
- 43. Liu Y, Ott M, Goyal N, et al. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 2019.
- 44. Beltagy I, Lo K, Cohan A. SciBERT: A pretrained language model for scientific text. EMNLP-IJCNLP 2019:3615-3620.
- 45. Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2020;36(4):1234-1240.
- 46. Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. Int Conf Learn Represent 2017.
- 47. Veličković P, Cucurull G, Casanova A, et al. Graph attention networks. Int Conf Learn Represent 2018.
- 48. Zhang Z, Cui P, Zhu W. Deep learning on graphs: A survey. IEEE Trans Knowl Data Eng 2022;34(1):249-270.
- 49. Wu Z, Pan S, Chen F, et al. A comprehensive survey on graph neural networks. IEEE Trans Neural Netw Learn Syst 2021;32(1):4-24.
- 50. García S, Ramírez-Gallego S, Luengo J, et al. Big data preprocessing: methods and prospects. Big Data Anal 2016;1:9.
- 51. 51. Zhu X, Goldberg AB. Introduction to semi-supervised learning. Synth Lect Artif Intell Mach Learn 2009;3(1):1-130.
- 52. 52. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521(7553):436-444.
- 53. Goodfellow I, Bengio Y, Courville A. Deep learning. MIT press; 2016.
- 54. Feurer M, Hutter F. Hyperparameter optimization. Automated machine learning: Methods, systems, challenges. Springer; 2019:3-33.
- 55. Bischl B, Binder M, Lang M, et al. Hyperparameter optimization: Foundations, algorithms, best practices and open challenges. Wiley Interdiscip Rev Data Min Knowl Discov 2023;13(2):e1484.
- 56. Madry A, Makelov A, Schmidt L, et al. Towards deep learning models resistant to adversarial attacks. Int Conf Learn Represent 2018.
- 57. Szegedy C, Zaremba W, Sutskever I, et al. Intriguing properties of neural networks. Int Conf Learn Represent 2014.
- 58. Han S, Mao H, Dally WJ. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. Int Conf Learn Represent 2016.
- 59. Howard AG, Zhu M, Chen B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 2017.
- 60. Tan M, Le Q. EfficientNet: Rethinking model scaling for convolutional neural networks. Int Conf Mach Learn 2019:6105-6114.
- 61. Rastegari M, Ordonez V, Redmon J, et al. XNOR-Net: ImageNet classification using binary convolutional neural networks. Eur Conf Comput Vis 2016:525-542.
- 62. Willmott CJ, Matsuura K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim Res 2005;30:79-82.
- 63. Chai T, Draxler RR. Root mean square error (RMSE) or mean absolute error (MAE). Geosci Model Dev Discuss 2014;7:1525-1534.
- 64. International Electrotechnical Commission. IEC 61850: Communication networks and systems for power utility automation. Geneva: IEC; 2013.
- 65. IEEE Standard Association. IEEE 1547: Standard for interconnection and interoperability of distributed energy resources with associated electric power systems interfaces. New York: IEEE; 2018.
- 66. Papernot N, McDaniel P, Sinha A, et al. SoK: Security and privacy in machine learning. IEEE Eur Symp Secur Priv 2018:399-414.
- 67. Ribeiro MT, Singh S, Guestrin C. "Why should I trust you?" Explaining the predictions of any classifier. ACM SIGKDD Int Conf Knowl Discov Data Min 2016:1135-1144.
- 68. Konečný J, McMahan HB, Yu FX, et al. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492 2016.
- 69. Raissi M, Perdikaris P, Karniadakis GE. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J Comput Phys 2019;378:686-707.
- 70. Schwartz R, Dodge J, Smith NA, et al. Green AI. Commun ACM 2020;63(12):54-63.
- 71. Brown TB, Mann B, Ryder N, et al. Language models are few-shot learners. Adv Neural Inf Process Syst 2020;33:1877-1901.
- 72. Radford A, Wu J, Child R, et al. Language models are unsupervised multitask learners. OpenAI blog 2019;1(8):9.
- 73. Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale. Int Conf Learn Represent 2021.
- 74. Touvron H, Lavril T, Izacard G, et al. LLaMA: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 2023.
- 75. Kitaev N, Kaiser Ł, Levskaya A. Reformer: The efficient transformer. Int Conf Learn Represent 2020.
- 76. Child R, Gray S, Radford A, et al. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509 2019.
- 77. Wang S, Li BZ, Khabsa M, et al. Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768 2020.
- 78. Katharopoulos A, Vyas A, Pappas N, et al. Transformers are RNNs: Fast autoregressive transformers with linear attention. Int Conf Mach Learn 2020:5156-5165.
- 79. Choromanski K, Likhosherstov V, Dohan D, et al. Rethinking attention with performers. Int Conf Learn Represent 2021.
- 80. Tay Y, Dehghani M, Bahri D, et al. Efficient transformers: A survey. ACM Comput Surv 2022;55(6):1-28.
- 81. Schmidhuber J. Deep learning in neural networks: An overview. Neural Netw 2015;61:85-117.
- 82. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput 1997;9(8):1735-1780.
- 83. Sherstinsky A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys D Nonlinear Phenom 2020;404:132306.
- 84. Yu Y, Si X, Hu C, et al. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput 2019;31(7):1235-1270.
- 85. Greff K, Srivastava RK, Koutník J, et al. LSTM: A search space odyssey. IEEE Trans Neural Netw Learn Syst 2017;28(10):2222-2232.
- 86. Cho K, Van Merriënboer B, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. EMNLP 2014:1724-1734.
- 87. Sutskever I, Vinyals O, Le QV. Sequence to sequence learning with neural networks. Adv Neural Inf Process Syst 2014;27.
- 88. Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. Int Conf Learn Represent 2015.
- 89. Luong MT, Pham H, Manning CD. Effective approaches to attention-based neural machine translation. EMNLP 2015:1412-1421.
- 90. Gehring J, Auli M, Grangier D, et al. Convolutional sequence to sequence learning. Int Conf Mach Learn 2017:1243-1252.
- 91. Parmar N, Vaswani A, Uszkoreit J, et al. Image transformer. Int Conf Mach Learn 2018:4055-4064.
- 92. Dai Z, Yang Z, Yang Y, et al. Transformer-XL: Attentive language models beyond a fixed-length context. ACL 2019:2978-2988.
- 93. Yang Z, Dai Z, Yang Y, et al. XLNet: Generalized autoregressive pretraining for language understanding. Adv Neural Inf Process Syst 2019;32.
- 94. Raffel C, Shazeer N, Roberts A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 2020;21(140):1-67.
- 95. Lewis M, Liu Y, Goyal N, et al. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. ACL 2020:7871-7880.
- 96. Lan Z, Chen M, Goodman S, et al. ALBERT: A lite BERT for self-supervised learning of language representations. Int Conf Learn Represent 2020.
- 97. Clark K, Luong MT, Le QV, et al. ELECTRA: Pre-training text encoders as discriminators rather than generators. Int Conf Learn Represent 2020.
- 98. Sanh V, Debut L, Chaumond J, et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 2019.
- 99. Sun Y, Wang S, Li Y, et al. ERNIE: Enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223 2019.
- 100. Zhang Z, Han X, Liu Z, et al. ERNIE: Enhanced language representation with informative entities. ACL 2019:1441-1451.
- 101. Joshi M, Chen D, Liu Y, et al. SpanBERT: Improving pre-training by representing and predicting spans. Trans Assoc Comput Linguist 2020;8:64-77.
- 102. Beltagy I, Peters ME, Cohan A. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150 2020.
- 103. Zaheer M, Guruganesh G, Dubey KA, et al. Big bird: Transformers for longer sequences. Adv Neural Inf Process Syst 2020;33:17283-17297.
- 104. Ainslie J, Ontanon S, Alberti C, et al. ETC: Encoding long and structured inputs in transformers. EMNLP 2020:268-284.
- 105. Xiong Y, Zeng Z, Chakraborty R, et al. Nyströmformer: A nyström-based algorithm for approximating self-attention. AAAI Conf Artif Intell 2021;35:14138-14148.
- 106. Wang X, Xiong Y, Wei Y, et al. Lightformer: Simplifying and streamlining transformers with long-short term adversarial training. Adv Neural Inf Process Syst 2021;34:19662-19674.
- 107. Chen M, Peng H, Fu J, et al. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Adv Neural Inf Process Syst 2021;34:22419-22430.
- 108. Liu S, Yu H, Liao C, et al. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. Int Conf Learn Represent 2022.
- 109. Wu H, Xu J, Wang J, et al. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Adv Neural Inf Process Syst 2021;34:22419-22430.
- 110. Kitaev N, Kaiser Ł, Levskaya A. Reformer: The efficient transformer. Int Conf Learn Represent 2020.
- 111. Roy A, Saffar M, Vaswani A, et al. Efficient content-based sparse attention with routing transformers. Trans Assoc Comput Linguist 2021;9:53-68.
- 112. Tay Y, Bahri D, Yang L, et al. Sparse sinkhorn attention. Int Conf Mach Learn 2020:9438-9447.
- 113. Correia GM, Niculae V, Martins AF. Adaptively sparse transformers. EMNLP-IJCNLP 2019:2174-2184.
- 114. Child R, Gray S, Radford A, et al. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509 2019.
- 115. Zaheer M, Guruganesh G, Dubey KA, et al. Big bird: Transformers for longer sequences. Adv Neural Inf Process Syst 2020;33:17283-17297.
- 116. Wang S, Li BZ, Khabsa M, et al. Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768 2020.
- 117. Katharopoulos A, Vyas A, Pappas N, et al. Transformers are RNNs: Fast autoregressive transformers with linear attention. Int Conf Mach Learn 2020:5156-5165.
- 118. Choromanski K, Likhosherstov V, Dohan D, et al. Rethinking attention with performers. Int Conf Learn Represent 2021.
- 119. Peng H, Pappas N, Yogatama D, et al. Random feature attention. Int Conf Learn Represent 2021.
- 120. Shen Z, Zhang M, Zhao H, et al. Efficient attention: Attention with linear complexities. IEEE Winter Conf Appl Comput Vis 2021:3531-3539.
- 121. Wang X, Girshick R, Gupta A, et al. Non-local neural networks. IEEE Conf Comput Vis Pattern Recognit 2018:7794-7803.
- 122. Hu J, Shen L, Sun G. Squeeze-and-excitation networks. IEEE Conf Comput Vis Pattern Recognit 2018:7132-7141.
- 123. Woo S, Park J, Lee JY, et al. CBAM: Convolutional block attention module. Eur Conf Comput Vis 2018:3-19.
- 124. Bello I, Zoph B, Vaswani A, et al. Attention augmented convolutional networks. IEEE Int Conf Comput Vis 2019:3286-3295.
- 125. Ramachandran P, Parmar N, Vaswani A, et al. Stand-alone self-attention in vision models. Adv Neural Inf Process Syst 2019;32.
- 126. Zhao H, Jia J, Koltun V. Exploring self-attention for image recognition. IEEE Conf Comput Vis Pattern Recognit 2020:10076-10085.
- 127. Carion N, Massa F, Synnaeve G, et al. End-to-end object detection with transformers. Eur Conf Comput Vis 2020:213-229.
- 128. Zhu X, Su W, Lu L, et al. Deformable DETR: Deformable transformers for end-to-end object detection. Int Conf Learn Represent 2021.
- 129. Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows. IEEE Int Conf Comput Vis 2021:10012-10022.
- 130. Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale. Int Conf Learn Represent 2021.
- 131. Touvron H, Cord M, Douze M, et al. Training data-efficient image transformers & distillation through attention. Int Conf Mach Learn 2021:10347-10357.
- 132. Yuan L, Chen Y, Wang T, et al. Tokens-to-token vit: Training vision transformers from scratch on imagenet. IEEE Int Conf Comput Vis 2021:558-567.
- 133. d'Ascoli S, Touvron H, Leavitt ML, et al. Convit: Improving vision transformers with soft convolutional inductive biases. Int Conf Mach Learn 2021:2286-2296.
- 134. Wu H, Xiao B, Codella N, et al. CvT: Introducing convolutions to vision transformers. IEEE Int Conf Comput Vis 2021:22-31.
- 135. Chen CF, Fan Q, Panda R. Crossvit: Cross-attention multi-scale vision transformer for image classification. IEEE Int Conf Comput Vis 2021:357-366.
- 136. Han K, Xiao A, Wu E, et al. Transformer in transformer. Adv Neural Inf Process Syst 2021;34:15908-15919.
- 137. Chu X, Tian Z, Wang Y, et al. Twins: Revisiting spatial attention design in vision transformers. Adv Neural Inf Process Syst 2021;34:9355-9366.
- 138. Wang W, Xie E, Li X, et al. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. IEEE Int Conf Comput Vis 2021:568-578.
- 139. Liu Z, Hu H, Lin Y, et al. Swin transformer v2: Scaling up capacity and resolution. IEEE Conf Comput Vis Pattern Recognit 2022:12009-12019.
- 140. Li Y, Zhang K, Cao J, et al. Localvit: Bringing locality to vision transformers. arXiv preprint arXiv:2104.05707 2021.
- 141. Zhang Q, Xu Y, Zhang J, et al. Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. Int J Comput Vis 2023;131(2):1141-1162.
- 142. Chen T, Kornblith S, Norouzi M, et al. A simple framework for contrastive learning of visual representations. Int Conf Mach Learn 2020:1597-1607.
- 143. He K, Fan H, Wu Y, et al. Momentum contrast for unsupervised visual representation learning. IEEE Conf Comput Vis Pattern Recognit 2020:9729-9738.
- 144. Grill JB, Strub F, Altché F, et al. Bootstrap your own latent-a new approach to unsupervised visual representation learning. Adv Neural Inf Process Syst 2020;33:21271-21284.
- 145. Caron M, Touvron H, Misra I, et al. Emerging properties in self-supervised vision transformers. IEEE Int Conf Comput Vis 2021:9650-9660.
- 146. Chen X, Xie S, He K. An empirical study of training self-supervised vision transformers. IEEE Int Conf Comput Vis 2021:9640-9649.
- 147. Bao H, Dong L, Piao S, et al. BEiT: BERT pre-training of image transformers. Int Conf Learn Represent 2022.
- 148. Zhou J, Wei C, Wang H, et al. iBOT: Image BERT pre-training with online tokenizer. Int Conf Learn Represent 2022.
- 149. Xie Z, Zhang Z, Cao Y, et al. SimMIM: A simple framework for masked image modeling. IEEE Conf Comput Vis Pattern Recognit 2022:9653-9663.
- 150. He K, Chen X, Xie S, et al. Masked autoencoders are scalable vision learners. IEEE Conf Comput Vis Pattern Recognit 2022:16000-16009.
- 151. Assran M, Caron M, Misra I, et al. Masked siamese networks for label-efficient learning. Eur Conf Comput Vis 2022:456-473.
- 152. Oquab M, Darcet T, Moutakanni T, et al. DINOv2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 2023.
- 153. Kirillov A, Mintun E, Ravi N, et al. Segment anything. arXiv preprint arXiv:2304.02643 2023.
- 154. Ramesh A, Pavlov M, Goh G, et al. Zero-shot text-to-image generation. Int Conf Mach Learn 2021:8821-8831.
- 155. Rombach R, Blattmann A, Lorenz D, et al. High-resolution image synthesis with latent diffusion models. IEEE Conf Comput Vis Pattern Recognit 2022:10684-10695.
- 156. Saharia C, Chan W, Saxena S, et al. Photorealistic text-to-image diffusion models with deep language understanding. Adv Neural Inf Process Syst 2022;35:36479-36494.
- 157. Podell D, English Z, Lacey K, et al. SDXL: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952 2023.
- 158. Betker J, Goh G, Jing L, et al. Improving image generation with better captions. Comput Sci Lang 2023;2(3):8.
- 159. Yu J, Xu Y, Koh JY, et al. Scaling autoregressive models for content-rich text-to-image generation. J Mach Learn Res 2023;24(140):1-76.
- 160. Esser P, Rombach R, Ommer B. Taming transformers for high-resolution image synthesis. IEEE Conf Comput Vis Pattern Recognit 2021:12873-12883.
- 161. Dehghani M, Djolonga J, Mustafa B, et al. Scaling vision transformers to 22 billion parameters. Int Conf Mach Learn 2023:7480-7512.
- 162. Zhai X, Kolesnikov A, Houlsby N, et al. Scaling vision transformers. IEEE Conf Comput Vis Pattern Recognit 2022:12104-12113.
- 163. Steiner A, Kolesnikov A, Zhai X, et al. How to train your ViT? Data, augmentation, and regularization in vision transformers. IEEE Trans Pattern Anal Mach Intell 2022;45(4):4176-4193.
- 164. Wortsman M, Ilharco G, Gadre SY, et al. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. Int Conf Mach Learn 2022:23965-23998.
- 165. Ilharco G, Ribeiro MT, Wortsman M, et al. Editing models with task arithmetic. Int Conf Learn Represent 2023.
- 166. Riquelme C, Puigcerver J, Mustafa B, et al. Scaling vision with sparse mixture of experts. Adv Neural Inf Process Syst 2021;34:8583-8595.
- 167. Fedus W, Zoph B, Shazeer N. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. J Mach Learn Res 2022;23(120):1-39.
- 168. Lepikhin D, Lee H, Xu Y, et al. GShard: Scaling giant models with conditional computation and automatic sharding. Int Conf Learn Represent 2021.
- 169. Clark A, De Las Casas D, Guy A, et al. Unified scaling laws for routed language models. Int Conf Mach Learn 2022:4057-4086.
- 170. Zoph B, Bello I, Kumar S, et al. Designing effective sparse expert models. IEEE Trans Pattern Anal Mach Intell 2022;45(6):6849-6863.
- 171. Artetxe M, Bhosale S, Goyal N, et al. Efficient large scale language modeling with mixtures of experts. EMNLP 2022:11699-11713.
- 172. Du N, Huang Y, Dai AM, et al. GLaM: Efficient scaling of language models with mixture-of-experts. Int Conf Mach Learn 2022:5547-5569.
- 173. Rajbhandari S, Ruwase O, Rasley J, et al. ZeRO-infinity: breaking the GPU memory wall for extreme scale deep learning. Int Conf High Perform Comput Netw Storage Anal 2021:59.
- 174. Ren J, Rajbhandari S, Aminabadi RY, et al. ZeRO-Offload: Democratizing billion-scale model training. USENIX Annu Tech Conf 2021:551-564.
- 175. Rasley J, Rajbhandari S, Ruwase O, et al. DeepSpeed: System optimizations enable training deep learning models with over 100 billion parameters. ACM SIGKDD Int Conf Knowl Discov Data Min 2020:3505-3506.
- 176. Shoeybi M, Patwary M, Puri R, et al. Megatron-LM: Training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053 2019.
- 177. Narayanan D, Shoeybi M, Casper J, et al. Efficient large-scale language model training on GPU clusters using megatron-LM. Int Conf High Perform Comput Netw Storage Anal 2021:58.
- 178. Brown TB, Mann B, Ryder N, et al. Language models are few-shot learners. Adv Neural Inf Process Syst 2020;33:1877-1901.
- 179. Chowdhery A, Narang S, Devlin J, et al. PaLM: Scaling language modeling with pathways. J Mach Learn Res 2023;24(240):1-113.
- 180. Rae JW, Borgeaud S, Cai T, et al. Scaling language models: Methods, analysis & insights from training gopher. arXiv preprint arXiv:2112.11446 2021.
- 181. Hoffmann J, Borgeaud S, Mensch A, et al. Training compute-optimal large language models. Adv Neural Inf Process Syst 2022;35:30016-30030.
- 182. Touvron H, Lavril T, Izacard G, et al. LLaMA: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 2023.
- 183. Zhang S, Roller S, Goyal N, et al. OPT: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068 2022.
- 184. Zeng A, Liu X, Du Z, et al. GLM-130B: An open bilingual pre-trained model. Int Conf Learn Represent 2023.
- 185. Scao TL, Fan A, Akiki C, et al. BLOOM: A 176B-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100 2022.
- 186. Wei J, Bosma M, Zhao VY, et al. Finetuned language models are zero-shot learners. Int Conf Learn Represent 2022.
- 187. Sanh V, Webson A, Raffel C, et al. Multitask prompted training enables zero-shot task generalization. Int Conf Learn Represent 2022.
- 188. Ouyang L, Wu J, Jiang X, et al. Training language models to follow instructions with human feedback. Adv Neural Inf Process Syst 2022;35:27730-27744.
- 189. Chung HW, Hou L, Longpre S, et al. Scaling instruction-finetuned language models. J Mach Learn Res 2024;25(70):1-53.
- 190. Iyer S, Lin XX, Pasunuru R, et al. OPT-IML: Scaling instruction finetuning to 1000+ tasks. arXiv preprint arXiv:2212.12017 2022.
- 191. Wang Y, Kordi Y, Mishra S, et al. Self-instruct: Aligning language models with self-generated instructions. ACL 2023:13484-13508.
- 192. Taori R, Gulrajani I, Zhang T, et al. Stanford alpaca: An instruction-following llama model. GitHub repository 2023.
- 193. Xu C, Sun Q, Zheng K, et al. WizardLM: Empowering large language models to follow complex instructions. arXiv preprint arXiv:2304.12244 2023.
- 194. Dubey A, Jauhri A, Pandey A, et al. The stack: 3 TB of permissively licensed source code. arXiv preprint arXiv:2211.15533 2022.
- 195. Li R, Allal LB, Zi Y, et al. StarCoder: may the source be with you! arXiv preprint arXiv:2305.06161 2023.
- 196. Rozière B, Gehring J, Gloeckle F, et al. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950 2023.
- 197. Nijkamp E, Pang B, Hayashi H, et al. Codegen: An open large language model for code with multi-turn program synthesis. Int Conf Learn Represent 2023.
- 198. Fried D, Aghajanyan A, Lin J, et al. Incoder: A generative model for code infilling and synthesis. Int Conf Learn Represent 2023.
- 199. Chen M, Tworek J, Jun H, et al. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 2021.
- 200. Austin J, Odena A, Nye M, et al. Program synthesis with large language models. J Mach Learn Res 2023;24(123):1-67.