Abstract

The global integration of renewable energy sources (RES) into the power grid is paramount for decarbonization but introduces profound challenges due to their stochastic, non-dispatchable, and geographically dispersed nature. Traditional optimization paradigms often fall short in addressing the high-dimensional, non-linear, and multi-temporal complexities inherent to modern renewable-rich power systems. This paper proposes a novel, unified framework that systematically leverages cutting-edge Artificial Intelligence (AI) paradigms to address these challenges across the entire RES lifecycle. The proposed methodology provides a structured decision-making pipeline for problem characterization, AI architecture selection, and robust implementation tailored to four critical domains: (i) probabilistic forecasting and prediction, (ii) strategic resource allocation and sizing, (iii) real-time control and operational management, and (iv) resilient grid integration and stability. The framework incorporates and defines the role of advanced AI architectures, including Transformer-based models for multi-horizon spatio-temporal forecasting, selective state space models like MAMBA for efficient long-sequence processing, large language models (LLMs) for technical knowledge extraction and constraint formulation, and Graph Neural Networks (GNNs) for topology-aware spatial optimization. A comprehensive implementation strategy elaborates on data fusion, hybrid (physics-informed AI) modeling, validation protocols, and deployment considerations for computationally constrained environments. This structured approach bridges the gap between theoretical AI advancements and their practical, impactful deployment, ultimately facilitating a more reliable, efficient, and scalable renewable energy infrastructure.

Keywords

  • Artificial Intelligence
  • Renewable Energy Optimization
  • Deep Learning Frameworks
  • Energy Forecasting
  • Grid Integration
  • Physics-Informed Machine Learning
  • Sustainable Systems

References

  1. 1. International Energy Agency. World Energy Outlook 2023. Paris: IEA Publications; 2023.
  2. 2. Intergovernmental Panel on Climate Change. Climate Change 2022: Mitigation of Climate Change. Cambridge: Cambridge University Press; 2022.
  3. 3. Zhang Y, Wang J, Wang X. Review on probabilistic forecasting of wind power generation. Renew Sustain Energy Rev 2014;32:255-270.
  4. 4. Antonanzas J, Osorio N, Escobar R, et al. Review of photovoltaic power forecasting. Sol Energy 2016;136:78-91.
  5. 5. Zia MF, Elbouchikhi E, Benbouzid M. Microgrids energy management systems: A critical review. Appl Energy 2018;222:1033-1055.
  6. 6. Hannan MA, Hoque MM, Mohamed A, et al. Review of energy storage systems for electric vehicle applications. Renew Sustain Energy Rev 2017;69:771-789.
  7. 7. Wang H, Lei Z, Zhang X, et al. A review of deep learning for renewable energy forecasting. Energy Convers Manag 2019;198:111799.
  8. 8. Ahmad T, Zhang D, Huang C, et al. Artificial intelligence in sustainable energy industry. J Clean Prod 2021;289:125834.
  9. 9. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Adv Neural Inf Process Syst 2017;30:5998-6008.
  10. 10. Gu A, Dao T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 2023.
  11. 11. Zhou J, Cui G, Hu S, et al. Graph neural networks: A review of methods and applications. AI Open 2020;1:57-81.
  12. 12. Mosavi A, Salimi M, Ardabili SF, et al. State of the art of machine learning models in energy systems. Energies 2019;12(7):1301.
  13. 13. Qazi A, Fayaz H, Wadi A, et al. The artificial intelligence revolution in smart grids. Sustain Energy Technol Assess 2022;52:102306.
  14. 14. Boroojeni KG, Amini MH, Nejadpak A, et al. A novel multi-time-scale modeling for electric power demand forecasting. Electr Power Syst Res 2017;142:58-73.
  15. 15. Meng L, Sanseverino ER, Luna A, et al. Microgrid supervisory controllers and energy management systems. Renew Sustain Energy Rev 2016;60:1263-1273.
  16. 16. Hong T, Pinson P, Fan S, et al. Probabilistic energy forecasting. Int J Forecast 2016;32(3):896-913.
  17. 17. Sobri S, Koohi-Kamali S, Rahim NA. Solar photovoltaic generation forecasting methods. Energy Convers Manag 2018;156:459-497.
  18. 18. Voyant C, Notton G, Kalogirou S, et al. Machine learning methods for solar radiation forecasting. Renew Energy 2017;105:569-582.
  19. 19. Raza MQ, Nadarajah M, Ekanayake C. On recent advances in PV output power forecast. Sol Energy 2016;136:125-144.
  20. 20. Kaabeche A, Belhamel M, Ibtiouen R. Optimal sizing method for stand-alone hybrid PV/wind power generation system. Rev Energies Renouvelables 2010;13(2):257-267.
  21. 21. Maleki A, Pourfayaz F. Optimal sizing of autonomous hybrid photovoltaic/wind/battery power system. Sol Energy 2015;115:471-483.
  22. 22. Chakraborty S, Senjyu T, Yona A, et al. Optimal thermal unit commitment integrated with renewable energy sources. IEEJ Trans Electr Electron Eng 2009;4(5):609-617.
  23. 23. Moradi MH, Abedini M. A combination of genetic algorithm and particle swarm optimization for optimal DG location and sizing. Int J Electr Power Energy Syst 2012;34(1):66-74.
  24. 24. Worku MY, Hassan MA, Abido MA. Real time energy management and control of renewable energy based microgrid. Energies 2019;12(2):276.
  25. 25. Khodayar ME, Shahidehpour M, Wu L. Enhancing the dispatchability of variable wind generation by coordination with pumped-storage hydro units. IEEE Trans Power Syst 2013;28(3):2808-2818.
  26. 26. Zhang C, Wu J, Zhou Y, et al. Peer-to-peer energy trading in a microgrid. Appl Energy 2018;220:1-12.
  27. 27. Esmaeel Nezhad A, Rahimnejad A, Gadsden SA. Home energy management systems. IEEE Access 2021;9:165457-165479.
  28. 28. Hossain MS, Madlool NA, Rahim NA, et al. Role of smart grid in renewable energy. Renew Sustain Energy Rev 2016;60:1168-1184.
  29. 29. Akram U, Nadarajah M, Shah R, et al. A review on rapid responsive energy storage technologies for frequency regulation. Renew Sustain Energy Rev 2020;120:109626.
  30. 30. Zhang Z, Zhang D, Qiu RC. Deep reinforcement learning for power system applications. CSEE J Power Energy Syst 2019;6(1):213-225.
  31. 31. Kumar N, Hussain I, Singh B, et al. Framework of maximum power extraction from solar PV panel using self adaptive FLC-MPPT algorithm. IEEE Trans Energy Convers 2023;38(1):176-186.
  32. 32. Zhou H, Zhang S, Peng J, et al. Informer: Beyond efficient transformer for long sequence time-series forecasting. AAAI Conf Artif Intell 2021;35:11106-11115.
  33. 33. Wu H, Xu J, Wang J, et al. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Adv Neural Inf Process Syst 2021;34:22419-22430.
  34. 34. Li S, Jin X, Xuan Y, et al. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Adv Neural Inf Process Syst 2019;32:5243-5253.
  35. 35. Liu Y, Wu H, Wang J, et al. Non-stationary transformers: Exploring the stationarity in time series forecasting. Adv Neural Inf Process Syst 2022;35:9881-9893.
  36. 36. Zhou T, Ma Z, Wen Q, et al. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. Int Conf Mach Learn 2022;162:27268-27286.
  37. 37. Wu H, Hu T, Liu Y, et al. TimesNet: Temporal 2D-variation modeling for general time series analysis. Int Conf Learn Represent 2023.
  38. 38. Smith JT, Warrington A, Linderman SW. Simplified state space layers for sequence modeling. Adv Neural Inf Process Syst 2023;36:12271-12288.
  39. 39. Mehta H, Gupta A, Cutkosky A, et al. Long range language modeling via gated state spaces. arXiv preprint arXiv:2206.13947 2022.
  40. 40. Goel K, Gu A, Donahue C, et al. It's raw! Audio generation with state-space models. Int Conf Mach Learn 2022;162:7616-7633.
  41. 41. Gupta A, Gu A, Berant J. Diagonal state spaces are as effective as structured state spaces. Adv Neural Inf Process Syst 2022;35:22982-22994.
  42. 42. Devlin J, Chang MW, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL-HLT 2019;1:4171-4186.
  43. 43. Liu Y, Ott M, Goyal N, et al. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 2019.
  44. 44. Beltagy I, Lo K, Cohan A. SciBERT: A pretrained language model for scientific text. EMNLP-IJCNLP 2019:3615-3620.
  45. 45. Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2020;36(4):1234-1240.
  46. 46. Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. Int Conf Learn Represent 2017.
  47. 47. Veličković P, Cucurull G, Casanova A, et al. Graph attention networks. Int Conf Learn Represent 2018.
  48. 48. Zhang Z, Cui P, Zhu W. Deep learning on graphs: A survey. IEEE Trans Knowl Data Eng 2022;34(1):249-270.
  49. 49. Wu Z, Pan S, Chen F, et al. A comprehensive survey on graph neural networks. IEEE Trans Neural Netw Learn Syst 2021;32(1):4-24.
  50. 50. García S, Ramírez-Gallego S, Luengo J, et al. Big data preprocessing: methods and prospects. Big Data Anal 2016;1:9.
  51. 51. 51. Zhu X, Goldberg AB. Introduction to semi-supervised learning. Synth Lect Artif Intell Mach Learn 2009;3(1):1-130.
  52. 52. 52. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521(7553):436-444.
  53. 53. Goodfellow I, Bengio Y, Courville A. Deep learning. MIT press; 2016.
  54. 54. Feurer M, Hutter F. Hyperparameter optimization. Automated machine learning: Methods, systems, challenges. Springer; 2019:3-33.
  55. 55. Bischl B, Binder M, Lang M, et al. Hyperparameter optimization: Foundations, algorithms, best practices and open challenges. Wiley Interdiscip Rev Data Min Knowl Discov 2023;13(2):e1484.
  56. 56. Madry A, Makelov A, Schmidt L, et al. Towards deep learning models resistant to adversarial attacks. Int Conf Learn Represent 2018.
  57. 57. Szegedy C, Zaremba W, Sutskever I, et al. Intriguing properties of neural networks. Int Conf Learn Represent 2014.
  58. 58. Han S, Mao H, Dally WJ. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. Int Conf Learn Represent 2016.
  59. 59. Howard AG, Zhu M, Chen B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 2017.
  60. 60. Tan M, Le Q. EfficientNet: Rethinking model scaling for convolutional neural networks. Int Conf Mach Learn 2019:6105-6114.
  61. 61. Rastegari M, Ordonez V, Redmon J, et al. XNOR-Net: ImageNet classification using binary convolutional neural networks. Eur Conf Comput Vis 2016:525-542.
  62. 62. Willmott CJ, Matsuura K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim Res 2005;30:79-82.
  63. 63. Chai T, Draxler RR. Root mean square error (RMSE) or mean absolute error (MAE). Geosci Model Dev Discuss 2014;7:1525-1534.
  64. 64. International Electrotechnical Commission. IEC 61850: Communication networks and systems for power utility automation. Geneva: IEC; 2013.
  65. 65. IEEE Standard Association. IEEE 1547: Standard for interconnection and interoperability of distributed energy resources with associated electric power systems interfaces. New York: IEEE; 2018.
  66. 66. Papernot N, McDaniel P, Sinha A, et al. SoK: Security and privacy in machine learning. IEEE Eur Symp Secur Priv 2018:399-414.
  67. 67. Ribeiro MT, Singh S, Guestrin C. "Why should I trust you?" Explaining the predictions of any classifier. ACM SIGKDD Int Conf Knowl Discov Data Min 2016:1135-1144.
  68. 68. Konečný J, McMahan HB, Yu FX, et al. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492 2016.
  69. 69. Raissi M, Perdikaris P, Karniadakis GE. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J Comput Phys 2019;378:686-707.
  70. 70. Schwartz R, Dodge J, Smith NA, et al. Green AI. Commun ACM 2020;63(12):54-63.
  71. 71. Brown TB, Mann B, Ryder N, et al. Language models are few-shot learners. Adv Neural Inf Process Syst 2020;33:1877-1901.
  72. 72. Radford A, Wu J, Child R, et al. Language models are unsupervised multitask learners. OpenAI blog 2019;1(8):9.
  73. 73. Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale. Int Conf Learn Represent 2021.
  74. 74. Touvron H, Lavril T, Izacard G, et al. LLaMA: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 2023.
  75. 75. Kitaev N, Kaiser Ł, Levskaya A. Reformer: The efficient transformer. Int Conf Learn Represent 2020.
  76. 76. Child R, Gray S, Radford A, et al. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509 2019.
  77. 77. Wang S, Li BZ, Khabsa M, et al. Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768 2020.
  78. 78. Katharopoulos A, Vyas A, Pappas N, et al. Transformers are RNNs: Fast autoregressive transformers with linear attention. Int Conf Mach Learn 2020:5156-5165.
  79. 79. Choromanski K, Likhosherstov V, Dohan D, et al. Rethinking attention with performers. Int Conf Learn Represent 2021.
  80. 80. Tay Y, Dehghani M, Bahri D, et al. Efficient transformers: A survey. ACM Comput Surv 2022;55(6):1-28.
  81. 81. Schmidhuber J. Deep learning in neural networks: An overview. Neural Netw 2015;61:85-117.
  82. 82. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput 1997;9(8):1735-1780.
  83. 83. Sherstinsky A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys D Nonlinear Phenom 2020;404:132306.
  84. 84. Yu Y, Si X, Hu C, et al. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput 2019;31(7):1235-1270.
  85. 85. Greff K, Srivastava RK, Koutník J, et al. LSTM: A search space odyssey. IEEE Trans Neural Netw Learn Syst 2017;28(10):2222-2232.
  86. 86. Cho K, Van Merriënboer B, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. EMNLP 2014:1724-1734.
  87. 87. Sutskever I, Vinyals O, Le QV. Sequence to sequence learning with neural networks. Adv Neural Inf Process Syst 2014;27.
  88. 88. Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. Int Conf Learn Represent 2015.
  89. 89. Luong MT, Pham H, Manning CD. Effective approaches to attention-based neural machine translation. EMNLP 2015:1412-1421.
  90. 90. Gehring J, Auli M, Grangier D, et al. Convolutional sequence to sequence learning. Int Conf Mach Learn 2017:1243-1252.
  91. 91. Parmar N, Vaswani A, Uszkoreit J, et al. Image transformer. Int Conf Mach Learn 2018:4055-4064.
  92. 92. Dai Z, Yang Z, Yang Y, et al. Transformer-XL: Attentive language models beyond a fixed-length context. ACL 2019:2978-2988.
  93. 93. Yang Z, Dai Z, Yang Y, et al. XLNet: Generalized autoregressive pretraining for language understanding. Adv Neural Inf Process Syst 2019;32.
  94. 94. Raffel C, Shazeer N, Roberts A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 2020;21(140):1-67.
  95. 95. Lewis M, Liu Y, Goyal N, et al. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. ACL 2020:7871-7880.
  96. 96. Lan Z, Chen M, Goodman S, et al. ALBERT: A lite BERT for self-supervised learning of language representations. Int Conf Learn Represent 2020.
  97. 97. Clark K, Luong MT, Le QV, et al. ELECTRA: Pre-training text encoders as discriminators rather than generators. Int Conf Learn Represent 2020.
  98. 98. Sanh V, Debut L, Chaumond J, et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 2019.
  99. 99. Sun Y, Wang S, Li Y, et al. ERNIE: Enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223 2019.
  100. 100. Zhang Z, Han X, Liu Z, et al. ERNIE: Enhanced language representation with informative entities. ACL 2019:1441-1451.
  101. 101. Joshi M, Chen D, Liu Y, et al. SpanBERT: Improving pre-training by representing and predicting spans. Trans Assoc Comput Linguist 2020;8:64-77.
  102. 102. Beltagy I, Peters ME, Cohan A. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150 2020.
  103. 103. Zaheer M, Guruganesh G, Dubey KA, et al. Big bird: Transformers for longer sequences. Adv Neural Inf Process Syst 2020;33:17283-17297.
  104. 104. Ainslie J, Ontanon S, Alberti C, et al. ETC: Encoding long and structured inputs in transformers. EMNLP 2020:268-284.
  105. 105. Xiong Y, Zeng Z, Chakraborty R, et al. Nyströmformer: A nyström-based algorithm for approximating self-attention. AAAI Conf Artif Intell 2021;35:14138-14148.
  106. 106. Wang X, Xiong Y, Wei Y, et al. Lightformer: Simplifying and streamlining transformers with long-short term adversarial training. Adv Neural Inf Process Syst 2021;34:19662-19674.
  107. 107. Chen M, Peng H, Fu J, et al. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Adv Neural Inf Process Syst 2021;34:22419-22430.
  108. 108. Liu S, Yu H, Liao C, et al. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. Int Conf Learn Represent 2022.
  109. 109. Wu H, Xu J, Wang J, et al. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Adv Neural Inf Process Syst 2021;34:22419-22430.
  110. 110. Kitaev N, Kaiser Ł, Levskaya A. Reformer: The efficient transformer. Int Conf Learn Represent 2020.
  111. 111. Roy A, Saffar M, Vaswani A, et al. Efficient content-based sparse attention with routing transformers. Trans Assoc Comput Linguist 2021;9:53-68.
  112. 112. Tay Y, Bahri D, Yang L, et al. Sparse sinkhorn attention. Int Conf Mach Learn 2020:9438-9447.
  113. 113. Correia GM, Niculae V, Martins AF. Adaptively sparse transformers. EMNLP-IJCNLP 2019:2174-2184.
  114. 114. Child R, Gray S, Radford A, et al. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509 2019.
  115. 115. Zaheer M, Guruganesh G, Dubey KA, et al. Big bird: Transformers for longer sequences. Adv Neural Inf Process Syst 2020;33:17283-17297.
  116. 116. Wang S, Li BZ, Khabsa M, et al. Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768 2020.
  117. 117. Katharopoulos A, Vyas A, Pappas N, et al. Transformers are RNNs: Fast autoregressive transformers with linear attention. Int Conf Mach Learn 2020:5156-5165.
  118. 118. Choromanski K, Likhosherstov V, Dohan D, et al. Rethinking attention with performers. Int Conf Learn Represent 2021.
  119. 119. Peng H, Pappas N, Yogatama D, et al. Random feature attention. Int Conf Learn Represent 2021.
  120. 120. Shen Z, Zhang M, Zhao H, et al. Efficient attention: Attention with linear complexities. IEEE Winter Conf Appl Comput Vis 2021:3531-3539.
  121. 121. Wang X, Girshick R, Gupta A, et al. Non-local neural networks. IEEE Conf Comput Vis Pattern Recognit 2018:7794-7803.
  122. 122. Hu J, Shen L, Sun G. Squeeze-and-excitation networks. IEEE Conf Comput Vis Pattern Recognit 2018:7132-7141.
  123. 123. Woo S, Park J, Lee JY, et al. CBAM: Convolutional block attention module. Eur Conf Comput Vis 2018:3-19.
  124. 124. Bello I, Zoph B, Vaswani A, et al. Attention augmented convolutional networks. IEEE Int Conf Comput Vis 2019:3286-3295.
  125. 125. Ramachandran P, Parmar N, Vaswani A, et al. Stand-alone self-attention in vision models. Adv Neural Inf Process Syst 2019;32.
  126. 126. Zhao H, Jia J, Koltun V. Exploring self-attention for image recognition. IEEE Conf Comput Vis Pattern Recognit 2020:10076-10085.
  127. 127. Carion N, Massa F, Synnaeve G, et al. End-to-end object detection with transformers. Eur Conf Comput Vis 2020:213-229.
  128. 128. Zhu X, Su W, Lu L, et al. Deformable DETR: Deformable transformers for end-to-end object detection. Int Conf Learn Represent 2021.
  129. 129. Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows. IEEE Int Conf Comput Vis 2021:10012-10022.
  130. 130. Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale. Int Conf Learn Represent 2021.
  131. 131. Touvron H, Cord M, Douze M, et al. Training data-efficient image transformers & distillation through attention. Int Conf Mach Learn 2021:10347-10357.
  132. 132. Yuan L, Chen Y, Wang T, et al. Tokens-to-token vit: Training vision transformers from scratch on imagenet. IEEE Int Conf Comput Vis 2021:558-567.
  133. 133. d'Ascoli S, Touvron H, Leavitt ML, et al. Convit: Improving vision transformers with soft convolutional inductive biases. Int Conf Mach Learn 2021:2286-2296.
  134. 134. Wu H, Xiao B, Codella N, et al. CvT: Introducing convolutions to vision transformers. IEEE Int Conf Comput Vis 2021:22-31.
  135. 135. Chen CF, Fan Q, Panda R. Crossvit: Cross-attention multi-scale vision transformer for image classification. IEEE Int Conf Comput Vis 2021:357-366.
  136. 136. Han K, Xiao A, Wu E, et al. Transformer in transformer. Adv Neural Inf Process Syst 2021;34:15908-15919.
  137. 137. Chu X, Tian Z, Wang Y, et al. Twins: Revisiting spatial attention design in vision transformers. Adv Neural Inf Process Syst 2021;34:9355-9366.
  138. 138. Wang W, Xie E, Li X, et al. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. IEEE Int Conf Comput Vis 2021:568-578.
  139. 139. Liu Z, Hu H, Lin Y, et al. Swin transformer v2: Scaling up capacity and resolution. IEEE Conf Comput Vis Pattern Recognit 2022:12009-12019.
  140. 140. Li Y, Zhang K, Cao J, et al. Localvit: Bringing locality to vision transformers. arXiv preprint arXiv:2104.05707 2021.
  141. 141. Zhang Q, Xu Y, Zhang J, et al. Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. Int J Comput Vis 2023;131(2):1141-1162.
  142. 142. Chen T, Kornblith S, Norouzi M, et al. A simple framework for contrastive learning of visual representations. Int Conf Mach Learn 2020:1597-1607.
  143. 143. He K, Fan H, Wu Y, et al. Momentum contrast for unsupervised visual representation learning. IEEE Conf Comput Vis Pattern Recognit 2020:9729-9738.
  144. 144. Grill JB, Strub F, Altché F, et al. Bootstrap your own latent-a new approach to unsupervised visual representation learning. Adv Neural Inf Process Syst 2020;33:21271-21284.
  145. 145. Caron M, Touvron H, Misra I, et al. Emerging properties in self-supervised vision transformers. IEEE Int Conf Comput Vis 2021:9650-9660.
  146. 146. Chen X, Xie S, He K. An empirical study of training self-supervised vision transformers. IEEE Int Conf Comput Vis 2021:9640-9649.
  147. 147. Bao H, Dong L, Piao S, et al. BEiT: BERT pre-training of image transformers. Int Conf Learn Represent 2022.
  148. 148. Zhou J, Wei C, Wang H, et al. iBOT: Image BERT pre-training with online tokenizer. Int Conf Learn Represent 2022.
  149. 149. Xie Z, Zhang Z, Cao Y, et al. SimMIM: A simple framework for masked image modeling. IEEE Conf Comput Vis Pattern Recognit 2022:9653-9663.
  150. 150. He K, Chen X, Xie S, et al. Masked autoencoders are scalable vision learners. IEEE Conf Comput Vis Pattern Recognit 2022:16000-16009.
  151. 151. Assran M, Caron M, Misra I, et al. Masked siamese networks for label-efficient learning. Eur Conf Comput Vis 2022:456-473.
  152. 152. Oquab M, Darcet T, Moutakanni T, et al. DINOv2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 2023.
  153. 153. Kirillov A, Mintun E, Ravi N, et al. Segment anything. arXiv preprint arXiv:2304.02643 2023.
  154. 154. Ramesh A, Pavlov M, Goh G, et al. Zero-shot text-to-image generation. Int Conf Mach Learn 2021:8821-8831.
  155. 155. Rombach R, Blattmann A, Lorenz D, et al. High-resolution image synthesis with latent diffusion models. IEEE Conf Comput Vis Pattern Recognit 2022:10684-10695.
  156. 156. Saharia C, Chan W, Saxena S, et al. Photorealistic text-to-image diffusion models with deep language understanding. Adv Neural Inf Process Syst 2022;35:36479-36494.
  157. 157. Podell D, English Z, Lacey K, et al. SDXL: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952 2023.
  158. 158. Betker J, Goh G, Jing L, et al. Improving image generation with better captions. Comput Sci Lang 2023;2(3):8.
  159. 159. Yu J, Xu Y, Koh JY, et al. Scaling autoregressive models for content-rich text-to-image generation. J Mach Learn Res 2023;24(140):1-76.
  160. 160. Esser P, Rombach R, Ommer B. Taming transformers for high-resolution image synthesis. IEEE Conf Comput Vis Pattern Recognit 2021:12873-12883.
  161. 161. Dehghani M, Djolonga J, Mustafa B, et al. Scaling vision transformers to 22 billion parameters. Int Conf Mach Learn 2023:7480-7512.
  162. 162. Zhai X, Kolesnikov A, Houlsby N, et al. Scaling vision transformers. IEEE Conf Comput Vis Pattern Recognit 2022:12104-12113.
  163. 163. Steiner A, Kolesnikov A, Zhai X, et al. How to train your ViT? Data, augmentation, and regularization in vision transformers. IEEE Trans Pattern Anal Mach Intell 2022;45(4):4176-4193.
  164. 164. Wortsman M, Ilharco G, Gadre SY, et al. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. Int Conf Mach Learn 2022:23965-23998.
  165. 165. Ilharco G, Ribeiro MT, Wortsman M, et al. Editing models with task arithmetic. Int Conf Learn Represent 2023.
  166. 166. Riquelme C, Puigcerver J, Mustafa B, et al. Scaling vision with sparse mixture of experts. Adv Neural Inf Process Syst 2021;34:8583-8595.
  167. 167. Fedus W, Zoph B, Shazeer N. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. J Mach Learn Res 2022;23(120):1-39.
  168. 168. Lepikhin D, Lee H, Xu Y, et al. GShard: Scaling giant models with conditional computation and automatic sharding. Int Conf Learn Represent 2021.
  169. 169. Clark A, De Las Casas D, Guy A, et al. Unified scaling laws for routed language models. Int Conf Mach Learn 2022:4057-4086.
  170. 170. Zoph B, Bello I, Kumar S, et al. Designing effective sparse expert models. IEEE Trans Pattern Anal Mach Intell 2022;45(6):6849-6863.
  171. 171. Artetxe M, Bhosale S, Goyal N, et al. Efficient large scale language modeling with mixtures of experts. EMNLP 2022:11699-11713.
  172. 172. Du N, Huang Y, Dai AM, et al. GLaM: Efficient scaling of language models with mixture-of-experts. Int Conf Mach Learn 2022:5547-5569.
  173. 173. Rajbhandari S, Ruwase O, Rasley J, et al. ZeRO-infinity: breaking the GPU memory wall for extreme scale deep learning. Int Conf High Perform Comput Netw Storage Anal 2021:59.
  174. 174. Ren J, Rajbhandari S, Aminabadi RY, et al. ZeRO-Offload: Democratizing billion-scale model training. USENIX Annu Tech Conf 2021:551-564.
  175. 175. Rasley J, Rajbhandari S, Ruwase O, et al. DeepSpeed: System optimizations enable training deep learning models with over 100 billion parameters. ACM SIGKDD Int Conf Knowl Discov Data Min 2020:3505-3506.
  176. 176. Shoeybi M, Patwary M, Puri R, et al. Megatron-LM: Training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053 2019.
  177. 177. Narayanan D, Shoeybi M, Casper J, et al. Efficient large-scale language model training on GPU clusters using megatron-LM. Int Conf High Perform Comput Netw Storage Anal 2021:58.
  178. 178. Brown TB, Mann B, Ryder N, et al. Language models are few-shot learners. Adv Neural Inf Process Syst 2020;33:1877-1901.
  179. 179. Chowdhery A, Narang S, Devlin J, et al. PaLM: Scaling language modeling with pathways. J Mach Learn Res 2023;24(240):1-113.
  180. 180. Rae JW, Borgeaud S, Cai T, et al. Scaling language models: Methods, analysis & insights from training gopher. arXiv preprint arXiv:2112.11446 2021.
  181. 181. Hoffmann J, Borgeaud S, Mensch A, et al. Training compute-optimal large language models. Adv Neural Inf Process Syst 2022;35:30016-30030.
  182. 182. Touvron H, Lavril T, Izacard G, et al. LLaMA: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 2023.
  183. 183. Zhang S, Roller S, Goyal N, et al. OPT: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068 2022.
  184. 184. Zeng A, Liu X, Du Z, et al. GLM-130B: An open bilingual pre-trained model. Int Conf Learn Represent 2023.
  185. 185. Scao TL, Fan A, Akiki C, et al. BLOOM: A 176B-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100 2022.
  186. 186. Wei J, Bosma M, Zhao VY, et al. Finetuned language models are zero-shot learners. Int Conf Learn Represent 2022.
  187. 187. Sanh V, Webson A, Raffel C, et al. Multitask prompted training enables zero-shot task generalization. Int Conf Learn Represent 2022.
  188. 188. Ouyang L, Wu J, Jiang X, et al. Training language models to follow instructions with human feedback. Adv Neural Inf Process Syst 2022;35:27730-27744.
  189. 189. Chung HW, Hou L, Longpre S, et al. Scaling instruction-finetuned language models. J Mach Learn Res 2024;25(70):1-53.
  190. 190. Iyer S, Lin XX, Pasunuru R, et al. OPT-IML: Scaling instruction finetuning to 1000+ tasks. arXiv preprint arXiv:2212.12017 2022.
  191. 191. Wang Y, Kordi Y, Mishra S, et al. Self-instruct: Aligning language models with self-generated instructions. ACL 2023:13484-13508.
  192. 192. Taori R, Gulrajani I, Zhang T, et al. Stanford alpaca: An instruction-following llama model. GitHub repository 2023.
  193. 193. Xu C, Sun Q, Zheng K, et al. WizardLM: Empowering large language models to follow complex instructions. arXiv preprint arXiv:2304.12244 2023.
  194. 194. Dubey A, Jauhri A, Pandey A, et al. The stack: 3 TB of permissively licensed source code. arXiv preprint arXiv:2211.15533 2022.
  195. 195. Li R, Allal LB, Zi Y, et al. StarCoder: may the source be with you! arXiv preprint arXiv:2305.06161 2023.
  196. 196. Rozière B, Gehring J, Gloeckle F, et al. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950 2023.
  197. 197. Nijkamp E, Pang B, Hayashi H, et al. Codegen: An open large language model for code with multi-turn program synthesis. Int Conf Learn Represent 2023.
  198. 198. Fried D, Aghajanyan A, Lin J, et al. Incoder: A generative model for code infilling and synthesis. Int Conf Learn Represent 2023.
  199. 199. Chen M, Tworek J, Jun H, et al. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 2021.
  200. 200. Austin J, Odena A, Nye M, et al. Program synthesis with large language models. J Mach Learn Res 2023;24(123):1-67.