Abstract

Phishing attacks continue to evolve in sophistication, targeting diverse industry sectors with varying degrees of effectiveness. This study investigates comprehensive feature engineering methodologies for developing predictive models that assess phishing campaign effectiveness across multiple dimensions. Through systematic analysis of industry-specific attack patterns, URL-based lexical features, and multi-layered detection approaches, we propose an integrated framework that combines traditional heuristic methods with advanced machine learning techniques. Our methodology leverages feature selection algorithms applied to the ISCX-URL2016 dataset comprising 9,964 phishing URLs and 10,000 legitimate URLs, identifying nine critical features that demonstrate strong discriminative power in predicting campaign success rates. Analysis reveals that financial services, Software as a Service platforms, and webmail systems constitute the primary targets, accounting for sixty percent of phishing campaigns. The multi-layered detection framework integrating list-based, visual similarity, and heuristic machine learning approaches achieves superior performance through optimal feature engineering. This research contributes actionable insights for prioritizing defensive strategies based on industry vulnerability profiles and predictive feature importance rankings.

Keywords

  • Phishing campaigns
  • feature engineering
  • predictive modeling
  • machine learning
  • URL analysis
  • campaign effectiveness
  • industry targeting
  • multi-layered detection
  • cybersecurity analytics

References

  1. 1 Kheruddin, M. S., Zuber, M. A. E. M., & Radzai, M. M. M. (2024). Phishing attacks: Unraveling tactics, threats, and defenses in the cybersecurity landscape. Authorea Preprints.
  2. 2 Orunsolu, A. A., Sodiya, A. S., & Akinwale, A. T. (2022). A predictive model for phishing detection. Journal of King Saud University-Computer and Information Sciences, 34(2), 232-247.
  3. 3 Mousavi SM, Bahaghighat M. Phishing Website Detection: An In-Depth Investigation of Feature Selection and Deep Learning. Expert Systems. 2025;42(1):e13824.
  4. 4 AYODELE, G. T., ABDULRAHMAN, I. A., ALEBIOSU, J., EGBEDION, G. E., & AKINBOLAJO, O. E. (2025). Human-Centric Cybersecurity: Addressing the Human Factor in Cyber Defense Strategies.
  5. 5 Ren, S., Jin, J., Niu, G., & Liu, Y. (2025). ARCS: Adaptive Reinforcement Learning Framework for Automated Cybersecurity Incident Response Strategy Optimization. Applied Sciences, 15(2), 951.
  6. 6 Xu, X., Liang, T., Zhu, J., Zheng, D., & Sun, T. (2019). Review of classical dimensionality reduction and sample selection methods for large-scale data processing. Neurocomputing, 328, 5-15.
  7. 7 Wang W, Zhang F, Luo X, Zhang S. PDRCNN: precise phishing detection with recurrent convolutional neural networks. Security and Communication Networks. 2019;2019:1-15.
  8. 8 Divakaran DM, Oest A. Phishing detection leveraging machine learning and deep learning: a review. IEEE Security & Privacy. 2022;20(5):86-95.
  9. 9 Sahingoz OK, Buber E, Demir O, Diri B. Machine learning based phishing detection from URLs. Expert Systems with Applications. 2019;117:345-357.
  10. 10 Adebowale MA, Lwin KT, Sánchez E, Hossain MA. Intelligent web-phishing detection and protection scheme using integrated features of Images, frames and text. Expert Systems with Applications. 2019;115:300-313.
  11. 11 Hannousse A, Yahiouche S. Towards benchmark datasets for machine learning based website phishing detection: an experimental study. Engineering Applications of Artificial Intelligence. 2021;104:104347.
  12. 12 Aljofey A, Jiang Q, Qu Q, Huang M, Niyigena JP. An effective phishing detection model based on character level convolutional neural network from URL. Electronics. 2020;9(9):1514.
  13. 13 Bu SJ, Cho SB. Deep character-level anomaly detection based on a convolutional autoencoder for zero-day phishing URL detection. Electronics. 2021;10(12):1492.
  14. 14 Chiew KL, Tan CL, Wong K, Yong KS, Tiong WK. A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Information Sciences. 2019;484:153-166.
  15. 15 Rao RS, Pais AR, Anand P. A heuristic technique to detect phishing websites using TWSVM classifier. Neural Computing and Applications. 2021;33(11):5733-5752.
  16. 16 Sun, T., Yang, J., Li, J., Chen, J., Liu, M., Fan, L., & Wang, X. (2024). Enhancing auto insurance risk evaluation with transformer and SHAP. IEEE Access.
  17. 17 Cao, W., Mai, N. T., & Liu, W. (2025). Adaptive knowledge assessment via symmetric hierarchical Bayesian neural networks with graph symmetry-aware concept dependencies. Symmetry, 17(8), 1332.
  18. 18 Mai, N. T., Cao, W., & Liu, W. (2025). Interpretable knowledge tracing via transformer-Bayesian hybrid networks: Learning temporal dependencies and causal structures in educational data. Applied Sciences, 15(17), 9605.
  19. 19 Chen, S., Liu, Y., Zhang, Q., Shao, Z., & Wang, Z. (2025). Multi‐Distance Spatial‐Temporal Graph Neural Network for Anomaly Detection in Blockchain Transactions. Advanced Intelligent Systems, 2400898.
  20. 20 Wang, Y., Ding, G., Zeng, Z., & Yang, S. (2025). Causal-Aware Multimodal Transformer for Supply Chain Demand Forecasting: Integrating Text, Time Series, and Satellite Imagery. IEEE Access.
  21. 21 Tan, Y., Wu, B., Cao, J., & Jiang, B. (2025). LLaMA-UTP: Knowledge-Guided Expert Mixture for Analyzing Uncertain Tax Positions. IEEE Access.
  22. 22 Ge, Y., Wang, Y., Liu, J., & Wang, J. (2025). GAN-Enhanced Implied Volatility Surface Reconstruction for Option Pricing Error Mitigation. IEEE Access.
  23. 23 Sun, T., Wang, M., & Han, X. (2025). Deep Learning in Insurance Fraud Detection: Techniques, Datasets, and Emerging Trends. Journal of Banking and Financial Dynamics, 9(8), 1-11.
  24. 24 Ren, S., & Chen, S. (2025). Large Language Models for Cybersecurity Intelligence, Threat Hunting, and Decision Support. Computer Life, 13(3), 39-47.
  25. 25 Sarker IH. Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions. SN Computer Science. 2021;2(6):420.
  26. 26 Hu, X., Zhao, X., Wang, J., & Yang, Y. (2025). Information-theoretic multi-scale geometric pre-training for enhanced molecular property prediction. PLoS One, 20(10), e0332640.
  27. 27 Zhang, H., Ge, Y., Zhao, X., & Wang, J. (2025). Hierarchical deep reinforcement learning for multi-objective integrated circuit physical layout optimization with congestion-aware reward shaping. IEEE Access.
  28. 28 Wang, M., Zhang, X., & Han, X. (2025). AI Driven Systems for Improving Accounting Accuracy Fraud Detection and Financial Transparency. Frontiers in Artificial Intelligence Research, 2(3), 403-421.
  29. 29 Chen, S., & Ren, S. (2025). AI-enabled Forecasting, Risk Assessment, and Strategic Decision Making in Finance. Frontiers in Business and Finance, 2(02), 274-295.
  30. 30 Sarker IH. AI-based modeling: techniques, applications and research issues towards automation, intelligent and smart systems. SN Computer Science. 2022;3(2):158.