An approach for vulnerability detection in web applications using graph neural networks and transformers

The increasing complexity of software systems and rising security concerns due to open-source package vulnerabilities have made software vulnerability detection a critical priority. Traditional vulnerability detection methods, including static, dynamic, and hybrid approaches, often struggle with hig...

Full description

Bibliographic Details
Main Authors: Md Sultan, Abu Bakar, Zulzalil, Hazura, Osman, Mohd Hafeez, Tanko, Mohammed Yahaya
Format: Article
Language:English
Published: Little Lion Scientific 2024
Online Access:http://psasir.upm.edu.my/id/eprint/119426/
http://psasir.upm.edu.my/id/eprint/119426/1/119426.pdf
Description
Summary:The increasing complexity of software systems and rising security concerns due to open-source package vulnerabilities have made software vulnerability detection a critical priority. Traditional vulnerability detection methods, including static, dynamic, and hybrid approaches, often struggle with high false-positive rates and limited efficiency. Recently, graph-based neural networks (GNNs) have shown potential in improving vulnerability detection accuracy by representing code as graphs that capture syntax and semantics. This paper introduces a Gated Graph Neural Network (GGNN) framework that leverages multiple graph representations: Abstract Syntax Tree (AST), Data Flow Graph (DFG), Control Flow Graph (CFG), and Code Property Graph (CPG). The model uses these graph structures to detect vulnerabilities in function-level code snippets. Evaluation of our framework on the OWASP WebGoat dataset demonstrates the effectiveness of different graph representations across five major vulnerability types: command injection, weak cryptography, path traversal, SQL injection, and cross-site scripting. Experimental results show that the GGNN+CPG configuration consistently yields high recall for cryptographic weaknesses, while GGNN+CFG excels in detecting control-based vulnerabilities, such as command injections. The framework demonstrates notable enhancements in accuracy, precision, recall, and F1-score across all vulnerability types, with each graph representation contributing unique insights into code structures and vulnerability patterns. These findings highlight the potential of multi-graph GNNs in enhancing code vulnerability detection for cybersecurity applications.