CSIS logoCenter for Secure Information Systems

Securing the World's Cyber Infrastructure

Aerial View of the George Mason Fairfax Campus

CSIS Seminar

A Preliminary Study on Using Large Language Models in Software Pentesting

Speaker:   Dr. Simon Ou, University of South Florida
When:   March 13, 2024, 1:00 pm - 2:00 pm
Where:   Dean’s Conference Room, Nguyen Engineering Building


Large Language Models (LLM) are perceived to offer promising potential for automating security tasks, such as those found in security operation centers (SOCs). As a first step towards evaluating this perceived potential, we investigate the use of LLMs in software pentesting, where the main task is to automatically identify software security vulnerabilities in source code. We hypothesize that an LLM-based AI agent can be improved over time for a specific security task as human operators interact with it. Such improvement can be made, as a first step, by engineering prompts fed to the LLM based on the responses produced, to include relevant contexts and structures so that the model provides more accurate results. Such engineering efforts become sustainable if the prompts that are engineered to produce better results on current tasks, also produce better results on future unknown tasks. To examine this hypothesis, we utilize the OWASP Benchmark Project 1.2 which contains 2,740 hand-crafted source code test cases containing various types of vulnerabilities. We divide the test cases into training and testing data, where we engineer the prompts based on the training data (only) and evaluate the final system on the testing data. We compare the AI agent’s performance on the testing data against the performance of the agent without prompt engineering. We also compare the AI agent’s results against those from SonarQube, a widely used static code analyzer for security testing. We built and tested multiple versions of the AI agent using different off-the-shelf LLMs – Google’s Gemini-pro, as well as OpenAI’s GPT-3.5-Turbo and GPT-4-Turbo (with both chat completions and assistants APIs). The results show that using LLMs is a viable approach to building an AI agent for software pentesting that can improve through repeated use and prompt engineering.

Speaker Bio

Dr. Xinming Ou is a professor of Computer Science and Engineering at the University of South Florida. Dr. Ou's research is primarily in cyber defense technologies, with a focus on computer systems, programming languages, and human-centric approaches. He has broad interest and ongoing work in security operations, IoT/CPS security, intrusion and forensics analysis, and mobile system security. Dr. Ou's research has been funded by the National Science Foundation, the Department of Defense, the Department of Homeland Security, the Department of Energy, the National Institute of Standards and Technology, HP Labs, and Rockwell Collins. He is a recipient of the 2010 U.S. NSF Faculty Early Career Development (CAREER) Award, a three-time winner of the HP Labs Innovation Research Program (IRP) award, and the 2013 Kansas State University Frankenhoff Outstanding Research Award.