CSIS logoCenter for Secure Information Systems

Securing the World's Cyber Infrastructure

Aerial View of the George Mason Fairfax Campus

CSIS Seminar

Generative Model and Knowledge Graph Query Processing and their Applications to Cybersecurity

Speaker:   Dr. Noseong Park, University of North Carolina, Charlotte
When:   March 22, 2018, 11:00 am - 12:00 pm
Where:   Engineering Building, Room 4801

Abstract

In the big data era, sharing data with partners or releasing data to the public frequently occurs. Privacy should be the top priority in the sharing process to protect people who were willing to share valuable information. Anonymization techniques remove identifiers (such as social security numbers) and modify quasi-identifiers (such as gender, ZIP code, age, occupation, and so forth). However, other sensitive attributes that are neither identifiers nor quasi-identifiers are often disclosed without any modification. If adversaries possess background knowledge or other information sources, then they can recover the identification of records We present a data synthesis method based on generative adversarial networks (GANs). Our method, named table-GAN, is specialized for synthesizing tables that contain categorical, ordinal, discrete, and continuous values. Tables synthesized by table-GAN have global statistics similar to that of the original table even though they differ at the record level. The main advantages of generating synthetic tables are twofold: 1) real records are not disclosed and 2) machine learning models trained using very carefully synthesized tables show behavior similar to that of models trained using the original table; they can replace each other (i.e., model compatibility). We also show that table-GAN is strong against membership attacks to infer about the original table after observing generated tables. Knowledge graphs are popular for various applications such as question answering, fact checking, and so forth. SPARQL is a standard knowledge graph query language. We present a method to quickly process aggregated top-k SPARQL queries that involve GROUP BY, ORDER BY, and LIMIT terms. We also present our recent efforts to construct cybersecurity knowledge graphs for cyber threat intelligence where machines can assist security analysts by answering questions such as ``What are the most vulnerable software packages installed in my network and what are the best substitutes for those packages?''.

Speaker Bio

Noseong Park is an Assistant Professor of Software and Information Systems at the University of North Carolina, Charlotte. He received his PhD degree from Computer Science at the University of Maryland, College Park. He has worked extensively at the intersection of artificial intelligence, data mining, network analysis and cybersecurity.