CSIS logoCenter for Secure Information Systems

Securing the World's Cyber Infrastructure

Aerial View of the George Mason Fairfax Campus

CSIS Seminar

Offensive Language Identification in Social Media

Speaker:   Dr. Marcos Zampieri, Assistant Professor, George Mason University
When:   May 5, 2023, 2:00 pm - 3:00 pm
Where:   CSIS Conference Room, Research 420


Offensive language is pervasive in social media. Individuals frequently take advantage of the perceived anonymity of computer-mediated communication, using this to engage in behavior that many of them would not consider in real life. One of the most effective strategies for tackling this problem is to use computational methods to identify offense, aggression, and hate speech in user-generated content (e.g., posts, comments, microblogs, etc.). In this talk, I discuss some of the challenges of using NLP to recognize offensive content online. I present the OLID and TBO taxonomies created to annotate offensive language datasets. The challenges of collecting and annotating multilingual datasets for offensive language identification are also discussed. Finally, I present the set-up and the results of the two editions of the OffensEval competition hosted at SemEval-2019 and SemEval-2020 (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsites.google.com%2Fsite%2Foffensevalsharedtask%2Fhome&data=05%7C01%7Cksun3%40gmu.edu%7Cb6eeba0c5fe346c5dfa708db4b636096%7C9e857255df574c47a0c00546460380cb%7C0%7C0%7C638186662410849858%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=9FkFUcyoL%2FbiJCstA1kzlr2YcX%2BAq7V3L519PchgRO8%3D&reserved=0). Join Zoom Meeting https://gmu.zoom.us/j/97990303329?pwd=L0RCSlY3ckpPNEFBcUpzN3hRZjI4dz09 Meeting ID: 979 9030 3329 Passcode: 014701

Speaker Bio

Marcos Zampieri is a tenure-track assistant professor at the School of Computing at George Mason University. He obtained his PhD degree from Saarland University in Germany with a thesis on computational approaches to language variation. Marcos published papers on many topics in Computational Linguistics and Natural Language Processing such as language acquisition and variation, offensive language identification, and machine translation. He has been one of the lead organizers of the workshop series on NLP for Similar Languages, Varieties and Dialects (VarDial) and a lead organizer of OffensEval at SemEval. He is one of the editors of Similar Languages, Varieties, and Dialects: A Computational Perspective published at the series Studies in Natural Language Processing by Cambridge University Press.