START Conference Manager    

ASIST 2012 Annual Meeting 
Baltimore, MD, October 26-30, 2012

The Art of Creating an Informative Data Collection for Automated Deception Detection: A Corpus of Truths and Lies
Victoria Rubin and Niall Conroy

Monday, 3:30pm


One of the novel research directions in Natural Language Processing and Machine Learning involves creating and developing methods for automatic discernment of deceptive messages from truthful ones. Mistaking intentionally deceptive pieces of information for authentic ones (true to the writer's beliefs) can create negative consequences, since our everyday decision-making, actions, and mood are often impacted by information we encounter. Such research is vital today as it aims to develop tools for the automated recognition of deceptive, disingenuous or fake information (the kind intended to create false beliefs or conclusions in the reader's mind). The ultimate goal is to support truthfulness ratings that signal the trustworthiness of the retrieved information, or alert information seekers to potential deception. To proceed with this agenda, we require elicitation techniques for obtaining samples of both deceptive and truthful messages from study participants in various subject areas. A data collection, or a corpus of truths and lies, should meet certain basic criteria to allow for meaningful analysis and comparison of socio-linguistic behaviors. In this paper we propose solutions and weigh pros and cons of various experimental set-ups in the art of corpus building. The outcomes of three experiments are used to formulate the limitations of an open ended task, the importance of incorporating motivation in the task descriptions, and the role of visual context in creating deceptive narratives.