‘You can research so many things through social media’, Dong Nguyen begins. ‘It reflects society, culture and people.’ Dong, PhD candidate in language technologies, uses public tweets to develop models that can automatically process large amounts of social media texts. Those models can be used in social science research. Dong: ‘Linguists, for example, use it to study dialects on social media. I’m also involved in the Twitter Data Grant project at the UT, that studies the effectiveness of cancer campaigns on Twitter.’
Don’t text mining models exist for a long time already? ‘Yes, but they are outdated,’ Dong believes. ‘Most of them are based on news articles and aren’t suitable at all for social media where many people make typos, use slang, dialects or multiple languages within one post.’ According to Dong, traditional assumptions no longer hold in social media context. ‘We need self-learning models to cope with these peculiarities.’
Watch while working
Dong is no stranger to the field. Her impressive resume includes internships at Google, Facebook and Microsoft in the USA, UK and Ireland. Dong: ‘People tend to think it’s next to impossible to do such internships. Truth is, there are quite a number of possibilities as long as you are well prepared.’
Dong recalls her job interview for Google. ‘They asked my opinion on certain methods and I had to develop a programme to solve a problem. They watch while you work and you have to say out loud what you think and do. That felt a bit weird, but it’s wonderful to see how such companies work.’ She got the job and helped to limit spam in web search results.
Next to working in Twente, Dong also has a position at the Meertens Institute, which studies language and culture in The Netherlands. ‘It’s a totally different environment. Interaction with others who can use my models helps to generate new ideas.’