When looking at Twitter, computer science professor Keith Vander Linden formerly saw noise: a continuous roar of chaotic 280-character messages. From this tumult, however, he now discerns meaningful patterns: “if you look at enough tweets,” says Vander Linden, “with the right kind of statistical models, you can derive a signal from that, you can find out information about what people are saying about stuff, and from that you can infer what they are thinking.”
An international effort
Vander Linden and Roy Adams, a Calvin senior majoring in computer science and mathematics, are analyzing Twitter to assess the Australian public’s stance towards mining companies. The duo is working with colleagues at the Commonwealth Scientific and Industrial Research Organization—Australia’s national research labs—and the University of Tokyo. Specifically, they hope to identify if mining companies have what Vander Linden calls a “social license to operate,” or public support for their actions.
“[The mining companies] are a big deal in Australia,” VanderLinden said, “Australia has significant reserves of natural resources, precious metals, coal, natural gas. [The mining companies] are the Microsofts, the Facebooks, the IBMs; they’re very influential, but they’re controversial, so we’re looking at Twitter to figure out what people think about them.”
Assisted by artificial intelligence
This process is performed by an artificial intelligence (AI) program that Vander Linden and his colleagues are designing, which “reads” through a 600,000-tweet database, analyzing each tweet word by word, and then classifies each tweet by the stance with which its word usage is associated. Vander Linden and Adams can then evaluate the quantity and content of tweets for or against a given issue.
The AI techniques used by the program have only become effective in the last ten years: “this paradigm shift happened in 2012,” said Vander Linden, “this shift from what were called symbolic AI systems to statistical AI systems; everybody’s moved to statistical, mathematical models.” He added, “what we are doing is riding the wave of that huge shift to the statistical mechanisms.”
Adams said that he joined the project as a student researcher because of the project’s focus on statistical machine learning, which allowed him to apply many of the skills he had developed in recent mathematics courses, like linear algebra. He added, “this project has been really interesting and enjoyable, especially because it’s such a hot field right now.”
Seeking patterns in creation
Vander Linden and his colleagues are aware of the ethical issues which surround big data analysis; having worked in computational linguistics for three decades, he’s observing real change: “the industry is slowly policing itself, and we are following those ethics guidelines as they develop.” He and Adams are taking steps to prevent data misuse: they collect only relevant tweets and tweet information, and they do not publish user information.
Vander Linden and Adams’ work in the field is ultimately driven by more than a desire to understand public opinion: “There are statistical patterns in this gift of language, it’s not random, there is a signal there, it’s part of the created order,” said Vander Linden; “God gave us language, and language has meaning; we’re looking for those meanings and stances hidden in the text."