My article about Ecological Inference methods on social data on SBBD 2016

What is Ecological Inference?

I’ve recently been to SBBD (Brazilian Symposium on Databases) to present my article on Ecological Inference methods on social data. The conference happened in Salvador, and many interesting and unique papers were presented here. My research originated in Hekima, where we got really interested in the article “Who supported Obama in 2012?: Ecological inference through distribution regression” by Flaxman, Seth R. and Wang, Yu-Xiang and Smola, Alexander J published at ACM SIGKDD 2015. This paper was the first time I read the concept of Ecological Inference, and the following table was enlightening for me

ecological_table

Ecological Inference methods infer clues about individual-level behavior from aggregate data, so the idea is to infer the values of interior cells of the table above for multiple geographical units. My research was on the implications of applying Ecological Inference on data generated by social networks, and one of my research questions was which method yields best results on a dataset collected from Twitter during an interval of 122 days . We showed that using only a Brazilian census (IBGE) and aggregated data of support to a political candidate (sentiment analysis of posts related to Dilma Rousseff), we can infer gender and age of groups of users with 2% to 3% average error (MAE) using King’s method.

If you want to understand exactly what Ecological Inference methods do and how I got to these results, you can check the paper here. Furthermore, I have made the dataset public as well as codes used in my analysis in a Github repo.