This file describes the dataset released together with the paper "How Biased is the Population of Facebook Users? Comparing the Demographics of Facebook Users and Census Data to Generate Correction Factors", published at the 12 ACM Web Science Conference 2020. The dataset includes the distribution of seven demographic attributes collected from ACS/2017 (American Community Survey - 5-year estimates) provided by the US Census Bureau (https://www.census.gov/programs-surveys/acs) and Gallup Research together with the correction factors. Facebook distributions are not included in this dataset due to legal restrictions.

The correction factors, computed for each demographic dimension and all levels (country, state, and city level) can be very useful for demographic research. One particular use is deriving the actual population for some distribution of interest calculated previously through the Facebook advertising platform. Suppose someone wants to identify how many people are interested in an activity, brand, or any other entity in a particular geographic region, stratified by gender. One can collect the distribution in the Facebook advertising platform (by manually selecting the audiences on the ad creator graphic interface) and derive the population interested in that entity after multiplying the numbers by the appropriate correction factor, that is intended to adjust the estimates for known biases.


We provide data at country, state and city level:
    - Country: gender, age, race, income, education level, political leaning and country of previous residence
    - State: race, income, and education level and political leaning
    - City (most populous cities): race, income, and education level


version: 1.0
Released: 05/19/2020


In order to cite this dataset use the following reference.

Bibtex Citation:
@inproceedings{ribeiro_websci_2020,
author = {Ribeiro, Filipe N. and Benevenuto, Fabr\'{\i}cio and Zagheni, Emilio },
title = {How Biased is the Population of Facebook Users? Comparing the Demographics of Facebook Users and Census Data to Generate Correction Factors},
year = {2020},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
doi = {10.1145/3394231.3397923},
booktitle = {Proceedings of the 12th ACM Web Science Conference 2020},
numpages = {10},
keywords = {social media, advertising, census},
location = {Southampton, UK},
series = {WebSci '20}
}
