Is big data killing American democracy?

Big data has radically transformed the relationship of Americans with the government and big technology corporations. The collection and use of data is only meaningful and purposeful when it is intended to benefit most Americans, having taken into account of all the possible downstream repercussions on end users. The ethics of data needs to be designed and engineered deeply into the fissures of information technology, complemented by strong compliant and regulatory frameworks.

Big data is the phenomenon of manipulation and deployment of gigantic datasets, analysed through non-traditional machine learning models that are focussed on predictive analytics to inform decisions on a large scale (Maurushat & Vaile, 2017). In the US, its use abounds in political campaigns, capitalist surveillance, corporate advertising, and national census. Unfortunately, the use of big data has largely deviated from the original purpose of collection, in a potential ‘overreach’, thus creating tensions and jeopardizing democracy. This is a missed opportunity because there are immense benefits that could have been reaped from big data to improve accessibility of information, growth and prosperity for vast number of Americans.

Democracy is usually comprised of five components: elected officials, free and fair elections, freedom of expression, associational autonomy, and inclusive citizenship (Dahl, 1956 & 1971 & 1989 & 1998). In this article, the effectiveness of democracy is assessed through equal accessibility to information and thus equal opportunity to succeed in the same society (Harrison & Sayogo, 2014). Only when citizens are well-informed on equal footing can they participate positively to the democratic processes of elections and healthy public debates (Birkinshaw, 2006). This is largely missing, as those in power through the possession of big data are growing stronger, and those at the receiving ends of technology and information are losing increasingly more.

Big data carries many hopes to improve the society for the better.

Democratization of information, for one, allows citizens to take part in political processes and hold government accountable for policy making (Baack, 2015). Yet, the promises are rarely upheld as big data becomes more of a weapon for self-interest and personal gain. Personal information should not be used for public and commercial purposes, especially without consent or used against the citizens. Filter bubbles on YouTube and Facebook hook users to the sites by making them like more and more content, thus contributing to their advertising avenues. Social media has been used to monitor electoral campaign through sentiment analysis, to nowcast and forecast voting preferences and intentions. The use of big data analytics to discriminate and micro target the population for politics, commerce and employment based on perceived differences of gender, race, ethnicity, origin, postal code, education, sexual orientation, disability and political belief has been widespread, systematically disadvantaging many Americans. Such practices border on the infringement of privacy. Because data protection, confidentially and security are central to democracy, ideals such as civil liberty, human rights and freedom of expression are constantly violated.

Well-regulated and connected Open Data Policy can help improve democracy through executive orders, non-binding resolutions, internal regulations and codified laws (Kalin, 2014). However, making open data work is an uphill task for the government (Ruijer, 2017).

Big data has the potential to improve democracy as the government can easily collect a range of information on its people and thus identify areas to improve upon.

With predictive and descriptive analytics, big data can be translated into useful insights about a population, and those could become the focus of policy-making. One of the most prominent places that big data has found utility are political campaigns. The increase in quantitative political data is supported by a falling cost in buying, keeping, maintaining and synthesizing the data, leading to the construction of individual voter databases by all political parties (McAuliffe & Kettman, 2007). The use of analytics has been effective in differentiating the voters based on their political beliefs, and tendency to turn up. This information gives campaigns the competitive advantage as they can focus their limited resources on the most valuable voters. Predictive scores can be more reliable in estimating voter preferences and behaviours than direct reports (Rogers & Aida 2013; Ansolabehere & Hersh 2012). The main tactic is to make sure that all contact has the power to decisively influence voter preferences, behaviours and turnout, based on a series of scores. These scores include the Behaviour scores, the Support scores and the Responsiveness scores (Nickerson & Rogers 2014, p. 54). The techniques involved in data analysis are just simple regression models with ordinary least squares for continuous variables and logistic regression for binary outcomes (p. 59). Campaigners can then formulate a well-informed strategy on a large scale. Large datasets from past elections can have ‘enough impact’ on influencing outcome as they indicate not only voter swings but provide information on volunteers and donors, as well as potential voters’ tendency to attend rallies and sign petitions (p. 53). Data can also be categorized by their ease of gathering. For instance, estimated years of education, home ownership status, and mortgage are relatively inexpensive data. Whereas magazine subscriptions, car purchases, and consumer tastes can be more expensive and sometimes useless. The most useful data is often not bought, but through interactions with volunteers and donors, and those who have answered phone or door, and involved in online activities. In addition, the official voter files, census, precinct-level data are all free and easily available (p. 56). During the re-election of Obama, a program called Narwhal consolidated data from digital, field, and financial sources into one database (Gallagher 2012; Madrigal 2012). What started as a ten terabyte database, ballooned to over 50 terabytes (Burt 2013). The amount of political data is massive.

With so much big data available one might think that politicians can reach out to voters more easily, learn about their concerns and engage them as much as possible in the political process, thereby improving democracy. Reality is more nuanced than expectation.

As voter databases are official voter files from states, campaigns tend to have disproportionate data on registered repeat voters (Rogers & Aida, 2013). This effectively rules out many of the ‘unlisted’ because targeting enthusiastic past voters makes much more economic sense than digging up a new well of the unregistered non-voters. When data on the unlisted is much less available, the predictive models tend to be substantially less accurate. The existing division between the voters and non-voters thus becomes exacerbated. The problem of political disengagement is not only unresolved, but might be buried even deeper.

In ‘Unlisted in America’, the researchers (Jackman & Spahn, 2016) argue that whether Americans are listed or not is closely tied with wealth: People of colour are significantly more likely to be unlisted than whites and less likely to register and vote, and tend to be much younger, have lower home ownership and no insurance. With a much lower annual income, the unlisted are more financially vulnerable as they have a higher propensity to move houses, and have a lower rate of possessing photo identification in passports and driving licenses.

Precisely, politicians of certain parties have the incentive to make registration and voting more difficult for groups that would otherwise not vote for them based on data. For example, many states have passed laws requiring voters to present photo identification before casting a ballot (Overton, 2007), which has proven to disproportionately affect the minorities, thus denting their turnout in Democratic constituencies (Barreto, Nuno & Sanchez, 2009). In this way, big data is used against certain citizens.

Being listed improves accessibility to campaigns, whereas being unlisted removes the chance to be seen and heard. An electorate that comprises just listed voters has a higher disposition towards conservative policy too. As a result, the minorities and the poor tend to be further disenfranchised by a Republican victory. Additionally, it is often the unregistered that are the least satisfied with democracy, with a less than 10% satisfaction rate as compared to the rest of the groups. Reduced contact with the less affluent, coupled with a low turnout rate mean that the elected candidates may end up crafting policy that favours the rich a lot more than the underprivileged, resulting in a vicious cycle (Bartels, 2009). While not registering might be a personal choice, politicians not targeting the unregistered to help solve their problems and improve their political engagement is not good for democracy. Often, the politically marginalized are marginalized socially and economically too (Jackman & Spahn, 2016, p. 11–12). When the predilections of the electorate and the overall citizenry deviate, the promise of democracy to usher in popular political outcomes is severely tested. Big data in terms of listing can make campaigns cost effective but at the expense of the unlisted. When the listed pool is significantly whiter, older, wealthier and more conservative than the entire country, it is dubious how democracy can be upheld when political inclinations are geared towards conservatism and maintaining the status quo rather than liberalism and equality.

Furthermore, slicing the demographics into political pies is tantamount to marketing the citizens like products, which is not a democratic practice. Voters are free-thinking citizens, and not products to be conveniently micro targeted by advertising to contribute to traffic on websites and applications. Similarly, politicians of all parties are not supposed to be profit-driven tech companies, only concerned with their own corporate objectives.

Brooks on the New York Times (2014) has lamented the ‘death by data’ of political campaigning, arguing that targeting individual voters increasingly crowds out public interest and political debate. He calls data-driven politics ‘impersonalism’, which only considers reactions of voters, and never the ‘idiosyncratic judgment, moral character or creativity’ of candidates.

Big data makes the political process more about voter mobilization and less about persuasion and political dialogues. Over time, the actual policies take a backseat as votes are not able to effectively evaluate the candidates. For instance, both Romney and Obama ‘lacked a policy agenda and produced no mandate’ in 2012. What voters need are leaders with quality, character, vision and solidarity, and not the expertise in deploying analytics models. In addition, the databases are often maintained by external vendors such as ‘Catalist’, ‘the GOP Data Trust’, ‘TargetSmart’ and ‘Labels and Lists’, while the analytics portion is contracted out to, for instance, Cambridge Analytica and tech companies. The capability to manoeuvre big data does not reflect on the calibre of the candidates at all, which breaks the democratic premise of electing the most celebrated politicians by the people, for the people.

The involvement of technology companies in politics is not democratic too, as politicians increasingly rely on big data from social media applications such as Facebook, Twitter, Snapchat, Instagram and YouTube. For instance, Facebook has all sorts of user data at its disposal. This includes information such as age, birthday, district, personality traits, interests, groups, salary, types of cars, restaurants, clothes we like, and online browsing history, all grouped under its Lookalike Audiences, Ethnic Affinity and Custom Audiences functions (Collins & Buchanan, 2018). These details are churned into ‘targetable data points’ highly desired by corporations and politicians alike.

Cambridge Analytica, for example, was exposed to have collected 5000 data points from no fewer than each of the 230 million Americans on Facebook.

Currently, 20% of online advertising in the US is spent on Facebook. Jeffrey Chester, the executive director of the Centre for Digital Democracy has commented that ‘They are not being honest,’ referring to Facebook, which is ‘bundling a dozen different data companies to target an individual customer, and an individual should have access to that bundle as well’ (Angwin, Mattu & Parris, 2016).

Facebook has also made it incredibly challenging to opt out of data brokers, such as Oracle’s Datalogix, which provides 350 types of data to Facebook. Reporter Angwin tried to opt out from 92 brokers but 65 required a form. It is also incredibly difficult to hold advertisers accountable because the ads only appear to certain audiences and disappear fast.

Democracy cannot thrive in such non-transparent settings. Yet, users can volunteer information to improve transparency and accountability.

For instance, AdObserver is a project of Cybersecurity for Democracy at New York University’s Tandon School of Engineering. This is a browser extension, developed by the Algorithmic Transparency Institute, Quartz, New York University, and the University of Grenoble, with advice from ProPublica, WhoTargetsMe, and The Globe And Mail. This initiative helps users to find out which kind of data is collected by Facebook and YouTube. Some of the more useful data emerged from the project show that ‘Stand Up to China’, a low-key group of Republican consultants who are anti-CCP, has invested half a million dollars on ads, mostly targeting Florida, and slowly on Iowa, New Hampshire, South Carolina and Nevada, which are expected to host the first four contests in 2024 (Markay, 2021). When social media companies intervene in politics, democracy is going off track as they are not accountable for how policies turn out.

Yet, with so much grim talk, if utilized and regulated properly, big data has abundant opportunities to improve democracy, in areas such as credit access, jobs, and higher education.

Currently, the challenges facing big data are aplenty, because data can be poorly selected, incomplete, outdated, and most importantly biased (The White House, 2016). This is just a challenge from the inputs itself. In addition, there are problems associated with the design of algorithms and machine learning models, with bad matching systems, recommendation services that limit options and unbalanced datasets. Democracy is about equality, but poor data and design can widen inequality. Conversely, a well-designed system incorporating big data can yield satisfactory results to improve equality. Big data can be a double-edged sword. All stakeholders must work collaboratively to harness the most potential out of big data while being cautious of discrimination and bias.

For example, credit accessibility can be extended to the financially underserved and low-income borrowers, with an alternative scoring system or nonconventional data sources, rather than the traditional FICO score (pp. 11–12).

In employment, individual biases can be reduced significantly as big data can help identify candidates that would be otherwise disadvantaged by traditional metrics such as education and experience. As such, hidden talents will not be buried just because of historical hiring patterns and practices (pp. 13–16).

As higher education becomes one of the most expensive public goods and the biggest liabilities in an American’s life, inequality is increasingly widened as the underprivileged have immense obstacles accessing it. The College Scoreboard, however, helps students to assess college performance, returns to investment and student debt. Colleges can also use data to render support, track progress and foster unique learning mechanisms to proactively help students to complete the education (pp. 16–18).

Big data has infiltrated into all corners of American lives. Most prominently, it has been deployed by politicians and big tech companies for personal gains as they have the power, resources and expertise to manipulate the vast amount of data that the US citizens produce each day. Data privacy and fairness are the biggest concerns, and so far no satisfactory balance has been struck in ensuring that the explosion of digital information is beneficial to all parties. The upward opportunity in big data is enormous, even though currently it is still be plagued by flaws in design and data input. Yet, big data still remains as one of the best tools to achieve equality and progress for most Americans as it has the potential to stamp out human bias and judgement.

The American Dream can only be realized if data about the population is used judiciously for the population with transparency and accountability by the population. The existing ethos of data science, ‘Develop first, question later. Datafication first, regulation afterward’ must be demolished and torn down for democracy to be driven by Americans, and not data.


Ab Observer.

Angwin, Julia, Surya Mattu & Terry Parris Jr.. (2016). Facebook Doesn’t Tell Users Everything It Really Knows About Them.

Ansolabehere, Stephen & Eitan Hersh. (2012). ‘Validation: What Big Data Reveal about Survey Misreporting and the Real Electorate.’ Political Analysis 20(4): 437–459.

Bartels, Larry M.. (2009). Unequal Democracy: The Political Economy of the New Gilded Age. (Princeton, New Jersey: Princeton University Press).

Barreto, Matt A, Stephen A Nuno & Gabriel R Sanchez. (2009). ‘The disproportionate impact of voter-ID requirements on the electorate — New evidence from Indiana.’ PS: Political Science & Politics 42(01): 111–116.

Brooks, D. 2014. “Death by Data.” The New York Times.

Cybersecurity for Democracy.

David W. Nickerson & Todd Rogers. (2014). ‘Political Campaigns and Big Data’, The Journal of Economic Perspectives, Vol. 28, №2, pp. 51–73.

Executive Office of the President. (2016). Big Data: A Report on Algorithmic Systems, Opportunity, and Civil Rights.

Foster, Peter. (2018). The Guardian. Why big data is killing western democracy — and giving authoritarian states a new lease of life.

Gallagher, Sean. (2012). ‘Built to Win: Deep inside Obama’s Campaign Tech’. Ars Technica.

Jackman, Simon & Bradley Spahn. (2016). Unlisted in America.

Koopman, Colin. The New York Times. (2018). How Democracy Can Survive Big Data

McAuliffe, Terry, & Steve Kettman. (2007). What A Party!: My Life Among Democrats: Presidents, Candidates, Donors, Activists, Alligators, and Other Wild Animals. (NY: St. Martin’s Press).

NYU Ad Observatory.

Overton, Spencer. (2007). ‘Voter identification.’ Michigan Law Review pp. 631–681.

Rogers, Todd, & Masa Aida. (2013). ‘Vote Self Prediction Hardly Predicts Who Will Vote, And Is (Misleadingly) Unbiased.’ American Politics Research.

Start writing, no matter what.