When a Cambridge, Massachusetts resident was murdered a few years ago, the repercussions took on an unexpected dimension that challenged me as a city councillor. The murder, still unsolved, shocked everyone, and I was deluged with questions I couldn’t answer. How did it happen? Who did it? Is the area safe? The DA and Cambridge Police Department were tight-lipped, so I had nothing to say to these primary questions. But people also asked secondary questions. Who was the victim? Where did the person live? Were there other people involved? These questions—I call them comfort questions that help people contextualize an incident—proved surprisingly easy to answer. In discovering that, I learned just how severely seemingly innocuous government actions impact people’s reasonable rights to privacy, even when they’re dead.
With no more than the victim’s name and my computer, I was able to tell people that the person probably lived alone, was of a certain age, had a home worth a certain amount of money, and belonged to a particular political party. It was then that I realized how disconnected the general citizenry has become from their government’s collection and use—or misuse—of what most of us assume to be private, or at least strictly transactional, information. And in an age where information, commonly referred to as data, is regularly commercialized and weaponized, it is ever more important that people understand this issue and demand that their government treat personal information with the respect that it deserves.
To understand today’s issues of government data collection and privacy, it is useful to take a trip. Not a physical one, but rather through time. For this trip we’ll go back before artificial intelligence, before cell phones, before electronic databases, before the Internet, and before the telephone. We’ll go back before automobiles, before the steam engine, and even before regular horse and coach travel. We’ll go back to the days of the Pilgrims, of the original colonies, when information travelled at the speed of people walking, talking, and writing by hand.
In other words, we need to go back to when everything was slower. In 1630 or 1750, if you wanted to see what sort of information your government collected about anyone or anything, you pretty much had to walk to City Hall. In fact, if you wanted to interact with your government, you had to walk to wherever it was your government was located. Which is why many towns and cities in the eastern part of the country are so small. It was too difficult to govern someone two days’ walk away.
Of course, back then it was also difficult to collect, analyze, and disseminate information. If they existed at all, census data, business licenses, birth certificates, and voting records were stored in paper form in a location that was, at best, a pain to get to. Walking three miles to get the chance to look at an official document really had to be worth the effort. Which it rarely was. And while people might know all sorts of “private” information about their fellow townspeople, for the most part this personal knowledge stopped at the town line and was limited to things like who was a drunk, whose son had run off to join a whaler, and whose cows were calving. If you weren’t something of a neighbor, the odds of your knowing much about someone, no matter how much you might want to, were pretty low unless you had the considerable amount of time it took to visit the local seat of government where all the records were kept. Information like someone’s date of birth and address was information known, generally, to people who actually knew each other in some way.
As the US grew westward, government jurisdictions tended to get bigger to reflect the speed of horses and then automobiles. The records collected by these larger governmental jurisdictions remained inherently inaccessible to most people. It took time and effort to drive 30 miles to City Hall to look up a building’s information or learn about someone’s voting record. If it wasn’t worth the effort, and it rarely was, people just wouldn’t bother.
For years, decades, and centuries, the government’s collecting data about boring things has remained a fairly innocuous and ignored issue. Assessor’s information on one’s home. The election commission’s information on one’s voting history. The animal control commission’s information about how people name their dogs. Who really cares? We’ve collected that data for a very long time, it’s just the way government works, providing the information government needs to function. Fifty years ago, we had pretty much the same data collection. Why change things now? What is different?
What changed things was the Internet. Over the course of a few decades, the distance between the government’s data and the general public went from an afternoon’s ride to a few taps on a keyboard, and the distances that insulated intuitively private data from probing eyes disappeared in a flash of fiber-optic cable.
What also changed, and continues to grow, is the power of computers to collect, combine, and analyze data sets. Address, name, gender, birthdate, occupation, and phone number are some of the data points that may be held at the local election commission but is often available to members of the public—sometimes for free. The Cambridge street list, for example, is created by the census form which many people feel it is their duty to fill out and submit. This list contains the name and address of every resident 17 years old or older and may be purchased for 20 dollars. Candidates for Cambridge City Council get more detailed voter information, including in which past elections people voted. The state of Washington requires that “The county auditor or secretary of state shall promptly furnish current lists of registered voters in his or her possession, at actual reproduction cost, to any person requesting such information.” For one cent per name, Louisiana will provide voter information including party affiliation, date of birth, sex, and race. While statutory language may limit the use of voter and census information, data is fungible; once the information is made public, it’s difficult, if not impossible, to ensure that it will not be abused.
The ability of programmers to mine, combine, and manipulate public data is what makes the collection of this data and its subsequent availability to various users particularly problematic. Working off a voter database, it takes only a few moments for anyone with even limited talent to create a list of, say, women over the age of 65 who are most likely living alone. That list can then be sorted by street and then—despite its questionable legality—those women may be targeted with particular advertising campaigns about, say, home security.
In Massachusetts, the Registry of Deeds has online information about the mortgages and other information related to real property because, naturally, that information has been recorded in the registry and is now public. Anyone with time on their hands can see how many times a particular piece of property has been mortgaged and for how much, information about powers of attorney, and images of property liens. A creative mind can find many interesting things to do with this information that were never considered when these data sets were created.
Even information that governments originally collected and maintained in paper form is increasingly available in digital form through electronic scanning and optical character recognition. Both hardware and software are getting better and better at accurately collecting and digitizing such data in less and less time. Historical societies, genealogists, universities, other organizations, and private citizens often make their collections available online, creating an ever-growing number of public databases about intuitively private information. Increasingly powerful and automated systems can sift through these troves of data and analyze them as someone’s—anyone’s—interests and abilities dictate.
All of this is not to say that governments should not collect information. Whether it’s setting taxes or locating polling stations, government needs information to function. The challenge is in limiting the information collected to what the government needs to function and keeping the information in a format that is narrowly defined for that function. Governments at all levels need to rethink their data programs and design collection and security policies that limit the public’s exposure to data-related risks. Data for data’s sake and easy availability of that data, whether for the goal of transparency or in an effort to improve efficiency, can violate the public’s expectation of reasonable privacy when it comes to their interactions with government. Add in the challenge of keeping databases and other aspects of digital governance secure, something numerous data breaches attest to, and it becomes even more important for governments to limit how much information they collect, who has access to it, and how it is maintained. New software platforms for managing parking tickets or building permits promise all sorts of gains in government efficiency, but come with the risk (and over time, almost the certainty) of collected information being made public and used in a manner for which it was not intended.
Government in the digital world comes with opportunities and challenges no one could have envisioned 200 years ago. To his credit, Massachusetts governor Charlie Baker is trying to limit access to records such as birth and marriage certificates, but the pushback he is getting from city clerks, reasonably concerned about how the restrictions would impact their ability to function, illustrate just how thorny this problem is. Nonetheless, Governor Baker is on the right track. While our norms about public safety, public health, and public education have changed over the years, our attitudes towards public data do not reflect the reality that, more and more, data is a tool—and sometimes a weapon—people use to gain advantages over others. It is government’s role to make sure it limits collection and availability of data to programs that are governmental in nature, and that it limits the ability of data speculators to manipulate this particular market.