Years ago, a speaker at a seminar I attended advised teachers to intentionally make mistakes on tests for students to find. The premise was that these errors would make educators appear more human.
It was a bad idea then, and it’s a bad idea now. Mistakes happen often enough without making them on purpose.
I wonder if similar concerns wafted through statisticians’ heads at the U.S. Census Bureau over so-called differential privacy. The controversial practice introduces “noise,” or mistakes, into census data to make it harder for third parties to reconstruct the identities of census respondents.
The law requires census records to stay private for 72 years, so it was just last month that the National Archives opened the vault, so to speak, and released names and addresses from 1950.
However, many experts argue that data from the 2020 census, already marred by controversy over President Trump’s insistence that it exclude unauthorized immigrants (he lost that bid), could be merged with publicly available information to reveal the identities of individual respondents almost immediately if steps aren’t taken to obscure them.
A recent New York Times report mentions a census block in Chicago that has 14 people living underwater. They don’t really, of course, but the census computers assigned them there to make it harder for bad actors to connect the information to specific people. And this particular census block is just one of tens of thousands such locations that are wrong in the name of privacy.
It’s the proverbial Rock, meet Hard Place situation. On the one hand, people who volunteer potentially sensitive information on a Census Bureau survey do so because their identities will remain anonymous. Without such a guarantee, they are less likely to share information, which makes census results less complete.
On the other hand, intentional errors in census data – even for the best of reasons – undermine the public’s faith in that data, used to determine budgets, government aid, and legislative districts.
Some readers likely feel the same nonchalance toward census privacy that they do to various forms of in-person or online surveillance. In other words, if you aren’t doing anything wrong, what are you worried about? Out of 330 million people in the nation, why would somebody bother to reconstruct information about you and the people you know?
That’s an easy attitude to have if you’re a WASP and most of your personal data couldn’t be weaponized against you.
But if you’re a minority, or an immigrant, or LGBTQ+, answering census questions increases the chances that some unscrupulous third party could merge your data with other readily available information (voting registration, for instance) and create a list to share with the world. Why take the risk?
Still, fuzzing up the data creates its own set of challenges. The whole “people living underwater” faux pas was an unintentional mistake made in the course of trying to make an intentional mistake, a computer that randomly assigned respondents to a census block that no longer has at least one person actually living in it.
And it doesn’t humanize the US Census Bureau, like that long-ago speaker suggested would happen when students caught teachers in a mistake. No, Americans already recognize the bureau is all too human, trying to do an impossible job made even more challenging since the advent of easily available computing power in the hands of those who would misuse it.
I don’t know the solution. Results should not be made inaccurate in the name of privacy, and privacy should not be violated in the name of accuracy.
At least the Census Bureau will have more time to ponder it, as most 2020 data has been delayed until next year, partially because of COVID and partially because of this fuzzy-data issue.
Is there a middle ground – private enough? accurate enough?
One “enough” is for sure: Whatever the solution, It’ll never be enough to satisfy everybody.
Reach Chris at chris.schillig@yahoo.com. On Twitter: @cschillig.
No comments:
Post a Comment