#[1]vice [2]alternate [3]alternate [4]alternate [5]next [6]prev IFRAME: [7]https://www.googletagmanager.com/ns.html?id=GTM-MSM4HQ4 [8]SKIP TO MAIN CONTENT * [9]VICE * [10]VICE on TV * [11]i-D * [12]IMPACT * [13]Refinery29 (BUTTON) United Statesen (BUTTON) * [14]Video * [15]Podcasts * [16]News * [17]Tech * [18]Music * [19]Food * [20]Health * [21]Money * [22]Drugs * [23]Uncommitted: Iowa 2020 * [24]Election 2020 * [25]Identity * [26]Games * [27]Entertainment * [28]Environment * [29]Travel * [30]Horoscopes * [31]Sex * [32]VICE Magazine * (BUTTON) More (BUTTON) Advertisement [33]Tech by VICE Researchers Find 'Anonymized' Data Is Even Less Anonymous Than We Thought Corporations love to pretend that 'anonymization' of the data they collect protects consumers. Studies keep showing that’s not really true. by [34]Karl Bode Feb 3 2020, 3:24pm [35]Share[36]Tweet[37]Snap Image: Cathryn Virginia Last fall, AdBlock Plus creator Wladimir Palant revealed that Avast was using its popular antivirus software to [38]collect and sell user data. While the effort was eventually [39]shuttered, Avast CEO Ondrej Vlcek first downplayed the scandal, assuring the public the collected data had been “anonymized”—or stripped of any obvious identifiers like names or phone numbers. “We absolutely do not allow any advertisers or any third party...to get any access through Avast or any data that would allow the third party to target that specific individual,” [40]Vlcek said. But analysis from students at Harvard University shows that anonymization isn’t the magic bullet companies like to pretend it is. Dasha Metropolitansky and Kian Attari, two students at the [41]Harvard John A. Paulson School of Engineering and Applied Sciences, recently built a tool that combs through vast troves of consumer datasets exposed from breaches for a class paper they’ve yet to publish. “The program takes in a list of personally identifiable information, such as a list of emails or usernames, and searches across the leaks for all the credential data it can find for each person,” [42]Attari said in a press release. They told Motherboard their tool analyzed [43]thousands of datasets from data scandals ranging from the [44]2015 hack of Experian, to the hacks and breaches that have plagued services from [45]MyHeritage to [46]porn websites. Despite many of these datasets containing “anonymized” data, the students say that identifying actual users wasn’t all that difficult. “An individual leak is like a puzzle piece,” Harvard researcher Dasha Metropolitansky told Motherboard. “On its own, it isn’t particularly powerful, but when multiple leaks are brought together, they form a surprisingly clear picture of our identities. People may move on from these leaks, but hackers have long memories.” For example, while one company might only store usernames, passwords, email addresses, and other basic account information, another company may have stored information on your browsing or location data. Independently they may not identify you, but collectively they reveal numerous intimate details even your closest friends and family may not know. “We showed that an ‘anonymized’ dataset from one place can easily be linked to a non-anonymized dataset from somewhere else via a column that appears in both datasets,” Metropolitansky said. “So we shouldn’t assume that our personal information is safe just because a company claims to limit how much they collect and store.” The students told Motherboard they were “astonished” by the sheer volume of total data now available online and on the dark web. Metropolitansky and Attari said that even with privacy scandals now a weekly occurrence, the public is dramatically underestimating the impact on privacy and security these leaks, hacks, and breaches have in total. Previous studies have shown that even within independent individual anonymized datasets, identifying users isn’t all that difficult. In one [47]2019 UK study, researchers were able to develop a machine learning model capable of correctly identifying 99.98 percent of Americans in any anonymized dataset using just 15 characteristics. A different [48]MIT study of anonymized credit card data found that users could be identified 90 percent of the time using just four relatively vague points of information. Another [49]German study looking at anonymized user vehicle data found that that 15 minutes’ worth of data from brake pedal use could let them identify the right driver, out of 15 options, roughly 90 percent of the time. Another [50]2017 Stanford and Princeton study showed that deanonymizing user social networking data was also relatively simple. Individually these data breaches are problematic—cumulatively they’re a bit of a nightmare. Metropolitansky and Attari also found that despite repeated warnings, the public still isn’t using unique passwords or password managers. Of the 96,000 passwords contained in one of the program’s output datasets—just 26,000 were unique. The problem is compounded by the fact that the United States still doesn’t have even a basic privacy law for the internet era, thanks in part to relentless lobbying from a [51]cross-industry coalition of corporations eager to keep this profitable status quo intact. As a result, penalties for data breaches and lax security are often [52]too pathetic to drive meaningful change. Harvard’s researchers told Motherboard there’s several restrictions a meaningful U.S. privacy law could implement to potentially mitigate the harm, including restricting data access to unauthorized employees, maininting better records on data collection and retention, and decentralizing data storage (not keeping corporate and consumer data on the same server). Until then, we’re left relying on the promises of corporations who’ve repeatedly proven their privacy promises aren’t worth all that much. Tagged: [53]data Subscribe to the VICE newsletter. ____________________ (BUTTON) Subscribe References Visible links 1. https://www.vice.com/en_us/rss 2. https://www.vice.com/en_us/article/dygy8k/researchers-find-anonymized-data-is-even-less-anonymous-than-we-thought 3. https://www.vice.com/en_ca/article/dygy8k/researchers-find-anonymized-data-is-even-less-anonymous-than-we-thought 4. https://www.vice.com/en_asia/article/dygy8k/researchers-find-anonymized-data-is-even-less-anonymous-than-we-thought 5. https://www.vice.com/en_us/article/3a8k79/do-ring-cameras-violate-wiretapping-laws-new-hampshire-is-about-to-find-out 6. https://www.vice.com/en_us/article/7kzxzy/senator-mark-warner-ftc-not-doing-enough-on-browsing-data-avast-antivirus 7. https://www.googletagmanager.com/ns.html?id=GTM-MSM4HQ4 8. https://www.vice.com/en_us/article/dygy8k/researchers-find-anonymized-data-is-even-less-anonymous-than-we-thought#main-content 9. https://www.vice.com/en_us 10. https://www.viceland.com/en_us?_ga=2.122564107.1244600859.1568037773-1243485207.1550599999 11. https://i-d.vice.com/en_us 12. https://impact.vice.com/en_us 13. https://www.refinery29.com/ 14. https://video.vice.com/en_us/ 15. https://vice.com/en_us/page/podcasts 16. https://news.vice.com/en_us 17. https://www.vice.com/en_us/section/tech 18. https://www.vice.com/en_us/section/music 19. https://www.vice.com/en_us/section/food 20. https://www.vice.com/en_us/section/health 21. https://www.vice.com/en_us/section/money 22. https://www.vice.com/en_us/section/drugs 23. https://www.vice.com/en_us/topic/uncommitted-iowa-2020 24. https://www.vice.com/en_us/topic/2020 25. https://www.vice.com/en_us/section/identity 26. https://www.vice.com/en_us/section/games 27. https://www.vice.com/en_us/section/entertainment 28. https://www.vice.com/en_us/section/environment 29. https://www.vice.com/en_us/section/travel 30. https://www.vice.com/en_us/astroguide 31. https://www.vice.com/en_us/section/sex 32. https://www.vice.com/en_us/topic/vice-magazine 33. https://www.vice.com/en_us/section/tech 34. https://www.vice.com/en_us/contributor/karl-bode 35. https://www.vice.com/en_us/article/dygy8k/researchers-find-anonymized-data-is-even-less-anonymous-than-we-thought#javascript 36. https://www.vice.com/en_us/article/dygy8k/researchers-find-anonymized-data-is-even-less-anonymous-than-we-thought#javascript 37. https://www.vice.com/en_us/article/dygy8k/researchers-find-anonymized-data-is-even-less-anonymous-than-we-thought#javascript 38. https://palant.de/2019/10/28/avast-online-security-and-avast-secure-browser-are-spying-on-you/ 39. https://www.vice.com/en_us/article/wxejbb/avast-antivirus-is-shutting-down-jumpshot-data-collection-arm-effective-immediately 40. https://www.forbes.com/sites/thomasbrewster/2019/12/09/are-you-one-of-avasts-400-million-users-this-is-why-it-collects-and-sells-your-web-habits/ 41. https://www.seas.harvard.edu/ 42. https://www.seas.harvard.edu/news/2020/01/imperiled-information 43. https://docs.google.com/spreadsheets/d/1A7y6Y5cgObJvoq3sIK-6K9PJ-XAaZ8QR99cD_Og-0RY/edit#gid=1989660935 44. https://www.theguardian.com/business/2015/oct/01/experian-hack-t-mobile-credit-checks-personal-information 45. https://www.vice.com/en_us/article/vbqyvx/myheritage-hacked-data-breach-92-million 46. https://www.vice.com/en_us/article/78k849/hacker-breaches-porn-network-advertises-user-data-on-dark-web 47. https://www.nature.com/articles/s41467-019-10933-3 48. http://news.mit.edu/2018/privacy-risks-mobility-data-1207 49. http://www.autosec.org/pubs/fingerprint.pdf 50. https://www.cs.princeton.edu/~arvindn/publications/browsing-history-deanonymization.pdf 51. https://www.eff.org/deeplinks/2017/10/how-silicon-valleys-dirty-tricks-helped-stall-broadband-privacy-california 52. https://www.vice.com/en_us/article/d3agv7/the-equifax-settlement-is-a-cruel-joke 53. https://www.vice.com/en_us/topic/data Hidden links: 55. https://www.vice.com/en_us 56. https://www.facebook.com/vice 57. https://twitter.com/vice