Minding Your Data

Wesley MurchisonTuesday,2 September 2014

The Snap:

Any information published online is considered, if not public, devoid of any promises of being private. At least, that’s the conventional wisdom among most Internet users. But in this data-driven world, where every bit of information is being harvested and scrutinized ad nauseam by enterprises large and small, how our digital footprints are being used is a mystery.

The good news is that there are some computer scientists that aim to unravel the unknown. One such group is a team of professors and graduate students at Columbia University. They have developed a tool that shows how web services are using certain content to target ads. The software is called XRay and it is the “first tool for revealing personal data use on the Web.”

 The Download:

Looking back at former CEO of Mozilla Corporation Gary Kovacs’s TED Talk video “Tracking Our Online trackers,” the trend of monitoring the monitors appears to be an inevitable evolution of the Internet. Browser extensions like Collusion and Lightbeam were the first defenses in this, to use a much overused cliché, arms race. Yet these programs only show how data is being collected, not how the data is being used by online advertisers to personally tailor ads. XRay is the first technology to do the necessary heavy lifting by reverse engineering the association between a piece of content and its resulting advertisement.

The first prototypes of XRay were designed for Gmail, YouTube and Amazon. The team stresses, however, that “XRay’s core mechanisms are largely service-agnostic” and that developers can use XRay as open-licensed software to monitor other websites.

The results from the prototypes did not reveal any arcing pattern among all three services that might be wholly disconcerting. One of the worst results came from the Gmail test in which using the words “debt,” “borrow” or “loan” lead to the display of subprime used-car loans. The XRay team saw a connection with these ads and an article by the New York Times about a subprime loan bubble for used cars under way in our economy. If true, then it is possible that the usage of online data can be connected with trends in the economy or society at large. This potential relationship between a personal ad and a larger trend in the economy underscores the endgame for the XRay team – to provide a tool to investigators, regulators and watchdog groups as a countermeasure against data mining abuses.

With the ascent of Internet-based companies like Google, Facebook and Amazon, the Wild West days of the World Wide Web have given way to corporate dominance. The data these companies collect on users allow them to wield an outsized influence over our lives. As is the tradition in American civil society, watchdogs, most notably journalists, but also investigators at foundations and think tanks with researchers at universities, step in where the government has yet been given license to regulate to provide public awareness. Thus far, these institutions have done some impressive work and dug up a few interesting practices. For example, the Wall Street Journal ran a piece on how Staples priced items online based off of the location of the customer. The prevailing factor was proximity to a competitor store, fortunately, and not the average income of the community. And there are some financial companies that are using Facebook profiles to determine an applicant’s credit worthiness. Far worse, however, is the academic paper that reported to discover evidence of racial bias in online ad delivery.

The XRay software isn’t the only game in town. ID3‘s Open Mustard Seed framework goes a step further by providing app developers a platform to add controls for users to better manage their data.

Over at Princeton the Center for Information Technology Policy describes their Web Transparency Research project as the Center for Disease Control “for web privacy — we identify new online privacy threats, quantify and characterize existing ones, and inform the public on these issues.” Along with the alert system, the CITP developed the OpenWPM platform that automates browsing to better measure web privacy.

Of course, these projects are embryonic and far from being a complete response to the dangers online. Their survival depends on funding and support from private companies and government agencies. But to really succeed, there needs to be more demand from the beneficiaries of these technologies, namely, consumers.

The New York Times, Columbia University, YouTube, Princeton, Image Credit: Flickr

