麻豆果冻传媒

In Short

A Tech Intro to Data Portability

Data Portability
Pexels

Recently, OTI hosted an event on online services and data portability. Keynoted by Rep. David N. Cicilline (D-R.I.), the event then featured a panel discussion and rigorous debate about whether and to what extent people should be able to take their data out of one online service and upload it into another service of their choice. The conversation dove into some fairly technical details surrounding protocols, APIs, and data formats. Given the technical nature of many of these concepts, we felt it was important to make sure that everyone listening to the discussion was using the same basic definitions of these terms. For anyone interested in this issue who couldn鈥檛 make it to the panel, I鈥檝e reproducing the script and the slides here in this blog post. If you want to see the original talk, archived video of the whole event is available.

—-

Data portability has been a big topic of discussion in the wake of the Cambridge Analytica scandal, with users asking whether they truly control their data or not. It also has been a hot topic because the new EU privacy law鈥攖he General Data Protection Regulation or GDPR鈥攔equires companies to offer data portability. But what is it? Well, to paraphrase the GDPR…

Data Portability is the ability of a user of an online service to extract an archive of the data they鈥檝e provided to or stored with that service, in a structured, commonly used and machine-readable format, suitable for transfer to a different service of that person鈥檚 choosing.

Today, for example, Google Takeout gives you the ability to select which Google services you want to export data from, and lets you choose which format to receive them in. Here, I鈥檓 downloading my contacts database in the vCard format.

Twitter, meanwhile, gives you a full archive whenever you ask. The archive in the lower screenshot, contains both a convenient human-readable web page of all your exported tweets, as well as copies in two machine-readable formats, CSV and JSON.

Facebook鈥檚 process is similar to Google鈥檚. You can select what types of data you want to download and what format to receive them in.

It's worth noting that all of these processes, with the possible exception of Google鈥檚, are today much more robust than they were six to eight weeks ago. The GDPR has obviously pushed companies to have a better approach to data portability.

Meanwhile, Google, Microsoft, and other contributors are working on an open source project called the Data Transfer Project that is trying to develop a simple common interface for moving files directly between services. For example, in this demo screenshot, a user is moving their photos directly from Google鈥檚 photo service to Microsoft鈥檚 photo storage service.

I鈥檝e referenced a few times now that a key feature of effective portability is that the data be in a common machine-readable format, by which I mean…

A machine-readable format is a file format, preferably based on an open and widely used standard, that structures data in such a way as to be easily parsable and modifiable by a range of computer systems, thereby making it easy to move the data between different services. Common examples of widely used open standards for structuring data in a machine readable way include JSON and XML.

For example, until recently, Facebook鈥檚 download-your-data tool only allowed you to download your content in the form of HTML archives optimized for private viewing, rather than in a more structured format suitable for easier transfer to another service. But about a month ago, it also began offering in JSON, presumably as part of its GDPR compliance. Here鈥檚 an excerpt from my downloaded data in JSON format:

It鈥檚 important to note that what I can download right now from Facebook is strictly my data鈥攖he content that I have posted to Facebook鈥攁nd doesn鈥檛 include, for example, photos or other posts in which my friends have tagged me, and it doesn鈥檛 include all of my friends contact information such that I could easily reconnect with them on a different service. This isn鈥檛 just Facebook鈥攕imilarly, Twitter lets you export the tweets you authored, but not your mentions or lists of likes or retweets. There are privacy arguments for why some of this is the case, but it also raises competition concerns.

Back to machine-readable formats for a moment: Activity Stream is one example of an open standard for social media activity that uses JSON.

It defines a format for storing items such as posts, likes, comments, etc. in a stream similar to your Facebook feed or Twitter feed. This example shows a 鈥渇ollow鈥-type action; it represents the fact that Brian here followed Ken.

We call Activity Stream an open standard because it was developed at the World Wide Web consortium and anyone can use it. So far, none of the major commercial social networks are offering their downloads using this standard, even though several of them participated in developing the standard. Instead, it鈥檚 mostly used by open source decentralized alternatives like the social network software Mastodon鈥攋ust one example of the kinds of alternatives that might be able to grow and compete with widespread data portability.

You may be asking now what it means to describe an internet technology as decentralized.

Decentralized information technology is a technology that relies on open standards such that users can make use of the technology and communicate with others using the technology without having to rely on a single service provider. Email, the web, and (once upon a time) instant messaging are all decentralized technology.

So, for example, both email and the world wide web are decentralized technologies based on open standards such that anyone can run an email server that talks to other email servers or send and receive emails from someone using another email service, and anyone can run a web server that serves content to any web browser and can link to content on any other site. Similarly, anyone can run a Mastodon server that hosts social network users, and those servers can easily talk to other Mastodon users on other servers. In other words, decentralized technologies are easily interoperable.

Interoperability is the ability of different computer systems or software to exchange and make use of information across systems in an ongoing way.

If you think of portability is a one-time copying of all your data, think of interoperability as the ongoing ability to interact across services. In open systems, this is pretty straightforward: it鈥檚 very easy for one web site to pull data from another or link to another; it鈥檚 very easy to email across services; and back when much of our instant messaging activity was based on the open standard XMPP, it was very easy to chat across different chat services, including Google Chat and Microsoft Messenger, along with many other XMPP servers. However, when it comes to closed platforms like Facebook that have been built on top of the open system of the internet鈥攚hat some folks might call walled gardens鈥攊nteroperability, when it exists, is typically accomplished through what we call Application Programming Interfaces, or APIs.

APIs (Application Programming Interfaces) are interfaces between different software applications allowing them to talk to each other and exchange data in a specifically defined way.

APIs are defined ways in which one piece of software allows other programs to interact with it. They can be totally private or completely open, or anywhere in between. They鈥檙e often used in open systems with little restriction. One example is a a weather API that takes in a zip code and produces a weather report.

However, they are also used to allow and regulate access to data and users in closed systems鈥攖hink of them as windows or doors into the walled garden. For example, many Google services have APIs that allow access to data including calendar items, but they are of course guarded behind authentication systems. Twitter鈥檚 API provides a means to search tweets, post new tweets, and manage advertising campaigns. And Facebook has APIs that Facebook apps and connected web sites use to access data about Facebook users. Indeed, it was Graph 1.0鈥攖he version of the Facebook platform API that was in use before 2015, which allowed apps to not only obtain data about their users but about the friends of users鈥攖hat led to the Cambridge Analytica controversy in the first place.

All of these definitions lead back to a core question: how, if at all, can we ensure enough portability and interoperability to promote competition and innovation and avoid locking in the dominance of existing platforms, while also adequately protecting privacy?

More 麻豆果冻传媒 the Authors

ross-schulman_person_image.jpeg
Ross Schulman
A Tech Intro to Data Portability