Twitter’s Open Source Algorithm Is A Red Herring
Last Friday afternoon, Twitter posted the source code of its recommendation algorithm to GitHub. Twitter said it was “open sourcing” its algorithm, something I would typically be in favor of. Recommendation algorithms and open source code are major focuses of my work as a researcher and advocate for corporate accountability in the tech industry. My research has demonstrated why and how companies like YouTube should be more transparent about the inner workings of their recommendation algorithms—and I’ve run campaigns pressuring them to do so. Mozilla, the nonprofit where I am a senior fellow, famously open-sourced the Netscape browser code and invited a community of developers around the world to contribute to it in 1998, and it has continued to push for an open internet since. So why aren’t I impressed or excited by Musk’s decision?
If anything, Twitter’s so-called “open sourcing” is a clever red herring to distract from its recent moves away from transparency. Just weeks ago, Twitter quietly announced it was shutting down the free version of its API, a tool that researchers around the world have relied on for years to conduct research into harmful content, disinformation, public health, election monitoring, political behavior, and more. The tool it is being replaced with will now cost researchers and developers between $42,000 and $210,000 a month to use. Twitter’s move caught the attention of lawmakers and civil society organizations (including the Coalition for Independent Tech Research, which I sit on the board of), who condemned Twitter’s decision.
The irony is that many of the issues people raised over the weekend while analyzing the source code could actually be tested by the very tool that Twitter is in the process of disabling. For example, researchers speculated that the “UkraineCrisisTopic” parameter found in Twitter’s source code was a signal for the algorithm to demote tweets referring to the invasion of Ukraine. Using Twitter’s API, researchers could have retrieved tweets related to the invasion of Ukraine and analyzed their engagement to determine if the algorithm amplified or de-amplified them. Tools like these allow the public to independently confirm—or refute—the nuggets of information that the source code provides. Without them, we are at the mercy of what Twitter tells us to be true.
Twitter’s stunt is just the latest example of transparency washing to come from the tech industry. In 2020, TikTok also used the words “source code” to dazzle regulators in the US and Europe who demanded more transparency into how the platform worked. It was the first platform to announce the opening of physical “Transparency Centers,” supposedly designed to “allow experts to examine and verify TikTok’s practices.” In 2021 I participated in a virtual tour of the Center, which amounted to little more than a Powerpoint presentation from TikTok’s policy staff explaining how the app works and reviewing their already public content moderation policies. Three years on, the Centers remain closed to the public (TikTok’s website cites the pandemic as the reason why) and TikTok has not released any source code.
If Musk had really wanted to bring accountability to Twitter’s algorithm, he could have made it scrutable in addition to transparent. For instance, he could have created tools that simulate the outputs of an algorithmic system based on a series of inputs. This would allow researchers to conduct controlled experiments to test how recommendation systems would rank real content. These tools should be available to researchers who work in the public interest (and, of course, who can demonstrate how their methods respect people’s privacy) for little or no cost.
There is good news on this front: Europe’s Digital Services Act, due to come into force for very large online platforms as soon as this summer, will compel platforms to conduct third-party audits on their algorithms to ensure they are not at risk of harming people. The kind of data that will be required for such audits goes far beyond what Twitter, TikTok, or any other platform currently provides.
Releasing the source code was a bold but hasty move that Twitter itself seemed unprepared for: The GitHub repository has been updated at least twice since the release to remove embarrassing bits from the code that were likely never meant to be made public. While the source code reveals the underlying logic of an algorithmic system, it tells us almost nothing about how the system will perform in real time, on real Tweets. Elon Musk’s decision leaves us unable to tell what is happening right now on the platform, or what may happen next.
WIRED Opinion publishes articles by outside contributors representing a wide range of viewpoints. Read more opinions here, and see our submission guidelines here. Submit an op-ed at [email protected].