Dependency Confusion: How I Hacked Into Apple, Microsoft and Dozens of Other Companies
Ever since I started learning how to code, I have been fascinated by the level of trust we put in a simple command like this one:
pip install package_name
Some programming languages, like Python, come with an easy, more or less official method of installing dependencies for your projects. These installers are usually tied to public code repositories where anyone can freely upload code packages for others to use.
You have probably heard of these tools already — Node has npm
and the npm registry, Python’s pip
uses PyPI (Python Package Index), and Ruby’s gems can be found on… well, RubyGems.
When downloading and using a package from any of these sources, you are essentially trusting its publisher to run code on your machine. So can this blind trust be exploited by malicious actors?
https://chrome.google.com/webstore/detail/netflix-premium-accounts/obeldidadcebnncelihgajboccgieino
https://chrome.google.com/webstore/detail/free-tiktok-followers-fre/nfahiipfajioegmmlngijbhmngkpdejd
https://chrome.google.com/webstore/detail/cod-mobile-hack-hack-call/lcmdppnmchfdphmkpdmglpagdpmlbopf
https://chrome.google.com/webstore/detail/onlyfans-hack-onlyfans-pr/mhpkfkemfmjhbnnhkjifeeppginmegii
https://chrome.google.com/webstore/detail/free-youtube-subscribers/bomekmccikdhkhlbdalefllkinmaanoo
Of course it can.
None of the package hosting services can ever guarantee that all the code its users upload is malware-free. Past research has shown that typosquatting — an attack leveraging typo’d versions of popular package names — can be incredibly effective in gaining access to random PCs across the world.
Other well-known dependency chain attack paths include using various methods to compromise existing packages, or uploading malicious code under the names of dependencies that no longer exist.
The Idea
While attempting to hack PayPal with me during the summer of 2020, Justin Gardner (@Rhynorater) shared an interesting bit of Node.js source code found on GitHub.
The code was meant for internal PayPal use, and, in its package.json
file, appeared to contain a mix of public and private dependencies — public packages from npm, as well as non-public package names, most likely hosted internally by PayPal. These names did not exist on the public npm registry at the time.
With the logic dictating which package would be sourced from where being unclear here, a few questions arose:
- What happens if malicious code is uploaded to npm under these names? Is it possible that some of PayPal’s internal projects will start defaulting to the new public packages instead of the private ones?
- Will developers, or even automated systems, start running the code inside the libraries?
- If this works, can we get a bug bounty out of it?
- Would this attack work against other companies too?
Without further ado, I started working on a plan to answer these questions.
The idea was to upload my own “malicious” Node packages to the npm registry under all the unclaimed names, which would “phone home” from each computer they were installed on. If any of the packages ended up being installed on PayPal-owned servers — or anywhere else, for that matter — the code inside them would immediately notify me.
At this point, I feel that it is important to make it clear that every single organization targeted during this research has provided permission to have its security tested, either through public bug bounty programs or through private agreements. Please do not attempt this kind of test without authorization.
“It’s Always DNS”
Thankfully, npm allows arbitrary code to be executed automatically upon package installation, allowing me to easily create a Node package that collects some basic information about each machine it is installed on through its preinstall
script.
To strike a balance between the ability to identify an organization based on the data, and the need to avoid collecting too much sensitive information, I settled on only logging the username, hostname, and current path of each unique installation. Along with the external IPs, this was just enough data to help security teams identify possibly vulnerable systems based on my reports, while avoiding having my testing be mistaken for an actual attack.
One thing left now — how do I get that data back to me?
Knowing that most of the possible targets would be deep inside well-protected corporate networks, I considered that DNS exfiltration was the way to go.
Sending the information to my server through the DNS protocol was not essential for the test itself to work, but it did ensure that the traffic would be less likely to be blocked or detected on the way out.
The data was hex-encoded and used as part of a DNS query, which reached my custom authoritative name server, either directly or through intermediate resolvers. The server was configured to log each received query, essentially keeping a record of every machine where the packages were downloaded.
The More The Merrier
With the basic plan for the attack in place, it was now time to uncover more possible targets.
The first strategy was looking into alternate ecosystems to attack. So I ported the code to both Python and Ruby, in order to be able to upload similar packages to PyPI (Python Package Index) and RubyGems respectively.
But arguably the most important part of this test was finding as many relevant dependency names as possible.
A few full days of searching for private package names belonging to some of the targeted companies revealed that many other names could be found on GitHub, as well as on the major package hosting services — inside internal packages which had been accidentally published — and even within posts on various internet forums.
However, by far the best place to find private package names turned out to be… inside javascript files.
Apparently, it is quite common for internal package.json
files, which contain the names of a javascript project’s dependencies, to become embedded into public script files during their build process, exposing internal package names. Similarly, leaked internal paths or require()
calls within these files may also contain dependency names. Apple, Yelp, and Tesla are just a few examples of companies who had internal names exposed in this way.