$catMANUAL||~40 min

Someone Dumped 20+ Zero-Days on GitHub and Claims GPT-5.5 Found Them: Is Open Source Security's Pandora's Box Open?

advertisement

Someone Dumped 20+ Zero-Days on GitHub and Claims GPT-5.5 Found Them: Is Open Source Security's Pandora's Box Open?

I was scrolling through Hacker News yesterday when a 616-point post caught my eye. Someone anonymously published 20+ undisclosed zero-day exploit PoCs on GitHub. FFmpeg, libssh2, nmap, VLC, Docker, c-ares, PHP — basically half the tools you use daily got hit.

The kicker? They said they found all of these using GPT-5.5-3-Codex-Spark for automated fuzzing.

My first reaction was: this has to be a honeypot repo, right? I cloned it to dig in, and what I found was way more complicated than I expected.

What This Repo Actually Is

The publisher used an anonymous GitHub account called bikini, with a repo named exploitarium. The README is blunt:

"At the time I post these, none have been reported. Feel free to report them yourself and take credit for the CVE if handed out lulz. I do this so to allure people into the field, and I've always found this is the most efficient way."

Casual tone, but the content is anything but. The repo contains 20+ independent exploit PoCs covering a terrifyingly wide range of software:

  • FFmpeg — RASC/DLTA parser vulnerabilities in video format parsing
  • libssh2 — CVE-2026-55200 and publickey list memory issues in the SSH library
  • c-ares — TCP Use-After-Free in the async DNS resolver
  • nmap — IPv6 extension header length overflow in the network scanner's parser
  • VLC — VP9 decoding crash in the media player's codec
  • Docker — cp command path escape breaking container boundaries
  • RustDesk — Session permission issues in the remote desktop tool
  • PHP 8.5.7 — StreamBucket SOAP Remote Code Execution
  • Gitea — Action Runner container configuration issues in CI/CD isolation
  • OpenVPN Connect — Echo script injection in the VPN client
  • ImageMagick — Ghostscript delegate hijack in the image processing library
  • 7-Zip — RAR5 Mark of the Web bypass chain
  • Firefox — SmartWindow private URL exfiltration
  • nghttp2 — nghttpx upgrade queue poisoning
  • objdump — DLX format out-of-bounds write
  • Ghidra — Three separate vulnerability reports

Each folder has working PoC code, README documentation, and classification files. Not screenshots — actual runnable code. The repo even includes a "consistency check" using Git tree data across 12 original repos and 96 file entries with zero mismatches. Whoever this is, they were thorough.

Which Ones Are Real and Which Are Filler

The HN community was split. Some rushed to clone and analyze, others immediately dismissed most of it as garbage.

Honestly, both sides have a point.

The genuinely scary ones

The c-ares UAF is probably the most technically impressive finding. The attack works by determining the size of a stale pointer, then spraying memory with data of the same size, ultimately achieving Remote Code Execution. Someone on HN broke it down:

"Writing non-blocking network protocol stacks in C with manual lifetime management — the job is impossible. Doesn't matter if you think you're a super genius."

The libssh2 vulnerabilities were independently verified — "they all work as of the latest upstream commit." libssh2 is the SSH client library that tons of projects depend on. If these can be weaponized for RCE, the blast radius would be enormous. Nobody has written a full exploit chain yet though — the PoCs mostly demonstrate that memory corruption exists.

The nmap IPv6 overflow is the most unsettling one. It's in nmap's parser code. Someone analyzed it:

"If this can achieve ACE, there'd be a certain irony being able to reverse shell anyone doing an nmap scan. This is the kind of bug an intelligence agency would love to have — add a few IPv6 packets and get access to any researcher's PC."

nmap users are almost exclusively security professionals and sysadmins. Small attack surface, but extremely high value. It's a trap set specifically for hunters.

The FFmpeg RASC/DLTA vulnerability was also confirmed by multiple independent verifiers. FFmpeg's parser code spans 20+ years and handles an absurd number of video formats. As one commenter put it: "Video parsing and decoding should be sandboxed. Writing secure code for these formats in C is really hard. This won't be the last of its kind."

The PHP 8.5.7 StreamBucket SOAP RCE is worth noting too. PHP is still one of the most widely used server-side languages, and SOAP modules, while old, are heavily deployed in enterprise environments.

The obvious padding

The Ghidra exploits got roasted the hardest. The first "vulnerability" requires being able to overwrite binaries in the Swift tool directory. If you can already overwrite binaries on someone's machine, that's not a vulnerability. It's like saying "I found a security flaw in this door lock: if the burglar already has the key, they can open it."

The Docker cp one was called "a weird bug, not a vulnerability, and certainly not a zero-day." The VLC VP9 crash — "VLC crashes all the time with weird codecs, what else is new." Though someone pushed back: "If VLC crashed on my computer, I'd immediately unplug it and thoughtfully consider the circumstances under which it would be safe to turn it back on."

Roughly half the PoCs have real technical substance. The other half are more "demonstrating code reachability" or "triggering a crash." But even the low-quality ones aren't entirely useless — they expose potentially problematic code areas that project maintainers should look at.

The Big Deal: GPT-5.5 Did the Fuzzing

This is the part that made me sit up straight.

The author clarifies in the README:

"In regard to AI usage, my fuzzing workflow was automated by AI with a strict harness. I used GPT-5.5-3-Codex-Spark for ALL the fuzzing, as barely any 'thought' is necessary when provided with an efficient harness. Contrary to the growing narrative that I'm just some random child burning tokens, I DO actually have a degree in the subject and have published multiple papers on fuzzing methodology. You do NOT need a SOTA model to help you identify these issues, I promise!"

Let me unpack this.

They didn't use AI to write the exploits. They explicitly state all PoCs were hand-typed (except RustDesk, where they're less familiar with the language). PoC writing requires precise understanding of vulnerability mechanics — not something AI can reliably do yet.

They used AI for the automated fuzzing part. Fuzzing is inherently repetitive: generate inputs, feed them to the target, watch for crashes, record trigger conditions. Traditional fuzzing tools (AFL, libFuzzer) do the same thing, but require manual harness writing. This person used AI to automate harness creation and fuzzing execution.

The key insight is "when provided with an efficient harness." The harness is the soul of fuzzing — it determines which direction the AI explores. A good harness narrows the search space from "infinite" to "promising regions." This researcher's years of expertise in harness design is the real competitive advantage. AI just accelerated the execution.

The bottom line: AI is lowering the barrier to vulnerability discovery, but not where you'd think. It's not that AI finds more vulnerabilities — it's that AI automates the boring parts of fuzzing, letting humans focus on designing harnesses and analyzing results. Experienced security researcher + AI automation = multiplied efficiency. Inexperienced person + AI = lots of false positives and low-quality reports.

HN users shared similar experiences:

"I also have a library of bugs I found using Claude Opus 4.8 through the Customer Verification Program. Undisclosed. Most of them are very specific scenario DoS bugs, buffer overflows that will get caught by ASLR and whatnot."

"LLMs are fantastic disassembly partners. The net losses from losing the benefits of open source outweigh the protection afforded by hiding your source code in yet another layer that is more and more easily unrolled through automated procedures."

The Security Community Is Losing Its Mind

The HN comments section was more interesting than the vulnerabilities themselves. Basically every classic security debate got reheated.

For full disclosure

Some argued the repo democratizes information. These vulnerabilities used to be known only to nation-state hackers. Now everyone can see them. Open source projects can self-audit and patch.

"Disclosures always enable more secure software to theoretically exist, even if nobody follows through creating it. They often do."

"Each vulnerability individually might not mean much, but put them all in one place and it becomes easier to pick up pieces and try them together. Now we have automatic puzzle solvers (coding assistants), a repo like this becomes a lot more meaningful."

A security researcher also pointed out the flaws in responsible disclosure itself:

"90 days in the responsible disclosure window, you find nobody cares about your report, and 13 other people submitted the same thing before you. Maybe better that we just know, so we can run code we can trust sooner."

Against full disclosure

A blue team security researcher wrote a long response:

"I've been a skiddy, he would have believed this. But now I see this for the transparent 'I'm angry and want to hurt others so I will feel a little less alone' it actually is. As someone who's joined the blue side, we'd appreciate it if you gave us some kind of heads up. The bad guys generally have a lot more time to scroll for new payloads than I do."

But someone else pushed back:

"Please name the 'victims.' If you report vulnerabilities, some companies treat you as a criminal. Even if you report anonymously, they might ignore you."

This is reality — the incentive structure for responsible disclosure is broken. White hat hackers risk legal trouble to report vulnerabilities and might get nothing but a "thanks" — or worse.

Do security teams even matter?

A sysadmin's rant resonated with many:

"Security teams are a bunch of busybodies with nothing to do. Pay for a competent admin team and the security dept is completely redundant."

A security engineer countered:

"You're absolutely correct! If you have a competent admin team, you don't need a dedicated security team. Unfortunately, as I live in the real world, where most people are incompetent, it does help to have a dedicated security team."

This is basically the daily internal conflict of the security industry. Dev teams think security is a roadblock. Security teams think dev teams are ticking time bombs. Both have a point. Both are frustrated.

AI Is Shifting the Balance in Security

Several trends are worth taking seriously.

Automated vulnerability discovery

A security researcher used to test hundreds of input combinations manually per day. Now AI with a good harness can run millions. That's not hypothetical — the 20+ vulns in this repo are proof.

But there's a crucial distinction: AI accelerates "search," not "discovery." It's like giving someone a sports car — they can reach their destination faster, but they still need to know where they're going. This researcher's core competency was knowing which formats, parsers, and boundary conditions to fuzz. AI just helped verify those hypotheses faster.

The defender's dilemma

Defenders are inherently slower than attackers. The time from vulnerability discovery to patch is measured in days or weeks. The time from public disclosure to weaponization can be hours. When someone drops 20+ vulnerabilities at once, that time gap becomes a massive security window.

Worse, many open source projects have no dedicated security team. FFmpeg's maintainers might be a handful of volunteers with day jobs. They can't analyze and fix 20+ reports in days.

Historical parallels

Mass vulnerability disclosures aren't new, but the scale and method are.

In 2017, The Shadow Brokers leaked NSA hacking tools including EternalBlue. That got used in WannaCry ransomware, hitting 200,000+ computers across 150+ countries. The fallout lasted years.

In 2021, someone posted multiple Apache project vulnerabilities on the Full Disclosure mailing list without notifying maintainers. Similar debates ensued.

But those were one-offs. What makes bikini different is the production method: AI-assisted discovery, batch organization, batch disclosure. This is artisanal workshop going assembly line.

If this becomes a trend, the open source community doesn't need better responsible disclosure processes — it needs entirely new security response infrastructure.

What This Means for Developers

A few practical takeaways.

Don't rush to git clone this repo. Someone on HN warned: "Things that are too good to be real are honeypots and something there will compromise your machine or make your LLM start working for someone else." PoC code itself can be an attack vector. Security researchers should analyze in isolated environments.

Audit your dependencies. Use FFmpeg? nmap? A project that depends on libssh2? Your own code might be fine, but every link in your dependency chain could be a ticking bomb. Run npm audit, pip audit, cargo audit.

The basics still matter. Keep dependencies updated. Minimum privileges. Input validation. Sandbox isolation. Half the vulns in this list could have their impact drastically reduced with these boring fundamentals.

AI-assisted fuzzing works. If you're a security researcher, this event proves it. If you're a project maintainer, start thinking about how to handle AI-accelerated vulnerability discovery — faster response workflows, better automated testing, tighter sandboxing.

FAQ

Q: Are these vulnerabilities still exploitable right now?

A: As of this writing, most haven't been patched. c-ares, libssh2, and FFmpeg vulns were confirmed to trigger on latest code. But "triggerable" and "exploitable" are very different things — going from memory corruption to actual RCE requires bypassing ASLR, DEP, and other mitigations.

Q: Should I upgrade my software immediately?

A: If you're on the latest versions, there are no patches to upgrade to yet. Watch security advisories from these projects and update as soon as patches drop. Add extra access controls for network-exposed services in the meantime.

Q: Will AI fuzzing find more vulnerabilities?

A: Short term, yes. AI automates the tedious parts of fuzzing. Long term, it's a dynamic equilibrium — AI-found vulns will also be AI-patched.

Wrapping Up

This whole thing reminds me of my earlier piece about 10,000 malicious repos on GitHub. The trust model of open source is being redefined by AI — attackers use AI to find vulns faster, defenders use AI to patch faster, and in between sit millions of small projects with no time to worry about security.

That anonymous bikini account said they wanted to "attract more people to the security field." Whatever their real motivation, it worked. Today, thousands of developers worldwide are carefully examining what's hiding in the C libraries they depend on.

As for whether those vulnerabilities are real, how severe they are — that'll take time to verify. But one thing is certain: AI is turning vulnerability discovery from "a domain for a handful of experts" into "a game anyone with patience and compute can play."

The Pandora's box is open. Are we ready? Probably not. But at least now we know.

I plan to keep following this repo's developments. If GitHub takes it down or projects formally respond, I'll write a follow-up. Thoughts? Drop them in the comments.

  • Written June 28, 2026. The GitHub repo bikini/exploitarium was accessible at time of publication but may be removed or modified at any time.*

advertisement

Someone Dumped 20+ Zero-Days on GitHub and Claims GPT-5.5 Found Them: Is Open Source Security's Pandora's Box Open? — AI Hub