Skip to main content
All field notes
10 min read

After Mythos, AI Code Review Is Table Stakes

A code review console showing AI analysis, human approval, and security signals converging before release.

"Security is a process, not a product."

-- Bruce Schneier, Secrets and Lies, 2000

The first wave of Claude Mythos coverage has had all the ingredients of a technology panic: a model too dangerous to release, thousands of zero-day vulnerabilities, every major operating system and browser, government warnings, restricted access, Project Glasswing, and then reports that unauthorized users had gotten access anyway.

If you missed the announcement, Claude Mythos Preview is Anthropic's unreleased frontier model for advanced coding and cybersecurity work. Anthropic's claim is unusually stark: the model's ability to find and exploit software vulnerabilities is powerful enough that the company does not consider it safe to release generally. Instead, Anthropic is giving restricted access to selected companies, infrastructure organizations, and open-source security groups through Project Glasswing, an effort to use the model defensively before similar capabilities spread more widely.

A cost curve showing vulnerability discovery getting cheaper and faster for both attackers and defenders.
Mythos matters because it changes the economics of vulnerability discovery, not because it makes software risk appear from nowhere.

That is a lot of noise. Some of it matters. Some of it is marketing. Some of it is the ordinary technology press doing what the ordinary technology press does whenever a new frontier model arrives: turning uncertainty into spectacle.

But business owners and engineering leaders do not need the spectacle. They need the operating lesson underneath it. And the lesson is not that Mythos is magic. It is not that every attacker suddenly became an elite exploit developer overnight. It is not that you should stop shipping software until the sky clears.

The lesson is simpler and more uncomfortable: the cost of finding software weaknesses just dropped. If the cost for attackers drops, the cost for defenders has to drop faster. That means AI-assisted code review is no longer an experimental extra. For serious software teams, it is becoming table stakes.

What Mythos actually is

Claude Mythos Preview is Anthropic's restricted frontier model with unusually strong coding, reasoning, and cybersecurity capabilities. Anthropic introduced it through Project Glasswing, a defensive security initiative involving major technology companies and critical-infrastructure organizations. According to Anthropic, Mythos has already found thousands of high-severity zero-day vulnerabilities, including some in every major operating system and web browser.

Those are vendor claims, and vendor claims deserve adult supervision. Anthropic has every incentive to make Mythos sound important. Reporters have every incentive to make Mythos sound dramatic. Competitors and skeptics have every incentive to make the claims sound overblown.

So do not build your business plan around the most frightening version of the story. Build it around the part that keeps surviving contact with the evidence: frontier models are getting much better at reading code, tracing behavior across files, identifying security flaws, and producing reproduction steps or patches. Whether Mythos itself is the exact model that changes your business is almost beside the point. Mythos is a signal that the capability class has arrived.

Anthropic's own technical assessment is useful because it is not just "the model is very smart." The report describes a practical workflow: put the model in an isolated environment with the target code, ask it to find a vulnerability, let it inspect, run, debug, and test, then have another pass validate whether the report is real and important. That is not science fiction. That is the agentic exploit loop, and it can be pointed at your code by defenders or attackers.

A malicious agent's exploit loop moving from inspect to hypothesize, run, validate, and report.
A malicious agent iterates this loop against your code: inspect the target, hypothesize a weakness, run the test, validate the finding, and report something usable.

And once that loop exists, the business question changes. It is no longer "Can a human expert find this bug?" The answer has always been yes, if you had enough of the right expert's time. The question is now: How many times can this review loop run, against how much code, at what cost, before someone else runs it against you?

The bugs were already there

This is the part that tends to get lost. Mythos did not make software vulnerable. The vulnerabilities were already there.

Anthropic says Mythos found a 27-year-old OpenBSD bug in TCP selective acknowledgements, a 16-year-old FFmpeg H.264 issue, and a FreeBSD NFS remote code execution vulnerability that allowed unauthenticated root access. It also describes Linux privilege-escalation chains, browser exploit chains, cryptography-library weaknesses, and web-application logic bugs.

If you are a business owner, the exact exploit mechanics are not the main point. You do not need to personally understand every buffer boundary, race condition, ROP chain, or JIT heap spray. Your useful takeaway is that these were not necessarily sloppy junior-developer mistakes hiding in toy code. These were bugs in mature, widely used, heavily reviewed systems. Some had survived years of human work and automated testing.

That should make everybody a little more humble about their own codebase.

The old comfort was that certain classes of vulnerability were expensive to find. An attacker needed specialized knowledge, patience, infrastructure, and time. That did not make the vulnerability safe, exactly, but it did create friction. It raised the price of attack.

AI-assisted vulnerability discovery attacks that friction directly. It does not need to be perfect to matter. If a model can search more files, try more hypotheses, validate more findings, and generate better reports than yesterday's tools, then the backlog of latent vulnerabilities becomes more discoverable. The risk did not appear overnight. The timeline changed.

That is what Mythos means for your business. Not "everything is broken now." Everything was already more broken than you wanted to admit. Now more people will be able to find out.

The Firefox example is the useful part

The most useful Mythos story is not the scary one. It is Firefox.

Mozilla told WIRED that Firefox 150 shipped with protections for 271 vulnerabilities identified using early access to Mythos Preview. That is the version of the story worth paying attention to because it shows the defender's path. Mozilla did not wait for attackers to operationalize the capability. It used the capability to harden its own software before release.

A release pipeline showing AI-discovered vulnerabilities moving through triage, fixes, verification, and a safer Firefox release.
The Firefox lesson is not panic. It is defensive sequencing: find the issues, triage them, patch them, verify the fix, and ship safer software.

That is the non-hysterical version of the future: AI finds issues, humans triage them, engineers patch them, releases ship safer.

Notice the order. The model did not become the accountable engineer. It did not get a blank check to rewrite the browser and push to production. It generated security findings at a scale that forced Mozilla to adjust its process. The value came from the whole system: model-assisted review, engineering discipline, triage capacity, patch validation, and release management.

That matters because a lot of businesses will misunderstand this moment in one of two directions. Some will dismiss it as hype because they do not like vendor drama. Others will buy an AI security product and imagine the problem is solved because a dashboard exists.

Both reactions miss the point. AI review is not a sticker. It is a new source of security work. If you run it seriously, it will find things. Then someone has to decide what matters, fix the right issues, verify the fix, and change the development process so the same class of issue does not keep recurring.

The win is not "we scanned the code." The win is "we hardened the software."

What this means if you build software

If your business builds software, AI-assisted security review needs to become a normal part of the software delivery lifecycle. Not as a special audit before a big launch. Not as an annual compliance ritual. Not as something a security consultant does after the architecture has already hardened around bad assumptions.

It belongs near the work.

That means meaningful pull requests and release candidates should be reviewed by an AI system that is specifically looking for security issues, not just style nits or autocomplete suggestions. It means high-risk areas of the codebase should receive deeper periodic scans. It means the findings should land in the same operational universe as your tests, your issue tracker, your code review process, and your release gates.

This does not replace human code review. It makes human code review more important. A human reviewer is still responsible for architecture, intent, tradeoffs, user impact, and final approval. But the human reviewer should not be the only entity trying to notice whether a subtle access-control path can be bypassed, whether untrusted input crosses a boundary incorrectly, or whether a patch quietly opens a new privilege path.

The old review process assumed that human attention was the scarce detection layer. That assumption is aging badly. Human attention should move up the stack: judge the finding, assess exploitability, decide priority, approve the fix, and improve the system. Let the model grind through the code paths that humans do not have enough hours to inspect line by line.

What this means if you buy software

Most businesses do not only build software. They buy it, rent it, integrate it, customize it, and depend on it. That means Mythos is not just an engineering story. It is a vendor-risk story.

A business system map showing internal code, vendors, open-source dependencies, and customer data as one connected risk surface.
The business risk surface includes your code, your vendors, your dependencies, and the customer data that flows through all of them.

If a vendor handles your customer data, payment flow, operational workflow, employee credentials, documents, or production systems, you should start asking different questions. Not "Do you use AI?" Everybody will say yes soon enough. Ask the operational questions:

  • Do you run AI-assisted security review on material code changes?
  • Do you scan the whole codebase on a recurring basis?
  • How are AI-discovered findings triaged?
  • Who approves model-suggested patches?
  • What is your average time to remediate high-severity findings?
  • Which existing controls does AI review complement: SAST, dependency scanning, fuzzing, tests, secrets scanning, penetration testing?
  • How do you prevent the model from seeing data it should not see?
  • What evidence can you show that the process actually changes shipped software?

This is where hype becomes easy to cut through. A vendor with a real process can answer those questions. A vendor with a marketing badge will talk about innovation until the meeting ends.

For business owners, that distinction matters. Your exposure does not stop at your own repository. Your exposure includes the systems you trust, the vendors you connect, and the open-source packages underneath both.

Open source becomes part of the business conversation

One of the sharpest implications is for open source. Public code can be scanned by defenders, but it can also be scanned by attackers. And much of the software modern businesses depend on is maintained by small teams, underfunded foundations, or individual volunteers with impossible responsibility and limited time.

This is not new. The industry has been building billion-dollar companies on top of underfunded open-source infrastructure for decades. What Mythos changes is the discoverability of the cracks.

If your product depends on open-source libraries, frameworks, runtimes, containers, codecs, cryptography packages, or operating-system components, then those projects are not abstract community goods. They are part of your production risk surface. When AI makes it easier to find vulnerabilities in those projects, your business inherits both sides of the change. You benefit when maintainers and responsible vendors find and patch issues. You suffer when attackers find them first, or when maintainers drown in reports they cannot triage.

So the mature response is not to panic about open source. It is to treat dependency governance as real governance. Know what you depend on. Patch quickly. Monitor advisories. Fund critical dependencies where appropriate. Prefer maintained packages over abandoned ones. Use AI review internally, but do not forget that your internal code is only one layer of the system you actually ship.

What not to do

Do not buy the scariest headline and freeze. Fear is not a security program.

Do not dismiss the whole thing because some of the coverage is theatrical. Hype can surround a real shift. In fact, it usually does.

Do not assume that an "AI security" claim from a vendor means anything by itself. The question is not whether a model was involved. The question is what happens after the model finds something.

Do not replace human review with model output. That is not modernization. That is outsourcing judgment to a system that cannot own the consequences.

Do not treat AI review as separate from the rest of engineering. If the findings live in a side dashboard nobody checks, you have added a ritual, not a control.

And do not wait for attackers, regulators, insurers, or customers to teach the lesson for you. That class is expensive.

The new baseline

The new baseline is not exotic. It is the same engineering discipline good teams already wanted, with AI added as another review layer.

A practical version looks like this:

  • AI-assisted security review on meaningful pull requests and release candidates.
  • Periodic whole-codebase scans, especially for high-risk modules and internet-facing systems.
  • Human approval for every model-suggested patch.
  • Existing controls kept in place: tests, static analysis, dependency scanning, fuzzing, secrets scanning, code ownership, and normal human review.
  • Findings tracked like real work, with severity, owner, remediation date, and verification.
  • Metrics watched over time: patch latency, repeat vulnerability classes, false positives, high-risk modules, and escaped defects.
  • Vendor reviews updated to ask how suppliers scan, triage, approve, and remediate AI-discovered vulnerabilities.

None of that requires believing every dramatic Mythos claim at face value. It only requires noticing the direction of travel. Code review is becoming more automated. Vulnerability discovery is becoming cheaper. Attackers will use these tools. Defenders have to use them first and use them better.

That is the business implication.

Need help making the shift?

Mythos is not a reason to panic. It is a reason to modernize.

If your team ships software, depends on custom software, or is trying to understand what AI-assisted security review should look like inside a real business, contact us. We can help you build the review gates, governance, triage workflows, and remediation practices that turn AI from a headline into a practical control.

The hype will move on. The economics will not. Once vulnerabilities become cheaper to find, the businesses that win are the ones that make them cheaper to fix.

Have an automation or agentic AI project in mind?

Tell us about your operation. We start every engagement with a free 60-minute consultation.

Schedule a Consultation