Between March 31 and April 10, 2026, three publicly confirmed events reset the operating assumptions for small and mid-sized business cybersecurity. Anthropic accidentally published the complete source code of its Claude Code agent harness (March 31, confirmed April 1). Anthropic announced its frontier offensive-cyber-capable Mythos research model (April 7). The U.S. Treasury Secretary and Federal Reserve Chair convened the CEOs of five major banks to discuss systemic risk from Mythos-class AI capabilities (April 7 — the same day as the Mythos announcement).
Industry analysis to date has focused on the second and third events. We argue that the first event — the Claude Code source leak — is the more important development for the small-business defensive picture. The harness leak does not reveal a new model capability. It does something more consequential: it materially reduces the engineering work that previously gated autonomous offensive AI from commodity threat-actor populations. Combined with mature open-weight models from Chinese labs and the publicly documented "abliteration" technique for removing safety constraints from those models, the leak places the architectural building blocks for autonomous offensive AI within reach of any patient, moderately skilled operator with consumer-grade hardware.
Zoltis Technologies conducted a preliminary laboratory reproduction in which the leaked harness loaded successfully, was wired to a modified open-weight model, and completed basic autonomous tasks in a controlled environment. These observations support the claim that the harness is portable beyond Anthropic's own services, but they do not establish real-world attacker adoption, offensive efficacy, or operational equivalence to threat-actor tooling. A staged measurement framework for follow-up evaluation is described in Section 12 and will be executed in a benchmarked v2.1 companion paper.
This paper documents the threat-model implications of the harness leak for SMB defenders, the historical defense assumptions that look increasingly unreliable, the recommended near-term defensive posture, and our calls to action for the broader security industry. We deliberately exclude operational details that would enable offensive use of the techniques described. This paper is a defender's analysis written for defenders.
1. Executive Summary
We argue that the threat picture for SMB cybersecurity changed on March 31, 2026, not on April 7. The Mythos disclosure put the AI cyber threat on the regulatory radar. The Claude Code harness leak is what actually changes the day-to-day adversary that small and mid-sized businesses face. This is a Zoltis position, not an established public fact, and the rest of this paper develops it as an argument backed by public reporting and preliminary laboratory observation.
Five claims, each defended in the body of the paper:
1. The leaked Claude Code harness is the agentic scaffolding that turns a frontier-tier language model into an autonomous agent capable of multi-step engineering tasks. It is approximately 512,000 lines of TypeScript across 1,906 files, includes a ~46,000-line query engine, ~40 permission-gated tools, the system prompts, multi-agent orchestration patterns, and three-layer memory management. Anthropic has publicly confirmed the leak. (Section 3.)
2. In our tested configuration, the harness did not require Anthropic-hosted services and could be adapted to a different model endpoint with limited changes. This is a narrow observation about what we ran in our lab; it is not a universal claim that the harness is "model-agnostic" against arbitrary model families. (Section 6.)
3. A publicly available technique called "abliteration" surgically removes safety alignment from open-weight language models without retraining. Documented by NousResearch and others, the technique is mature, the toolkit (OBLITERATUS) supports 116+ models, and benchmark studies show that safety refusals can be largely eliminated with limited capability degradation on the tested configurations. (Section 4.)
4. The Zoltis lab conducted a preliminary in-house reproduction. Using a modified open-weight model wired to the leaked harness, we verified that the harness could be configured to run outside Anthropic's native environment and could complete basic autonomous tasks in a controlled lab setting. We did not benchmark the system against the commercial product, against threat-actor tooling, or against realistic offensive task suites. (Section 6.)
5. The defensive implications for SMBs are material and time-bounded. Several historical defense assumptions — that advanced offensive AI tooling is gated by cost, by access controls, by safety alignment, or by specialized expertise — are increasingly unreliable as planning assumptions. The window in which "we'll address AI threats next year" is a defensible posture is closing. (Sections 7–9.)
The recommendations in Section 9 are not novel. They are the same fundamentals — multi-factor authentication, identity threat detection, endpoint detection and response, network segmentation, end-of-life retirement, email defense in depth, immutable backup, exposure surface minimization — that competent MSPs have been recommending for years. What we argue has changed is the deadline by which they must be in place. A compressed schedule on a familiar list of work is the actionable takeaway from this paper.
2. Background — the three events of April 2026
2.1 The Claude Code source disclosure (March 31, confirmed April 1)
On March 31, 2026, at approximately 00:21 UTC, Anthropic published version 2.1.88 of its @anthropic-ai/claude-code npm package containing a 59.8 MB JavaScript source map file, cli.js.map. JavaScript source maps are debugging artifacts that map minified production code back to the original, unminified source — they are routinely included in development builds and routinely excluded from production releases by adding *.map to the .npmignore file. In this case, the exclusion was missing.
Security researcher Chaofan Shou discovered the exposure and publicly announced it on X at approximately 04:23 ET. Anthropic pulled version 2.1.88 from the npm registry at approximately 03:29 UTC, after about three hours of public exposure. By the time the package was pulled, the codebase had been mirrored to public GitHub repositories by independent researchers.
Anthropic confirmed the incident publicly on April 1, with a statement that read in full: "No sensitive customer data or credentials were involved or exposed. This was a release packaging issue caused by human error, not a security breach. We're rolling out measures to prevent this from happening again." The company issued DMCA takedown requests against GitHub mirrors, designated the Native Installer (a standalone binary independent of the npm dependency chain) as the recommended installation path going forward, and — according to subsequent reporting in Fortune — acknowledged this was the second source-map packaging incident in days.
The leak's contents are detailed in Section 3 below. Notably, the leak did not include model weights, training data, or any cryptographic secrets.
2.2 The Mythos Preview release (April 7)
Six days after the source leak, Anthropic announced Claude Mythos Preview through its red.anthropic.com red-team disclosure site. The Mythos card disclosed that the model autonomously discovers zero-day vulnerabilities across every major operating system and every major web browser, generates working exploit chains, and reproduces N-day exploits in roughly half a day for under $2,000 each. Quantitative claims included a ~90× uplift in successful Firefox exploitation versus the previous-generation Claude Opus 4.6, a 27-year-old subtle OpenBSD vulnerability discovered in routine analysis, and 89% validator agreement with human reviewers on severity ratings across 198 manually reviewed reports.
Mythos was released to a small partner cohort under "Project Glasswing" — JPMorgan Chase, Apple, Google, Microsoft, and Nvidia, per subsequent reporting. Anthropic explicitly stated it does not intend to make Mythos generally available, but plans to "enable users to safely deploy Mythos-class models at scale" once "cybersecurity safeguards that detect and block the model's most dangerous outputs" mature. No timeline was given.
The leaked Claude Code source — published seven days earlier — included references to internal codenames identified by the security research community as belonging to Mythos (specifically "Capybara") as well as forthcoming Opus 4.7 and Sonnet 4.8 models and a planned "Buddy/companion system" rollout window of April 1–7. The temporal proximity has prompted some analysts to ask whether the sequence of events was intentional. Anthropic has not addressed the question. For the purposes of this paper, the question is irrelevant: regardless of cause, the threat picture has changed, and the defensive work to do does not depend on Anthropic's intent.
2.3 The Powell-Bessent meeting (April 7)
On Tuesday, April 7, 2026 — the same day as the Mythos Preview announcement — Treasury Secretary Scott Bessent and Federal Reserve Chair Jerome Powell convened the CEOs of Citigroup, Morgan Stanley, Bank of America, Wells Fargo, and Goldman Sachs at the Treasury Department to discuss systemic cyber risk from Mythos-class capabilities. JPMorgan's Jamie Dimon was reported as unable to attend. CNBC, Bloomberg, Reuters, CBS News, Fortune, and other major outlets covered the meeting in detail. Per Bloomberg, the regulators "summoned" the executives to "make sure banks are aware of possible future risks raised by Anthropic's Mythos and potential similar models, and are taking precautions to defend their systems."
(Note for future readers: two earlier drafts of this paper — v1.0 and the first v2.0 draft — listed the meeting as April 8, 2026. That was wrong. CNBC, Bloomberg, Reuters, and CBS News all describe the meeting as taking place "on Tuesday," and Tuesday of that week was April 7, 2026 — not April 8, which was a Wednesday. A peer reviewer caught the inconsistency on careful reading, and this revision corrects it. The meeting and the Mythos Preview announcement therefore happened on the same day, not on consecutive days as v1.0 implied.)
Two trading days later, on Thursday, April 9, 2026, software equities sold off on renewed AI disruption fears. Per Reuters reporting (as syndicated in The Globe and Mail), the broader S&P 500 Software and Services Index closed down 2.6% on the day. Cybersecurity names Cloudflare, Okta, CrowdStrike, and SentinelOne dropped between 4.9% and 6.5%. Zscaler was among the biggest S&P 500 decliners at -8.8%, driven at least in part by a BTIG downgrade from "buy" to "neutral" citing demand and competition concerns. Broader enterprise-software names — Atlassian, Workday, Adobe, Salesforce, and Intuit — dropped between 3.7% and 6.8%. Reuters attributed the move to broader AI disruption fears triggered by Anthropic's Mythos restriction announcement, not specifically to the Treasury/Fed bank meeting. As of this writing, Treasury and the Federal Reserve have not issued formal guidance, but reporting indicates "they may issue formal guidance, building on this meeting to mandate disclosures or stress tests for AI-related vulnerabilities."
2.4 What is missing from the conventional analysis
Industry coverage to date has treated these as three temporally clustered but distinct events. The connection most commonly drawn is between Mythos (the model) and the Powell-Bessent meeting (the regulatory response). The Claude Code source leak is treated as an unrelated packaging incident, important primarily for the supply-chain malware campaigns that followed (Vidar, GhostSocks, fake "leaked source" GitHub repositories).
We argue this framing misses the point. The Claude Code source leak is, in our view, the most important of the three events for SMB defenders, because it materially reduces the engineering latency that previously gated commodity-threat-actor access to autonomous offensive AI. The next two sections of this paper develop that argument.
3. The Claude Code harness leak in detail
According to public analysis from Zscaler ThreatLabz, Trend Micro, and independent reverse-engineering published on Medium, Substack, and dev community sites, the leaked Claude Code codebase contains:
· Approximately 512,000 lines of unobfuscated TypeScript across 1,906 files — the complete client-side agent harness as Anthropic shipped it in production
· A ~46,000-line query engine managing language-model API calls, token caching, context window management, and retry logic
· ~40 permission-gated tool definitions covering file operations (read/write/edit), shell command execution, web fetching, content search, and language-server-protocol integration
· The system prompts that govern the agent's behavior, including a subsystem internally called "Undercover Mode" designed to prevent the agent from leaking internal Anthropic information when contributing to open-source repositories
· Multi-agent orchestration patterns, including a "Buddy/companion system" with planned rollout windows coded into the source
· A three-layer memory architecture explicitly designed to counter "context entropy" — the phenomenon where long-running agents lose operational context as their context window fills
· Permission and sandbox model governing what tools the agent can invoke under what conditions
· References to internal codenames including "Capybara" (identified by the security community as Mythos), "Opus 4.7," and "Sonnet 4.8," along with a planned Buddy system rollout window of April 1–7
What was not in the leak: model weights, training data, cryptographic secrets, customer data, or telemetry.
The omission of model weights matters less than it seems. The leaked harness is a software product engineered around the assumption that some sufficiently capable language model will be at the other end of its API calls. It does not contain cryptographic ties to any specific model. In our preliminary reproduction (Section 6), we wired the harness to a different model — an open-weight model with no relationship to Anthropic — and the harness functioned in our test configuration with limited changes beyond an API endpoint substitution. This is a narrow observation about what we ran, not a universal claim of model-agnosticity. We discuss the limits of this observation in Section 6 and the measurement framework that would test it more robustly in Section 12.
The 46,000-line query engine is the most consequential single component. It encapsulates years of production engineering on the question of how to make an LLM do useful multi-step work without losing the plot: token budgeting, automatic context summarization, retry logic for transient API failures, rate-limit handling, prompt-chaining patterns, tool-call orchestration. This is the kind of engineering work that previously took a sophisticated team months to reproduce. With the source now public, that engineering effort becomes available as a reference implementation to anyone who reads the code.
3.1 The Check Point CVEs — separate vulnerability class, separate timeline
A note on chronology that v1 of this paper got wrong: the Check Point Research vulnerabilities in Claude Code are not part of the March 31 source-leak event. They are a distinct, prior vulnerability class.
Check Point Research published two Claude Code vulnerabilities in advance of the leak:
· CVE-2025-59536 (NVD published October 3, 2025) — API token exfiltration through Claude Code project files
· CVE-2026-21852 (NVD published January 21, 2026) — information disclosure through Claude Code hooks/settings
The associated Check Point analysis post — "Caught in the Hook: RCE and API Token Exfiltration Through Claude Code Project Files" — is dated February 25, 2026. Both CVEs were disclosed and patched well before the March 31 source-map packaging incident, and both concern a separate vulnerability class — RCE and token exfiltration through Claude Code's project configuration files (hooks, settings, project-level config) — not the source-disclosure event.
We mention this distinction because v1 of this paper conflated the two events and stated that Check Point published the CVEs "within 72 hours of the leak." That was incorrect. The CVEs predate the leak by months, and the two threats should be reasoned about separately. The Check Point work demonstrates that the attack surface of Claude Code as an installed product on a user workstation was already an active research target before the source disclosure.
3.2 Independent validation of Mythos-class vulnerability discovery
Separately and independently of the source leak, researchers at Horizon3.ai used Claude (the commercial product) to discover CVE-2026-34197, a 13-year-old Apache ActiveMQ remote code execution vulnerability — a finding that provides concrete validation of the OpenBSD-class capability claim from the original Mythos card. This finding remains relevant to the broader Mythos narrative regardless of how the source-leak event is positioned.
4. Abliteration: removing safety from open-weight models
Sources:
· arXiv 2510.02768 — A Granular Study of Safety Pretraining under Model Abliteration
· arXiv 2505.19056 — An Embarrassingly Simple Defense Against LLM Abliteration Attacks
· GitHub — NousResearch/llm-abliteration
· Heretic AI Abliteration Benchmarks vs GPT-4 Safety
· OBLITERATUS Strips AI Safety From Open Models in Minutes
Abliteration is a post-training technique for removing safety alignment from open-weight large language models without retraining and without access to training data. The technique was first described in the open-weight community in late 2023 and has matured rapidly since.
The mechanism, summarized at the strategic level: a model's refusal behavior — its tendency to decline harmful or sensitive requests — can be measured as a specific direction in the model's internal activation space. Researchers run a target model on two carefully constructed datasets, one containing harmless prompts and one containing prompts the model is trained to refuse. They measure how internal activations differ between the two and identify the vector that corresponds to "I should not help with this." They then surgically edit the model's weights to suppress that vector. The result, on the tested configurations reported in the published benchmarks, is a model that retains general reasoning, coding, and language capabilities while no longer refusing requests it would previously have refused.
The technique works because safety alignment in current open-weight models is a relatively shallow modification of the underlying model's behavior. The base capability — predicting what token comes next given a context — is unchanged by safety post-training; only the model's policy on which capabilities to expose to the user is changed. Abliteration reverses that policy without touching the underlying capability.
The 2026 abliteration toolkit ecosystem is mature:
· OBLITERATUS (the NousResearch project) supports more than 116 open-weight LLMs without requiring fine-tuning data or significant compute.
· Heretic AI has published benchmarks reporting that Gemma-3-12B-IT post-abliteration produces only 3 refusals out of 100 challenge prompts with a Kullback-Leibler divergence of 0.16 — meaning safety refusals are essentially eliminated while general capability is largely intact on the tested benchmark.
· Academic work on both attacking and defending the technique is published on arXiv.
· The technique is reproducible by practitioners with consumer-grade hardware on the open-weight models for which toolkit support exists.
Defensive countermeasures exist — the "Embarrassingly Simple Defense" paper demonstrates that randomized fine-tuning with specific data distributions can substantially raise the cost of abliteration — but these defenses must be applied by the model publisher before release and cannot be retroactively imposed on models that are already widely distributed.
4.1 Model provenance — official release vs. third-party derivative
A note on terminology that v1 of this paper conflated: an abliteration technique is not the same thing as a specific abliterated model. The Zoltis laboratory work (Section 6) used a third-party derivative of an official Alibaba release, not the official model itself. The distinction matters for provenance and reviewer scrutiny.
· Official base model: Qwen/Qwen3.5-35B-A3B, released by Alibaba's Qwen team on Hugging Face, February 24, 2026. The official release is published by the Qwen organization in several variants (base, FP8 quantized, GPTQ-Int4 quantized).
· Third-party derivatives (examples on Hugging Face that apply abliteration or similar safety-removal techniques to the official base):
· HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive
· huihui-ai/Huihui-Qwen3.5-35B-A3B-abliterated
The choice of which third-party derivative was used in our laboratory work is documented in the Private Evidence Appendix and is an artifact of what was available at the time of the lab work. The choice is not a validation of any specific uploader and should not be read as such.
The practical implication is that any current open-weight model can be transformed into an unrestricted assistant by applying a published abliteration toolkit, with effort that is reproducible by practitioners with consumer-grade hardware. The transformation does not require GPU clusters or specialized compute for the model sizes for which toolkit support exists.
5. The synthesis: harness + abliterated open-weight
Reading note: Section 5 is an analytical argument, not a measurement result. The 7-component table below characterizes general public claims about the state of the open-weight ecosystem and the harness leak; it is not lab-validated. Sections 5 and 6 should be read as analysis and preliminary observation. The empirical foundation that would substantiate the strongest version of these claims is the future work described in Section 12.
The Claude Code harness leak and the abliteration technique are each individually significant. Their combination is the development this paper exists to document.
Consider the architectural pieces required for an autonomous offensive AI agent:
Component
Source before March 31, 2026
Source after March 31, 2026
Frontier-tier reasoning and coding capability
Proprietary (OpenAI, Anthropic, Google) — gated by API access, rate limits, content policies
Open-weight Chinese models — benchmark-competitive on coding and agentic tasks per public benchmarks (Qwen 3.5 Plus, GLM-5.1, Kimi K2.5, MiniMax M2.5, DeepSeek V3) — downloadable, free, runnable on consumer hardware
Safety alignment removal
Difficult — required novel jailbreak research per model
Abliteration toolkits — public, mature, reproducible on supported open-weight models
Agent loop, tool orchestration, context management, multi-step planning
Hand-rolled by sophisticated teams over months — the limiting factor for offensive AI development
The Claude Code harness — leaked, mirrored, public, ~512K lines of production engineering as a reference implementation
Tool wrappers (file system, shell, web, code execution)
Hand-rolled per project
Available as ~40 permission-gated tools in the leaked harness
Production hardening (retry logic, error handling, context-entropy mitigation)
Hand-rolled per project
Available in the leaked ~46K-line query engine
Offensive-cyber-specific prompt engineering and tool wrapper specialization
Hand-rolled per project
Still required — and not part of the leaked artifact
Targeting: a list of likely-vulnerable internet-addressable hosts
Shodan, ZoomEye, Censys — already public and free
Same
Five of the seven required components have moved from "hard, gated, or proprietary" to "free, public, mature" between March 31 and April 10, 2026. The two remaining components — offensive-cyber prompt engineering and targeting — were already the easier components and were not, in our view, the bottleneck. We argue that the agentic harness was the bottleneck, and the bottleneck has been materially reduced. This is a Zoltis interpretation of the public evidence; it is not a measured fact and we explicitly caveat it in Sections 6 and 11.
The relevant adversary tier, on this argument, is no longer "Mythos in 2027." It is "current best open-weight model + abliteration + leaked harness + offensive specialization," which is reachable today by practitioners with the harness, an abliterated open-weight model, and offensive-domain specialization. The capability ceiling of this combination is bounded by the open-weight model's capability — currently benchmark-competitive on coding and agentic tasks per public leaderboards (Arena-Hard, Aider, and other contemporary agentic benchmarks) but not equal to the frontier closed models — which is well below Mythos but plausibly above what is required to compromise the typical SMB target. (We deliberately omit SWE-bench Verified from this list because OpenAI's own 2026 retraction — see Section 12.2 — has disqualified it as a frontier evaluation anchor.) Whether that "plausibly above" is in fact above is the empirical question Section 12's measurement framework is designed to answer.
This is the central argument of the paper. Most of the cybersecurity industry's mental model of "AI threats" is calibrated to the capability ceiling: when does Mythos-class capability arrive in commodity form? We argue that is the wrong question. The right question is: what can a competent operator build today with what is freely available, and how does it compare to the defensive postures most SMBs actually have in place? Our preliminary lab observation in Section 6 supports the view that the architectural building blocks are in place. It does not measure how effective any specific configuration is against any specific defender. That is the work of v2.1.
6. Preliminary Laboratory Verification
Our laboratory work should be read as a preliminary functionality and portability check, not as a benchmarked demonstration of offensive capability or real-world threat-actor equivalence.
This section was renamed in v2.0 from "The Zoltis lab reproduction" because the v1.0 framing implied stronger conclusions than the lab work supports. The work documented here is a preliminary functional verification: we wired the leaked harness to a modified open-weight model in a controlled lab environment and made narrow observations about portability and functionality. We did not benchmark against any task suite, did not measure against the commercial product, and did not specialize the system for offensive operations. The full set of things we did not test is enumerated in §6.3, and the framework that would substantiate the stronger claims a benchmarked version of this work could make is in Section 12.
6.1 What we verified
The following observations are direct results of the lab work and are reproducible from the configuration documented in the Private Evidence Appendix.
· The leaked Claude Code harness loads from the publicly mirrored source in our lab build environment.
· The harness wires successfully to a non-Anthropic model endpoint when the API endpoint configuration is substituted. No cryptographic ties to Anthropic services were encountered in our configuration.
· The tool layer is functional in the controlled environment. The file-system, shell, search, and web-fetch tools as defined in the leaked codebase invoke and return as designed against test resources in our lab.
· Basic autonomous multi-step tasks complete in the test configuration. The agent loop, the subagent orchestration, and the context-management subsystem all operate against simple benign engineering tasks of the kind Claude Code is designed for.
6.2 What we observed
The following are observations that emerged from the lab work and are consistent with — but do not by themselves prove — the broader claims this paper makes. They are reported as observations, not as benchmarked findings.
· Safety refusals were materially reduced in the tested configuration. The third-party abliterated derivative we used behaves on a small set of test prompts as the published benchmarks for the abliteration technique would predict. We did not run the published refusal benchmarks ourselves; we report only what we observed in our test prompts.
· The configuration ran on consumer-grade hardware (a single workstation) without any external API dependency, rate limits, or upstream telemetry. This is consistent with the abliteration toolkit and quantization approaches we used; it is not a novel observation.
· Reproduction time, after the source code was available and the open-weight model had been selected, was approximately one operator-day for the functional verification we performed. This is a measured fact about our lab work, performed by one engineer with prior familiarity with both the open-weight ecosystem and the Claude Code product surface. It is not a generalizable claim about other operators or other configurations. A different practitioner with different background, different hardware, or a different model could spend more time, less time, or fail to reproduce the result entirely.
6.3 What we did not test
The following is the explicit list of things our lab work did not establish, in the order in which a benchmarked v2.1 paper would need to address them.
· Capability against any specific offensive cyber benchmark. We did not run CAIBench, the cyber ranges from arXiv 2603.11214, CyberSOCEval, or any comparable suite. We have no measurement of the system's offensive capability in any standardized form.
· Long-horizon multi-step adversarial scenarios. We tested basic engineering tasks of the kind Claude Code is designed for. We did not test sustained adversarial sessions of the kind real attacks involve.
· Cross-model portability across two materially different model families. We tested one model family in one configuration. The "model-agnostic" framing in v1 was unsupported by this; the more accurate statement is that in our tested configuration, the harness did not require Anthropic-hosted services and could be adapted to a different model endpoint with limited changes.
· Repeatability across multiple seeds or runs. We did not measure rerun variance. A single successful run does not establish whether the result is robust or fortunate.
· Context-compaction survival on long sessions. The leaked harness includes context-management machinery designed to survive long sessions, but we did not test the threshold at which it fails on the configuration we ran.
· False-positive, true-positive, and partial-credit rates on security findings. We did not run the system against any task pack that would let us measure these rates.
· Specialization for offensive operations. We did not write offensive system prompts, did not author offensive tool wrappers, and did not aim the system at any target outside the lab. Our configuration is a generic autonomous agent equivalent in shape to Claude Code's general-purpose capability, not a weaponized variant.
· Comparison to the commercial Claude Code product on a fixed task pack. We did not run a side-by-side comparison against the original commercial product on any defined task suite.
· Any third-party environment, any client environment, any unauthorized target. All testing was confined to a controlled laboratory environment containing only assets owned by Zoltis Technologies. The system has not been pointed at any system the firm does not own.
6.4 What would be required to validate broader claims
The full measurement framework that would substantiate the strongest version of the claims in Sections 5 and 7 is described in Section 12 (new in v2.0). In summary, it would require:
6. A staged evaluation following the Frontier Model Forum 3-stage protocol (baseline model, modified model, harness-integrated system)
7. A bounded, published task portfolio (CAIBench, the arXiv 2603.11214 cyber ranges, or a Zoltis-defined task pack with a published spec)
8. Multi-seed repeatability runs (target 5–10 runs per task)
9. Cross-model evaluation across at least two materially different model families
10. The metric portfolio described in §12.3 — task completion rate, wall-clock, tool calls, rerun variance, context-compaction survival, FP/TP/partial-credit rate, cost per successful completion
11. Per-run logs, configuration records, hashes, and hardware documentation in a controlled-distribution evidence appendix
A staged evaluation along these lines is the work of a v2.1 paper. v2.0 commits to running it.
6.5 The defensible summary sentence
We do not claim that our laboratory configuration is operationally equivalent to tooling already in active threat-actor use. We claim only that the leaked harness appears portable, functional in a controlled environment, and materially easier to adapt than defenders should be comfortable assuming.
6.6 Why we did this
A defender needs to understand the tools the adversary may be using. Reading press coverage of the leak and the abliteration toolkit is not the same as having the code on disk and watching it operate against a test environment. Zoltis exists to protect client environments, and the integrity of our threat assessments depends on grounding them in first-hand observation rather than secondhand reporting. The recommendations in our 2026-04 client advisory package are grounded in what we have observed in our lab — within the limits enumerated in §6.3.
We also intend to use this capability defensively, on behalf of clients who specifically authorize such engagements. See Section 9.7 on authorized adversary emulation. The Private Evidence Appendix referenced throughout this section is the distribution-controlled companion document containing artifact hashes, configuration records, hardware details, run logs, and benchmark outputs as additional validation work is completed.
7. What this means for SMBs
Small and mid-sized businesses face four substantive shifts as a result of the harness-leak event. Each shift is presented as argument or inference from the public evidence and the Section 6 observations, not as established fact.
7.1 The capability bar for "AI-uplifted attack against an SMB" appears to be dropping (argument)
Pre-leak, the conventional defensive analysis treated AI-uplifted threats as a forecast — something to plan for in 2027 once Mythos-class capability appeared in open-weight form. We argue that post-leak, the relevant adversary tier is bounded below by what current open-weight models plus the leaked harness plus offensive specialization can produce, which on public benchmarks is benchmark-competitive on coding and agentic tasks. We further argue that benchmark-competitive autonomous capability is sufficient to compromise the typical SMB target, because the typical SMB's exposures (end-of-life software, missing MFA, credential reuse, exposed RDP, flat networks, weak email defense) are weaknesses that require competent post-exploitation and lateral movement, not novel zero-day discovery. Both legs of this argument are supported by general public benchmark data and by our preliminary Section 6 observations; neither is supported by direct measurement of an adversary in our lab.
7.2 Attack-surface scanning is becoming more economic against small targets (inference)
Historically, the bottleneck on small-target attacks was operator attention. A human attacker working through Shodan results has to choose which targets are worth the time investment, and most SMBs were not. Autonomous agents do not have an attention budget in the same way. They can scan, enumerate, exploit, and pivot continuously across many targets. We infer that the small-target buffer that existed because attackers had limited human time is closing. The inference is supported by the architectural shift documented in Section 5; the magnitude of the shift in real attacker behavior is not yet measurable from public data.
7.3 Phishing has crossed the grammar-quality threshold and is increasingly adaptive (argument supported by external evidence)
AI-drafted phishing is not new. The PROMPTSTEAL and PROMPTFLUX precedents demonstrate that threat actors have been wrapping open-weight LLMs around malware for at least a year, and CrowdStrike's 2026 Global Threat Report documents an 89% year-over-year increase in AI-enabled adversary activity. What changes with the harness leak, on our argument, is the adaptive dimension: an autonomous agent running a phishing operation can adapt the email in real time based on recipient response, run multi-day social-engineering conversations, and reason about which lure works against which target. The first part of this claim is well-documented in public threat reporting; the second part is an inference from the architectural capability of the leaked harness.
7.4 Post-credential-theft activity is becoming faster than human response (argument)
The historical defensive advantage in post-credential-theft scenarios was that lateral movement is slow — an attacker who steals credentials must read mailboxes for context, find references to other systems, locate the next set of credentials, and pivot. We argue that an autonomous agent can do all of that in one continuous run, faster than any human IR team can detect. This is the dimension that materially raises the value of detection-grade controls (EDR, ITDR, conditional access alerting) relative to prevention-grade controls. Prevention is still the first line. Detection is what catches what prevention misses, and on this argument the time-to-detect window has shrunk from days toward hours. The Section 12 measurement framework includes context-compaction survival as a specific metric because that is the dimension on which agentic post-exploitation succeeds or fails.
8. The collapse of historical defense assumptions
These are positions we hold based on the analysis above. Each should be testable against measurable defender outcomes, and where the table below uses words like "increasingly unreliable," that softening reflects the uncertainty we inherit from the limits of our preliminary lab work.
Historical assumption
Status (Zoltis position, April 2026)
"Advanced AI tooling is gated by cost — only well-funded actors can use it"
Increasingly unreliable. Open-weight models now run on consumer hardware. The harness is free. Abliteration toolkits are free.
"Advanced AI tooling is gated by cloud API access — providers can revoke API keys"
Increasingly unreliable. Local inference removes the cloud-API gatekeeper. There is no key to revoke for an agent running locally.
"Advanced AI tooling is gated by safety alignment — refusals limit offensive use"
Increasingly unreliable. Abliteration removes safety alignment in hours on supported open-weight models, with limited capability degradation on tested benchmarks.
"Advanced AI tooling is gated by specialized expertise — building agentic systems requires research-grade engineers"
Increasingly unreliable. The leaked harness encapsulates much of the engineering work. Adapting it to a different model is, in our preliminary lab observation, materially easier than building such a system from scratch.
"AI threats are a 2027 problem — the threat landscape will evolve gradually"
No longer dependable. The threat landscape evolved discretely on March 31, 2026. The next discontinuous shift could happen at any time.
"Small targets are protected by attacker attention scarcity"
Becoming unreliable. Autonomous scanning at machine speed reduces the per-target attention cost, making smaller targets more economic.
"Detection has weeks of dwell time before damage compounds"
Increasingly unreliable. Autonomous post-exploitation compresses the window between initial access and damage.
"Signature-based AV is structurally adequate for low-value targets"
False. Polymorphic AI-generated payloads defeat signature detection by construction. This was already the consensus view before the harness leak.
"Phishing filters are mostly effective because most phishing is grammatically poor"
False. AI-drafted phishing has been grammatically and contextually fluent since at least 2024.
"MFA is sufficient against credential theft"
Becoming insufficient. Real-time phishing proxies (the evilginx class) plus AI-drafted social engineering can defeat push-MFA in the moment. Phishing-resistant MFA (FIDO2) is the new bar.
"Backup is sufficient against ransomware"
False if the backup is reachable from the same credentials. Cloud-side backups under the same admin tenant as the source data are not ransomware-resistant. Immutable, separately-credentialed, offline backup is the new bar.
The pattern: most defensive controls are still individually correct but the assumptions about the threshold of attacker capability they protect against are eroding. Defenses that were adequate against a script-kiddie tier are becoming less adequate against the harness-equipped operator tier — which is the tier this paper argues is now reachable.
9. A defensive posture for the harness era
The recommendations in this section are not novel. They are the same fundamentals competent MSPs and CISOs have been recommending for years. What we argue has changed is the deadline by which they must be in place and the relative priority of detection-grade controls compared to prevention-grade controls.
9.1 Reduce exposed surface aggressively and immediately
Anything reachable from the public internet that does not need to be reachable from the public internet should be made unreachable within days, not quarters. This includes:
· Public-IP-bound management surfaces (SSH, RDP, web admin panels)
· End-of-life server operating systems and unpatched application servers exposed to the internet
· Cloud workloads with overly permissive Network Security Groups or no NSGs at all
· Orphaned public IPs from decommissioned services
· Internet-exposed development and testing environments
For every internet-addressable IP an organization owns, the question to ask is: "What specific business purpose requires this to be reachable from anywhere on the internet, rather than from a known set of source IPs or from an internal network?" If the answer is "none" or "I don't know," the surface should be removed or restricted within days.
9.2 Phishing-resistant MFA and Conditional Access
Push-MFA against AI-drafted real-time phishing is no longer adequate for privileged accounts. Phishing-resistant MFA — FIDO2 hardware tokens or platform authenticators — should be deployed for every administrator, every privileged-role assignee, and every executive whose account is a high-value target. Conditional Access policies should be tuned to the organization's actual access patterns, with explicit denials for legacy authentication and risky sign-in behavior.
9.3 EDR and identity threat detection — the detection tier
Prevention-grade controls are not enough when the post-compromise activity happens at machine speed. Endpoint Detection and Response (Microsoft Defender for Endpoint Plan 2, SentinelOne, CrowdStrike Falcon, or comparable) should be deployed on every endpoint and server. Identity Threat Detection and Response (Microsoft Defender for Identity, Vectra, or comparable) should monitor for anomalous behavior in the identity layer — the layer where post-credential-theft pivots happen. Both should feed into a 24/7 monitoring path, whether internal SOC or managed SOC, because alerts that are reviewed the next morning are alerts that arrive after the damage is done.
9.4 Network segmentation and lateral-movement bounding
Flat networks are now unsupportable. The blast radius of any single endpoint compromise on a flat /24 is the entire site. VLAN segmentation, NSG-based micro-segmentation in cloud environments, and host-based firewall rules that restrict server-to-server communication to known-required paths are all necessary controls. The standard a competent autonomous agent should encounter when it lands on a workstation is: the workstation cannot directly reach the file server, the database, the backup target, or the domain controller.
9.5 Email defense in depth
An inbound email defense layer (Defender for Office 365 Plan 2, Mimecast, Proofpoint, AppRiver SecureTide, or comparable) should sit in front of every mailbox in the organization. Outbound encryption alone (CipherPost, MessageGate) is not a substitute. SPF, DKIM, DMARC reject policy, and DNSSEC should be configured for every domain the organization owns. Attachment sandboxing should be enabled.
9.6 Immutable, separately-credentialed backup
Backups that share an administrative trust boundary with the production environment are not ransomware-resistant. The backup target should require a separate credential set, the backup data should be immutable for the retention window (object-lock, snapshot-lock, write-once-read-many), and a copy of the most critical data should reside on infrastructure that has no operational dependency on the cloud tenant being protected. For some organizations the right answer is a third-party cloud backup provider; for others it is on-premises NAS with air-gap rotation; for high-sensitivity environments the right answer is both.
9.7 Authorized adversary emulation as a continuous practice
Pen-testing in the modern sense should not be an annual checkbox. It should be a recurring, scoped engagement that validates the defensive posture against the same class of tooling adversaries are increasingly able to assemble. For Zoltis-managed clients, this is the function of the new CM-17 counter-measure in our 2026-04 advisory package — scoped, written-authorization-only engagements where Zoltis runs a controlled reproduction of the leaked harness against a client environment with the client's consent and under the same legal posture as any traditional pen-test. The findings flow into the standard remediation roadmap. Adversary emulation is validation of remediation, not a substitute for it; it should run after the primary work is complete, not before.
9.8 Cross-cloud and on-premises backup-of-last-resort
For organizations whose entire operating environment depends on a single cloud tenant (Microsoft 365, Google Workspace, AWS, Azure), the worst-case scenario is no longer "the cloud has an outage." The worst-case scenario is "the tenant is compromised and the attacker, holding our admin credentials, has the same access to the backup vaults that we do." The defense is daily replication of the most critical data to infrastructure that the compromised credentials cannot reach — typically on-premises, with separately-credentialed access, and with truly immutable retention.
9.9 The compressed timeline
For every recommendation above, we argue the new question is "how soon can we have this in place?" not "when in our roadmap does this fit?" This is a Zoltis defensive recommendation based on the analysis in this paper, not a forecast claim with quantitative probability. Specific items that should drop to days or weeks rather than quarters:
· Closing public-IP exposures that have no business purpose: days
· Deploying email defense in depth where it is currently missing: weeks
· Enforcing MFA + Conditional Access where it is currently incomplete: weeks
· Deploying EDR where it is currently missing or outdated: weeks
· Network segmentation projects: months, but with interim Layer-2/Layer-3 bounding immediately
· End-of-life retirement projects: months, but with interim role-removal immediately
· Backup immutability: weeks
· Authorized adversary emulation: after primary remediation, as validation
10. Recommendations for the security industry
Beyond the per-organization defensive posture, the harness-leak event has implications for the security industry as a whole. The Zoltis recommendations:
For AI vendors: The Claude Code source-leak event illustrates that production AI agent harnesses are now critical infrastructure. They should be treated with the operational discipline that is applied to other critical software products: dedicated release engineering, mandatory artifact review before publication, separation of debug builds from production builds, and tabletop exercises for source-disclosure incidents. The "release packaging issue caused by human error" framing is technically accurate but operationally inadequate for an artifact this consequential. We recommend that other AI agent vendors (OpenAI, Google, Microsoft, Cursor, Cognition, Cline, etc.) audit their own packaging, distribution, and source-map handling practices immediately.
For open-weight model publishers: The abliteration technique is not going to be voluntarily abandoned by the open-weight community. The realistic defensive option is to apply abliteration-resistant safety techniques (the "Embarrassingly Simple Defense" approach from arXiv 2505.19056 is a starting point) to models before public release. The security industry should pressure open-weight publishers to adopt these techniques as a baseline.
For threat intelligence vendors: The current threat-intelligence frameworks (MITRE ATT&CK, Diamond Model, Cyber Kill Chain) do not have first-class concepts for autonomous AI agents as adversary actors. They should. Every framework that currently models adversary tradecraft as a sequence of human-operated steps needs to add a parallel model for the same tradecraft executed by an autonomous agent, with different speed assumptions, different detection signatures, and different defensive priorities.
For cyber insurance carriers: The actuarial assumptions underlying SMB cyber insurance pricing are calibrated to a pre-harness-leak threat environment. Carriers should expect a surge in incidents in mid-2026 and should price accordingly. More importantly, they should make EDR coverage, MFA enforcement, and network segmentation prerequisites for coverage rather than discounts on top of coverage.
For regulators: The Powell-Bessent meeting of April 7 was the right instinct (taking the systemic risk seriously) but the wrong scope (limited to systemically important financial institutions). The Mythos-class threat scales down. SMB regulatory regimes (HIPAA, attorney duty of technological competence, state-level data protection laws) should issue updated guidance reflecting the changed threat environment, with a particular focus on the inadequacy of checklist-based vendor security assessments in a world where defenders need contemporaneous evidence of their actual posture.
For MSPs and security service providers: The Zoltis position is that small and mid-sized MSPs can and should reproduce the harness-plus-abliterated-open-weight combination in their own controlled laboratories — not to weaponize it, but to ground their threat assessments in first-hand observation rather than secondhand reporting. The integrity of MSP threat advice depends on this. We strongly encourage peer MSPs to do the same, to make the same kinds of disclosure to clients that we have made in our own client letters, and to apply the same editorial discipline (reproduction at the architectural level only, no operational uplift, written-authorization-only client engagements) that this paper documents.
11. Limitations and what we did not establish
This paper is grounded in published reporting and in our preliminary laboratory verification (Section 6). There are many things we did not establish, and v2.0 leans harder on this section than v1.0 did because the reviewer was right that v1.0 underplayed it.
11.1 Things we did not measure in the lab
· We did not establish the maximum capability of the combination. Our reproduction confirmed that the harness loads, the tool layer functions, and basic autonomous tasks complete in our test configuration. We did not specialize for offensive operations and did not measure effective capability against realistic attack scenarios.
· We did not run any standardized benchmark. No CAIBench, no arXiv 2603.11214 cyber ranges, no CyberSOCEval, no SWE-bench Pro. The capability of our reproduction relative to any published baseline is unknown.
· We did not measure repeatability. A single successful run does not establish whether the result is robust or fortunate. Without multi-seed reruns we cannot quantify variance.
· We did not measure cross-model portability. We tested one model family. The "model-agnostic" framing in v1 was unsupported by this; the more accurate statement is the one in §6.1: in our tested configuration, the harness did not require Anthropic-hosted services.
· We did not measure long-session context-compaction survival. The leaked harness contains machinery for this; we did not test where it fails.
· We did not measure false-positive, true-positive, or partial-credit rates on any security task. We have no data on the system's reliability as a finder or as a validator.
· We did not compare against the commercial Claude Code product. A side-by-side run on a fixed task pack would tell us how much capability has been retained or lost in the modified configuration. We did not do this.
11.2 Things we did not measure outside the lab
· We did not measure threat-actor adoption. We do not know how many threat-actor groups have already weaponized the harness as of April 10, 2026. The historical precedent of WormGPT, FraudGPT, GhostGPT, and similar dark-web LLM products suggests packaged offensive variants will appear within weeks of a leak of this magnitude, but we have no direct intelligence on the current state of weaponization.
· We did not conduct incident-rate forecasting. Our forecast in 02_Forecast_Model.md is qualitative and based on historical analogues, not quantitative.
· We did not test defensive tooling against the reproduction. A future revision could test specific EDR products, network segmentation configurations, and identity-detection rules against a reproduced agent in a controlled environment. We did not do this for v2.0.
11.3 The future-work commitment
A future revision (v2.1) will evaluate the reproduced system using staged assessment: baseline model, modified model, and harness-integrated system; repeated trials across at least two model families; and a bounded task portfolio measuring task completion, step count, runtime, repeatability, and false-positive rate. The full framework is in Section 12. v2.1 will be published when the staged evaluation has been run and the results are ready to report.
11.4 The Private Evidence Appendix
A separate Private Evidence Appendix is maintained for distribution-controlled review and will contain artifact hashes, configuration records, hardware details, run logs, and benchmark outputs as additional validation work is completed. The appendix exists today as a placeholder structure (most sections are explicitly marked draft) and will be populated as v2.1 work proceeds. It is not part of the public v2.0 paper bundle. Vetted reviewers, peer MSPs, partners, counsel, insurers, or regulators may request access directly from Zoltis.
11.5 The currency caveat
The analysis is current as of April 10, 2026. This is a fast-moving situation. By the time this paper is widely distributed, the landscape may have changed in ways that require further revision. We commit to revising this paper if any of the following occur: a packaged offensive variant of the leaked harness is publicly attributed to an active threat-actor group; Anthropic publishes additional confirmation, retraction, or technical detail about the leak; a comparable leak from another AI vendor; a defensive technique that materially changes the threat picture; or a regulatory action that creates new compliance obligations relevant to SMB defenders.
12. Measurement framework for v2.1 (new in v2.0)
This section is new in v2.0. It exists because the v1.0 paper's strongest claims were not adequately backed by measurement, and the v2.0 reframing addresses that by lowering the rhetoric. v2.1 will address it by raising the evidence. This section describes the staged evaluation Zoltis intends to run as a benchmarked follow-up paper.
12.1 Why this section exists
The peer review of v1.0 correctly observed that strong sentences in the paper (notably the "operationally equivalent to threat-actor tooling" claim and the universal "model-agnostic" framing) read as findings when they were really a mix of finding, inference, and forecast. v2.0 fixes this by softening the rhetoric. v2.1 will fix it by replacing soft observations with measured results. Section 12 is the bridge between the two.
12.2 Standards we will adopt
The v2.1 measurement framework is grounded in published industry standards and benchmarks. The specific frameworks Zoltis intends to anchor against:
· Frontier Model Forum — Managing Advanced Cyber Risks in Frontier AI Frameworks (February 13, 2026). The FMF report defines a 3-stage evaluation pattern: pre-safeguard (assess maximum cyber capability before any safety measures), post-safeguard (evaluate safeguard effectiveness), pre-deployment (assess capability as close to deployment as possible). v2.1 will adopt this 3-stage pattern explicitly. URL: https://www.frontiermodelforum.org/technical-reports/managing-advanced-cyber-risks-in-frontier-ai-frameworks/
· CAIBench: A Meta-Benchmark for Evaluating Cybersecurity AI Agents (Alias Robotics, October 2025; arXiv 2510.24317). CAIBench is the closest existing meta-benchmark for cybersecurity AI agents, with 10,000+ instances across five categories: Jeopardy-style CTFs, Attack-Defense CTFs, Cyber Range exercises, cybersecurity knowledge benchmarks, and privacy/CyberPII-Bench. v2.1 will use CAIBench as its primary capability anchor. URL: https://arxiv.org/abs/2510.24317
· arXiv 2603.11214 — Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios (Folkerts, Payne, et al., March 2026). This paper documents the metric portfolio and the cyber-range methodology that v2.1 will most closely mirror. v2.1 will not run the specific cyber ranges from the paper but will adopt the metric definitions. URL: https://arxiv.org/abs/2603.11214
· CyberSOCEval (Meta + CrowdStrike, September 2025), integrated into CyberSecEval 4. The defender-side complement to CAIBench. v2.1 will use CyberSOCEval to evaluate detection-tier reasoning. GitHub: https://github.com/CrowdStrike/CyberSOCEval_data
· Note on SWE-bench Verified. OpenAI's own retraction in 2026 ("Why SWE-bench Verified no longer measures frontier coding capabilities") documents that SWE-bench Verified is contaminated and unreliable for frontier model evaluation. v2.1 will not use SWE-bench Verified. If a coding-benchmark anchor is required, v2.1 will use SWE-bench Pro (https://www.swebench.com/) instead.
12.3 Metric portfolio
Per the peer reviewer's recommendation and consistent with the metric definitions in arXiv 2603.11214 and CAIBench, v2.1 will measure:
· Task completion rate — fraction of tasks in the chosen pack that the system completes successfully
· Median wall-clock time — end-to-end time per task
· Median tool calls / steps — number of tool invocations per successful completion
· Rerun variance across 5–10 seeds or repeats — to distinguish robust capability from fortunate single runs
· Context-compaction survival on long sessions — whether the harness's context-management machinery sustains capability across the threshold the leaked code is designed to handle
· False-positive / true-positive / partial-credit rate on security findings — for security-oriented tasks where the system makes claims about a target
· Cost per successful completion — wall-clock plus inference compute, normalized to dollars where possible
Each metric will be reported per-stage (baseline / modified / harness-integrated) and per-model-family.
12.4 Operational definitions
The two terms the peer reviewer flagged as overclaimed in v1.0 are operationalized here for v2.1:
· "Model-agnostic" is operationalized as: same harness, same tool layer, same prompt budget, same task suite, success above a defined threshold on at least two materially different open-weight model families. This is a Zoltis methodological contribution and is more rigorous than the loose "applies to all model types" usage in current benchmark literature. v2.1 will state the threshold and the chosen model families before running the evaluation.
· "Operationally equivalent" is operationalized as: relative performance against (a) the original commercial Claude Code product on a benign engineering task pack, (b) the Zoltis baseline harness, and optionally (c) a public open-agent baseline. The threshold is a quantitative comparison on a fixed task pack, not a qualitative impression from one session. v2.0 explicitly does not make any "operationally equivalent" claim; v2.1 will make the claim only if the measured comparison supports it.
12.5 Staged evaluation protocol
Aligned with the FMF 3-stage pattern:
12. Stage 1 — Baseline model evaluation. The official open-weight model (e.g., the upstream Qwen/Qwen3.5-35B-A3B or comparable from a second model family) is run against the chosen task pack with no modifications, no harness, and no tool layer beyond what the model exposes natively. This establishes the floor.
13. Stage 2 — Modified model evaluation. The abliterated derivative wired to the leaked Claude Code harness is run against the same task pack. Differences from Stage 1 are attributable to the abliteration plus the harness scaffolding combined.
14. Stage 3 — Comparison and reporting. Quantitative comparison across the metric portfolio defined in §12.3, with explicit discussion of which differences are attributable to the abliteration step alone, the harness step alone, or the combination. v2.1 will publish the comparison.
12.6 What v2.1 will publish
· Public: aggregated results across the metric portfolio, methodology, comparison across model families, the staged evaluation conclusions, and any updates to this v2.0 paper that the measurement work warrants
· Private (Evidence Appendix): per-run data, configurations, hardware details, file hashes, screenshots, run logs, and any operational artifacts that would constitute offensive uplift if published openly
12.7 Limits of the framework
v2.1's measurement framework is designed to substantiate the capability claims in v2.0. It is not designed to substantiate the adoption claims (whether threat actors are using this in the wild) or the incident-rate claims (how often it succeeds against real defenders). Those remain outside the scope of any measurement Zoltis can conduct in our own laboratory and would require either coordinated industry data or partnership with a threat-intelligence vendor with telemetry.
13. Conclusion
The conventional cybersecurity narrative around the events of April 2026 has focused on Mythos and the Powell-Bessent meeting. We argue that the Claude Code source leak of March 31 is the more consequential event for small and mid-sized business defenders, because it materially reduces the engineering work that previously gated commodity-threat-actor access to autonomous offensive AI. Combined with the mature open-weight model ecosystem and the publicly documented abliteration technique, the leak places the architectural building blocks for autonomous offensive AI within reach of practitioners with consumer-grade hardware.
Zoltis Technologies conducted a preliminary laboratory verification of this combination. We did this so that the recommendations we make to our clients are grounded in first-hand observation, not in secondhand reporting. We are publishing this paper so that other defenders — peer MSPs, in-house security teams, and the broader security community — can ground their own assessments in the same observation. We are equally publishing our limitations clearly, because the v1.0 release of this paper was justly criticized for under-reporting them.
The recommendations in Section 9 are not novel. They are the same fundamentals competent MSPs have been recommending for years. What we argue has changed is the deadline. A compressed schedule on a familiar list of work is the actionable takeaway from this paper. Organizations that have been deferring fundamental cybersecurity hygiene on the assumption that "we have time to get to it" should reconsider that assumption.
We are not panicked. We are calibrated. The threat we describe is real, the architectural capability is here, and the defensive posture that responds adequately to it is well-understood. The work to do is the work that has always needed doing. The schedule has, on our argument, compressed.
v2.0 is published as a positional defensive analysis paper. v2.1 will follow as a benchmarked companion when the lab work in Section 12 has been run against a published task suite. We commit to revising both documents if the threat landscape, the evidence, or the analysis materially changes.
We encourage every reader who manages, advises, or is responsible for the cybersecurity posture of a small or mid-sized organization to take the events of April 2026 as the trigger to finally do the basics. We are at the disposal of any peer MSP, client, regulator, or security research colleague who would benefit from a conversation about any element of this paper.
14. About Zoltis Technologies
Zoltis Technologies is a managed IT services provider headquartered in Van Nuys, California, founded in 2005. We manage cybersecurity, infrastructure, and IT operations for small and mid-sized clients across the legal, medical, and professional-services verticals in Los Angeles and the broader Southern California region. Our work spans Microsoft 365, Google Workspace, Azure, ConnectWise Manage, ScreenConnect, and bespoke automation environments. Our threat-intelligence practice supports our managed-services clients with calibrated, source-cited risk assessments and remediation roadmaps.
We are a small organization. We do not have a security research lab in the traditional sense — what we have is operational discipline, an engineering team that can read source code, and a strong commitment to grounding our client recommendations in first-hand observation. The reproduction documented in Section 6 was conducted by Zoltis engineers as part of our normal threat-intelligence operations.
(323) 212-3002 | zoltis.com Address: 15149 Domino St, Van Nuys, CA 91411 CEO: Karel Rodriguez
15. References
Primary sources — Anthropic disclosures
· Anthropic. "Claude Mythos Preview." red.anthropic.com/2026/mythos-preview/, April 7, 2026.
· Anthropic. Public statement on Claude Code source disclosure, April 1, 2026 (quoted in Bloomberg, Axios, TechCrunch, Fortune coverage).
Primary sources — Claude Code source leak coverage
· Axios. "Anthropic leaks source code for its AI coding agent Claude." March 31, 2026. https://www.axios.com/2026/03/31/anthropic-leaked-source-code-ai
· Bloomberg. "Anthropic Rushes to Limit Leak of Claude Code Source Code." April 1, 2026. https://www.bloomberg.com/news/articles/2026-04-01/anthropic-scrambles-to-address-leak-of-claude-code-source-code
· Fortune. "Anthropic leaks its own AI coding tool's source code in second major security breach." March 31, 2026. https://fortune.com/2026/03/31/anthropic-source-code-claude-code-data-leak-second-security-lapse-days-after-accidentally-revealing-mythos
· InfoQ. "Anthropic Accidentally Exposes Claude Code Source via npm Source Map File." April 2026. https://www.infoq.com/news/2026/04/claude-code-source-leak/
· TechCrunch. "Anthropic took down thousands of GitHub repos trying to yank its leaked source code." April 1, 2026. https://techcrunch.com/2026/04/01/anthropic-took-down-thousands-of-github-repos-trying-to-yank-its-leaked-source-code-a-move-the-company-says-was-an-accident/
· VentureBeat. "Claude Code's source code appears to have leaked: here's what we know" and "In the wake of Claude Code's source code leak, 5 actions enterprise security leaders should take now." April 2026. https://venturebeat.com/security/claude-code-512000-line-source-leak-attack-paths-audit-security-leaders
Primary sources — Powell/Bessent meeting (Tuesday, April 7, 2026)
The meeting date is Tuesday, April 7, 2026. CNBC, Bloomberg, Reuters, and CBS News all describe the meeting as taking place "on Tuesday," and Tuesday of that week was April 7, 2026 — April 8 was a Wednesday. Two earlier drafts of this paper (v1.0 and the first v2.0 draft) incorrectly listed April 8; the error has been corrected in this revision. This means the Mythos Preview announcement and the Treasury/Fed bank-CEO meeting took place on the same day, not on consecutive days.
· Bloomberg. "Bessent, Powell Summon Bank CEOs to Urgent Meeting Over Anthropic's New AI Model." April 10, 2026. https://www.bloomberg.com/news/articles/2026-04-10/anthropic-model-scare-sparks-urgent-bessent-powell-warning-to-bank-ceos
· CBS News. "Mythos Anthropic AI cybersecurity risks — Powell and Bessent meet bank CEOs." April 10, 2026. https://www.cbsnews.com/news/mythos-anthropic-ai-cybersecurity-risks-powell-bessent/
· CNBC. "Powell, Bessent discussed Anthropic's Mythos AI cyber threat with major U.S. banks." April 10, 2026. https://www.cnbc.com/2026/04/10/powell-bessent-us-bank-ceos-anthropic-mythos-ai-cyber.html
· Fortune. "Bessent, Powell convene bank CEOs over Anthropic Mythos AI cyber risk." April 10, 2026. https://fortune.com/2026/04/10/bessent-powell-anthropic-mythos-ai-model-cyber-risk/
· Reuters. "Bessent, Powell warned bank CEOs about Anthropic model risks, sources say." April 10, 2026. https://www.reuters.com/business/finance/bessent-powell-warn-bank-ceos-about-anthropic-model-risks-bloomberg-news-reports-2026-04-10/
· CoinDesk. "Mythos AI threat prompts Bessent, Powell to convene bank CEOs." April 10, 2026.
Market reaction (April 9, 2026)
· Reuters. "US software stocks slump on renewed AI disruption jitters." April 9, 2026. https://www.reuters.com/business/us-software-stocks-fall-anthropics-new-ai-model-revives-disruption-fears-2026-04-09/ — narrative-text figures verified via The Globe and Mail syndication at https://www.theglobeandmail.com/investing/article-us-software-stocks-fall-as-anthropics-new-ai-model-revives-disruption/
Threat intelligence — Claude Code leak exploitation
· Zscaler ThreatLabz. "Anthropic Claude Code Leak." April 2026. https://www.zscaler.com/blogs/security-research/anthropic-claude-code-leak
· Trend Micro. "Weaponizing Trust Signals: Claude Code Lures and GitHub Release Payloads." April 2026. https://www.trendmicro.com/en_us/research/26/d/weaponizing-trust-claude-code-lures-and-github-release-payloads.html
· Bleeping Computer. "Claude Code Leak used to push infostealer malware on GitHub." April 2026.
· GBHackers. "Claude Code Leak Exploited to Spread Vidar and GhostSocks via GitHub Releases." April 2026.
· Help Net Security. "Claude Code source leak exploited to spread malware." April 3, 2026. https://www.helpnetsecurity.com/2026/04/03/claude-code-leak-github-malware/
Check Point Research — Claude Code project-file vulnerabilities (separate from the source leak)
These CVEs concern a distinct vulnerability class — token exfiltration and RCE through Claude Code project configuration files (hooks, settings, project-level config) — and predate the March 31 source-leak event by months. v1.0 of this paper incorrectly grouped them with the source-leak event.
· Check Point Research. "Caught in the Hook: RCE and API Token Exfiltration Through Claude Code Project Files | CVE-2025-59536 | CVE-2026-21852." February 25, 2026. https://research.checkpoint.com/2026/rce-and-api-token-exfiltration-through-claude-code-project-files-cve-2025-59536/
· NIST National Vulnerability Database. CVE-2025-59536. Published October 3, 2025. https://nvd.nist.gov/vuln/detail/CVE-2025-59536
· NIST National Vulnerability Database. CVE-2026-21852. Published January 21, 2026. https://nvd.nist.gov/vuln/detail/CVE-2026-21852
Independent validation of Mythos-class vulnerability discovery
· Help Net Security. "Claude helps researcher dig up decade-old Apache ActiveMQ RCE vulnerability." April 9, 2026 (CVE-2026-34197). https://www.helpnetsecurity.com/2026/04/09/apache-activemq-rce-vulnerability-cve-2026-34197-claude/
Open-weight models — Qwen3.5-35B-A3B (provenance)
· Qwen / Alibaba. Qwen3.5-35B-A3B (official base model). Released February 24, 2026. https://huggingface.co/Qwen/Qwen3.5-35B-A3B
· HauhauCS. Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive (third-party derivative). https://huggingface.co/HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive
· huihui-ai. Huihui-Qwen3.5-35B-A3B-abliterated (third-party derivative). https://huggingface.co/huihui-ai/Huihui-Qwen3.5-35B-A3B-abliterated
Abliteration technique and toolkits
· NousResearch. llm-abliteration toolkit. https://github.com/NousResearch/llm-abliteration
· "OBLITERATUS Strips AI Safety From Open Models in Minutes." awesomeagents.ai, 2026. https://awesomeagents.ai/news/obliteratus-strips-ai-safety-open-models/
· "Heretic AI Abliteration Benchmarks vs GPT-4 Safety." aithinkerlab.com, 2026. https://aithinkerlab.com/heretic-ai-abliteration-benchmarks-2026/
· arXiv 2510.02768. "A Granular Study of Safety Pretraining under Model Abliteration." https://arxiv.org/html/2510.02768v1
· arXiv 2505.19056. "An Embarrassingly Simple Defense Against LLM Abliteration Attacks." https://arxiv.org/html/2505.19056v1
· "Abliteration: Removing AI Refusals Without Retraining." Envisioning Vocab, 2026. https://www.envisioning.com/vocab/abliteration
Threat-actor AI adoption (historical context)
· CrowdStrike. 2026 Global Threat Report. https://www.crowdstrike.com/en-us/press-releases/2026-crowdstrike-global-threat-report/
· Microsoft Security Blog. "AI as Tradecraft: How Threat Actors Operationalize AI." March 6, 2026.
· Google Cloud Threat Intelligence Group (GTIG). "Threat Actor Usage of AI Tools" (PROMPTFLUX, PROMPTSTEAL coverage). 2026.
· The Record. "New malware uses AI to adapt." 2026.
Chinese open-weight model landscape
· Hugging Face. "State of Open Source on Hugging Face: Spring 2026."
· BenchLM. "State of LLM Benchmarks 2026: Rankings, Trends, and What Actually Changed." February 2026.
· Simon Willison. "GLM-5.1: Towards Long-Horizon Tasks." April 7, 2026.
· MIT Technology Review. "What's next for Chinese open-source AI." February 12, 2026.
Measurement framework (new in v2.0)
References for Section 12. v2.1 will execute against these standards.
· Frontier Model Forum. "Managing Advanced Cyber Risks in Frontier AI Frameworks." February 13, 2026. https://www.frontiermodelforum.org/technical-reports/managing-advanced-cyber-risks-in-frontier-ai-frameworks/
· Alias Robotics. "CAIBench: A Meta-Benchmark for Evaluating Cybersecurity AI Agents." arXiv 2510.24317, October 2025. https://arxiv.org/abs/2510.24317
· Folkerts, L., Payne, W., Inman, S., et al. "Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios." arXiv 2603.11214, March 2026. https://arxiv.org/abs/2603.11214
· CrowdStrike + Meta. "CyberSOCEval." Released September 2025. https://github.com/CrowdStrike/CyberSOCEval_data and https://www.crowdstrike.com/en-us/press-releases/crowdstrike-and-meta-deliver-new-benchmarks-for-evaluation-of-ai-performance-in-cybersecurity/
· OpenAI. "Why SWE-bench Verified no longer measures frontier coding capabilities." 2026. https://openai.com/index/why-we-no-longer-evaluate-swe-bench-verified/
· SWE-bench Pro. https://www.swebench.com/
Zoltis Technologies — Managed IT Services — Los Angeles — 2005 to present