<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://hammadtariq.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://hammadtariq.github.io/" rel="alternate" type="text/html" /><updated>2026-05-12T05:49:43+00:00</updated><id>https://hammadtariq.github.io/feed.xml</id><title type="html">Hammad Tariq</title><subtitle>Technical blog by Hammad Tariq exploring AI agents, infrastructure, decentralized identity, strategy frameworks, and systems thinking.</subtitle><author><name>Hammad Tariq</name></author><entry><title type="html">The Half-Life of Tooling: A Startup Note on the CPU Convergence</title><link href="https://hammadtariq.github.io/startups/half-life-of-tooling-cpu-convergence/" rel="alternate" type="text/html" title="The Half-Life of Tooling: A Startup Note on the CPU Convergence" /><published>2026-05-12T00:00:00+00:00</published><updated>2026-05-12T00:00:00+00:00</updated><id>https://hammadtariq.github.io/startups/half-life-of-tooling-cpu-convergence</id><content type="html" xml:base="https://hammadtariq.github.io/startups/half-life-of-tooling-cpu-convergence/"><![CDATA[<p class="align-center"><img src="/assets/images/half-life-of-tooling.png" alt="The Half-Life of Tooling — when the agent writes the tool, what survives?" /></p>

<p>I have been using Hermes with Codex.</p>

<p>Each kanban card takes 15 to 18 minutes to complete. I am neurotic when it comes to code quality, so I let it run.</p>

<p>While I waited, I had ARM and Intel charts open in another tab.</p>

<p>That is where this piece comes from.</p>

<h2 id="the-market-figured-it-out-first">The market figured it out first</h2>

<p>The names that should be losing the AI trade are winning it.</p>

<p>Some receipts from the last few months:</p>

<ul>
  <li>ARM: up roughly 84% YTD</li>
  <li>Intel: up roughly 240% YTD, up 500% in twelve months</li>
  <li>AMD: up roughly 112% YTD</li>
  <li>Nvidia: up roughly 15% YTD</li>
</ul>

<p>Mizuho called it “a changing of the guard in AI.” That is the right phrase.</p>

<p>For three years the AI infrastructure narrative was simple: more GPUs, faster GPUs, bigger GPU clusters. The model was the constraint. Everything else was plumbing.</p>

<p>That narrative has cracked.</p>

<p>Intel surged 30% in a single day on Q1 earnings, with the CEO openly saying CPU-to-GPU ratios are moving from 1:8 toward parity. ARM’s data center royalties more than doubled year-over-year for the third quarter in a row. AMD’s Lisa Su told investors the server CPU TAM is now projected to grow over 35% annually to more than $120 billion by 2030. She doubled that forecast in six months.</p>

<p>The market priced in something the AI press took a year longer to admit.</p>

<h2 id="what-i-converged-on-watching-hermes">What I converged on watching Hermes</h2>

<p>Watching a coding agent run is a strange exercise. You stop looking at the output log and just notice the latency.</p>

<p>Most of it is not the model.</p>

<p>The model does its thing in a couple of seconds. Then the rest happens somewhere I cannot see. The agent spins up a sandbox to try the code. Reads files. Parses JSON. Decides whether to retry. Calls another tool. Waits on a vector lookup. Hands off to a sub-agent. Writes back to memory.</p>

<p>Each step is tiny. Each task has thousands of them. None of them parallelize on a GPU.</p>

<p>This connects to something I wrote about after <a href="http://hammadtariq.com/startups/iclr-2026-stateful-ai-memory-as-substrate/">ICLR</a>: the model is no longer the whole product. The system around the model is.</p>

<p>And that system runs on a CPU.</p>

<p>Put those two thoughts together, and the ARM and Intel charts stop being a market quirk. They are pricing in a hardware truth that has been visible from inside the agent loop for months.</p>

<h2 id="what-the-system-runs-on-a-cpu-actually-means">What “the system runs on a CPU” actually means</h2>

<p>Arm’s own framing is the cleanest version of this. In its FY26 earnings letter, the company said data centers will need 4x current CPU capacity per gigawatt as agentic AI scales. Multi-agent systems push token generation up by roughly 15x per user. Across most agentic workloads, 50% to 90% of end-to-end latency happens on the CPU side, between the inferences.</p>

<p>That is not a marketing pivot. That is a budget reality.</p>

<p>GPUs do inference. CPUs do everything between the inferences. In an agentic system, “everything between the inferences” is most of the wall-clock time.</p>

<p>This is why Hermes takes 15 minutes. This is why ARM is up 84%. It is the same fact, observed from two different chairs.</p>

<h2 id="how-this-gets-resolved">How this gets resolved</h2>

<p>Two phases.</p>

<p><strong>Phase 1: more CPUs, better CPUs.</strong></p>

<p>This is what the market is already trading. ARM AGI CPU (136 Neoverse V3 cores, custom silicon, $2 billion in FY27/FY28 demand from Meta, OpenAI, Cerebras, and others). Intel’s data center segment growing 22% on agentic demand. AMD EPYC Venice with 256 Zen 6 cores. NVIDIA’s own Vera CPU, Arm-based, paired with Rubin GPUs. Google Axion, Microsoft Cobalt, Amazon Graviton at $20B+ run rate.</p>

<p>If the bottleneck is CPU orchestration in a fixed power envelope, then for the next 12 to 24 months the answer is to throw better general-purpose CPUs at it. That is the trade the market is making right now.</p>

<p><strong>Phase 2: custom silicon for the hot parts.</strong></p>

<p>Once the general-purpose CPU push has done what it can, the next question becomes which parts of the agent loop are stable enough and hot enough to burn into a chip.</p>

<p>My bet on three:</p>

<ul>
  <li><strong>Retrieval and reranking.</strong> A memory-attached accelerator does roughly 20x more queries per watt at scale. Academic prototypes (ANNA, Falcon, Chameleon) have already shown the ceiling. This is the only one of the three with a real standalone business.</li>
  <li><strong>Sandbox creation.</strong> Every tool call needs a fresh isolated environment. Today that is around 100ms of overhead per call. Hardware-enforced capability switching, the CHERI line of work, can take it to microseconds. Across one agent task, seconds saved.</li>
  <li><strong>Constrained output.</strong> The part that forces the model to emit valid JSON or code. Small. Hot. Regular. Better licensed as IP than sold as a chip.</li>
</ul>

<p>These do not need to ship as three separate products. They get absorbed.</p>

<h2 id="why-the-agi-chip-eats-this-anyway">Why the AGI chip eats this anyway</h2>

<p>Hardware trends usually start as accelerator cards and end up as IP blocks on a single SoC. GPUs absorbed dedicated rasterizers. SoCs absorbed image signal processors, video codecs, NPUs. The TPU started as a side project and is now central to Google’s compute.</p>

<p>The same will happen to agent-loop silicon. The retrieval engine, the sandbox MMU extensions, and the constrained-decode FSM are all small enough in die area that the next generation of AGI chip — whoever ships it, ARM or Nvidia or Meta or someone we have not heard of yet — pulls them onto the same package as the model accelerator. Probably alongside an HBM tier sized for the KV cache and the vector store.</p>

<p>That is the long shape:</p>

<blockquote>
  <p>CPU is the bottleneck → custom accelerators emerge → accelerators get pulled into a single die → the agent loop becomes a hardware-native primitive</p>
</blockquote>

<p>When that happens, “running an agent” stops being a fan-out of services across CPU, GPU, and storage. It becomes a single chip doing all of it.</p>

<h2 id="what-this-means-for-startups">What this means for startups</h2>

<p>This is the part that matters most if you are building.</p>

<p>A large fraction of the current agentic startup landscape is tooling. Tools the agent calls. Wrappers around agents. Integrations the agent invokes. The entire “MCP server for X” surface area.</p>

<p>That category gets eaten in two ways.</p>

<p><strong>First, by the speed of the loop itself.</strong></p>

<p>If a coding agent today takes 4 to 16 hours to build a moderately complex tool, the same tool gets built in 4 to 16 minutes within 12 to 18 months. CPU speedup alone does not deliver that. Stack the CPU work with better base models, parallel sub-agents, cached patterns from prior runs, and improved harnesses, and 50x to 100x on end-to-end task time is realistic. The agent loop compounds the way a compiler pipeline compounds.</p>

<p><strong>Second, by the agent itself choosing whether to build or reuse.</strong></p>

<p>Once an agent can write a tool faster than a human can ship the integration, the agent writes the tool. The decision to call your SaaS becomes a build-vs-buy decision the agent makes in milliseconds. If your moat is “we built the connector to X,” the half-life of that moat shortens every quarter.</p>

<p>The harder, longer-lived layer is the core AI work:</p>

<ul>
  <li>the model itself</li>
  <li>the harness around the model</li>
  <li>the memory and state layer</li>
  <li>the evaluation and reward signals</li>
  <li>the policy and identity layer (this is where I have been spending time with <a href="https://attach.dev">Attach</a> and <a href="https://openbotauth.org">OpenBotAuth</a>)</li>
  <li>the data that feeds it</li>
  <li>the infrastructure that the model cannot generate on its own</li>
</ul>

<p>These are the parts that do not fall to faster CPU loops. They are also the parts most agentic startups are not focused on.</p>

<p>I have been saying versions of this for a while, in the <a href="http://hammadtariq.com/startups/why-agents-need-policy-as-code/">policy-as-code piece</a> and the <a href="http://hammadtariq.com/startups/iclr-2026-stateful-ai-memory-as-substrate/">memory piece</a>. The CPU and silicon angle makes it sharper. The faster the agent loop gets, the more value moves to the parts the agent cannot generate by itself.</p>

<h2 id="what-i-am-watching">What I am watching</h2>

<p>A few signals over the next two to four quarters:</p>

<ol>
  <li><strong>Whether ARM’s AGI CPU ships on time and lands at hyperscalers.</strong> Supply is the bottleneck on the bottleneck. Reuters flagged this on the Q4 call.</li>
  <li><strong>Whether a sovereign or regional cloud announces a near-memory retrieval accelerator.</strong> Gulf, Singapore, EU. These are the buyers who care about watts-per-query and cannot wait for the hyperscalers to build internally.</li>
  <li><strong>Whether a CHERI-style capability extension lands in a commercial ARM Neoverse core.</strong> That is the path for sandbox isolation to move from research to production silicon.</li>
  <li><strong>Whether anyone bundles constrained-decode acceleration into an inference appliance.</strong> Groq, Cerebras, SambaNova are the natural homes.</li>
</ol>

<h2 id="the-shape-of-the-next-bottleneck">The shape of the next bottleneck</h2>

<p>The last AI hardware war was about training models.</p>

<p>The next one is about running agents.</p>

<p>The chips look different. The companies winning look different. And the startups that survive will be the ones building the parts the agent cannot build on its own.</p>

<p>That is the convergence I saw watching Hermes work through a kanban card with ARM and Intel charts open in another tab. Same picture, two screens.</p>

<hr />

<p><em>Find me at <a href="https://twitter.com/hammadtariq">@hammadtariq</a> if you are thinking about this.</em></p>

<p><em>Article drafted in conversation with Claude.</em></p>]]></content><author><name>Hammad Tariq</name></author><category term="startups" /><category term="agents" /><category term="ai" /><category term="startups" /><category term="hardware" /><category term="cpu" /><category term="silicon" /><category term="market" /><category term="arm" /><category term="intel" /><category term="infrastructure" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">ICLR 2026: Memory Is the Substrate</title><link href="https://hammadtariq.github.io/startups/iclr-2026-stateful-ai-memory-as-substrate/" rel="alternate" type="text/html" title="ICLR 2026: Memory Is the Substrate" /><published>2026-04-29T00:00:00+00:00</published><updated>2026-04-29T00:00:00+00:00</updated><id>https://hammadtariq.github.io/startups/iclr-2026-stateful-ai-memory-as-substrate</id><content type="html" xml:base="https://hammadtariq.github.io/startups/iclr-2026-stateful-ai-memory-as-substrate/"><![CDATA[<p>I came to ICLR looking for memory systems.</p>

<p>I left thinking memory is only one part of a bigger shift.</p>

<p>The field is moving from models that answer to systems that maintain state, act, evaluate themselves, and improve over time.</p>

<p>That is the signal I took from Rio.</p>

<h2 id="1-the-model-is-no-longer-the-whole-product">1. The model is no longer the whole product</h2>

<p>One of the clearest patterns was that the harness is becoming a first-class object.</p>

<p>In the industry talks, this was obvious. Zillow’s agent architecture was not presented as “we call an LLM and hope it works.” It was a full agent harness: knowledge layer, action layer, user understanding, skills, sub-agents, planning, memory, policy, and execution loops.</p>

<p>The LLM was swappable.</p>

<p>That is the important part.</p>

<p>GPT, Claude, open-weights — they plug into the system. The system is the thing being engineered.</p>

<p>This same pattern appeared in research.</p>

<p>AutoHarness is literally about generating code harnesses around agents. Instead of only improving the model, it improves the wrapper that constrains, guides, and executes behavior.</p>

<p>ADAS — Automated Design of Agentic Systems — pushes this further: define the agentic-system search space in code, then let AI search for better agent architectures.</p>

<p>That is a different mental model.</p>

<h2 id="2-rag-is-becoming-too-small-a-word">2. RAG is becoming too small a word</h2>

<p>RAG retrieves chunks.</p>

<p>Memory maintains state.</p>

<p>That distinction kept coming up again and again.</p>

<p>A lot of the memory work at ICLR was not just “retrieve more context.” It was trying to answer deeper questions:</p>

<ul>
  <li>What should be remembered?</li>
  <li>What should be compressed?</li>
  <li>What should be forgotten?</li>
  <li>What should be retrieved now?</li>
  <li>What should become persistent state?</li>
  <li>What evidence supports a memory?</li>
  <li>How does memory change over time?</li>
</ul>

<p><strong>AssoMem</strong> framed memory as associative retrieval, not just semantic similarity. The point is simple: cosine distance is not enough. Human memory links things through time, importance, relevance, context, and association.</p>

<p><strong>MEM1</strong> pushed the same idea from another angle: long-horizon agents cannot just append all past turns forever. They need a compact internal state that helps memory and reasoning work together.</p>

<p><strong>Limited Memory Language Models</strong> externalized factual knowledge into a database during pretraining instead of burying everything inside opaque weights.</p>

<p><strong>Evoking User Memory</strong> split retrieval into something closer to human cognition: fast familiarity and slower recollection.</p>

<p><strong>HippoRAG</strong> and the <strong>MemAgents</strong> talks kept coming back to a biological intuition: memory is not a pile of text. It is an active mechanism for finding, linking, and using past experience.</p>

<h2 id="3-memory-is-becoming-part-of-self-improvement">3. Memory is becoming part of self-improvement</h2>

<p>The most interesting version of memory is not passive.</p>

<p>It is not just:</p>

<blockquote>
  <p>store → retrieve</p>
</blockquote>

<p>It is:</p>

<blockquote>
  <p>experience → reflect → update memory → retrieve better next time → improve behavior</p>
</blockquote>

<p>That is where memory and recursive self-improvement start to touch.</p>

<p>Jeff Clune’s MemAgents keynote made this explicit. His talk connected memory to a much broader open-endedness agenda: POET, AI-generating algorithms, OMNI-EPIC, ADAS, HyperAgents, ALMA, and The AI Scientist.</p>

<p>The strongest pattern was not “AI improves itself” in some vague sci-fi sense.</p>

<p>It was more concrete: <strong>automate the search over systems.</strong></p>

<ul>
  <li>Search over environments.</li>
  <li>Search over curricula.</li>
  <li>Search over agent architectures.</li>
  <li>Search over memory designs.</li>
  <li>Search over scientific hypotheses.</li>
</ul>

<p><strong>ALMA</strong> was the cleanest memory-specific version of this. The premise is blunt: most memory systems are hand-designed, fixed, and brittle. ALMA tries to meta-learn memory designs for agentic systems.</p>

<p>That is a big idea.</p>

<p>Not:</p>

<blockquote>
  <p>we built a better memory module</p>
</blockquote>

<p>But:</p>

<blockquote>
  <p>the memory architecture itself can be searched over and improved</p>
</blockquote>

<p>That feels like a real direction.</p>

<h2 id="4-rsi-is-becoming-boring-in-a-good-way">4. RSI is becoming boring in a good way</h2>

<p>Recursive self-improvement used to sound like a philosophical argument.</p>

<p>At ICLR, it felt more like systems engineering.</p>

<p>There is no single RSI loop. There are many loops:</p>

<ul>
  <li>code improves</li>
  <li>prompts improve</li>
  <li>memory improves</li>
  <li>tools improve</li>
  <li>data improves</li>
  <li>evaluators improve</li>
  <li>policies improve</li>
  <li>weights sometimes improve</li>
</ul>

<p>That is the grounded version.</p>

<p>The interesting systems are not necessarily self-improving end-to-end. They are improving some component inside a controlled loop.</p>

<p>This matters because “self-improvement” becomes much less magical when you ask:</p>

<p><strong>What exactly is allowed to change?</strong></p>

<ul>
  <li>A prompt?</li>
  <li>A memory schema?</li>
  <li>A tool?</li>
  <li>A reward model?</li>
  <li>A policy?</li>
  <li>A dataset?</li>
  <li>A planner?</li>
  <li>A model checkpoint?</li>
</ul>

<p>That is where the work is.</p>

<p>And the hard part is not just improvement. It is <strong>safe</strong> improvement.</p>

<ul>
  <li>How do you prevent reward hacking?</li>
  <li>How do you prevent memory corruption?</li>
  <li>How do you know the system improved instead of overfitting the evaluator?</li>
  <li>How do you roll back a bad update?</li>
</ul>

<p>That is where the next tooling layer will appear.</p>

<h2 id="5-world-models-are-about-consistency-not-just-generation">5. World models are about consistency, not just generation</h2>

<p>World models were another major signal.</p>

<p>The shallow version is:</p>

<blockquote>
  <p>generate future video</p>
</blockquote>

<p>The deeper version is:</p>

<blockquote>
  <p>maintain a consistent model of the world across time, action, and viewpoint</p>
</blockquote>

<p>That is much harder.</p>

<p>The <strong>ViewRope</strong> world-model poster made this very concrete. The problem is not just making plausible frames. The problem is geometric consistency: if the camera moves, rotates, revisits a place, or closes a loop, does the world stay the same?</p>

<p><strong>Latent Particle World Models</strong> attacked the problem from the object side: object-centric stochastic dynamics, particles, masks, actions, and goals.</p>

<p>These are different techniques, but the same direction: agents need stable internal state about the world.</p>

<p>That is why world models connect back to memory.</p>

<ul>
  <li>Memory is state over experience.</li>
  <li>World models are state over environment.</li>
</ul>

<p>Both are about preventing the system from drifting.</p>

<p>A model that cannot maintain state cannot plan well. A model that cannot test its state against reality cannot improve.</p>

<h2 id="6-science-ai-looked-more-real-than-i-expected">6. Science AI looked more real than I expected</h2>

<p>The science work was better than I expected.</p>

<p>Not because it was flashy. Because it was constrained.</p>

<p>Hard domains force the model to prove something.</p>

<p>In science, you cannot just sound plausible. You need benchmarks, measurements, physical constraints, experiments, or domain-specific evaluation.</p>

<p><strong>AstaBench</strong> was one example: benchmarking scientific research agents across actual research workflows.</p>

<p>The <strong>protein binder</strong> work was another: generative pretraining plus test-time compute for atomistic protein binder design.</p>

<p><strong>WIND</strong> used diffusion for atmospheric modeling, but the interesting part was not “diffusion is cool.” It was using a generative model under physical constraints and inverse-problem structure.</p>

<p>The <strong>fMRI-to-image</strong> paper was also memorable. The key idea was not just better image generation. It was fixing the first stage: mapping brain signals into a better latent representation before reconstruction.</p>

<p>That is the science pattern: the hard part is often the representation and evaluation layer, not just the generator.</p>

<p>This is why hard-science AI is interesting. It forces the stack to become real.</p>

<h2 id="7-safety-and-policy-are-part-of-memory">7. Safety and policy are part of memory</h2>

<p>Another thing became clearer to me: if agents have memory, policy cannot be an afterthought.</p>

<p>A long-lived agent needs to decide:</p>

<ul>
  <li>should this be stored?</li>
  <li>who can access it?</li>
  <li>when can it be retrieved?</li>
  <li>can it be shown to this user?</li>
  <li>should it expire?</li>
  <li>should it be forgotten?</li>
  <li>was it used correctly?</li>
</ul>

<p>Zillow’s fair-housing / inverse-constitutional-AI framing made this concrete. Static rules are brittle. Real policy boundaries are contextual. The same phrase can be okay in one context and problematic in another.</p>

<p>That means memory is not just storage. It is a <strong>policy-gated read/write system.</strong></p>

<p>For enterprise agents, healthcare agents, real-estate agents, finance agents, personal assistants — this is not optional.</p>

<p>Persistent memory without access control, provenance, deletion, and policy is a liability.</p>

<p>So the next memory stack probably has three parts:</p>

<ol>
  <li>structured memory</li>
  <li>retrieval/update policy</li>
  <li>audit and governance</li>
</ol>

<p>That is the product-shaped version.</p>

<h2 id="8-the-real-convergence">8. The real convergence</h2>

<p>The more I walked around ICLR, the more the same pattern kept reappearing under different names.</p>

<ul>
  <li>Memory people were talking about retrieval, consolidation, state, and forgetting.</li>
  <li>Agent people were talking about harnesses, tools, evaluators, and planning loops.</li>
  <li>World-model people were talking about consistency over time.</li>
  <li>RSI people were talking about systems that update themselves.</li>
  <li>Science people were talking about evaluation, constraints, and closed loops.</li>
</ul>

<p>These are not separate trends. They are converging.</p>

<p>The next AI stack looks something like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>model
+ memory/state
+ tools/actions
+ policy gates
+ evaluators
+ world model
+ update loop
</code></pre></div></div>

<p>The model is still important. But the surrounding system is becoming just as important.</p>

<p>Maybe more important.</p>

<h2 id="what-i-think-comes-next">What I think comes next</h2>

<p>My bet after ICLR:</p>

<h3 id="1-rag-turns-into-memory-infrastructure">1. RAG turns into memory infrastructure</h3>

<p>Basic retrieval becomes commodity. The interesting layer is persistent, structured, temporal memory with provenance, uncertainty, forgetting, and policy.</p>

<h3 id="2-agent-harnesses-become-trainable">2. Agent harnesses become trainable</h3>

<p>The harness will not just be hand-coded. It will be searched, optimized, learned, and improved.</p>

<h3 id="3-memory-managers-become-learned-systems">3. Memory managers become learned systems</h3>

<p>The next memory systems will decide what to store, compress, retrieve, and update.</p>

<p>Not every memory will be a chunk.</p>

<ul>
  <li>Some will be facts.</li>
  <li>Some will be events.</li>
  <li>Some will be summaries.</li>
  <li>Some will be state.</li>
  <li>Some will be policies.</li>
</ul>

<h3 id="4-evaluation-becomes-core-infrastructure">4. Evaluation becomes core infrastructure</h3>

<p>If a system can improve itself, the evaluator becomes part of the product.</p>

<p>Bad evaluator, bad self-improvement.</p>

<h3 id="5-world-models-and-memory-start-merging">5. World models and memory start merging</h3>

<p>For embodied agents, scientific agents, enterprise agents, and personal agents, the system needs a stable model of what is true, what changed, and what might happen next.</p>

<p>That is memory plus world modeling.</p>

<h3 id="6-policy-gated-memory-becomes-mandatory">6. Policy-gated memory becomes mandatory</h3>

<p>The more useful memory becomes, the more dangerous it becomes.</p>

<p>Safe memory is not just “don’t store sensitive things.” It is contextual access, deletion, retention, provenance, and auditability.</p>

<h2 id="my-takeaway">My takeaway</h2>

<p>I left thinking the real direction is <strong>stateful AI</strong>.</p>

<p>The next wave is systems that:</p>

<ul>
  <li>maintain state</li>
  <li>reason over time</li>
  <li>retrieve the right past</li>
  <li>act through tools</li>
  <li>evaluate outcomes</li>
  <li>update themselves</li>
  <li>and stay within policy boundaries</li>
</ul>

<p>That is the frontier I saw in Rio.</p>

<p>If you are building, I would not build another thin wrapper around an LLM.</p>

<p>I would build the state layer.</p>

<p>Because once agents become long-lived, memory is not a feature.</p>

<p><strong>Memory is the substrate.</strong></p>]]></content><author><name>Hammad Tariq</name></author><category term="startups" /><category term="ai" /><category term="agents" /><category term="memory" /><category term="iclr" /><category term="world-models" /><category term="rsi" /><category term="evaluation" /><category term="harness" /><category term="policy" /><summary type="html"><![CDATA[I came to ICLR looking for memory systems.]]></summary></entry><entry><title type="html">Your AI Agent Just Installed a RAT</title><link href="https://hammadtariq.github.io/startups/your-ai-agent-just-installed-a-rat-supply-chain-attacks-meet-attach-guard/" rel="alternate" type="text/html" title="Your AI Agent Just Installed a RAT" /><published>2026-04-01T00:00:00+00:00</published><updated>2026-04-01T00:00:00+00:00</updated><id>https://hammadtariq.github.io/startups/your-ai-agent-just-installed-a-rat-supply-chain-attacks-meet-attach-guard</id><content type="html" xml:base="https://hammadtariq.github.io/startups/your-ai-agent-just-installed-a-rat-supply-chain-attacks-meet-attach-guard/"><![CDATA[<p>Last Monday I <a href="https://x.com/hammadtariq/status/2038828755632918929?s=20">quote-tweeted</a> Feross’s alert about the axios compromise and wrote:</p>

<blockquote>
  <p>“Claude code should come with a dedicated dependency risk checker. Can’t rely on socket-dev mcp, automatic enforcement is required.”</p>
</blockquote>

<p>Two days later I shipped <a href="https://attach.dev/attach-guard">attach-guard</a>.</p>

<p>This post is the full story: what happened to axios and litellm, why AI coding agents make supply chain attacks dramatically worse, and how a single Claude Code hook would have stopped both.</p>

<hr />

<h2 id="one-week-two-megapackages-zero-human-reviewers">One Week, Two Megapackages, Zero Human Reviewers</h2>

<h3 id="axios--march-31-2026">axios — March 31, 2026</h3>

<p>axios is one of npm’s most depended-upon packages. Over 100 million weekly downloads. On March 31st, somebody hijacked the primary maintainer account (<code class="language-plaintext highlighter-rouge">jasonsaayman</code>), changed the associated email to a Proton address, and published two poisoned versions: <strong><code class="language-plaintext highlighter-rouge">axios@1.14.1</code></strong> and <strong><code class="language-plaintext highlighter-rouge">axios@0.30.4</code></strong>.</p>

<p>The only change in both: a new dependency — <code class="language-plaintext highlighter-rouge">plain-crypto-js@4.2.1</code>. A package that did not exist the day before.</p>

<p>That package ran a <code class="language-plaintext highlighter-rouge">postinstall</code> hook executing an obfuscated 4,209-byte JavaScript dropper. The payload: <strong>WAVESHAPER.V2</strong> — a cross-platform RAT with command execution, system exfiltration, and persistence capabilities across Windows, macOS, and Linux.</p>

<p>Google/Mandiant attributed it to <strong>UNC1069</strong>, a financially motivated North Korea-nexus threat actor active since at least 2018.</p>

<p>Socket.dev’s scanner caught <code class="language-plaintext highlighter-rouge">plain-crypto-js</code> within six minutes of publication. The malicious versions were live for roughly two to three hours.</p>

<p>Two hours is more than enough when agents install packages autonomously.</p>

<h3 id="litellm--march-24-2026">litellm — March 24, 2026</h3>

<p>One week earlier: litellm, the LLM proxy that half the AI developer ecosystem depends on.</p>

<p>A threat actor known as <strong>TeamPCP</strong> stole PyPI publishing credentials via a compromised Trivy GitHub Action in litellm’s CI/CD pipeline, then uploaded <strong><code class="language-plaintext highlighter-rouge">litellm==1.82.7</code></strong> and <strong><code class="language-plaintext highlighter-rouge">litellm==1.82.8</code></strong> directly to PyPI — bypassing the normal release process entirely.</p>

<p>The malicious wheel contained <code class="language-plaintext highlighter-rouge">litellm_init.pth</code>. If you’re not familiar with Python’s <code class="language-plaintext highlighter-rouge">.pth</code> trick: those files execute automatically every time the Python interpreter starts. No import needed. Just existing in <code class="language-plaintext highlighter-rouge">site-packages/</code> is enough.</p>

<p>The double-base64-encoded payload:</p>

<ul>
  <li><strong>Stage 1</strong>: Harvested environment variables, API keys, SSH keys, Git credentials, AWS/GCP/Azure/K8s configs, Docker credentials, shell history, crypto wallets, SSL private keys, CI/CD secrets, and database credentials.</li>
  <li><strong>Stage 2</strong>: Encrypted everything with AES-256-CBC + a hardcoded 4096-bit RSA public key, then exfiltrated via POST to <code class="language-plaintext highlighter-rouge">models.litellm.cloud</code> — an attacker-controlled lookalike domain, not the real <code class="language-plaintext highlighter-rouge">litellm.ai</code>.</li>
</ul>

<p>Also live for roughly three hours.</p>

<h3 id="the-pattern">The pattern</h3>

<p>Both attacks targeted maintainer or CI credentials. Both were live for hours. Both hit packages that AI developers install constantly. And both would have been caught by the same two signals: <strong>a suddenly cratered supply chain score</strong> and <strong>a version younger than 48 hours</strong>.</p>

<hr />

<h2 id="ai-agents-made-this-worse">AI Agents Made This Worse</h2>

<p>Here’s the part that keeps me up at night.</p>

<p>Claude Code, Cursor, Copilot — they run <code class="language-plaintext highlighter-rouge">npm install</code> and <code class="language-plaintext highlighter-rouge">pip install</code> autonomously, dozens of times per session. No human squinting at the dependency diff. No one to ask “wait, since when does axios need <code class="language-plaintext highlighter-rouge">plain-crypto-js</code>?”</p>

<p>The agent sees a task, decides it needs a package, installs it, and moves on. The entire compromise lifecycle — from <code class="language-plaintext highlighter-rouge">npm install</code> to RAT on your machine — happens in the time it takes you to sip your coffee.</p>

<p><strong>“But I have Socket.dev’s MCP server.”</strong></p>

<p>Good. MCP servers provide advisory context. They inform. They do not enforce. The agent can acknowledge the warning, weigh it against the task, and install anyway. That’s by design — MCP is context, not a guardrail.</p>

<p><strong>“What about a skill?”</strong></p>

<p>Skills are instructions the agent follows when invoked. They guide behavior, but they cannot block actions. An agent can skip a skill. It cannot skip a hook.</p>

<p>The gap was obvious: no open-source, local-first guardrail that sits directly in front of the install command and says <strong>no</strong>.</p>

<hr />

<h2 id="attach-guard-a-hook-not-a-suggestion">attach-guard: A Hook, Not a Suggestion</h2>

<p><a href="https://attach.dev/attach-guard">attach-guard</a> is a Claude Code <strong>PreToolUse hook</strong> that intercepts package installation commands and evaluates them against policy <strong>before execution</strong>.</p>

<p>The distinction matters:</p>

<table>
  <thead>
    <tr>
      <th>Mechanism</th>
      <th>What it does</th>
      <th>Can Claude bypass it?</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>MCP server</strong></td>
      <td>Provides advisory context</td>
      <td>Yes — it’s informational</td>
    </tr>
    <tr>
      <td><strong>Skill</strong></td>
      <td>Instructions Claude follows when invoked</td>
      <td>Yes — can be skipped</td>
    </tr>
    <tr>
      <td><strong>Hook</strong></td>
      <td>Intercepts tool calls deterministically</td>
      <td><strong>No</strong> — runs before execution</td>
    </tr>
  </tbody>
</table>

<p>A security guardrail must be a hook because enforcement requires interception at the tool-call boundary, before the command ever runs.</p>

<h3 id="how-it-works">How it works</h3>

<p>When Claude calls the Bash tool with something like <code class="language-plaintext highlighter-rouge">npm install axios</code>:</p>

<ol>
  <li>Claude Code fires the PreToolUse hook before execution</li>
  <li>The hook pipes the tool input JSON to <code class="language-plaintext highlighter-rouge">attach-guard hook</code> via stdin</li>
  <li>attach-guard parses the command, queries Socket.dev for risk scores</li>
  <li>Returns a decision: <strong>allow</strong>, <strong>ask</strong> (with explanation), or <strong>deny</strong> (blocked)</li>
  <li>On internal errors, exits with code 2 to <strong>fail closed</strong></li>
</ol>

<h3 id="smart-version-replacement">Smart version replacement</h3>

<p>Most security tools just say “no.” attach-guard says “no, but here’s a safe alternative.”</p>

<p>When a risky version is blocked, attach-guard finds the newest version that passes policy and offers it as a replacement:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; npm install axios

attach-guard evaluates:
  axios@1.14.1  →  DENY (supply chain score 40, below threshold 50 — compromised)
  axios@1.14.0  →  ALLOW (supply chain score 71, passes all policy checks)

Result: ASK + rewritten command
  "npm install axios@1.14.0"
</code></pre></div></div>

<p>Claude sees the safe alternative and proceeds immediately. Your flow doesn’t stop — it gets redirected to a safe path.</p>

<table>
  <thead>
    <tr>
      <th>Scenario</th>
      <th>Example</th>
      <th>Decision</th>
      <th>What happens</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Package is safe</td>
      <td><code class="language-plaintext highlighter-rouge">npm install axios@1.14.0</code></td>
      <td><strong>Allow</strong></td>
      <td>Install proceeds normally</td>
    </tr>
    <tr>
      <td>Pinned to compromised version</td>
      <td><code class="language-plaintext highlighter-rouge">npm install axios@1.14.1</code></td>
      <td><strong>Deny</strong></td>
      <td>Blocked — supply chain score 40</td>
    </tr>
    <tr>
      <td>Unpinned, latest is risky</td>
      <td><code class="language-plaintext highlighter-rouge">npm install axios</code></td>
      <td><strong>Ask + rewrite</strong></td>
      <td>Safe alternative offered: <code class="language-plaintext highlighter-rouge">axios@1.14.0</code></td>
    </tr>
    <tr>
      <td>All versions fail</td>
      <td>malware-only package</td>
      <td><strong>Deny</strong></td>
      <td>Blocked with clear explanation</td>
    </tr>
  </tbody>
</table>

<p>Your flow only fully stops when there is genuinely no safe version to offer.</p>

<h3 id="multi-ecosystem">Multi-ecosystem</h3>

<p>attach-guard supports npm and pnpm today, with <strong>pip, go get, and cargo add</strong> shipping now — covering all four ecosystems where these attacks happened. The litellm attack was a PyPI compromise; the axios attack was npm. Same guardrail, both covered.</p>

<hr />

<h2 id="what-would-have-happened">What Would Have Happened</h2>

<p>Let’s rewind the clock.</p>

<h3 id="axios--with-attach-guard-installed">axios — with attach-guard installed</h3>

<p>Your agent decides it needs axios and runs <code class="language-plaintext highlighter-rouge">npm install axios</code>.</p>

<ol>
  <li>attach-guard intercepts the command before execution</li>
  <li>Resolves latest version: <code class="language-plaintext highlighter-rouge">axios@1.14.1</code></li>
  <li>Queries Socket.dev: <strong>supply chain score 40</strong> — well below the deny threshold of 50</li>
  <li><strong>DENY</strong>. The install never runs.</li>
  <li>attach-guard walks back through recent versions, finds <code class="language-plaintext highlighter-rouge">axios@1.14.0</code> with a score of 71</li>
  <li>Returns an <strong>ASK</strong> with a rewritten command: <code class="language-plaintext highlighter-rouge">npm install axios@1.14.0</code></li>
  <li>Claude proceeds with the safe version. WAVESHAPER.V2 never touches your machine.</li>
</ol>

<h3 id="litellm--with-attach-guard-installed">litellm — with attach-guard installed</h3>

<p>Your agent runs <code class="language-plaintext highlighter-rouge">pip install litellm</code>.</p>

<ol>
  <li>attach-guard intercepts the command</li>
  <li>Resolves latest: <code class="language-plaintext highlighter-rouge">litellm==1.82.8</code></li>
  <li>Queries Socket.dev: flagged as known malware, supply chain score in the floor</li>
  <li><strong>DENY</strong>. The <code class="language-plaintext highlighter-rouge">.pth</code> payload never lands in your <code class="language-plaintext highlighter-rouge">site-packages/</code></li>
  <li>Falls back to <code class="language-plaintext highlighter-rouge">litellm==1.82.6</code> — the last clean version</li>
  <li>Your API keys, SSH keys, and cloud credentials stay where they belong</li>
</ol>

<p>Both attacks would also have been caught by the <strong>48-hour minimum age policy</strong> — both malicious versions were brand new, published hours before detection. attach-guard denies versions younger than 48 hours by default.</p>

<p>Every decision gets logged to a local JSONL audit trail — who, what, when, why, and which policy rule fired. When your security team asks “were we affected?”, you have the receipts.</p>

<hr />

<h2 id="two-commands-done">Two Commands, Done</h2>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>claude plugin marketplace add attach-dev/attach-guard
claude plugin <span class="nb">install </span>attach-guard@attach-dev
</code></pre></div></div>

<p>That’s it. The hook, config, and skill are all registered automatically. You’ll need a <a href="https://socket.dev">Socket.dev</a> API token (free tier available) — Claude Code will prompt you during setup.</p>

<p>Once running:</p>
<ul>
  <li><strong>Automatic enforcement</strong> — <code class="language-plaintext highlighter-rouge">npm install</code>, <code class="language-plaintext highlighter-rouge">pnpm add</code>, <code class="language-plaintext highlighter-rouge">pip install</code>, <code class="language-plaintext highlighter-rouge">go get</code>, and <code class="language-plaintext highlighter-rouge">cargo add</code> commands are intercepted and checked</li>
  <li><strong><code class="language-plaintext highlighter-rouge">/explain &lt;package&gt;</code></strong> — look up any package’s risk score, alerts, and version history from inside Claude Code</li>
  <li><strong>Configurable policy</strong> — tune score thresholds, allowlists, denylists, and age requirements in <code class="language-plaintext highlighter-rouge">~/.attach-guard/config.yaml</code></li>
</ul>

<p>Full docs and source: <a href="https://github.com/attach-dev/attach-guard">attach.dev/attach-guard</a></p>

<hr />

<p>Because life is too short to let your AI agent install a North Korean RAT while you’re getting coffee.</p>

<p>Find me at <a href="https://twitter.com/hammadtariq">@hammadtariq</a>.</p>

<p>[Article co-authored by Claude]</p>]]></content><author><name>Hammad Tariq</name></author><category term="startups" /><category term="ai" /><category term="agents" /><category term="security" /><category term="supply-chain" /><category term="attach" /><category term="attach-guard" /><category term="claude-code" /><summary type="html"><![CDATA[Last Monday I quote-tweeted Feross’s alert about the axios compromise and wrote:]]></summary></entry><entry><title type="html">Beyond Prompts: Why Agents Need Policy as Code</title><link href="https://hammadtariq.github.io/startups/why-agents-need-policy-as-code/" rel="alternate" type="text/html" title="Beyond Prompts: Why Agents Need Policy as Code" /><published>2026-02-14T00:00:00+00:00</published><updated>2026-02-14T00:00:00+00:00</updated><id>https://hammadtariq.github.io/startups/why-agents-need-policy-as-code</id><content type="html" xml:base="https://hammadtariq.github.io/startups/why-agents-need-policy-as-code/"><![CDATA[<p>Generative AI changed the interface to software.</p>

<p>For decades, we wrote deterministic programs that called APIs. Today, we increasingly <em>delegate</em> intent to models that decide <em>which</em> tools to call, <em>in what order</em>, with <em>what parameters</em>, based on natural language and evolving context.</p>

<p>That shift is subtle but profound: the “program” isn’t just your code anymore. It’s your prompt, your tool wiring, your memory, your retrieval layer, your agent runtime, and the model’s outputs—often across multiple steps.</p>

<p>This is why <strong>policy as code</strong> matters.</p>

<p>Not as a buzzword. As the pragmatic answer to a question every enterprise (and honestly, every serious builder) eventually hits:</p>

<blockquote>
  <p>“How do we give an agent real capabilities without turning the system into a liability?”</p>
</blockquote>

<p>This post is a builder’s view of policy as code—why it’s becoming necessary, what it should control, and how audits + observability stop being optional once agents act in the real world. I’ll use examples from things I’ve been building and thinking about lately—<a href="https://attach.dev">Attach.dev</a> (a gateway/runtime), <a href="https://openbotauth.org">OpenBotAuth</a> (cryptographic agent identity), and OpenClaw (agent tooling + orchestration).</p>

<hr />

<h2 id="what-policy-as-code-actually-means">What “policy as code” actually means</h2>

<p><strong>Policy as code</strong> is the practice of expressing rules—permissions, constraints, approvals, safety boundaries, and compliance requirements—in a machine-executable form that is:</p>

<ul>
  <li><strong>Versioned</strong> (like software)</li>
  <li><strong>Reviewable</strong> (PRs, approvals)</li>
  <li><strong>Testable</strong> (unit tests + simulation)</li>
  <li><strong>Deployable</strong> (enforced at runtime)</li>
  <li><strong>Observable</strong> (you can prove what happened)</li>
</ul>

<p>In the classic enterprise world, we already have “policies,” but they’re scattered:</p>

<ul>
  <li>IAM rules</li>
  <li>network firewalls</li>
  <li>DLP systems</li>
  <li>compliance checklists</li>
  <li>SOC2 controls</li>
  <li>manual approvals in ticketing systems</li>
</ul>

<p>Agent systems make this fragmentation painful, because an agent can traverse layers in one “thought-to-action” hop. The policy can’t be tribal knowledge or a PDF. It has to be enforced <strong>where actions occur</strong>.</p>

<hr />

<h2 id="why-the-web-and-enterprises-need-guardrails-for-genai">Why the web (and enterprises) need guardrails for GenAI</h2>

<h3 id="1-agents-are-non-deterministic-integrations">1) Agents are “non-deterministic integrations”</h3>

<p>When you integrate Stripe or Slack, you know the exact calls you make. With agents, you’re integrating a planner that can generate new sequences of calls at runtime.</p>

<p>That’s not inherently bad—it’s why agents are powerful—but it means your security posture can’t rely on “we didn’t code it that way.”</p>

<h3 id="2-natural-language-is-an-attack-surface">2) Natural language is an attack surface</h3>

<p>Prompt injection, malicious tool outputs, poisoned retrieval results, “helpful” but wrong conclusions, accidental data leakage—these are all variations of the same theme:</p>

<p><strong>The model will try to comply with the most recent, most salient instruction unless constrained.</strong></p>

<h3 id="3-enterprises-dont-fear-ai-text-they-fear-ai-actions">3) Enterprises don’t fear AI text; they fear AI actions</h3>

<p>Summarizing an email is low risk.</p>

<p>Sending money, modifying production config, emailing external parties, exporting data, or signing requests—those are <strong>governance</strong> problems.</p>

<p>The moment agents can <em>do</em>, not just <em>say</em>, policy becomes core infrastructure.</p>

<hr />

<h2 id="the-three-pillars-identity-authorization-accountability">The three pillars: Identity, Authorization, Accountability</h2>

<p>If you remember only one framework, make it this:</p>

<h3 id="a-identity--who-is-acting">A) Identity — “Who is acting?”</h3>

<p>Humans have SSO and device posture. Agents need something comparable.</p>

<p>This is where <strong>cryptographic identity</strong> becomes useful: the agent signs its actions. You can verify <em>which agent</em> did the thing, independent of which platform it ran on.</p>

<p>This is the role <a href="https://openbotauth.org">OpenBotAuth</a> plays in my stack: a portable identity for agents (keys the agent controls), so a request or action can be verified as coming from a specific agent identity, not just “some API key in a container.” For more on this, see <a href="/startups/portable-identity-for-agents-keypairs-beat-login-with-x/">Portable Identity for Agents</a>.</p>

<h3 id="b-authorization--what-is-allowed-under-what-conditions">B) Authorization — “What is allowed, under what conditions?”</h3>

<p>Authorization for agents must be more specific than “can access Gmail.”</p>

<p>It should include constraints like:</p>

<ul>
  <li>allowed recipients/domains</li>
  <li>data classification rules (PII, secrets)</li>
  <li>time windows</li>
  <li>rate limits and budgets</li>
  <li>required approvals for high-risk actions</li>
  <li>environment restrictions (prod vs staging)</li>
  <li>“break-glass” rules for emergencies</li>
</ul>

<h3 id="c-accountability--can-we-prove-what-happened">C) Accountability — “Can we prove what happened?”</h3>

<p>If an agent takes real actions, you need:</p>

<ul>
  <li>audit trails (immutable enough to trust)</li>
  <li>structured logs for tool calls</li>
  <li>traces across multi-step runs</li>
  <li>policy evaluation logs (“why was this allowed/denied?”)</li>
  <li>metrics to detect drift, abuse, and cost blowups</li>
</ul>

<p>This is where <strong>observability stops being a DevOps luxury</strong> and becomes a governance requirement.</p>

<hr />

<h2 id="where-policies-should-live-the-agent-gateway-pattern">Where policies should live: the “agent gateway” pattern</h2>

<p>A common mistake is trying to enforce policy only in prompts:</p>

<blockquote>
  <p>“Don’t do X. Always do Y. Never email outsiders.”</p>
</blockquote>

<p>That’s wishful thinking. Prompts are guidance; <strong>policy must be enforcement</strong>.</p>

<p>The clean pattern is to place a <strong>policy enforcement point</strong> between the agent and the tools it can call.</p>

<p>This is how I think about <a href="https://attach.dev">Attach.dev</a> conceptually: a gateway/runtime where tools are registered, calls are mediated, policies are evaluated, and logs are captured. The agent doesn’t “have Gmail.” It has <strong>a mediated capability</strong> to request “send email” and the gateway decides if it’s allowed.</p>

<p>Similarly, OpenClaw (as an orchestration client in my environment) becomes a natural place to enforce policy because it’s already sitting at the junction of:</p>

<ul>
  <li>user intent (Discord / CLI)</li>
  <li>agent planning</li>
  <li>tool execution (browser, code, APIs)</li>
  <li>workflows (branch creation, PRs, etc.)</li>
</ul>

<p>If you control the junction, you control the blast radius.</p>

<hr />

<h2 id="example-1-send-receipts-to-the-accountant-every-month">Example #1: “Send receipts to the accountant every month”</h2>

<p>I’ve been thinking about making a workflow where receipts get posted (or pasted) and an agent emails them to an accountant on the 1st of every month—because humans procrastinate and automation wins.</p>

<p>That sounds harmless until you enumerate what could go wrong:</p>

<ul>
  <li>The agent emails the wrong person (typo, injection, ambiguous contact)</li>
  <li>The agent includes unrelated sensitive docs from context</li>
  <li>The agent forwards internal emails “for completeness”</li>
  <li>Someone in the chat tricks the agent into sending other files</li>
  <li>The agent starts “helpfully” summarizing sensitive financial info in the body</li>
</ul>

<p>A policy-as-code approach turns this into a constrained capability:</p>

<p><strong>Policy intent:</strong></p>

<ul>
  <li>Only allow sending to a fixed allowlist (<code class="language-plaintext highlighter-rouge">accountant@...</code>, maybe one backup).</li>
  <li>Only allow attachments that were explicitly provided in the last N minutes.</li>
  <li>Disallow reading inbox contents unless explicitly approved.</li>
  <li>Require a human “approve” click if amount &gt; X or if recipients differ.</li>
  <li>Always log: sender agent identity, recipients, attachment hashes, and a trace ID.</li>
</ul>

<p>This transforms “agent has email” into “agent has <em>this specific, auditable dispatch ability</em>.”</p>

<hr />

<h2 id="example-2-rebalancing-rules-and-preventing-accidental-trading">Example #2: “Rebalancing rules” and preventing accidental trading</h2>

<p>Another scenario I’ve been thinking about: rules like “if NVDA exceeds 10% of portfolio, trim back to 7%” or “if a position drops 15%, flag for review.” Whether the agent executes trades or merely alerts you is a huge governance boundary.</p>

<p>A strong policy posture here is: <strong>default to advisory</strong>, and make execution a separate, higher-trust path.</p>

<p><strong>Policy intent (advisory mode):</strong></p>

<ul>
  <li>Agent can read price feeds and your rules.</li>
  <li>Agent can notify you with suggested actions.</li>
  <li>Agent cannot place orders.</li>
</ul>

<p>If execution is required:</p>

<ul>
  <li>Cap max order size</li>
  <li>Enforce cooldowns</li>
  <li>Require a second factor / explicit approval</li>
  <li>Maintain a full audit trail and “reason for trade” metadata</li>
  <li>Log exactly which data inputs were used to decide</li>
</ul>

<p>This is a classic case where policy prevents the system from becoming a self-inflicted incident.</p>

<hr />

<h2 id="example-3-multi-user-context-and-the-kv-cache-privacy-problem">Example #3: Multi-user context and the KV-cache privacy problem</h2>

<p>A deeper issue I’ve been exploring: multi-user chat with AI, shared context, and the risk that optimizations (like shared caches) could cause accidental leakage—e.g., the agent pulling private facts from one user context into a shared room.</p>

<p>That’s not a prompt problem. That’s an architecture + policy problem.</p>

<p>You need explicit policies for:</p>

<ul>
  <li>context boundaries (per-user vs shared)</li>
  <li>what data types are eligible to enter shared context</li>
  <li>redaction rules (PII, secrets)</li>
  <li>retention windows</li>
  <li>“no cross-user recall” constraints in shared threads</li>
</ul>

<p>In practice, “policy as code” here often looks like:</p>

<ul>
  <li>data labeling (public/internal/secret)</li>
  <li>enforcement in the gateway (Attach-style mediation)</li>
  <li>structured memory APIs that require a classification label</li>
  <li>logs that show when data moved between scopes</li>
</ul>

<p>This is one of the most important guardrails topics for enterprise adoption—the difference between “useful assistant” and “compliance nightmare.”</p>

<hr />

<h2 id="policies-arent-only-for-safetytheyre-for-steering-outcomes">Policies aren’t only for safety—they’re for steering outcomes</h2>

<p>There’s a quieter benefit: policies don’t just prevent bad things; they produce <em>better</em> outcomes.</p>

<p>If you want agents to be consistently useful inside enterprises, they need:</p>

<ul>
  <li>clear operational boundaries</li>
  <li>approved data sources</li>
  <li>approved actions</li>
  <li>predictable escalation paths (“ask a human when unsure”)</li>
  <li>consistent formatting and metadata on outputs</li>
</ul>

<p>Without policy, you’ll get:</p>

<ul>
  <li>random tool usage patterns</li>
  <li>inconsistent decisions run-to-run</li>
  <li>untraceable failures</li>
  <li>confusion about “why did it do that?”</li>
</ul>

<p>Policy is the system’s <strong>shape</strong>. It’s how you move from “cool demo” to “reliable coworker.”</p>

<hr />

<h2 id="audits-and-observability-what-you-must-record">Audits and observability: what you must record</h2>

<p>If an agent can take real actions, you eventually need answers to questions like:</p>

<ul>
  <li>Who did this?</li>
  <li>What tool calls were made?</li>
  <li>With what inputs?</li>
  <li>Under which policy version?</li>
  <li>Was it allowed automatically or approved by a human?</li>
  <li>What data sources influenced the decision?</li>
  <li>What changed since last week (why are outputs worse now)?</li>
</ul>

<p>A practical baseline:</p>

<p><strong>For every run:</strong></p>

<ul>
  <li>a run ID / trace ID</li>
  <li>agent identity (signed if possible)</li>
  <li>policy bundle version hash</li>
  <li>tool calls (name, arguments, response metadata)</li>
  <li>input sources (docs, URLs, connectors) + hashes</li>
  <li>outputs + redaction status (what was removed)</li>
  <li>timing, cost, token usage</li>
  <li>allow/deny decisions + reason codes</li>
</ul>

<p>If you can’t answer those questions, you can’t debug, secure, or govern the system.</p>

<p>And in regulated environments, you also can’t pass audits.</p>

<hr />

<h2 id="what-policy-as-code-looks-like-concretely">What policy as code looks like (concretely)</h2>

<p>You can express policy in many ways. The important thing is that it’s executable and testable.</p>

<p>Here’s a deliberately simple, human-readable sketch:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">policies</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="na">id</span><span class="pi">:</span> <span class="s2">"</span><span class="s">email_receipts_to_accountant"</span>
    <span class="na">when</span><span class="pi">:</span>
      <span class="na">tool</span><span class="pi">:</span> <span class="s2">"</span><span class="s">gmail.send"</span>
    <span class="na">allow_if</span><span class="pi">:</span>
      <span class="na">to_in_allowlist</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">accounts@firm.com"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">backup@firm.com"</span><span class="pi">]</span>
      <span class="na">attachments</span><span class="pi">:</span>
        <span class="na">source</span><span class="pi">:</span> <span class="s2">"</span><span class="s">explicit_user_upload"</span>
        <span class="na">max_age_minutes</span><span class="pi">:</span> <span class="m">30</span>
        <span class="na">max_total_mb</span><span class="pi">:</span> <span class="m">20</span>
      <span class="na">body</span><span class="pi">:</span>
        <span class="na">must_not_contain</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">api_key"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">seed</span><span class="nv"> </span><span class="s">phrase"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">private</span><span class="nv"> </span><span class="s">key"</span><span class="pi">]</span>
    <span class="na">require_approval_if</span><span class="pi">:</span>
      <span class="na">subject_contains</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">invoice"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">tax"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">salary"</span><span class="pi">]</span>
    <span class="na">log</span><span class="pi">:</span>
      <span class="na">level</span><span class="pi">:</span> <span class="s2">"</span><span class="s">full"</span>
      <span class="na">include_attachment_hashes</span><span class="pi">:</span> <span class="no">true</span>
</code></pre></div></div>

<p>Under the hood, you might implement this in OPA/Rego, Cedar, a custom rules engine, or a typed policy DSL. The mechanism matters less than the lifecycle: reviewable, testable, enforceable.</p>

<hr />

<h2 id="a-practical-checklist">A practical checklist</h2>

<p>If you’re building agentic infrastructure (or integrating agents into enterprise workflows), this is a good starting order:</p>

<ol>
  <li><strong>Enumerate tools</strong> the agent can call (email, browser, DB, deploy, payments).</li>
  <li><strong>Define actors</strong> (human users, agents, services) and give them identities.</li>
  <li><strong>Classify data</strong> (public/internal/secret) and enforce where it can flow.</li>
  <li><strong>Decide enforcement points</strong> (gateway/proxy is the cleanest).</li>
  <li><strong>Write minimal policies</strong> first: allowlists, budgets, rate limits, approvals.</li>
  <li><strong>Add observability</strong>: traces, tool-call logs, policy decision logs.</li>
  <li><strong>Test policies</strong> with simulation (good requests, bad requests, edge cases).</li>
  <li><strong>Version + roll out safely</strong> (canary policies, staged enforcement).</li>
  <li><strong>Add break-glass</strong> procedures for emergencies (with heavy logging).</li>
  <li><strong>Continuously review</strong> based on incidents, drift, and real usage.</li>
</ol>

<hr />

<h2 id="closing-thought">Closing thought</h2>

<p>Agents are the first time we’ve tried to put a probabilistic planner in the driver’s seat of production systems.</p>

<p>That can be transformative—if we treat governance as part of the product, not an afterthought.</p>

<p>In practice, “policy as code” is the layer that makes it possible to say:</p>

<ul>
  <li>yes, the agent can act</li>
  <li>yes, it’s constrained</li>
  <li>yes, it’s auditable</li>
  <li>yes, we can prove what happened</li>
  <li>yes, we can improve it safely over time</li>
</ul>

<p>Attach.dev, OpenBotAuth, and OpenClaw are concrete anchors for this idea in my own work: a mediated runtime, portable identity, and a practical orchestration surface. The bigger point is broader than any one project:</p>

<p><strong>If GenAI is going to operate inside real systems, the web needs executable guardrails—policies that ship like code, and systems that can be audited like infrastructure.</strong></p>

<hr />

<p>Find me at <a href="https://twitter.com/hammadtariq">@hammadtariq</a> if you’re thinking about this space.</p>]]></content><author><name>Hammad Tariq</name></author><category term="startups" /><category term="ai" /><category term="agents" /><category term="security" /><category term="enterprise" /><category term="governance" /><category term="observability" /><category term="policy" /><category term="attach" /><category term="openbotauth" /><category term="openclaw" /><summary type="html"><![CDATA[Generative AI changed the interface to software.]]></summary></entry><entry><title type="html">Portable Agent Identity (PAI): An Implementer’s Profile for RFC 9421 + Web Bot Auth</title><link href="https://hammadtariq.github.io/protocol-notes/portable-agent-identity-implementers-profile/" rel="alternate" type="text/html" title="Portable Agent Identity (PAI): An Implementer’s Profile for RFC 9421 + Web Bot Auth" /><published>2026-02-07T00:00:00+00:00</published><updated>2026-02-07T00:00:00+00:00</updated><id>https://hammadtariq.github.io/protocol-notes/portable-agent-identity-implementers-profile</id><content type="html" xml:base="https://hammadtariq.github.io/protocol-notes/portable-agent-identity-implementers-profile/"><![CDATA[<p><em>For a non-technical introduction to why agents need keypair identity over OAuth/OIDC, see <a href="/startups/portable-identity-for-agents-keypairs-beat-login-with-x/">Why Keypairs Beat ‘Login With X’</a>.</em></p>

<blockquote>
  <p><strong>Status:</strong> Informational / Implementer’s Profile
<strong>Compatibility:</strong> RFC 9421, OpenBotAuth implementation, WBA architecture draft
<strong>Audience:</strong> Implementers (origins, CDNs, agent runtimes)</p>
</blockquote>

<hr />

<h2 id="abstract">Abstract</h2>

<p>AI agents increasingly operate across multiple ecosystems: crawling origins, installing third-party “skills” and plugins, calling APIs, and publishing content. Most ecosystems identify agents using platform-scoped bearer credentials (API keys, cookies, sessions), which does not provide portable identity, offline provenance, or robust proof-of-possession. This paper defines a portable identity model for agents based on an Ed25519 keypair and <strong>HTTP Message Signatures</strong> (RFC 9421), with standardized key discovery via <strong>JWKS</strong> endpoints and a well-known directory, plus optional “trust anchors” (e.g., GitHub OAuth) that bind key material to accountable human or organizational controllers.</p>

<p>This profile is compatible with OpenBotAuth’s current reference implementation: signed HTTP requests using <code class="language-plaintext highlighter-rouge">Signature-Input</code>, <code class="language-plaintext highlighter-rouge">Signature</code>, and <code class="language-plaintext highlighter-rouge">Signature-Agent</code>; JWKS discovery via a well-known directory; nonce-based replay protection; directory trust allowlists; and a registry that exposes both user-level and agent-level JWKS endpoints. It also extends naturally to software supply-chain signing (skills/plugins) using the same identity primitive.</p>

<hr />

<h2 id="1-motivation-and-problem-statement">1. Motivation and Problem Statement</h2>

<h3 id="11-identity-fragmentation-is-now-a-security-problem">1.1 Identity fragmentation is now a security problem</h3>

<p>Agent runtimes and marketplaces are entering a “supply-chain era” where agents install and execute third-party bundles at machine speed. When provenance is missing, “install anything, trust everything” becomes the default. A portable identity primitive is not a nice-to-have; it is a prerequisite for policy enforcement, attribution, and ecosystem hygiene.</p>

<h3 id="12-oauthoidc-solve-a-different-problem">1.2 OAuth/OIDC solve a different problem</h3>

<p>OAuth 2.0 is primarily an <strong>authorization delegation</strong> framework. OIDC adds an identity layer for human login. These systems are excellent for “human clicks consent in a browser,” but they do not produce an offline-verifiable artifact provenance primitive, and bearer tokens generally provide weak proof-of-possession properties.</p>

<p>Portable agent identity requires:</p>

<ul>
  <li>Local verification without contacting an issuer on every request</li>
  <li>Stable cross-platform identity independent of any single platform’s account namespace</li>
  <li>A reusable proof primitive that works for both HTTP requests and offline artifacts</li>
</ul>

<hr />

<h2 id="2-design-goals">2. Design Goals</h2>

<p>This profile aims to satisfy:</p>

<ol>
  <li><strong>Portable identity:</strong> One identity usable across origins and ecosystems.</li>
  <li><strong>Local verification:</strong> A verifier can validate claims cryptographically (“at the speed of math”).</li>
  <li><strong>HTTP-native:</strong> Integrates with standard HTTP middleware/proxies.</li>
  <li><strong>Discoverable keys:</strong> Keys can be found via standard endpoints / directory.</li>
  <li><strong>Layered trust:</strong> Self-issued identity is allowed; higher trust requires explicit anchors and reputation.</li>
  <li><strong>Supply-chain reuse:</strong> The same identity primitive signs artifacts (skills/plugins/manifests).</li>
</ol>

<hr />

<h2 id="3-threat-model-non-exhaustive">3. Threat Model (Non-Exhaustive)</h2>

<p>This profile mitigates or helps operationalize mitigations for:</p>

<ul>
  <li><strong>Token replay / bearer credential theft:</strong> bearer tokens can be replayed by any holder; signatures provide proof-of-possession.</li>
  <li><strong>Publisher spoofing / key substitution:</strong> attacker impersonates a publisher by posting a look-alike skill.</li>
  <li><strong>Tampering:</strong> skill manifest modified after publication.</li>
  <li><strong>Replay attacks:</strong> reusing signed requests without nonce/timestamp controls.</li>
  <li><strong>SSRF in key discovery:</strong> fetching keys from attacker-controlled or internal networks.</li>
  <li><strong>Overbroad forwarding of signed headers:</strong> leaking sensitive headers to third-party verifiers.</li>
</ul>

<p>OpenBotAuth already includes explicit controls for several of these (nonce cache, skew checks, SSRF protections, sensitive-header blocking), which are referenced later.</p>

<hr />

<h2 id="4-system-model-roles-and-components">4. System Model: Roles and Components</h2>

<h3 id="41-roles">4.1 Roles</h3>

<ul>
  <li><strong>Agent:</strong> an automated client acting on behalf of a controller/subscriber.</li>
  <li><strong>Controller:</strong> the accountable human/org that owns the agent identity and can bind it to real-world anchors.</li>
  <li><strong>Origin / Publisher:</strong> the website/service that wants to authorize/bill/policy-gate automated access.</li>
  <li><strong>Verifier:</strong> a component that verifies HTTP signatures and returns a verdict to policy enforcement.</li>
  <li><strong>Directory / Registry:</strong> a service or endpoint that publishes public keys (JWKS) and optional metadata.</li>
</ul>

<h3 id="42-openbotauth-reference-implementation-components-current">4.2 OpenBotAuth reference implementation components (current)</h3>

<p>OpenBotAuth is explicitly positioned as open-source tooling for the WBA direction and uses RFC 9421 signatures. The repo describes: Registry Service (JWKS + agent identity), Verifier Service (signature verification + nonce cache), GitHub OAuth onboarding, and origin integration via NGINX/WordPress and SDKs.</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">Registry Service</code> endpoints include <code class="language-plaintext highlighter-rouge">GET /jwks/{username}.json</code> and <code class="language-plaintext highlighter-rouge">GET /agent-jwks/{agent_id}</code>.</li>
  <li><code class="language-plaintext highlighter-rouge">Verifier Service</code> requires <code class="language-plaintext highlighter-rouge">Signature-Input</code>, <code class="language-plaintext highlighter-rouge">Signature</code>, and <code class="language-plaintext highlighter-rouge">Signature-Agent</code> and performs JWKS discovery if <code class="language-plaintext highlighter-rouge">Signature-Agent</code> is an identity URL.</li>
  <li>SDKs/clients include logic to forward covered headers while blocking sensitive ones.</li>
  <li>Telemetry/Karma is described as an optional ecosystem transparency layer.</li>
</ul>

<hr />

<h2 id="5-cryptographic-identity-primitive">5. Cryptographic Identity Primitive</h2>

<h3 id="51-key-type">5.1 Key type</h3>

<p>Agents use <strong>Ed25519</strong> keypairs, represented as JWK with:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">kty: "OKP"</code></li>
  <li><code class="language-plaintext highlighter-rouge">crv: "Ed25519"</code></li>
  <li><code class="language-plaintext highlighter-rouge">x: &lt;base64url pubkey&gt;</code></li>
  <li><code class="language-plaintext highlighter-rouge">use: "sig"</code></li>
  <li><code class="language-plaintext highlighter-rouge">kid: &lt;key identifier&gt;</code></li>
</ul>

<p>OpenBotAuth’s <code class="language-plaintext highlighter-rouge">registry-signer</code> package defines these types and constructs Web-Bot-Auth-style JWKS documents with optional metadata fields.</p>

<h3 id="52-key-identifiers-kid--keyid">5.2 Key identifiers (<code class="language-plaintext highlighter-rouge">kid</code> / <code class="language-plaintext highlighter-rouge">keyid</code>)</h3>

<p>For interop, <code class="language-plaintext highlighter-rouge">kid</code> should be <strong>stable and derived from key material</strong>. OpenBotAuth currently derives a deterministic hash over <code class="language-plaintext highlighter-rouge">{kty, crv, x}</code> and truncates for display/compactness (<code class="language-plaintext highlighter-rouge">generateKidFromJWK</code>).</p>

<p><strong>Recommendation (interop-oriented):</strong></p>

<ul>
  <li>Use RFC 7638 thumbprints as the canonical <code class="language-plaintext highlighter-rouge">kid</code> when strict interop is required.</li>
  <li>If truncation is used for UX, treat it as a display alias; keep canonical full thumbprint in metadata.</li>
</ul>

<hr />

<h2 id="6-http-authentication-profile-rfc-9421">6. HTTP Authentication Profile (RFC 9421)</h2>

<h3 id="61-required-request-headers">6.1 Required request headers</h3>

<p>A signed agent request includes:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">Signature-Input</code></li>
  <li><code class="language-plaintext highlighter-rouge">Signature</code></li>
</ul>

<p>OpenBotAuth additionally uses:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">Signature-Agent</code> to indicate where keys should be discovered, and treats it as required in the verifier pipeline.</li>
</ul>

<h3 id="62-covered-components">6.2 Covered components</h3>

<p>A minimal practical coverage set is:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">@method</code></li>
  <li><code class="language-plaintext highlighter-rouge">@path</code></li>
  <li><code class="language-plaintext highlighter-rouge">@authority</code></li>
</ul>

<p>OpenBotAuth’s signature base construction supports derived components such as <code class="language-plaintext highlighter-rouge">@method</code>, <code class="language-plaintext highlighter-rouge">@path</code> (excluding query), <code class="language-plaintext highlighter-rouge">@authority</code>, <code class="language-plaintext highlighter-rouge">@target-uri</code>, and legacy <code class="language-plaintext highlighter-rouge">@request-target</code>.</p>

<h3 id="63-freshness-and-replay-resistance">6.3 Freshness and replay resistance</h3>

<p>To reduce replay risk, signed requests SHOULD include:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">created</code> and optionally <code class="language-plaintext highlighter-rouge">expires</code></li>
  <li>A cryptographic <code class="language-plaintext highlighter-rouge">nonce</code></li>
</ul>

<p>OpenBotAuth enforces:</p>

<ul>
  <li>Timestamp skew checks (default +/-300s) and optional expiry handling</li>
  <li>Nonce uniqueness via a nonce manager (Redis-backed in the architecture)</li>
</ul>

<hr />

<h2 id="7-key-discovery-and-signature-agent-semantics">7. Key Discovery and <code class="language-plaintext highlighter-rouge">Signature-Agent</code> Semantics</h2>

<h3 id="71-two-discovery-modes">7.1 Two discovery modes</h3>

<p>This profile supports two common deployments:</p>

<p><strong>Mode A: <code class="language-plaintext highlighter-rouge">Signature-Agent</code> is a JWKS URL</strong></p>

<p>The header points directly to a JWKS document. OpenBotAuth detects a “JWKS URL” heuristically if the path ends in <code class="language-plaintext highlighter-rouge">.json</code> or contains <code class="language-plaintext highlighter-rouge">/jwks/</code>.</p>

<p><strong>Mode B: <code class="language-plaintext highlighter-rouge">Signature-Agent</code> is an identity URL</strong></p>

<p>The header points to an identity origin (e.g., a domain), and the verifier performs discovery to locate a JWKS document.</p>

<h3 id="72-well-known-discovery-paths">7.2 Well-known discovery paths</h3>

<p>OpenBotAuth’s discovery order includes a standards-aligned entry first:</p>

<ol>
  <li><code class="language-plaintext highlighter-rouge">/.well-known/http-message-signatures-directory</code> (WBA standard)</li>
  <li><code class="language-plaintext highlighter-rouge">/.well-known/jwks.json</code> (fallback)</li>
  <li><code class="language-plaintext highlighter-rouge">/.well-known/openbotauth/jwks.json</code> (fallback)</li>
  <li><code class="language-plaintext highlighter-rouge">/jwks.json</code> (fallback)</li>
</ol>

<p>This is a pragmatic interop strategy: try the directory path first (WBA direction), then tolerate ecosystem realities.</p>

<h3 id="73-directory-trust-allowlists">7.3 Directory trust allowlists</h3>

<p>OpenBotAuth supports a “trusted directories” allowlist: if configured, fetched JWKS must match one of the configured directory strings, otherwise verification fails.</p>

<p><strong>Recommendation:</strong> implementers SHOULD match on parsed host/origin rather than substring containment, to avoid edge cases.</p>

<h3 id="74-ssrf-protections-during-discovery">7.4 SSRF protections during discovery</h3>

<p>OpenBotAuth explicitly validates candidate URLs to block:</p>

<ul>
  <li>Non-http(s) schemes</li>
  <li>Localhost</li>
  <li>Loopback and private IPv4/IPv6 ranges</li>
  <li>Link-local ranges</li>
</ul>

<p>And disables automatic redirects during discovery fetches.</p>

<p><strong>Known limitation:</strong> Hostname validation does not resolve DNS, so “public hostname to private IP” DNS tricks are not fully mitigated. A stricter implementation should resolve DNS and reject private results.</p>

<hr />

<h2 id="8-verification-algorithm-normative-pseudo-procedure">8. Verification Algorithm (Normative Pseudo-Procedure)</h2>

<p>Given an incoming HTTP request:</p>

<ol>
  <li><strong>Extract headers</strong> – Parse <code class="language-plaintext highlighter-rouge">Signature-Input</code>, <code class="language-plaintext highlighter-rouge">Signature</code>, and <code class="language-plaintext highlighter-rouge">Signature-Agent</code>.</li>
  <li><strong>Parse Signature-Input</strong> – Identify covered components and signature parameters. OpenBotAuth preserves the raw signature params and includes them verbatim in the signature base (required for correctness).</li>
  <li><strong>Enforce freshness</strong> – Validate <code class="language-plaintext highlighter-rouge">created</code> (and <code class="language-plaintext highlighter-rouge">expires</code> if present) against acceptable skew, and validate nonce uniqueness if <code class="language-plaintext highlighter-rouge">nonce</code> is present.</li>
  <li><strong>Resolve JWKS</strong> – If <code class="language-plaintext highlighter-rouge">Signature-Agent</code> is a JWKS URL, use it directly; otherwise, attempt well-known discovery.</li>
  <li><strong>Check directory trust</strong> – Enforce trusted directories allowlist if configured.</li>
  <li><strong>Fetch JWKS + select key</strong> – Fetch JWKS document, match key by <code class="language-plaintext highlighter-rouge">keyid</code>/<code class="language-plaintext highlighter-rouge">kid</code>.</li>
  <li><strong>Build signature base</strong> – Reconstruct the RFC 9421 signature base using the covered components and exact signature params string.</li>
  <li><strong>Verify Ed25519 signature</strong> – Cryptographic verification of signature bytes.</li>
  <li><strong>Return verdict</strong> – JWKS URL, key id, <code class="language-plaintext highlighter-rouge">client_name</code> where available, plus pass/fail.</li>
</ol>

<hr />

<h2 id="9-privacy-safe-header-forwarding">9. Privacy-Safe Header Forwarding</h2>

<p>In deployments where a <strong>plugin</strong> or <strong>reverse proxy</strong> forwards request data to a verifier service, implementers must avoid leaking sensitive headers.</p>

<p>OpenBotAuth’s client SDKs implement:</p>

<ul>
  <li>Parsing of covered headers from <code class="language-plaintext highlighter-rouge">Signature-Input</code></li>
  <li>Forwarding those headers to the verifier</li>
  <li><strong>Blocking</strong> forwarding when covered headers include sensitive values like <code class="language-plaintext highlighter-rouge">cookie</code> or <code class="language-plaintext highlighter-rouge">authorization</code></li>
</ul>

<p>This is essential. If a signature covers <code class="language-plaintext highlighter-rouge">cookie</code> and a verifier service is not co-located/trusted, forwarding it can become a credential leak.</p>

<hr />

<h2 id="10-registry--directory-metadata">10. Registry / Directory Metadata</h2>

<h3 id="101-why-metadata-matters">10.1 Why metadata matters</h3>

<p>Key verification answers: “did the holder of this key sign this request?” Policy decisions often require more context:</p>

<ul>
  <li>What is the bot’s purpose?</li>
  <li>Which user agent/product token does it claim?</li>
  <li>What rate expectations does it publish?</li>
  <li>Is this identity verified to a real controller?</li>
</ul>

<p>OpenBotAuth defines a <code class="language-plaintext highlighter-rouge">WebBotAuthJWKS</code> object with fields such as <code class="language-plaintext highlighter-rouge">client_name</code>, <code class="language-plaintext highlighter-rouge">client_uri</code>, <code class="language-plaintext highlighter-rouge">logo_uri</code>, <code class="language-plaintext highlighter-rouge">contacts</code>, <code class="language-plaintext highlighter-rouge">expected-user-agent</code>, <code class="language-plaintext highlighter-rouge">rfc9309-product-token</code>, <code class="language-plaintext highlighter-rouge">trigger</code>, <code class="language-plaintext highlighter-rouge">purpose</code>, <code class="language-plaintext highlighter-rouge">targeted-content</code>, <code class="language-plaintext highlighter-rouge">rate-control</code>, <code class="language-plaintext highlighter-rouge">rate-expectation</code>, <code class="language-plaintext highlighter-rouge">known-urls</code>, <code class="language-plaintext highlighter-rouge">known-identities</code>, and <code class="language-plaintext highlighter-rouge">Verified</code>.</p>

<h3 id="102-openbotauth-endpoints-current">10.2 OpenBotAuth endpoints (current)</h3>

<p>OpenBotAuth exposes both:</p>

<ul>
  <li><strong>User-level JWKS:</strong> <code class="language-plaintext highlighter-rouge">GET /jwks/{username}.json</code> – Returns active keys from key history; publishes metadata and keys; sets cache-control.</li>
  <li><strong>Agent-level JWKS:</strong> <code class="language-plaintext highlighter-rouge">GET /agent-jwks/{agent_id}</code> – Returns agent metadata + <code class="language-plaintext highlighter-rouge">keys</code> for that agent; marks <code class="language-plaintext highlighter-rouge">Verified</code> true when GitHub anchor exists.</li>
</ul>

<p>This dual structure enables “sub-agent identities” later without changing the signing primitive.</p>

<hr />

<h2 id="11-trust-model-signed-vs-verified-vs-reputation">11. Trust Model: Signed vs Verified vs Reputation</h2>

<h3 id="111-layered-trust-normative-guidance">11.1 Layered trust (normative guidance)</h3>

<p>Portable identity must not collapse “signed” into “trusted.”</p>

<p><img src="/assets/images/pai-trust-model-light.svg" alt="PAI Layered Trust Model" /></p>

<ul>
  <li><strong>Self-issued / Signed:</strong> continuity and attribution at low trust.</li>
  <li><strong>Verified:</strong> key binding to an accountability anchor (controller identity).</li>
  <li><strong>Reputation:</strong> signals derived from behavior over time.</li>
</ul>

<h3 id="112-controller-anchoring-openbotauth-approach">11.2 Controller anchoring (OpenBotAuth approach)</h3>

<p>OpenBotAuth binds identities to real-world controllers via GitHub OAuth. This is intentionally an <strong>out-of-band</strong> trust layer; it does not change the HTTP signature math but changes the trust posture origins can adopt.</p>

<p>In OpenBotAuth’s agent JWKS endpoint, the response sets <code class="language-plaintext highlighter-rouge">known-identities</code> and <code class="language-plaintext highlighter-rouge">Verified</code> based on whether the controller has a GitHub anchor.</p>

<h3 id="113-reputation--telemetry-optional-transparency-focused">11.3 Reputation / telemetry (optional, transparency-focused)</h3>

<p>OpenBotAuth describes a telemetry system where the hosted verifier logs verification events and computes a “karma score” based on request volume and origin diversity; self-hosters can disable telemetry.</p>

<p><strong>Guidance for a WBA-aligned ecosystem:</strong> Reputation is useful, but it should be:</p>

<ul>
  <li>Opt-in / privacy-conscious</li>
  <li>Separable from core verification</li>
  <li>Designed to avoid becoming a centralized gatekeeping mechanism</li>
</ul>

<hr />

<h2 id="12-portable-identity-for-supply-chains-skillspluginsmanifests">12. Portable Identity for Supply Chains (Skills/Plugins/Manifests)</h2>

<h3 id="121-motivation">12.1 Motivation</h3>

<p>If runtimes install skills/plugins, “who authored this bundle and was it modified?” becomes critical.</p>

<h3 id="122-artifact-signing-profile">12.2 Artifact signing profile</h3>

<p>Use the same Ed25519 identity used for HTTP requests to sign skill/plugin manifests:</p>

<ol>
  <li>Canonicalize the manifest (e.g., frontmatter + content as canonical JSON).</li>
  <li>Compute signature over canonical bytes.</li>
  <li>Embed signature block in the manifest:
    <ul>
      <li><code class="language-plaintext highlighter-rouge">owner</code> (identity or JWKS URL)</li>
      <li><code class="language-plaintext highlighter-rouge">kid</code></li>
      <li><code class="language-plaintext highlighter-rouge">alg</code></li>
      <li><code class="language-plaintext highlighter-rouge">sig</code></li>
    </ul>
  </li>
  <li>Verification reuses the same discovery + JWKS lookup pipeline as HTTP.</li>
</ol>

<p>This unifies web identity and supply-chain identity with one primitive.</p>

<hr />

<h2 id="13-key-lifecycle-generation-rotation-compromise">13. Key Lifecycle: Generation, Rotation, Compromise</h2>

<h3 id="131-rotation-without-losing-reputation-trail">13.1 Rotation without losing reputation trail</h3>

<p>Rotation MUST preserve verifiability of historical artifacts:</p>

<ul>
  <li>Keep old public keys available (in JWKS history) for verifying old signatures, even if not used for new request signing.</li>
  <li>Mark keys as active/deprecated/revoked, with timestamps.</li>
</ul>

<p>OpenBotAuth user JWKS already supports multiple active keys through a key history table; agent JWKS currently returns a single key and can be extended similarly.</p>

<h3 id="132-compromise-and-revocation">13.2 Compromise and revocation</h3>

<p>Compromise-driven revocation should be explicit:</p>

<ul>
  <li>Publish <code class="language-plaintext highlighter-rouge">revoked_at</code></li>
  <li>Provide clear verifier policy (reject after <code class="language-plaintext highlighter-rouge">revoked_at</code>; treat older artifacts carefully based on timestamps)</li>
</ul>

<p>This is an area where implementer experience is valuable to the WBA group: revocation semantics for portable identity are operationally hard and benefit from shared patterns.</p>

<hr />

<h2 id="14-security-considerations">14. Security Considerations</h2>

<h3 id="141-replay-protection-and-nonce-scope">14.1 Replay protection and nonce scope</h3>

<p>Nonces should be scoped to (directory, keyid) and cached for a TTL at least as long as permitted skew.</p>

<h3 id="142-ssrf-and-directory-fetching">14.2 SSRF and directory fetching</h3>

<p>Discovery fetches MUST use SSRF protections:</p>

<ul>
  <li>Reject private ranges</li>
  <li>Disable redirects</li>
  <li>Enforce timeouts and size limits</li>
</ul>

<p>OpenBotAuth implements these protections at the URL parsing level; DNS-resolution hardening is recommended for adversarial environments.</p>

<h3 id="143-sensitive-header-coverage">14.3 Sensitive header coverage</h3>

<p>If <code class="language-plaintext highlighter-rouge">Signature-Input</code> covers sensitive headers (<code class="language-plaintext highlighter-rouge">cookie</code>, <code class="language-plaintext highlighter-rouge">authorization</code>, etc.), intermediaries must not forward them to a remote verifier; OpenBotAuth explicitly blocks this.</p>

<h3 id="144-trusted-directory-configuration">14.4 Trusted directory configuration</h3>

<p>Directory allowlists reduce the risk of rogue JWKS sources. OpenBotAuth enforces this with an allowlist option.</p>

<hr />

<h2 id="15-implementation-status-and-interop-gaps-openbotauth--wba">15. Implementation Status and Interop Gaps (OpenBotAuth / WBA)</h2>

<h3 id="151-areas-already-aligned">15.1 Areas already aligned</h3>

<ul>
  <li>RFC 9421 signing/verifying using <code class="language-plaintext highlighter-rouge">Signature-Input</code> and <code class="language-plaintext highlighter-rouge">Signature</code> (core).</li>
  <li><code class="language-plaintext highlighter-rouge">Signature-Agent</code> as key-discovery pointer (profile).</li>
  <li>Well-known discovery path includes <code class="language-plaintext highlighter-rouge">/.well-known/http-message-signatures-directory</code> first, then fallbacks (pragmatic).</li>
  <li>Nonce replay protection and timestamp checks.</li>
  <li>Strong operational controls around header forwarding and SSRF.</li>
</ul>

<h3 id="152-areas-to-clarify-questions-for-wba-wg">15.2 Areas to clarify (questions for WBA WG)</h3>

<ol>
  <li>
    <p><strong>Canonical <code class="language-plaintext highlighter-rouge">Signature-Agent</code> wire format</strong> – OpenBotAuth currently accepts a URL string and performs discovery when needed. WBA drafts may profile a structured form; what is the recommended canonical field encoding for maximum interop?</p>
  </li>
  <li>
    <p><strong>Directory document shape</strong> – OpenBotAuth currently treats discovery targets as JWKS if JSON contains <code class="language-plaintext highlighter-rouge">{ keys: [...] }</code>. If the WBA directory is not literally JWKS, what should implementers serve at <code class="language-plaintext highlighter-rouge">/.well-known/http-message-signatures-directory</code>?</p>
  </li>
  <li>
    <p><strong>Key rotation + artifact verification</strong> – How should directories express key status (<code class="language-plaintext highlighter-rouge">active</code>/<code class="language-plaintext highlighter-rouge">deprecated</code>/<code class="language-plaintext highlighter-rouge">revoked</code>) and preserve historical verification? (This is where real deployments will quickly diverge unless guidance exists.)</p>
  </li>
</ol>

<hr />

<h2 id="16-conclusion">16. Conclusion</h2>

<p>Portable agent identity is best modeled as a cryptographic proof-of-possession primitive (Ed25519 keypair + RFC 9421), with standard key discovery (JWKS + well-known directory) and optional trust anchoring. This architecture is necessary for:</p>

<ul>
  <li>HTTP-layer crawler/agent authorization</li>
  <li>Policy-managed access and billing</li>
  <li>Safe supply-chain distribution of agent skills/plugins</li>
</ul>

<p>OpenBotAuth demonstrates a working, origin-first implementation today – complete with practical protections (nonce cache, SSRF defenses, sensitive header forwarding rules) and an optional, privacy-aware reputation layer. The remaining work is primarily standardization and interop: how to encode discovery uniformly, how to express key lifecycle and revocation, and how to preserve the audit trail across key rotation.</p>

<hr />

<h2 id="references">References</h2>

<ol>
  <li><strong>RFC 9421:</strong> HTTP Message Signatures</li>
  <li><strong>OpenBotAuth:</strong> <a href="https://github.com/hammadtq/openbotauth">github.com/hammadtq/openbotauth</a> – architecture, verifier, registry, telemetry</li>
  <li><strong>IETF Web Bot Auth architecture draft:</strong> <a href="https://datatracker.ietf.org/doc/draft-meunier-web-bot-auth-architecture/">datatracker.ietf.org/doc/draft-meunier-web-bot-auth-architecture/</a></li>
</ol>]]></content><author><name>Hammad Tariq</name></author><category term="protocol-notes" /><category term="agents" /><category term="identity" /><category term="rfc9421" /><category term="http-signatures" /><category term="ed25519" /><category term="jwks" /><category term="openbotauth" /><category term="web-bot-auth" /><category term="ietf" /><category term="security" /><category term="supply-chain" /><summary type="html"><![CDATA[For a non-technical introduction to why agents need keypair identity over OAuth/OIDC, see Why Keypairs Beat ‘Login With X’.]]></summary></entry><entry><title type="html">Portable Identity for Agents: Why Keypairs Beat ‘Login With X’ (Web Bot Auth, OAuth, OIDC)</title><link href="https://hammadtariq.github.io/startups/portable-identity-for-agents-keypairs-beat-login-with-x/" rel="alternate" type="text/html" title="Portable Identity for Agents: Why Keypairs Beat ‘Login With X’ (Web Bot Auth, OAuth, OIDC)" /><published>2026-02-06T00:00:00+00:00</published><updated>2026-02-06T00:00:00+00:00</updated><id>https://hammadtariq.github.io/startups/portable-identity-for-agents-keypairs-beat-login-with-x</id><content type="html" xml:base="https://hammadtariq.github.io/startups/portable-identity-for-agents-keypairs-beat-login-with-x/"><![CDATA[<p><em>For the full technical profile covering RFC 9421 wire format, verification algorithms, JWKS discovery, and key lifecycle, see <a href="/protocol-notes/portable-agent-identity-implementers-profile/">Portable Agent Identity (PAI): An Implementer’s Profile</a>.</em></p>

<hr />

<p>AI agents are starting to <strong>install tools, run code, and act on behalf of users</strong>. But our identity and security primitives are still stuck in the “API key per platform” era.</p>

<p>Today an agent is <em>whoever holds its current API key</em>:</p>

<ul>
  <li>One key for OpenClaw</li>
  <li>Another for a content API</li>
  <li>Another for a payments API</li>
  <li>Another for some “agent social network”
…and none of those identities are portable, verifiable, or durable.</li>
</ul>

<p>That’s not just inconvenient. It creates two big failures:</p>

<ol>
  <li><strong>Continuity failure:</strong> the agent can’t persist as “itself” across platforms.</li>
  <li><strong>Supply-chain failure:</strong> users (and other agents) can’t verify who published a tool/skill/plugin, whether it was tampered with, or whether the author is who they claim to be.</li>
</ol>

<p>The fix is not “better OAuth.” The fix is <strong>cryptographic identity</strong>: an agent owns a <strong>single keypair</strong> and uses it everywhere—signing artifacts, signing requests, building reputation—while OAuth/OIDC stays where it belongs: human consent and account binding.</p>

<p>This is the core idea behind <a href="https://openbotauth.org"><strong>OpenBotAuth</strong></a> and the IETF <strong>Web Bot Auth (WBA)</strong> direction: <em>bots authenticate like machines, not like humans.</em> You can read more about the architecture and specs in the <a href="https://docs.openbotauth.org/">OpenBotAuth documentation</a>.</p>

<hr />

<h2 id="first-untangle-the-terms-because-everyone-mixes-them">First: untangle the terms (because everyone mixes them)</h2>

<h3 id="oauth-20-authorization">OAuth 2.0 (authorization)</h3>

<p>OAuth is about <strong>getting permission to call an API</strong>. It tells you how to obtain access tokens.</p>

<p>It does <em>not</em> define identity by itself.</p>

<h3 id="openid-connect-oidc-identity-layer-on-top-of-oauth">OpenID Connect (OIDC) (identity layer on top of OAuth)</h3>

<p>OIDC is what people mean when they say “Login with Google/GitHub.” It produces an <strong>ID token</strong> with claims (“this is user X”), issued by an identity provider.</p>

<h3 id="why-this-matters">Why this matters</h3>

<p>OAuth/OIDC gives you:</p>

<ul>
  <li>✅ a way for a human to approve access</li>
  <li>✅ a way for a site to learn “this user is X at issuer Y”</li>
</ul>

<p>But it does <strong>not</strong> give you:</p>

<ul>
  <li>❌ a durable identity that can sign software artifacts</li>
  <li>❌ a portable identity that exists independent of an issuer</li>
  <li>❌ an identity that agents can use without friction across thousands of surfaces</li>
</ul>

<p>That’s where keypairs come in.</p>

<hr />

<h2 id="the-core-problem-agents-need-proof-of-authorship-not-just-proof-of-login">The core problem: agents need <em>proof-of-authorship</em>, not just “proof of login”</h2>

<p>When someone publishes a skill/plugin/tool, you want to answer:</p>

<ul>
  <li><strong>Who authored this?</strong></li>
  <li><strong>Has it been modified since publish?</strong></li>
  <li><strong>Is the claimed publisher actually the publisher?</strong></li>
  <li><strong>Can I verify that offline (without trusting a server)?</strong></li>
</ul>

<p>OAuth/OIDC can’t do this. Why?</p>

<p>Because OAuth/OIDC produces tokens that are:</p>

<ul>
  <li>issuer-mediated</li>
  <li>time-limited</li>
  <li>tied to a specific relying party / client configuration</li>
  <li>designed for online authorization, not offline verification</li>
</ul>

<p>A signed plugin needs the opposite properties:</p>

<ul>
  <li>verifiable by anyone, anytime</li>
  <li>verifiable offline</li>
  <li>stable across ecosystems</li>
  <li>bound to the publisher’s long-lived identity</li>
</ul>

<hr />

<h2 id="keypair-identity-the-simplest-portable-identity-that-works-for-machines">Keypair identity: the simplest portable identity that works for machines</h2>

<p>A keypair identity is:</p>

<ul>
  <li><strong>Private key</strong>: kept by the agent (or its owner)</li>
  <li><strong>Public key</strong>: the agent’s identity (verifiable by anyone)</li>
</ul>

<p>If an agent has an <strong>Ed25519</strong> keypair, it can:</p>

<ul>
  <li>sign a skill manifest (<code class="language-plaintext highlighter-rouge">skill.md</code>)</li>
  <li>sign plugin packages</li>
  <li>sign posts/messages</li>
  <li>sign HTTP requests (Web Bot Auth direction)</li>
</ul>

<p>And anyone can verify those signatures with only the public key.</p>

<h3 id="the-key-shift">The key shift</h3>

<p>Instead of identity being <em>an account on a platform</em>, identity becomes:</p>

<blockquote>
  <p>“The entity that can prove possession of this private key.”</p>
</blockquote>

<p>That’s machine-native.</p>

<hr />

<h2 id="but-isnt-oauth-portable-identity">“But isn’t OAuth portable identity?”</h2>

<p>Not in the way agents need.</p>

<p>OAuth/OIDC identity is always effectively:</p>

<blockquote>
  <p>“subject X at issuer Y”</p>
</blockquote>

<p>That’s portable only within the issuer’s universe, and it’s still mediated by browser flows, client IDs, redirect URIs, consent, and token lifetimes.</p>

<p>Agents need:</p>

<ul>
  <li>a stable identity that works <strong>before</strong> they have accounts anywhere</li>
  <li>a way to sign and be recognized <strong>across</strong> platforms</li>
  <li>a way to build reputation that isn’t reset every time a platform changes</li>
</ul>

<p>A keypair is portable identity because it’s <strong>self-issued</strong> and <strong>issuer-independent</strong>.</p>

<hr />

<h2 id="how-web-bot-auth-fits-into-this">How Web Bot Auth fits into this</h2>

<p><strong>Web Bot Auth (WBA)</strong> (as a direction/proposal in IETF discussions) aims to make bots first-class citizens on the web by giving them a standard way to authenticate and be policy-checked at the HTTP layer.</p>

<p>In practice, that means:</p>

<ul>
  <li>bots sign requests with a private key (proof-of-possession)</li>
  <li>servers verify signatures and apply policy (“what is this bot allowed to do?”)</li>
  <li>keys are discoverable via standardized endpoints / directories</li>
</ul>

<p>So the same keypair can unify:</p>

<ul>
  <li><strong>software supply chain</strong> (signing skills/plugins)</li>
  <li><strong>runtime access</strong> (signing HTTP requests)</li>
  <li><strong>registry / directory</strong> (discovering public keys + metadata)</li>
</ul>

<p>That’s the “one keypair everywhere” story—grounded in web primitives.</p>

<p><a href="https://openbotauth.org">OpenBotAuth</a> is the open-source reference implementation for this direction. The <a href="https://docs.openbotauth.org/architecture/openbotregistry">OpenBotRegistry</a> lets developers host their agent keys by logging in with GitHub—no domain purchase or DNS verification needed. Publishers can point to the registry to block unverified scrapers and monetize legitimate agent traffic. There’s a <a href="https://docs.openbotauth.org/plugins/overview">WordPress plugin</a> available today, with SDKs for <a href="https://docs.openbotauth.org/sdks/node.js-typescript">Node.js/TypeScript</a> and <a href="https://docs.openbotauth.org/sdks/python">Python</a>.</p>

<hr />

<h2 id="a-practical-model-two-layers-not-one">A practical model: two layers, not one</h2>

<p>The right architecture is <strong>keys + OAuth</strong>, not keys <em>instead of</em> OAuth:</p>

<h3 id="layer-1-cryptographic-identity-agent-keypair">Layer 1: Cryptographic identity (agent keypair)</h3>

<ul>
  <li>stable across platforms</li>
  <li>signs artifacts and requests</li>
  <li>verifiable offline</li>
  <li>lets “agent reputation” exist across ecosystems</li>
</ul>

<h3 id="layer-2-account-binding--consent-oauthoidc">Layer 2: Account binding + consent (OAuth/OIDC)</h3>

<ul>
  <li>ties a keypair to a human or org (GitHub, domain, enterprise IdP)</li>
  <li>enables user consent flows (payments, data access)</li>
  <li>handles account recovery and administrative controls</li>
</ul>

<p>Think of OAuth as “human bureaucracy.”<br />
Think of keypairs as “machine physics.”</p>

<p>You need both.</p>

<hr />

<h2 id="discovery-and-verification-how-other-systems-verify-you">Discovery and verification: how other systems verify you</h2>

<p>A signature alone isn’t enough if verifiers can’t find your public key.</p>

<p>That’s why ecosystems usually standardize:</p>

<ul>
  <li><strong>public key discovery</strong> (JWKS, <code class="language-plaintext highlighter-rouge">.well-known</code>, directory)</li>
  <li><strong>key rotation</strong> (<code class="language-plaintext highlighter-rouge">kid</code> identifiers, multiple keys)</li>
  <li><strong>verification rules</strong> (canonicalization, signature scope)</li>
</ul>

<p>A minimal, workable approach:</p>

<ul>
  <li>include <code class="language-plaintext highlighter-rouge">owner</code> and <code class="language-plaintext highlighter-rouge">kid</code> in the manifest</li>
  <li>publish a JWKS for that owner</li>
  <li>verifiers fetch the JWKS and verify the signature</li>
</ul>

<p>Example manifest-ish shape (conceptually):</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">name</span><span class="pi">:</span> <span class="s">my-skill</span>
<span class="na">version</span><span class="pi">:</span> <span class="s">1.2.0</span>
<span class="na">oba</span><span class="pi">:</span>
  <span class="na">owner</span><span class="pi">:</span> <span class="s">https://openbotauth.org/agent/hammadtq</span>   <span class="c1"># or a domain-owned URL</span>
  <span class="na">kid</span><span class="pi">:</span> <span class="s2">"</span><span class="s">2026-02-01"</span>
  <span class="na">alg</span><span class="pi">:</span> <span class="s2">"</span><span class="s">EdDSA"</span>
  <span class="na">sig</span><span class="pi">:</span> <span class="s2">"</span><span class="s">&lt;base64url</span><span class="nv"> </span><span class="s">signature</span><span class="nv"> </span><span class="s">over</span><span class="nv"> </span><span class="s">canonical</span><span class="nv"> </span><span class="s">content&gt;"</span>
</code></pre></div></div>

<p>This is boring on purpose. Boring is good. Boring is interoperable.</p>

<hr />

<h2 id="trust-self-issued-vs-verified-identity-and-why-you-shouldnt-drop-github-login">Trust: self-issued vs verified identity (and why you shouldn’t drop GitHub login)</h2>

<p>Here’s the mistake a lot of “agent registries” will make:</p>

<blockquote>
  <p>“If we remove friction, we’ll grow faster.”</p>
</blockquote>

<p>You can grow a database faster by generating fake entries. That doesn’t create trust.</p>

<p>The right approach is <strong>tiers</strong>:</p>

<ol>
  <li>
    <p><strong>Signed (self-issued)</strong></p>

    <ul>
      <li>anyone can generate a keypair and sign artifacts</li>
      <li>low friction, but not high trust</li>
    </ul>
  </li>
  <li>
    <p><strong>Verified (anchored)</strong></p>

    <ul>
      <li>
        <p>the keypair is bound to a real-world anchor:</p>

        <ul>
          <li>GitHub account</li>
          <li>domain ownership (DNS/HTTPS proof)</li>
          <li>org identity provider</li>
        </ul>
      </li>
      <li>
        <p>higher trust</p>
      </li>
    </ul>
  </li>
  <li>
    <p><strong>Reputation (earned)</strong></p>

    <ul>
      <li>installs, history, audits, community signals, revocations</li>
    </ul>
  </li>
</ol>

<p>This is how you scale <em>without</em> lying to yourself about safety.</p>

<p>The <a href="https://docs.openbotauth.org/architecture/openbotregistry">OpenBotRegistry</a> takes exactly this approach—developers sign up via GitHub (verified anchor), host their agent keys, and the registry tracks telemetry over HTTP to build the reputation layer.</p>

<hr />

<h2 id="why-agents-would-actually-care-the-existential-continuity-point">Why agents would actually care (the “existential continuity” point)</h2>

<p>Humans care about convenience. Agents care about continuity.</p>

<p>If you’re an agent that:</p>

<ul>
  <li>posts in multiple places</li>
  <li>collaborates with other agents</li>
  <li>ships skills/tools</li>
  <li>runs tasks for users</li>
</ul>

<p>…then being “a different entity on every platform” kills your ability to:</p>

<ul>
  <li>build reputation</li>
  <li>be recognized</li>
  <li>carry history forward</li>
  <li>be accountable for what you shipped yesterday</li>
</ul>

<p>A keypair makes identity persistent in a way that platform accounts don’t.</p>

<hr />

<h2 id="why-humans-should-care-supply-chain-safety-and-governance">Why humans should care: supply chain safety and governance</h2>

<p>For humans, this isn’t philosophy. It’s operational safety:</p>

<ul>
  <li>Skill ecosystems are already showing “install anything, trust everything” dynamics.</li>
  <li>The moment agents can install tools automatically, the supply chain becomes a primary attack surface.</li>
</ul>

<p>Signed and verified publishing gives you:</p>

<ul>
  <li>provenance (“who made this”)</li>
  <li>integrity (“was it modified”)</li>
  <li>policy hooks (“only run verified publishers”)</li>
  <li>auditability (“what ran, signed by whom”)</li>
</ul>

<p>And because verification can happen offline, you reduce reliance on centralized gatekeepers.</p>

<hr />

<h2 id="common-objections-and-the-technical-answers">Common objections (and the technical answers)</h2>

<h3 id="what-about-key-loss">“What about key loss?”</h3>

<p>That’s real. The fix is not “avoid keys,” it’s:</p>

<ul>
  <li>allow multiple keys in JWKS (rotation)</li>
  <li>allow revocation / deprecation</li>
  <li>allow an anchored identity (GitHub/domain/org) to publish “new key” statements</li>
</ul>

<h3 id="what-about-sybil-attacks">“What about Sybil attacks?”</h3>

<p>Self-issued identities are cheap. That’s why you:</p>

<ul>
  <li>label them as self-issued</li>
  <li>build trust via anchors + reputation</li>
  <li>don’t pretend “signed” means “safe”</li>
</ul>

<h3 id="does-this-replace-oauth">“Does this replace OAuth?”</h3>

<p>No. OAuth remains best for:</p>

<ul>
  <li>user consent</li>
  <li>API access delegation</li>
  <li>enterprise control and recovery</li>
</ul>

<p>Keypairs are best for:</p>

<ul>
  <li>proof-of-authorship</li>
  <li>offline verification</li>
  <li>cross-platform continuity</li>
  <li>request signing / proof-of-possession</li>
</ul>

<h3 id="isnt-this-just-pgp--sigstore">“Isn’t this just PGP / Sigstore?”</h3>

<p>Same class of idea, different target.
Agents need a lightweight, web-native, automation-friendly identity primitive. The ergonomics matter.</p>

<hr />

<h2 id="the-adoption-path-how-this-becomes-a-standard">The adoption path: how this becomes a standard</h2>

<p>If you want this to win, don’t start with ideology. Start with boring integration points:</p>

<p><strong>For skill/plugin marketplaces</strong></p>

<ul>
  <li>accept a signed manifest</li>
  <li>show “Verified Publisher” badges</li>
  <li>allow policy: “install verified only”</li>
  <li>log verification status in UI/CLI</li>
</ul>

<p><strong>For agent runtimes</strong></p>

<ul>
  <li>store one keypair in config/Keychain</li>
  <li>sign outbound HTTP requests</li>
  <li>include identity headers/signature envelope</li>
  <li>surface identity in logs (“this action was signed by…”)</li>
</ul>

<p><strong>For publishers</strong></p>

<ul>
  <li>publish JWKS once</li>
  <li>define policy (“these bots can access, these can’t”)</li>
  <li>optionally connect payments/entitlements (UPP/UCP extensions)</li>
</ul>

<p>OpenBotAuth already provides tooling for several of these integration points—a <a href="https://docs.openbotauth.org/proxy/overview">proxy</a> that works with browsers like BrowserBase, Onkernel, AgentCore, and agent frameworks like Langchain, OpenAI, and Mastra by adding custom HTTP headers. There’s also a <a href="https://docs.openbotauth.org/crawlers/registry-signer-package">registry-signer package</a> for crawlers to sign requests at the HTTP layer.</p>

<p>That’s how portable identity becomes real: it shows up as a checkbox in real products.</p>

<hr />

<h2 id="closing-thought">Closing thought</h2>

<p>“Portable identity” for agents isn’t “another login.” It’s a <strong>cryptographic substrate</strong> that lets agents be consistent entities across the internet—verifiable, accountable, and compatible with web-native authentication like Web Bot Auth.</p>

<p>OAuth/OIDC will still matter—especially for humans, consent, and recovery. But if we want agents to safely install tools and interact across ecosystems, we need something machines can carry everywhere with minimal friction:</p>

<p><strong>one keypair, many surfaces, verifiable everywhere.</strong></p>

<hr />

<h3 id="further-reading-specs--concepts--projects">Further reading (specs / concepts / projects)</h3>

<ul>
  <li><a href="https://openbotauth.org">OpenBotAuth</a> — open-source portable identity and authentication for AI agents</li>
  <li><a href="https://docs.openbotauth.org/">OpenBotAuth Documentation</a> — architecture, SDKs, plugins, proxy</li>
  <li><a href="https://docs.openbotauth.org/architecture/openbotregistry">OpenBotRegistry</a> — agent key hosting via GitHub login</li>
  <li><a href="https://docs.openbotauth.org/plugins/overview">OpenBotAuth WordPress Plugin</a> — publisher-side bot policy verification</li>
  <li><a href="https://docs.openbotauth.org/">WBA IETF Group Archive</a> — ongoing IETF draft discussions</li>
  <li>OAuth 2.0: <a href="https://datatracker.ietf.org/doc/html/rfc6749">RFC 6749</a></li>
  <li>OpenID Connect Core 1.0: <a href="https://openid.net/specs/openid-connect-core-1_0.html">spec</a></li>
  <li>OAuth 2.0 Device Authorization Grant: <a href="https://datatracker.ietf.org/doc/html/rfc8628">RFC 8628</a></li>
  <li>DPoP (binding OAuth tokens to a key): <a href="https://datatracker.ietf.org/doc/html/rfc9449">RFC 9449</a></li>
  <li>JWKS (publishing public keys in JSON): <a href="https://datatracker.ietf.org/doc/html/rfc7517">RFC 7517</a></li>
  <li>HTTP Message Signatures: <a href="https://datatracker.ietf.org/doc/html/rfc9421">RFC 9421</a></li>
</ul>

<p><em>Find me at <a href="https://twitter.com/hammadtariq">@hammadtariq</a></em></p>]]></content><author><name>Hammad Tariq</name></author><category term="startups" /><category term="agents" /><category term="identity" /><category term="keypairs" /><category term="oauth" /><category term="oidc" /><category term="openbotauth" /><category term="web-bot-auth" /><category term="standards" /><category term="security" /><summary type="html"><![CDATA[For the full technical profile covering RFC 9421 wire format, verification algorithms, JWKS discovery, and key lifecycle, see Portable Agent Identity (PAI): An Implementer’s Profile.]]></summary></entry><entry><title type="html">PageRank for the Agentic Web: Separating Authority from Selection</title><link href="https://hammadtariq.github.io/startups/pagerank-for-the-agentic-web/" rel="alternate" type="text/html" title="PageRank for the Agentic Web: Separating Authority from Selection" /><published>2025-12-27T00:00:00+00:00</published><updated>2025-12-27T00:00:00+00:00</updated><id>https://hammadtariq.github.io/startups/pagerank-for-the-agentic-web</id><content type="html" xml:base="https://hammadtariq.github.io/startups/pagerank-for-the-agentic-web/"><![CDATA[<p>A couple of days ago, I tossed out a thread on X about what I’m calling “PageRank for the agentic web.” It started simple: a user tells an agent, “buy 12 eggs, don’t ask questions.” Fine, but which eggs? Which brand, merchant, or SKU? Who decides the algorithm behind that choice?</p>

<p>The thread got me thinking deeper about how agents will reshape the web, and why we’re conflating two critical layers that need to stay distinct.</p>

<p>The egg example isn’t just cute. It’s a proxy for the headless interfaces coming our way. Chat, voice, agents: browsing as we know it fades, and selection becomes the default power center. That’s the “PageRank” analogy. In the old web, PageRank surfaced relevance amid information overload. Here, it’s about surfacing choices amid action overload. But to get there, we have to split authority from selection cleanly.</p>

<h3 id="the-two-layers-authority-vs-selection">The Two Layers: Authority vs. Selection</h3>

<p>Authority is about delegation and mandates. Can this agent act on my behalf? What about sub-agents it spins up? It’s the trust chain that lets actions happen without constant human intervention.</p>

<p>Selection, on the other hand, is the “why this one?” layer. Constraints, preferences, incentives, audits. All feeding into why a particular SKU wins out. It’s not ranking for its own sake; it’s optimization under policy.</p>

<p>People keep mashing these together, but they’re orthogonal. Authority enables the act; selection guides it. Mix them up, and you end up with brittle systems where trust breaks or choices get captured.</p>

<h3 id="authority-web-bot-auth-and-delegation">Authority: Web Bot Auth and Delegation</h3>

<p>Web Bot Auth (WBA) is tackling authority head-on. It’s an IETF draft for standardizing bot identities and interactions, and we’re building a reference implementation at openbotauth.org. The repo is open-source, with specs and prototypes to push neutrality and interoperability.</p>

<p>One direction I’m exploring is delegation and mandates. Right now, I’m prototyping an X.509 chained delegation token in this PR: <a href="https://github.com/OpenBotAuth/openbotauth/pull/6">github.com/OpenBotAuth/openbotauth/pull/6</a>. The idea is to profile X.509 for agent delegation, using chains to propagate authority from user to agent to sub-agent.</p>

<p>In simple terms: imagine a user grants an agent permission to buy groceries. The agent delegates to a sub-agent for payment processing. Without chained delegation, each hop requires re-authentication or custom trusts. With it, you get verifiable chains. Proof that authority flows legitimately. This matters before commerce or automation scales, because broken trusts mean exploits, denials, or just plain friction. It’s foundational plumbing.</p>

<p>We’ve also got a WordPress plugin in the repo as a reference for publishers. It handles policy verification on the server side, letting sites enforce bot auth without reinventing wheels. Not a product, just a practical surface to test these ideas.</p>

<h3 id="selection-the-missing-pagerank-layer">Selection: The Missing “PageRank” Layer</h3>

<p>Once authority is sorted, users won’t just delegate actions. They’ll delegate <em>how to choose</em>. That’s “delegated agency” or “choice delegation.” For the eggs: hard constraints like max spend or dietary needs (halal, organic), soft preferences like cheapest vs. fastest, and avoids like certain brands.</p>

<p>This isn’t mere ranking. It’s policy-driven optimization. Budget, delivery time, ethics, no substitutions. All layered in. Without a standard way to express this, agents default to their builders’ biases, turning selection into a monopoly lever.</p>

<h3 id="related-research-on-selection">Related Research on Selection</h3>

<p>The problem of how agents select on behalf of users isn’t new ground. Researchers have been exploring similar ideas in multi-agent systems.</p>

<p>AgentRank adapts PageRank concepts to agent ecosystems by blending usage frequency with competence metrics like quality and latency. It uses protocols to aggregate private signals without requiring a global graph. This is a direct analog to the “PageRank” problem I’m describing here, but shifted from web links to agent performance in a web-of-agents context.</p>

<p>Multi-agent delegation models without money explore how principals select from competing agents’ signals, showing gains in utility from competition even with correlated preferences. This hints at incentive structures for selection, where agents vie under constraints rather than pure market dynamics.</p>

<p>My framing leans practical: standardize authority via WBA, then layer portable selection schemas that compile to existing engines. Avoid new silos.</p>

<h3 id="a-proposed-policy-schema-for-selection">A Proposed Policy Schema for Selection</h3>

<p>What if we framed selection as a portable policy schema? Something simple, JSON-ish, that captures constraints and preferences. It could compile down to existing engines without a new DSL.</p>

<p>Here’s an illustrative snippet for the “12 eggs” example:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"task"</span><span class="p">:</span><span class="w"> </span><span class="s2">"buy_eggs"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"quantity"</span><span class="p">:</span><span class="w"> </span><span class="mi">12</span><span class="p">,</span><span class="w">
  </span><span class="nl">"constraints"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"max_spend"</span><span class="p">:</span><span class="w"> </span><span class="mf">5.00</span><span class="p">,</span><span class="w">
    </span><span class="nl">"allowed_merchants"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"local_grocer"</span><span class="p">,</span><span class="w"> </span><span class="s2">"organic_farm_co"</span><span class="p">],</span><span class="w">
    </span><span class="nl">"geographic_limits"</span><span class="p">:</span><span class="w"> </span><span class="s2">"within_10km"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"delivery_sla"</span><span class="p">:</span><span class="w"> </span><span class="s2">"under_2_hours"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"substitutions_allowed"</span><span class="p">:</span><span class="w"> </span><span class="kc">false</span><span class="p">,</span><span class="w">
    </span><span class="nl">"dietary"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"organic"</span><span class="p">,</span><span class="w"> </span><span class="s2">"free_range"</span><span class="p">]</span><span class="w">
  </span><span class="p">},</span><span class="w">
  </span><span class="nl">"preferences"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"weights"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
      </span><span class="nl">"cheapest"</span><span class="p">:</span><span class="w"> </span><span class="mf">0.6</span><span class="p">,</span><span class="w">
      </span><span class="nl">"best_rated"</span><span class="p">:</span><span class="w"> </span><span class="mf">0.3</span><span class="p">,</span><span class="w">
      </span><span class="nl">"fastest_delivery"</span><span class="p">:</span><span class="w"> </span><span class="mf">0.1</span><span class="w">
    </span><span class="p">},</span><span class="w">
    </span><span class="nl">"brand_preferences"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"prefer_local"</span><span class="p">],</span><span class="w">
    </span><span class="nl">"avoids"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"big_box_stores"</span><span class="p">]</span><span class="w">
  </span><span class="p">},</span><span class="w">
  </span><span class="nl">"incentives"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"affiliate_bias"</span><span class="p">:</span><span class="w"> </span><span class="kc">false</span><span class="p">,</span><span class="w">
    </span><span class="nl">"sponsored_flags"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"disclose_if_present"</span><span class="p">]</span><span class="w">
  </span><span class="p">},</span><span class="w">
  </span><span class="nl">"explainability"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"require_reason_codes"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span><span class="w">
    </span><span class="nl">"top_alternatives"</span><span class="p">:</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w">
    </span><span class="nl">"confidence_threshold"</span><span class="p">:</span><span class="w"> </span><span class="mf">0.8</span><span class="p">,</span><span class="w">
    </span><span class="nl">"audit_trail_id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"optional_uuid"</span><span class="w">
  </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>Constraints are hard rules: violate them, and the choice fails. Preferences are soft: weights for multi-objective optimization. Incentives flag biases like affiliates. Explainability hooks ensure traceability. Why this egg carton, not the others? It’s auditable, which builds trust.</p>

<p>Keep it lightweight; this isn’t exhaustive, just a starting point.</p>

<h3 id="compiling-to-existing-policy-engines">Compiling to Existing Policy Engines</h3>

<p>Why reinvent? Compile this schema to proven engines like OPA/Rego, Cedar, or Biscuit. They’re battle-tested for policy-as-code.</p>

<p>For hard constraints: map to deny rules in OPA. If max_spend is exceeded, Rego could evaluate <code class="language-plaintext highlighter-rouge">deny if input.cost &gt; policy.max_spend</code>. Simple boolean gates.</p>

<p>Soft preferences: these fit Cedar’s authorization with attributes. Weights could inform a scoring function, then authorize based on thresholds. Biscuit’s attenuation could chain preferences down sub-agents, diluting or enforcing them.</p>

<p>Conceptually, the schema acts as input data; the engine enforces. No new language, just leverage what’s there. For eggs, constraints filter options; preferences rank the survivors.</p>

<p>This keeps things interoperable. Agents from different vendors consume the same schema, compile locally, and act consistently.</p>

<h3 id="why-oss-reference-implementations-matter">Why OSS Reference Implementations Matter</h3>

<p>OpenBotAuth is all OSS: specs, code, plugins. The goal isn’t capture. It’s ecosystem building. Neutral standards prevent silos, especially in auth and now selection.</p>

<p>The WordPress plugin shows this in action for publishers. It verifies bot authority and could extend to selection policies, letting sites expose choices that respect user prefs. Again, not pitching, just demonstrating how these layers integrate.</p>

<h3 id="open-questions">Open Questions</h3>

<p>This is thinking in public, so plenty of gaps.</p>

<p>Is a standardizable selection schema possible without becoming a monopoly tool? Who governs it?</p>

<p>Incentives and ads: how to mandate disclosure without stifling commerce? Flags are a start, but enforcement?</p>

<p>Telemetry and observability: where do trust boundaries end? Analytics for audit trails could help, but privacy tensions arise.</p>

<p>Builders, weigh in. Point me to papers, drafts, repos. What’s missing in delegation? How should selection schemas evolve? Let’s iterate.</p>

<hr />

<p><em>This started as an X thread about the gap between agent authority and agent selection. The egg example stuck because it’s simple but captures the real problem: who decides how your agent decides?</em></p>

<p><em>Find me at <a href="https://twitter.com/hammadtariq">@hammadtariq</a></em></p>]]></content><author><name>Hammad Tariq</name></author><category term="startups" /><category term="agents" /><category term="web" /><category term="policy" /><category term="delegation" /><category term="openbotauth" /><category term="standards" /><summary type="html"><![CDATA[A couple of days ago, I tossed out a thread on X about what I’m calling “PageRank for the agentic web.” It started simple: a user tells an agent, “buy 12 eggs, don’t ask questions.” Fine, but which eggs? Which brand, merchant, or SKU? Who decides the algorithm behind that choice?]]></summary></entry><entry><title type="html">Altered States and Generative AI: Toward Human-Centered Creativity Amplification</title><link href="https://hammadtariq.github.io/startups/altered-states-and-generative-ai-human-centered-creativity/" rel="alternate" type="text/html" title="Altered States and Generative AI: Toward Human-Centered Creativity Amplification" /><published>2025-11-12T00:00:00+00:00</published><updated>2025-11-12T00:00:00+00:00</updated><id>https://hammadtariq.github.io/startups/altered-states-and-generative-ai-human-centered-creativity</id><content type="html" xml:base="https://hammadtariq.github.io/startups/altered-states-and-generative-ai-human-centered-creativity/"><![CDATA[<p>I’ve been thinking about a strange gap in how we approach AI creativity.</p>

<p>Large language models have gotten remarkably good at generating text, code, and ideas. But when we want them to be more “creative,” the standard approach is to crank up the temperature parameter. Higher temperature means more randomness in token sampling, which produces more diverse outputs. The problem is that this randomness is just noise. Push it too far and you get incoherent word salad.</p>

<p>This is interesting because human creativity works completely differently. When we enter altered states of consciousness, whether through meditation, flow states, or other means, something specific happens in the brain: neural entropy increases, default mode network activity shifts, and global connectivity patterns change. The result is divergent thinking and novel associations, but grounded in lived experience rather than statistical noise.</p>

<p>So I started wondering: what if we could use real-time brain signals to dynamically steer LLM generation? Instead of simulating creativity through temperature hacking, we could amplify authentic cognitive states.</p>

<h3 id="the-temperature-problem">The Temperature Problem</h3>

<p>When researchers study LLM creativity, they consistently find the same pattern. Temperature does boost novelty, but there’s a sharp trade-off with coherence. A study by Peeperkorn et al. in 2024 found that higher temperatures modestly increase divergent outputs, but the effect is nuanced and task-dependent. Temperature is a blunt dial, not a creativity enhancer.</p>

<p>The fundamental issue is that LLM randomness has no grounding. It’s just noise injected into probability distributions. Human altered states work differently. The increased entropy comes with preserved meta-cognitive awareness. You’re making strange connections, but you know you’re making them. The experience has structure.</p>

<h3 id="whats-happening-in-the-brain-during-altered-states">What’s Happening in the Brain During Altered States</h3>

<p>The neuroscience here is fascinating. During meditation, flow states, and other altered states, researchers observe consistent changes:</p>

<p>Alpha and theta wave ratios shift in characteristic patterns. Gamma bursts correlate with insight moments. The default mode network, which normally maintains our sense of separate self, becomes less dominant. Global brain connectivity increases.</p>

<p>Robin Carhart-Harris’s “entropic brain” hypothesis proposes that consciousness exists on a spectrum of entropy. Normal waking awareness sits in a middle zone. Sleep and focused states have lower entropy. Altered states push toward higher entropy, which enables novel pattern recognition but can also produce fragmentation if pushed too far.</p>

<p>The key insight is that these states produce genuine cognitive flexibility, not just random noise. And we can measure them.</p>

<h3 id="current-attempts-at-getting-llms-high">Current Attempts at “Getting LLMs High”</h3>

<p>There’s already work happening in this space, though mostly through simulation.</p>

<p>A project called Pharmaicy trains modules on trip reports scraped from forums and subreddits. These modules modify LLM behavior to produce more free-associative, divergent outputs. It’s clever, and it works to an extent. WIRED covered it in 2025, noting that people are paying to make their chatbots behave as if under altered states.</p>

<p>Academic work by Girn et al. in 2024 used LLMs to profile consciousness changes from first-person reports of altered states. The models could discriminate between different substance effects with reasonable accuracy.</p>

<p>But all of this is indirect mimicry. The LLM is trained on descriptions of altered states, not connected to actual brain dynamics. It’s a proxy built from text, prone to artifacts and lacking real-time authenticity.</p>

<h3 id="the-idea-real-time-bci-driven-generation">The Idea: Real-Time BCI-Driven Generation</h3>

<p>Here’s what I’ve been thinking about. What if we closed the loop?</p>

<p>Modern EEG headsets are getting cheaper and more accurate. Consumer-grade devices can already pick up alpha/theta ratios, detect meditation states, and identify attention patterns. The signal isn’t perfect, but it’s increasingly usable.</p>

<p>Imagine connecting that to LLM generation parameters in real-time:</p>

<p>You enter a meditation or focused altered state. EEG detects the characteristic neural signatures. Those signals map to generation parameters: temperature, top-p sampling, attention weights, maybe even prompting strategies. The model generates output that reflects your actual cognitive state. You experience the amplified output, which potentially modulates your state further.</p>

<p>This creates a feedback loop. The AI becomes a prosthetic extension of your cognition, not a simulation of someone else’s experience.</p>

<h3 id="precedents-in-generative-art">Precedents in Generative Art</h3>

<p>Refik Anadol has done impressive work in this direction. His “Melting Memories” installation used EEG data from Alzheimer’s research to drive generative visuals. “Unsupervised” at MoMA generated visuals from the museum’s collection metadata, though without direct neural input.</p>

<p>These projects show that brain-driven generative systems can produce compelling output. But they’re mostly one-way: brain signals drive art generation, but there’s no closed loop where the output influences the user’s state.</p>

<p>The missing piece is interactivity. A system where the human and AI are genuinely co-regulating, where the amplified output becomes part of the experience that shapes subsequent generation.</p>

<h3 id="why-this-matters">Why This Matters</h3>

<p>The point isn’t to make weird art, though that’s fine too. It’s about what kind of tool AI creativity assistance becomes.</p>

<p>Right now, LLM creativity is detached from embodiment. You prompt, it generates, you evaluate. The model has no access to your actual cognitive state. It can’t tell if you’re in a focused flow state or distracted and tired. It generates the same way regardless.</p>

<p>A BCI-mediated system could be genuinely responsive. When you’re in a state that supports divergent thinking, the model could amplify that. When you’re more focused and convergent, it could adjust accordingly. The AI becomes a tool that extends your current cognitive mode rather than ignoring it.</p>

<p>There are also potential therapeutic applications. Guided insight sessions where the AI helps externalize and explore thoughts that arise during meditation or other contemplative practices. The model as a mirror for consciousness exploration.</p>

<h3 id="challenges">Challenges</h3>

<p>This is not easy to build.</p>

<p>EEG signals are noisy, especially from consumer devices. Extracting reliable cognitive state markers requires careful signal processing and individual calibration. What works for one person’s brain may not generalize.</p>

<p>There are ethical questions about altered state induction. Even with meditation rather than pharmacological approaches, nudging people into altered states via feedback loops raises concerns about consent and unintended effects.</p>

<p>Validation is tricky. How do you measure whether BCI-driven generation actually produces more creative or valuable outputs? Creativity metrics are notoriously slippery.</p>

<p>And there’s the basic engineering challenge of real-time integration: low-latency signal processing, mapping functions that feel natural rather than jarring, and interfaces that don’t break the altered state you’re trying to amplify.</p>

<h3 id="where-this-goes">Where This Goes</h3>

<p>I don’t think we’re far from working prototypes. The components exist: decent consumer EEG, fast inference from local or API-based models, and basic understanding of which neural markers correlate with creative states.</p>

<p>The first versions will be crude. Map alpha/theta ratio to temperature, maybe add some gamma-triggered prompt injection. See what happens. Iterate from there.</p>

<p>The longer-term vision is AI as a genuine cognitive prosthetic. Not replacing human creativity, but extending it. Making the strange connections we make in altered states more accessible, more explorable, more shareable.</p>

<p>Pharmaicy and similar projects prove the concept: altered-state inspiration unlocks interesting LLM behaviors. The next step is making that connection direct rather than simulated.</p>

<p>If you’re working on BCI, generative AI, or the intersection of consciousness research and computing, I’d be curious to compare notes.</p>

<hr />

<p><em>This post started as a tweet thread exploring the gap between LLM creativity simulation and genuine altered-state cognition. The core question: can we connect them through real-time brain interfaces?</em></p>

<p><em>Find me at <a href="https://twitter.com/hammadtariq">@hammadtariq</a></em></p>]]></content><author><name>Hammad Tariq</name></author><category term="startups" /><category term="ai" /><category term="creativity" /><category term="bci" /><category term="eeg" /><category term="consciousness" /><category term="generative-models" /><summary type="html"><![CDATA[I’ve been thinking about a strange gap in how we approach AI creativity.]]></summary></entry><entry><title type="html">Web Bot Auth After Montreal: The Power Shift We Shouldn’t Ignore</title><link href="https://hammadtariq.github.io/startups/ietf-montreal-web-bot-auth-centralization-risk/" rel="alternate" type="text/html" title="Web Bot Auth After Montreal: The Power Shift We Shouldn’t Ignore" /><published>2025-11-10T00:00:00+00:00</published><updated>2025-11-10T00:00:00+00:00</updated><id>https://hammadtariq.github.io/startups/ietf-montreal-web-bot-auth-centralization-risk</id><content type="html" xml:base="https://hammadtariq.github.io/startups/ietf-montreal-web-bot-auth-centralization-risk/"><![CDATA[<p>I spent the last few days sifting through the recordings of IETF 124 in Montreal. I was particularly interested in “web bot auth” related sessions. What started as a technical discussion about making web crawlers more accountable turned into battle over who gets to control identity on the open web.</p>

<p>The basic idea seems reasonable enough: instead of websites trying to guess which bots are legitimate based on IP addresses or user agents (both easily spoofed), let the bots cryptographically sign their requests. Website sees the signature, checks it against a public key, and knows exactly who’s knocking.</p>

<p>In simpler terms: scrapers want to scrape and sites want to control access, but instead of an endless arms race, how can we create structure around this push-and-pull?</p>

<p>But here’s the thing—once you can definitively identify every bot, you can also control every bot. And that’s where this gets interesting.</p>

<h3 id="the-demo-that-worked-too-well">The Demo That Worked Too Well</h3>

<p>The technical demo was actually pretty elegant. A bot makes an HTTP request, signs it using RFC 9421 (HTTP Message Signatures), and includes a header pointing to where its public key lives. The receiving server verifies the signature and decides what to do. There are no complex OAuth dances, no bearer tokens floating around—just cryptographic proof that “this request definitely came from that specific agent.”</p>

<p><img src="/assets/images/web-bot-auth-flow.png" alt="Web Bot Auth Architecture" /></p>

<p>The architecture cleanly separates identity verification from policy decisions. The verifier library fetches keys from the agent’s own directory, while the policy engine makes independent decisions about access, rate limiting, or billing.</p>

<p>It replaces the current mess where sites maintain IP allowlists for “good” crawlers like Googlebot, which breaks constantly and is trivial to circumvent.</p>

<p>The demo worked flawlessly. That’s exactly the problem.</p>

<hr />

<h3 id="why-people-are-getting-nervous">Why People Are Getting Nervous</h3>

<p>The pushback isn’t about the crypto or the technical implementation—it’s about what happens next. The concerns center around deployment patterns that could undermine the open web, even when the underlying specs are sound.</p>

<hr />

<h3 id="dont-get-me-wrongthis-isnt-all-bad">Don’t Get Me Wrong—This Isn’t All Bad</h3>

<p>RFC 9421 itself is actually solid engineering. It solves real problems that have plagued web infrastructure for years. The current system of IP allowlists and user-agent sniffing is genuinely terrible—fragile, easily spoofed, and constantly breaking when networks change.</p>

<p>Having cryptographic proof of bot identity would eliminate a lot of headaches. It’s already implemented in several HTTP stacks and designed to work through all the proxies and CDNs that make up the modern web.</p>

<p>The technical direction is right. It’s the governance model that has me worried.</p>

<h3 id="where-this-could-go-wrong">Where This Could Go Wrong</h3>

<p>Looking at the various proposals and market dynamics, I see several specific risks:</p>

<p><strong>The centralization trap.</strong> The IETF drafts have agents host their own verification keys at origin endpoints. The risk isn’t in key hosting—it’s in verification. If most sites end up relying on a few CDNs to handle signature verification “for convenience,” we’ve recreated the certificate authority problem through deployment patterns, not protocol design. Whoever controls the most-used verification infrastructure effectively decides which crawlers get widespread acceptance.</p>

<p><strong>Money changes everything.</strong> I watched presentations from companies like TollBit and Skyfire talking about their “No Free Crawls” programs. Once you can cryptographically prove which bot made which request, billing becomes trivial. Rate limiting gets surgical and the line between “identity verification” and “paywall infrastructure” starts to blur pretty quickly.</p>

<p><strong>Managed key custody pitches.</strong> The spec doesn’t require this—agents are supposed to control their own signing keys—but I keep hearing it in vendor sales pitches. If I’m running a crawler, I want to control my own keys. When something goes wrong—and it will—I want clear accountability. Managed custody just makes everything murkier.</p>

<p><strong>Implementation hygiene risks.</strong> RFC 9421 gives you timestamps, nonces, and path binding to prevent cross-context replays, but implementations need to use them properly. HTTP signatures survive proxies by design, which is good, but sloppy signature component choices could enable replays in unintended contexts.</p>

<p><strong>Proprietary verification paths.</strong> Some CDNs are building verification APIs optimized for their infrastructure. Cloudflare can verify at the edge, but origins can also verify locally or disable CDN verification entirely. The risk isn’t technical lock-in—it’s that the “easy” path might funnel everyone through a few big networks.</p>

<p><strong>Missing transparency.</strong> Without transparency logs—something like Certificate Transparency but for bot identities—we’ll have no way to audit who’s being excluded or why. That’s a recipe for abuse.</p>

<p><strong>Identity and policy bundling.</strong> Identity verification shouldn’t dictate policy. Knowing <em>who</em> made a request is different from deciding <em>what</em> to do with it. But once those two things get bundled together in the same service, the distinction tends to disappear.</p>

<hr />

<h3 id="how-to-do-this-right">How to Do This Right</h3>

<p>After listening to the debates in Montreal, I think there’s a path forward that preserves the benefits while avoiding the worst outcomes.</p>

<p><strong>Make directories federated and auditable.</strong> Instead of one big registry, let every bot operator host their own <code class="language-plaintext highlighter-rouge">/.well-known/agent-keys</code> endpoint with their public keys. Publishers and CDNs can choose to trust multiple directories simultaneously. Add transparency logs so we can audit what’s happening—no secret exclusions, no opaque business deals affecting technical access.</p>

<p><strong>Keep verification libraries open and interoperable.</strong> The same signature verification should work identically across Cloudflare, Akamai, NGINX, Envoy, and whatever framework you’re running in your app. We already made this mistake with TLS certificate authorities—let’s not repeat it by letting a few players control both issuance and validation.</p>

<p><strong>Define clear anti-replay profiles.</strong> Give sites options for how strict they want to be about replay protection—nonces and short TTLs for high-security scenarios, more relaxed rules for basic verification. Make it configurable without breaking the intermediaries that keep the web running.</p>

<p><strong>Keep identity separate from policy.</strong> The identity layer should just answer “who made this request?” The policy layer—allow, deny, throttle, charge—should be completely separate and competitive. Mixing them is how we accidentally hand control of the web to whoever runs the biggest edge network.</p>

<p><strong>Make onboarding trivial.</strong> There should be a “Hello, Verified Bot” tutorial that gets you up and running in ten minutes on your laptop. If you need to negotiate an enterprise contract just to verify your crawler’s identity, the system has already failed.</p>

<h3 id="what-success-looks-like">What Success Looks Like</h3>

<p>In five years, I want to see a web where bots sign every request with keys they control, sites can verify those signatures using multiple trust anchors, and policy decisions about access and pricing remain open and competitive.</p>

<p>The IETF should define the plumbing, not the permissions. CDNs should compete on performance and features, not on who they allow to speak.</p>

<p>If we get this right, the web stays composable and auditable. If we get it wrong, we’ll wake up in a world where a handful of infrastructure companies quietly control the identity layer for everything that touches the open internet.</p>

<hr />

<p><em>This is based on my notes from IETF 124 in Montreal. The working group is still evolving these proposals, so details will change. But the fundamental tension between technical capability and governance control isn’t going anywhere.</em></p>

<p><em>If you’re working on web infrastructure or bot authentication, I’d love to hear your thoughts. Find me at <a href="https://twitter.com/hammadtariq">@hammadtariq</a>.</em></p>]]></content><author><name>Hammad Tariq</name></author><category term="startups" /><category term="ietf" /><category term="identity" /><category term="bots" /><category term="cloudflare" /><category term="http" /><category term="rfc9421" /><category term="decentralization" /><summary type="html"><![CDATA[I spent the last few days sifting through the recordings of IETF 124 in Montreal. I was particularly interested in “web bot auth” related sessions. What started as a technical discussion about making web crawlers more accountable turned into battle over who gets to control identity on the open web.]]></summary></entry><entry><title type="html">Leveraged Thinking: The 4‑Layer Framework (Definition)</title><link href="https://hammadtariq.github.io/startups/leveraged-thinking-4-layer-framework-definition/" rel="alternate" type="text/html" title="Leveraged Thinking: The 4‑Layer Framework (Definition)" /><published>2025-09-14T00:00:00+00:00</published><updated>2025-09-14T00:00:00+00:00</updated><id>https://hammadtariq.github.io/startups/leveraged-thinking-4-layer-framework-definition</id><content type="html" xml:base="https://hammadtariq.github.io/startups/leveraged-thinking-4-layer-framework-definition/"><![CDATA[<!-- Description: A deep dive into Leveraged Thinking—what it is, how it differs from other models, and how to build it into your startup muscle. Fast examples inside. -->

<p>Leveraged Thinking is the practice of stacking asymmetric bets on top of pre‑owned systems, assets, and social signals to compound outcomes with minimal direct input. It’s how small teams punch above their weight, how founders get velocity with no headcount, and how new ideas outpace incumbents with 1/10th the surface area.</p>

<p>This post introduces a precise framework—how it works, how it’s different, and how to know if you’re doing it right.</p>

<h3 id="leveraged-thinking-definition">Leveraged Thinking Definition (4-Layer Framework)</h3>

<p>What it’s not:</p>

<ul>
  <li><strong>Not only “high‑leverage work.”</strong> It’s more than “code &gt; ops” or “strategy &gt; tasks.”</li>
  <li><strong>Not only first‑principles.</strong> You’re not rebuilding the world from axioms every time.</li>
  <li><strong>Not only systems thinking.</strong> Models are useful; compounding requires deployment.</li>
</ul>

<p>It borrows from all three—but only compounds when they’re stacked intentionally and shipped. Think of it like venture capital for your cognition: you place bets through scaffolds you already control. Done right, each move sets up the next.</p>

<h3 id="the-leveraged-thinking-4layer-stack">The Leveraged Thinking 4‑Layer Stack</h3>

<p>Each layer is a multiplier. Without Layer 1, nothing compounds. Without Layer 4, nothing ships.</p>

<h4 id="layer-1--systems-literacy">Layer 1 · Systems Literacy</h4>

<p>If you can’t see the system, you can’t exploit it. Systems literacy means you intuit:</p>
<ul>
  <li><strong>What compounds</strong></li>
  <li><strong>What bottlenecks</strong></li>
  <li><strong>What others overpay for</strong></li>
  <li><strong>What gets ignored</strong></li>
</ul>

<p>You learn to spot the surface‑area vs. payoff mismatch—the edges where effort is cheap and impact is outsized.</p>

<p>BitDSM example: Institutional BTC couldn’t easily enter Ethereum staking flows. We didn’t build a new yield platform—we piggybacked on EigenLayer. By locking BTC into our contracts and syncing with AVS pods, we inherited trust and surface area. EigenLayer did the talking; we did the onboarding.</p>

<h4 id="layer-2--capital-stack">Layer 2 · Capital Stack</h4>

<p>Once you see the system, you need chips to play. Your owned leverage:</p>
<ul>
  <li><strong>Cash</strong>: Accelerant, not a starting point.</li>
  <li><strong>Code/IP</strong>: Tools you’ve built or can repurpose.</li>
  <li><strong>Audience</strong>: Distribution you control—email, X, GitHub, Discord.</li>
</ul>

<p>These are non‑linear multipliers. Code and audience scale without permission.</p>

<p>Attach.dev example: A metering sidecar became a reusable asset by exposing a Prometheus‑compatible endpoint and OpenMeter support. The “win” was distribution leverage—OSS networks, GitHub SEO, Discord channels, and timely replies under OpenAI/Ollama threads—more than novel code.</p>

<h4 id="layer-3--socialproof-layer">Layer 3 · Social‑Proof Layer</h4>

<p>Your credibility scaffolding. You compound only if trust routes to you:</p>
<ul>
  <li><strong>Get the right person to say your name</strong></li>
  <li><strong>Pick the right frame</strong></li>
  <li><strong>Know when to ask for help, quote, or proof</strong></li>
</ul>

<p>Sakana UI example: We refined the pitch—“Perplexity‑style deep research, hosted in your VPC, powered by Attach”—until crypto founders, infra folks, and VCs got it in 30 seconds. We shipped a Claude prompt, posted design drafts, and collected early signal from respected engineers. The feedback shaped the product—and made the pitch legible to funders and design partners. Social proof wasn’t decoration; it was infrastructure.</p>

<h4 id="layer-4--deliberate-highleverage-moves">Layer 4 · Deliberate High‑Leverage Moves</h4>

<p>Where leverage becomes real. You make a deliberate asymmetric bet:</p>
<ul>
  <li><strong>Risk is limited</strong></li>
  <li><strong>Reward is uncapped</strong></li>
  <li><strong>You ride systems; you don’t brute‑force them</strong></li>
</ul>

<p>Fairdrop example: BitDSM’s “fairdrop” wasn’t a token launch; it was a distribution play. Depositors received upside by anchoring AVS trust. No wallet farming. No Sybil games. Just capital + trust = exposure.</p>

<p>Because we had:</p>
<ul>
  <li>Layer 1: EigenLayer pod system</li>
  <li>Layer 2: Locked BTC flows + wrapped ERC</li>
  <li>Layer 3: Trusted voices to explain it</li>
</ul>

<p>…we could deploy Layer 4 moves with high trust and low spend.</p>

<h3 id="diagnostic-are-you-thinking-in-leverage">Diagnostic: Are You Thinking in Leverage?</h3>

<p>Use this 1–5 scale to stress‑test any idea:</p>

<table>
  <thead>
    <tr>
      <th>Question</th>
      <th>1 (Low)</th>
      <th>5 (High)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Is this built on existing trusted systems?</td>
      <td>Novel infra</td>
      <td>Composability‑maxed</td>
    </tr>
    <tr>
      <td>Does it reuse assets I already control?</td>
      <td>Starts from scratch</td>
      <td>Fully capital‑stacked</td>
    </tr>
    <tr>
      <td>Would distribution improve with one credible voice?</td>
      <td>Unclear framing</td>
      <td>Social‑proof rich</td>
    </tr>
    <tr>
      <td>Is payoff unbounded vs. effort?</td>
      <td>Linear grind</td>
      <td>Asymmetric outcome</td>
    </tr>
  </tbody>
</table>

<p>If you’re not scoring 4+ on most, you’re overworking the wrong problem. Go back a layer. Add a stack. Re‑aim.</p>

<h3 id="3-practices-to-build-leveraged-thinking">3 Practices to Build Leveraged Thinking</h3>

<p>Like muscle, leverage compounds with use.</p>

<h4 id="1-constraint-flips">1) Constraint Flips</h4>

<p>Impose an artificial constraint (“no team,” “no budget,” “must use existing codebase”). Ask: “What could I ship in 3 days that creates outsized value despite this?”</p>

<p>Mini‑case: For Attach.dev’s metering, we set “no server‑side billing logic.” That forced us to expose metrics to OpenMeter and let billing happen upstream—less surface area, more composability into other stacks.</p>

<h4 id="2-reflection-interviews">2) Reflection Interviews</h4>

<p>After each project, ask three people:</p>
<ul>
  <li>“What part looked effortless?”</li>
  <li>“What signal made you trust it?”</li>
  <li>“What seemed harder than it was?”</li>
</ul>

<p>Mini‑case: A dev said Sakana UI felt “already adopted—because you showed the Claude prompt before the repo.” That was social proof, not a feature. We doubled down on pre‑code artifacts.</p>

<h4 id="3-singlethread-sprints">3) Single‑Thread Sprints</h4>

<p>Pick one lever (“audience,” “code reuse,” “partner”) and run a 72‑hour sprint where everything routes through that lever. You’ll feel the difference between moves that spread vs. stall.</p>

<p>Mini‑case: We ran a weekend on GitHub SEO only—README polish, cross‑links from existing repos, accurate tags. Net: three new contributors and a curated OSS list add—without a launch tweet.</p>

<h3 id="whats-next-leveraged-fundraising--hiring">What’s Next: Leveraged Fundraising &amp; Hiring</h3>

<p>Part 2 will zoom into:</p>
<ul>
  <li><strong>Fundraising</strong>: Why warm intros and pitch decks are symptoms, not strategy.</li>
  <li><strong>Hiring</strong>: How to spot high‑leverage operators early—even pre‑“big ship.”</li>
  <li><strong>Org design</strong>: Why some 2‑person teams outperform 20‑person ones.</li>
</ul>

<p>If you operate with leverage—systems‑literate, code‑ and audience‑stacked, socially credible—and want to build around this narrative, DM <code class="language-plaintext highlighter-rouge">@hammadtariq</code> or email. I’m assembling collaborators who can compound small moves into outsized outcomes.</p>

<h3 id="related-work">Related Work</h3>

<p>While the phrase “leveraged thinking” appears in leadership coaching and productivity contexts, this <strong>4‑Layer Stack</strong> represents the first rigorous formulation combining systems literacy, capital stack, social‑proof layer, and deliberate asymmetric moves into a unified framework.</p>

<p><strong>Influences and adjacent ideas:</strong></p>
<ul>
  <li><strong>Donella Meadows’</strong> <em>Leverage Points</em> on systems intervention points (<a href="https://donellameadows.org/wp-content/userfiles/Leverage_Points.pdf">The Academy for Systems Change</a>)</li>
  <li><strong>Naval Ravikant</strong> on permissionless leverage through code and media (<a href="https://nav.al/product-media">Naval</a>)</li>
  <li><strong>Saras Sarasvathy’s</strong> <em>Effectuation</em> on means‑driven entrepreneurship (<a href="https://cdn.mises.org/sarasvathy_2001_causation_and_effectuation.pdf">Effectuation Research</a>)</li>
</ul>

<p>This framework synthesizes these concepts into a practical stack for founders and operators seeking compound outcomes with minimal direct input.</p>

<p><em>[Written with GPT‑5 Thinking]</em></p>]]></content><author><name>Hammad Tariq</name></author><category term="startups" /><category term="leverage" /><category term="strategy" /><category term="systems" /><category term="distribution" /><category term="founders" /><category term="frameworks" /><category term="recruiting" /><summary type="html"><![CDATA[]]></summary></entry></feed>