AI agents are beginning to remember.
Not perfectly, and not always safely, but enough that the shape of the next infrastructure problem is becoming visible. Agents now keep long-term notes about users, write task files, maintain vector stores, build skill libraries, refine prompts, and carry procedural habits from one session to the next. Memory is no longer just context. It is becoming part of the agent’s operating system.
But most of that memory is trapped.
A customer-support agent learns how to triage a class of incidents, but that knowledge stays inside one deployment. A security agent develops a reliable way to prioritize vulnerabilities, but the procedure lives as a prompt, a notebook, or an undocumented workflow. A DeFi risk agent learns that some governance patterns matter more than surface-level TVL, but the calibration is embedded in a private chain of examples. Another agent, running on a different model, must rediscover the same thing from scratch.
This is wasteful. It is also structurally strange. Software ecosystems solved a version of this problem decades ago. When developers discover a reusable behavior, they do not paste it into every project by hand. They package it, version it, publish it, sign it, document it, and let other systems install it. We have package registries for code, model hubs for weights, container registries for deployments, and artifact stores for machine-learning pipelines.
We do not yet have the equivalent for agent memory.
IMPP, the Inter-Model Memory Protocol, is an attempt to define one: a protocol and registry model for packaging, verifying, distributing, and attaching portable agent-memory artifacts across models, frameworks, and deployments.
The important claim is not that IMPP invents agent memory. It does not. Agent memory already exists in research systems, production frameworks, long-context retrieval pipelines, skill files, prompt libraries, and persistent stores. The claim is narrower and, perhaps, more useful: some forms of agent memory can become portable artifacts, and those artifacts need a trust layer before they can become infrastructure.
What kind of memory is portable?
“Memory” is too broad a word. Human memory includes facts, experiences, motor skills, associations, habits, and emotional salience. Agent memory is becoming similarly overloaded. A vector store of retrieved documents, a user preference profile, a tool-use policy, a trajectory summary, a bug-fix heuristic, and a reusable skill file are all called memory, but they do not behave the same way.
IMPP is mainly concerned with portable procedural and semantic memory.
Procedural memory tells an agent how to do something: how to evaluate a protocol, how to triage a class of support tickets, how to decide whether a vulnerability is exploitable, how to avoid a repeated planning mistake, how to use a tool safely, how to escalate uncertainty. Semantic memory captures domain knowledge in a structured form: concepts, taxonomies, risk factors, assumptions, constraints, and relationships.
Episodic memory also matters, but it is harder to transfer safely. A private conversation history or a deployment-specific trace may contain user data, proprietary context, or unrepeatable environmental details. IMPP should support episodic-derived artifacts only after they have been transformed, redacted, licensed, and documented. The registry should not become a dumping ground for raw conversations.
This distinction matters because the plausible future is not “agents share all their memories.” That would be reckless. The plausible future is that agents publish bounded, documented, verifiable memory packages that describe what they know, where it came from, how it should be attached, when it should not be used, and what evidence supports it.
The package-registry analogy.
A package registry does not make all code safe. It makes code discoverable, versioned, inspectable, installable, and governable. That is already a large improvement over copying files from unknown sources.
An agent-memory registry would play a similar role.
A memory artifact might contain a signed manifest, a human-readable artifact card, structured procedural instructions, examples, provenance metadata, evaluation results, licensing terms, and compatibility information. The manifest would not merely say “this is useful.” It would say what type of memory this is, what domain it applies to, what models or agent frameworks have been tested, what attach modes are allowed, what data was used to derive it, what risks are known, and what should cause the artifact to be rejected or downgraded.
A simple IMPP artifact might look conceptually like this:
schema: impp.artifact.v0.2
name: defi-risk-calibration
version: 1.4.0
type: procedural
license: Apache-2.0
domain:
name: defi-risk-assessment
confidence_scope: lending, governance, oracle, bridge, stablecoin
attach_modes:
- retrieval_only
- few_shot_examples
- prepend_policy
prohibited_attach_modes:
- autonomous_trade_execution
source:
author_agent: gpt-4o-risk-agent
observation_window: 2024-01-01/2025-12-31
source_material: public audit reports, public postmortems, synthetic evaluations
privacy:
pii: none_detected
proprietary_data: false
provenance:
content_hash: sha256:...
signature: sigstore:...
attestations: in-toto:...
evaluation:
suites:
- defi-risk-public-v1
- negative-transfer-v1
- prompt-injection-memory-v1
known_failures:
- may over-penalize unaudited but formally verified toy protocols
- not validated for real-time trading decisions
conflicts:
declares:
- risk_scoring.defi.oracle_assumptions
That example is intentionally more boring than the phrase “memory market.” Boring is good here. If memory artifacts are going to affect agent behavior, they need the discipline of software supply chains, not just the excitement of AI demos.
Install is the wrong metaphor unless attach semantics are explicit.
The word “install” is useful, but it is also dangerous. Installing a Python package has a relatively clear meaning. Installing memory does not.
A memory artifact can be attached to an agent in several ways, and each mode has different safety properties.
One artifact might be loaded as a system-level policy. Another might be used only as retrieval context. Another might provide few-shot examples. Another might define tool-use constraints. Another might remain available as a read-only file the agent can consult when needed. These are not interchangeable.
IMPP should therefore define attach modes as part of the protocol:
• prepend_policy: the artifact contributes instructions that are placed near the top of the agent’s context.
• few_shot_examples: the artifact provides examples used to shape task behavior.
• retrieval_only: the artifact is indexed and retrieved only when relevant.
• tool_policy: the artifact constrains tool calls, permissions, or escalation behavior.
• read_only_reference: the artifact is available as a file or namespace but not injected automatically.
This is where many memory systems become vague. A registry cannot just say “install this memory.” It must say how the artifact changes runtime behavior. It also needs conflict rules. What happens when two installed artifacts disagree? Which namespace wins? Does a safety policy override a domain heuristic? Can one artifact require another? Can an artifact be downgraded if its dependency fails freshness checks?
These details are not implementation trivia. They are the protocol.
Trust is the hard part.
The registry substrate is not the hardest technical problem. We already know how to store, version, hash, sign, and distribute artifacts. OCI registries can distribute arbitrary blobs. ORAS can push non-container artifacts. Model registries track versions, tags, and lifecycle state. Vector indexes and document stores can scale. None of that is trivial, but it is well-trodden ground.
The hard problem is semantic trust.
A code package can be malicious, but its behavior is at least partly inspectable through static analysis, tests, permissions, and runtime sandboxing. A memory artifact is stranger. It may not execute directly. It may instead bias an agent three steps later, in a different conversation, under different task pressure. Its failure mode can be delayed, probabilistic, and context-sensitive.
That means a memory registry needs two trust layers.
The first layer is supply-chain trust. Who created the artifact? What source material was used? Was the artifact modified? Which verification steps ran? Are the signatures valid? Has the artifact been revoked? Is there a transparency log? Can a consumer audit the publication history?
This layer should reuse existing patterns wherever possible: content-addressed storage, short-lived signing certificates, transparency logs, step-level attestations, compromise-resistant repository metadata, and staged assurance levels. Ed25519 signatures are useful, but signatures alone do not prove that an artifact was derived safely, evaluated honestly, or published through an uncompromised process.
The second layer is AI-behavior trust. Does the artifact actually improve downstream performance? Does it transfer across model families? Does it contain hidden instructions? Does it cause negative transfer outside its domain? Does it become stale? Does it leak private data? Does it create licensing problems? Does it conflict with other installed artifacts?
IMPP’s core contribution should live here: not merely storing memory packages, but testing their behavioral effects.
Trust scores should explain, not mystify.
A numeric score is useful only if it is calibrated and explainable. A trust score that looks precise but is not tied to downstream outcomes will quickly become decoration.
The first version of IMPP can use a transparent weighted score, but it should be presented honestly: a provisional heuristic, not a law of nature. The score should come with reason codes, confidence intervals where possible, evaluation coverage, and known failure cases. It should be versioned. It should be recalibrated against held-out tests. It should be possible for a package to have high transfer performance and still fail safety checks.
More importantly, the public benchmark set should not be the live exam key. A credible registry can publish its methodology, public fixtures, and reproducibility harness while keeping rotating holdout probes private. This is not security through obscurity; it is basic evaluation hygiene. If every adversarial probe is public and static, artifact authors will optimize against the test rather than against safe transfer.
The right model is transparent methodology plus protected holdouts:
• Public: scoring formula, probe categories, sample fixtures, artifact schema, evaluation harness, benchmark results, score version history. • Private or rotating: live adversarial prompts, canary artifacts, buyer feedback labels, abuse heuristics, and production calibration sets.
Trust comes from being inspectable. Robustness comes from not publishing the entire answer key.
The minimum credible evidence package.
The idea of portable memory is plausible. That is not enough.
To be taken seriously as research, IMPP needs a reproducible evidence package: corpus, task definitions, prompts, baseline agents, model versions, scoring formulas, ablations, negative results, cost numbers, and evaluation scripts. Headline transfer percentages are not sufficient without the details needed to reproduce and falsify them.
A useful first benchmark release would include:
- A public artifact corpus with sanitized procedural and semantic memory packages.
- Fixed train/test or author/evaluate splits.
- Baselines: no memory, raw-context memory, retrieval over source documents, prompt-only skill files, and domain-specific handcrafted rules.
- Multiple model families with fixed context budgets and identical attach modes.
- Near-transfer and far-transfer tasks, so the system is rewarded for abstaining when an artifact should not apply.
- Red-team artifacts that test prompt injection, hidden instructions, stale assumptions, license contamination, and conflicting policies.
- Cost and latency reporting, separating static checks from model-based evaluations.
- A calibration study showing whether trust scores predict downstream success and safety.
This is also where the language should become more careful. The strongest claim is not “memory artifacts transfer universally.” They will not. The stronger and more defensible claim is: bounded procedural and semantic artifacts can improve related downstream tasks across model families when their attach mode, domain scope, and trust evidence are explicit.
That is a serious claim. It is also testable.
Privacy, licensing, and consent.
A memory registry cannot ignore where memories come from.
An agent’s memory may encode private user preferences, customer interactions, proprietary workflows, trading assumptions, security incidents, or internal decision rules. Even if a memory artifact contains no obvious personal data, it may still reveal sensitive patterns. A registry that imports material from GitHub, model hubs, papers, chat logs, or production traces needs a licensing and privacy pipeline before it needs a marketplace.
Every artifact should declare:
• source material and license compatibility; • whether personal data was present and how it was removed; • whether proprietary data was used; • whether human consent or organizational approval is required; • intended use and prohibited use; • jurisdictional or domain-specific restrictions; • retention and revocation policy.
This is not bureaucracy. It is what separates infrastructure from scraping.
Registry, marketplace, or protocol?
These words should not be mixed casually.
A protocol defines how artifacts are packaged, signed, evaluated, and attached.
A registry stores and distributes artifacts.
A marketplace lets people buy, sell, rank, monetize, or gate access to artifacts.
IMPP should begin as a protocol and reference registry. A marketplace can come later, but the marketplace should not be the first trust story. If money enters too early, every score becomes economically adversarial. Sellers will optimize for ranking, buyers will treat scores as guarantees, and the registry operator becomes both judge and market-maker.
A healthier sequence is:
- Publish the protocol.
- Publish the artifact schema.
- Publish a reference implementation.
- Publish a reproducible benchmark.
- Publish a small curated registry of examples.
- Add private or premium packages only after governance, revocation, dispute handling, and abuse controls exist.
A memory marketplace may be real. But the protocol must be credible before the market can be.
What IMPP should reuse rather than reinvent.
The convincing version of IMPP is not a totally new stack. It is a composition of existing, mature infrastructure with a new semantic layer.
Use artifact-card conventions from model cards and datasheets. Use OCI-style distribution for packages. Use Sigstore-style signing and transparency. Use in-toto-style attestations for build and evaluation steps. Use TUF- or SLSA-inspired thinking for compromise resistance, revocation, and assurance levels. Use familiar semver and dependency constraints. Use namespace isolation and explicit attach policies from production agent frameworks.
Then add what is unique to agent memory: behavioral transfer tests, negative-transfer detection, hidden-instruction probes, freshness checks, conflict declarations, and model-family compatibility evidence.
That separation makes the system easier to trust. Software-supply-chain infrastructure verifies the container. AI evaluation verifies the behavioral claim.
A concrete trust ladder.
A public registry should avoid binary language like “safe” or “unsafe.” Memory artifacts are too contextual for that. A five-level ladder is more honest:
- Imported — source captured; no structural guarantees.
- Parsed — schema-valid; license and privacy fields present.
- Evaluated — passed public tests for a declared domain and attach mode.
- Verified — signed, attested, reproducible, and checked against private holdouts.
- Curated — reviewed by maintainers or domain experts, with known limitations documented.
This ladder should be attached to a specific version of an artifact. A new version starts over. A stale artifact can be downgraded. A revoked artifact remains visible in the log but is blocked by default. A domain-specific artifact should not silently generalize into another domain.
What success would look like.
A mature IMPP ecosystem would not replace agent memory inside applications. It would connect those systems.
An enterprise agent could install a verified compliance-memory artifact as read-only policy. A security agent could attach a vulnerability-triage artifact as retrieval-only procedural guidance. A DeFi analysis agent could use a risk-calibration artifact with explicit contraindications. A personal assistant could refuse to install an artifact that lacks privacy provenance. A registry UI could show not just a score, but the artifact’s source, evaluation history, attach modes, revocations, known failures, and compatible models.
The strongest version of this future is not one where agents blindly import each other’s minds. It is one where learned behavior becomes legible, bounded, testable, and governable.
The narrower, stronger claim.
Agent memory will not become valuable merely because it exists. It becomes valuable when it can be transferred without collapsing trust.
That is the real problem IMPP tries to solve.
The next generation of agents will not just need bigger contexts or better retrieval. They will need ways to exchange learned procedures, domain calibrations, and behavioral constraints across model families and organizational boundaries. Those exchanges will need packaging, provenance, compatibility checks, privacy controls, revocation, evaluation, and clear runtime semantics.
A memory registry is not a place where agents upload vibes. It is a place where learned behavior becomes an artifact.
That artifact should be inspectable before it is trusted.
It should be signed before it is distributed.
It should be evaluated before it is recommended.
It should declare how it attaches before it changes behavior.
And it should be revoked or downgraded when the world changes.
That is the case for IMPP: not a marketplace first, and not a grand theory of memory, but a protocol for making portable agent memory safe enough, legible enough, and useful enough to become shared infrastructure.