Agentic PKM for Memory Loop: Design Options

The Core Insight

The original research concluded that assistants + smart automation are likely optimal for PKM, with monitoring-only agency as the prudent path. Our discussion refined this further:

The Einstein Test for Notes: If an agent can’t parse what you meant, maybe you haven’t actually worked it out yet. The agent’s confusion becomes a proxy signal for your own potential confusion—not quality control, but a thinking prompt.

This reframes the agentic PKM problem from “autonomous knowledge management” to “asynchronous rubber duck debugging.”

Current Memory Loop Data Flow

Mode	Function	Output
Capture	Quick one-liners	→ Daily note (ordered)
Meetings	Quick capture + immediate Think session	→ Meeting file + expanded notes
Think	AI chat with vault context	→ Creates/modifies files
Adjust	Manual editing	→ Direct file changes

All four modes represent potential trigger points for an agentic monitoring system.

Proposed Agent Architecture

Monitoring-Only Agent with Tiered Outputs

File Change Event (any mode)
         ↓
    [Watcher Service]
         ↓
    [Analysis Agent]
         ↓
    ┌─────────┼─────────┐
    │         │         │
Connections  Contradictions  Confusion
    │         │         │
    ↓         ↓         ↓
  Inline     Inline    Think Prompt
  addition   addition  (invitation to
                       elaborate)

Three Output Types

1. Connections (High confidence, additive)

“This relates to [other note] where you discussed X”
Output: Inline callout or backlink
No human approval needed—purely additive

2. Contradictions (High confidence, additive)

“This seems to conflict with [other note] where you said Y”
Output: Inline callout linking both notes
No modification of original content

3. Confusion Prompts (Low confidence, interactive)

“I’m not sure I follow—when you wrote ‘[text]’, what were you thinking about?”
Output: Invitation to a Think session
The agent’s confusion is the nudge

Key Design Principle

The agent never modifies your original content. It only:

Appends observations (connections/contradictions)
Invites elaboration (confusion)

This preserves the emergent, bottom-up nature of PKM while adding a “thinking partner” that notices things you might miss.

The Context Problem

Your vault: ~400k tokens (exceeds 200k context window) Daily generation: 5-7k tokens Growth pattern: Bounded via regular synthesis

Recommended: Two-Tier Context Strategy

Tier 1: Claims Index (~20-40k tokens) A compressed, running document of key assertions extracted from notes:

Stated beliefs (“I think X works better than Y”)
Factual claims (“Project launched in March”)
Commitments/intentions (“I’m going to do X”)

Updated when you synthesize. Checked against every new note for contradictions.

Tier 2: RAG for Deep Dives When the claims index flags something interesting:

Embed the new note
Retrieve semantically similar source notes
Full context comparison for detailed analysis

Why Two Tiers?

Embeddings find similarity, not contradiction. “I love X” and “I hate X” are semantically close. The claims index provides structured assertions that make contradiction detection reliable. RAG then provides the full context when needed.

Confusion Detection: No Index Needed

The “do I understand this?” check runs on the note in isolation. The agent assesses: “Could I explain what this person meant to a third party?” This is cheap—single note, no retrieval required.

Cost Analysis

Per-Operation Estimates (Claude Sonnet)

Operation	Context Size	Cost
Embedding	~negligible	~$0.0001
Confusion check (isolated)	~2k tokens	~$0.005
Claims index check	~30k tokens	~$0.05
Deep RAG comparison	~50k tokens	~$0.10

Daily Estimates (5-7k tokens generated)

Confusion checks (all notes):     ~$0.05-0.10
Claims index checks (all notes):  ~$0.10-0.15  
Deep dives (20% trigger rate):    ~$0.03-0.05
                                  ─────────────
Daily total:                      ~$0.18-0.30
Monthly total:                    ~$5-9

Cost Optimization Levers

Batch processing — Analyze daily notes at end of day instead of real-time
Threshold gating — Skip notes below N words
Haiku triage — Use Haiku to decide “worth deeper analysis?” before Sonnet
Separate processes — Run confusion checks (cheap) more frequently than contradiction checks (expensive)

Implementation Considerations

Trigger Timing Options

Approach	Pros	Cons
Real-time (on save)	Immediate feedback	Higher cost, potential interruption
End of day batch	Cost efficient, non-intrusive	Delayed feedback
On-demand	User controls when	Loses “monitoring” benefit

Recommendation: Confusion checks could run real-time (cheap, valuable immediate feedback). Contradiction/connection checks batch at end of day.

Claims Index Maintenance

The claims index is the key design problem. Options:

Auto-extracted — Agent extracts claims from every note automatically
- Pro: Complete coverage
- Con: Extraction errors compound, cost
Synthesis-triggered — Index updates only when you synthesize
- Pro: Aligns with your existing workflow
- Con: New claims not indexed until next synthesis
Hybrid — Light extraction on capture, full reconciliation on synthesis
- Pro: Balance of coverage and accuracy
- Con: More complex

Output Surfaces

Where do agent observations appear?

Output	Possible Surfaces
Connections	Inline callout in triggering note, backlinks on both notes
Contradictions	Inline callout with links to conflicting notes
Confusion prompts	Queue in Ground mode, notification badge, inline prompt

Key question: Do confusion prompts interrupt you, or wait for you to check them?

Open Questions

What counts as a “claim” for the index? Facts only? Opinions? Emotional states? Goals?
How do you handle intentional evolution? “I used to think X, now I think Y” isn’t a contradiction—it’s growth. How does the agent distinguish?
What’s the threshold for “confusing”? Every terse capture will seem unclear without context. Do you want the agent asking about everything, or only notes that seem substantive but unclear?
Where does the claims index live? A hidden system file? A visible note you can review/edit? Part of a synthesis workflow?
What happens when you dismiss a confusion prompt? Does the agent learn? Does it just log and move on?

Minimum Viable Implementation

If you wanted to test this pattern with minimal infrastructure:

Phase 1: Confusion Detection Only

File watcher on vault
On new note: send to Claude with prompt “Could you explain what this person meant to a third party? If not, ask a clarifying question.”
Store responses in a daily “Agent Questions” file
Review when you want a thinking prompt

Cost: ~$1-2/month Infrastructure: File watcher + single API call per note Value test: Do the questions actually surface unfinished thinking?

Phase 2: Add Contradiction Detection

Build claims index (start manually, then automate extraction)
Check new notes against index
Surface contradictions inline

Phase 3: Connection Detection

Embed vault
RAG for semantically similar notes
Surface non-obvious connections

Summary

The viable path for agentic PKM in Memory Loop:

Confusion detection — Cheap, runs in isolation, directly addresses the “do I understand this?” problem. Outputs thinking prompts, not judgments.
Contradiction detection — Requires a claims index (20-40k tokens). Checks new content against established assertions. Outputs inline additions, not modifications.
Connection detection — Standard RAG pattern. Useful but lower priority than the above.

Total estimated cost: $5-9/month at current vault size and generation rate.

Key insight: This isn’t really “agency” in the technical sense. It’s monitoring + prompting. The agent watches for signals (confusion, contradiction, connection) and surfaces them for human action. All authority stays with you. That’s probably right for a thinking tool.