CampaignForge AI - The Journey
Chapter 11: The System That Learns
Date: 2026-05-09 Vertical: B2B SaaS | Budget: $500/month
Where We Left Off
Chapter 10 was about the homepage. The external audit found the same categories of problems Agent 00 is built to catch — weak CTA, no trust signals, no ROI framing. We fixed them, wired up the Formspree waitlist form, and noted the irony of building a site auditor while running a site that needed auditing.
The pipeline was declared ready for real Meta Ads data. The diagnostic brain exists. The rework loop exists. Every piece of the system that processes real performance results is operational. What was missing was what happens between campaigns: does the second campaign start smarter than the first?
Until Chapter 11, the answer was no. Every campaign started cold.
The Cold Start Problem
The full agent chain — strategy, creative, launch, performance diagnosis — produces a structured output at the end of every run. ROAS achieved. Diagnosis category. What worked in the creative. What failed in the targeting. Which hook drove clicks and which one didn't.
All of that was being written to the audit log and then ignored by the next campaign.
This is the exact weakness Meta and Google's native tools share: they have memory within an account, but no memory across accounts, across verticals, or across campaigns run by different businesses. Every campaign manager starts cold. Every agency onboarding a new client starts cold. The institutional knowledge lives in spreadsheets and the heads of people who might leave.
The architecture of CampaignForge has always implied a solution to this: if every campaign outcome is stored as a structured document, future campaigns can retrieve the most similar past outcomes and use them before the Strategist writes a single targeting parameter.
Chapter 11 is where that implication became a working system.
What Got Built
The implementation is called the Performance Memory RAG. It has three moving parts that form a loop.
Part 1: The Store
A new SQLite table called campaign_performance_memory. Every completed campaign run writes one record to it. The record captures everything that mattered about the outcome: vertical, budget band, platform, creative type, ROAS tier, final ROAS/CAC/CTR/conversion rate, the primary diagnosis from Agent 06, what the winning creative elements were, what failed, and the full diagnostic summary and recommendations.
It also stores a vector embedding of the record — a mathematical fingerprint that makes similarity search possible. The embedding is deterministic, built from token frequencies in the campaign text, and stored as JSON in SQLite. No Chroma, no external dependency, no cloud service required. Phase 1 stays fully local. Chroma or sqlite-vec can replace the vector layer in Phase 2 with no change to agent contracts.
Each record is tagged with a source flag: is_real_performance_data. This is the most important field. It distinguishes between outcomes from real campaigns with real ad spend and outcomes from simulated or local-artifact runs. The system treats them differently at retrieval time — real data ranks higher, and agents are instructed not to cite simulated examples as verified proof.
Part 2: The Retrieval Node
A new graph node called performance_rag_retriever sits between GATE-4 (spend limit approval) and the Strategist. Before the Strategist writes a single targeting recommendation, the retriever runs a query against the store using the current campaign's vertical, budget band, platform, and brief signals.
It ranks results by vector similarity, then applies metadata boosts: matching vertical gets a boost, matching budget band gets a boost, high or medium ROAS tier gets a boost, real performance data over simulated gets a boost. The top results are assembled into a performance_rag_context block and injected into state.
The same retriever also runs when GATE-7 rework is approved. Before the Strategist or Creative agent begins rework, the retriever re-queries with the updated context — because the diagnosis has changed and different past campaigns are now the most relevant comparison.
Part 3: The Feedback Write
After Agent 06 completes its performance diagnosis, it writes the outcome back to the store before the pipeline continues. This is the closing of the loop. The campaign that just finished becomes a document that the next campaign can retrieve.
Store failures are non-blocking. If the write fails, the audit log records it and the pipeline continues. No campaign run should halt because the memory store had a transient error.
How Strategist and Creative Use It
When the retriever finds relevant past examples, Strategist and Creative don't just receive raw data. The retrieved context is translated into actionable framing.
The Strategist sees prior examples as evidence for KPI targets. If three similar campaigns in the same vertical at the same budget band achieved ROAS between 2.1 and 2.8, that's the target range — not a generic benchmark from an industry report. When real examples exist, the Strategist also generates an additional A/B variant that is explicitly memory-informed: a targeting and positioning approach shaped by what worked in prior similar campaigns.
The Creative agent sees the memory context as a constraint on angles to test. If prior winning creatives in the vertical used pain-point hooks, the memory-informed creative variant leads with pain. If prior failures used feature-listing copy, it avoids that. The creative output labels which variant is memory-informed so the operator can see exactly what the system learned from.
Both agents are instructed to keep simulated examples clearly unverified in their reasoning. The system can acknowledge that a local-artifact run suggests a direction without claiming it as proof.
The Cold Start Problem, Revisited
The memory store compounds with every real campaign. After ten campaigns it is useful. After a hundred it is a competitive asset. But on the first campaign, it is empty — and retrieval returns nothing.
The cold start is solved by seeding the store with structured documents derived from public industry benchmarks before the first campaign runs.
The strongest sources right now:
Benly.ai Q1 2026 benchmarks. They analyzed 1.1 million Meta ads across 17 industries. It breaks down which hooks perform longest — Pain Point hooks tend to outperform Visual Intrigue at the 90-day mark, while Visual Intrigue is more common. Video assets in B2B and Finance survive significantly longer than static. The median creative lifespan is 22 days overall, much longer in Pets, Automotive, and Finance. These patterns are directly translatable into seeded documents.
AdAmigo and Triple Whale reports. These give concrete outcome numbers by vertical: Fitness at 14.29% conversion rate, Education at 13.58%, SaaS lower but with higher contract values. ROAS ranges by vertical, CPA benchmarks, what Advantage+ typically adds versus manual campaigns. These complement Benly by adding outcome-focused data rather than just creative patterns.
Meta Ad Library. Long-running ads are by definition winners — an ad that has been active for 90 days on Meta has survived creative fatigue testing at real spend levels. The Library lets you search by industry keyword and see which hooks, formats, and angles dominate. Reels versus Feed. Founder-voice UGC versus produced video. These patterns seed the creative memory with real competitive intelligence.
Google Performance Max case studies. Fewer but valuable: documented ROAS lifts by vertical, what audience signals and creative feeds produced the gains. These give the store labelled success examples for Google campaigns.
The seeding workflow: pick 5–7 target verticals, create 3–5 structured documents per vertical from the sources above, tag each with is_real_performance_data: false and the source name, embed them with the deterministic local vector. The first campaign in any priority vertical retrieves relevant industry intelligence rather than starting completely blind.
The source flag is not cosmetic. Agents are instructed to treat seeded documents as strong priors — useful for direction — but not as verified proof that the approach will work in this specific account at this specific budget. The system will tell you what the industry tends to do. It cannot tell you what your specific creative and audience will do until you run it. That distinction is kept explicit in every agent output that references a seeded document.
What This Means for the Competitive Position
CampaignForge AI is not a dashboard. It does not manage campaigns you have already launched. It creates them from scratch, with human approval at every gate, and improves with every outcome.
The platform we identified as holding the campaignforge.ai domain describes a three-step process: connect your accounts, their AI analyzes performance, you implement the recommendations. Their step three is "Implement AI recommendations and scale your campaigns with confidence."
The operator implements. The AI advises.
In CampaignForge, the system builds and launches the campaign. The operator approves. What the Performance Memory RAG adds is that the system now does both with the accumulated knowledge of every campaign that came before it — including seeded knowledge from the 1.1 million Meta ads analyzed in Benly's Q1 2026 benchmarks.
The flywheel is the moat. A smarter base model makes each agent better. It cannot replicate 500 real campaign outcomes with verified ROAS numbers, each one feeding the next brief. That dataset belongs to whoever built and runs the system.
The Honest State After Chapter 11
What got built:
campaign_performance_memorySQLite table with full outcome schema including source flag, embedding, winning elements, failure reasons, and diagnostic summaryperformance_rag_retrievernode: runs after GATE-4 and before Strategist, re-runs on approved GATE-7 rework, outputsperformance_rag_context- Retrieval ranking: vector similarity + metadata boosts for vertical match, budget band match, ROAS tier, real vs simulated source
- Strategist: memory-informed KPI targets and A/B variant when examples exist
- Creative: memory-informed ad variant, clearly labelled, no leaking of private campaign details
- Agent 06: upserts performance report to memory after every completed diagnosis, non-blocking on failure
- Status page: shows retrieved performance memory context
- PRD: updated with RAG architecture, self-improvement loop, cold start seeding strategy, and new self-improvement KPIs
- Cold start seeding strategy documented: Benly.ai, AdAmigo, Triple Whale, Meta Ad Library, Google PMax studies
- Test suite: 130 focused, 275 full, 5 RAG-only — all passing
What did not change:
- Local-first operation: no new external dependencies, no Chroma, no cloud vector store
- The phase 2 swap path:
performance_rag.pyis structured so the vector layer is replaceable without touching agent contracts - All 7 human approval gates: the retriever adds context before strategy; it does not modify the approval gate sequence
- The real vs simulated data boundary: existing test runs are tagged
is_real_performance_data: falseand ranked below real outcomes
What is not yet done:
- The store is not seeded. The documents from Benly.ai, AdAmigo, Triple Whale, and the Meta Ad Library need to be structured and embedded. The first campaign will retrieve nothing until seeding is completed.
- The Meta Ads API connection is still the missing piece. The retriever is ready. The feedback write is ready. The loop has no real data to close until a live campaign runs.
What Comes Next
Two things in parallel.
The seeding work can happen now, without a live campaign. Five to seven verticals, three to five documents each, built from the public benchmark sources. This is the difference between the first real campaign starting cold and starting with 17 industries of creative intelligence already in memory.
The Meta Ads API connection is the other unlock. The full pipeline — brief intake through performance diagnosis and memory write — is ready for real data. The retriever, the feedback loop, and the memory store are all operational. What produces the data is a live campaign.
Both can move at the same time. Seeding gives the system something to retrieve. A real campaign gives the system its first real outcome to store. After that run, the flywheel has real data. The one after that retrieves it.
That is what the system was built to do.