Why we rebuilt the blog engine on Spark (and what we learned)

On April 26, we discovered five near-identical "HubSpot SEO Coach" drafts sitting in the same agent run. No dedup. No safeguards. Just five versions of the same post, each one slightly different, none of them good.

That was the moment we knew the old blog system had to go.

The problems were real and layered. Image generation was broken as a baseline, not occasionally but systematically. We shipped posts with missing visuals, off-brand mockups, and images that had nothing to do with the content. On top of that, the agent had zero semantic deduplication, so it could spin up the same idea multiple times in a single run and we wouldn't catch it until someone manually reviewed the drafts folder.

But there was a bigger issue underneath the bugs: the blog wasn't aligned with what Media Garcia actually is anymore.

We'd been running a generic HubSpot partner playbook: pillar content rotations, SEO templates, evergreen how-tos. Useful, maybe, but not us. What we actually do is build AI agents and solve real RevOps problems for clients. We fail, we iterate, we ship again. That's the story worth telling. We decided the blog needed to become a true lab notebook: posts grounded in what we shipped that week, written in the voice of engineers talking to peers, not marketers writing to an imaginary broad audience.

That meant rearchitecting the entire pipeline.

The new stack

We rebuilt the content engine on our own infrastructure with a four-tier source architecture. The primary feed comes from our memory layer's observations tagged as feature, discovery, or decision, anything that represents actual work we did. Those get scored by a simple formula: fact density times narrative length times recency. That's Tier A.

Tier B pulls from closed tasks in our tracker (title only, to avoid leaking anything sensitive). Tier C uses git commit logs as validation evidence for shipped claims. Tier D is the fallback: the old pillar rotation, but only when Tiers A-C don't yield anything worth writing about.

We also built a deduplication ledger backed by a vector store that tracks every published post as an embedding. When the drafter finishes a new post, it runs a cosine-similarity check over the last 90 days. If the score is above 0.85, the post gets blocked. There's also a hard cooldown: if a source artifact has already been used, it can't be used again. That solves the repetition problem at the source level.

The drafter runs on a local open-weights model, with a hosted model as fallback. Every draft is structured: title, slug, dek, markdown body with inline source links, image type, and an array of citation facts. Every claim must trace back to the source artifact. No invented details. No "we believe" without evidence.

Voice and review

We knew garbage output was possible, so we built an adversarial review loop. Each draft goes through explicit rubric checks: Does it match our dictation samples in voice and rhythm? Does it contain bot-talk ("delve," "leverage," "in conclusion," "seamlessly")? Are all factual claims cited? Does it fit the lab-notebook brand? Is it AEO-friendly (specific numbers, dates, tool names)?

If it fails, the reviewer sends back structured critique, and the drafter gets up to three revision iterations to fix it. We store the critique trail as draft metadata so we can learn what actually works.

Images: a pragmatic approach

The old image generator was too broken to save, so we deleted it. For screenshots of shipped work, we now use manual placeholder markup until a redaction pipeline is running, so a human attaches the actual screenshot after a privacy review. For diagrams and quote cards, we pull from brand templates via a design integration.

Deployment and migration

The new engine runs on our own infrastructure, fired daily by a scheduled job. We reused the existing Python environment and lockfile concurrency patterns from the old setup, so no surprises around dependency management.

Every draft lands in our CMS with DRAFT status and a lab-notebook tag. Nothing auto-publishes. We ran the old agent and the new one in parallel for a seven-day stability window before killing the legacy code.

Once the drafter publishes a post, it writes a memory observation about itself: what it tried, what the source was, how the review loop performed. That closes the loop: the engine documents its own output, so we have a record of why each post exists.

What's next

Phase 1 is the blog spine, what we just shipped. Phase 2 adds automated screenshot redaction (OCR-based) to mask logos, emails, and customer names. Phase 3 atomizes blog posts into social threads, weekly newsletter recaps, and sales talk-track snippets. Phase 4 wires up distribution automation and feedback loops so we can track how posts perform and retune the prompts.

The core insight driving all of this: if you want to stop shipping repetitive, generic content, you have to make repetition and genericism impossible by design. A dedup ledger stops you from writing the same post twice. A citation requirement stops you from inventing facts. A voice rubric stops you from sounding like an AI. An adversarial review loop catches what the rules miss.

We're not there yet. But we're shipping the engine that will get us there.

Why we rebuilt the blog engine on Spark (and what we learned)

The new stack

Voice and review

Images: a pragmatic approach

Deployment and migration

What's next

NEXT UP

We Almost Self-Hosted Our Browser Fleet to Cut Costs 25x. Here's Why We Didn't (Yet)

Why we deployed 2 concurrent slots instead of 3

What You Don’t Realize Is Important When Working With A Web Developer