Future

Payload CMS Security Best Practices: Top 10 Threats & Mitigation Strategies in 2026

Michał Miler — Tue, 21 Apr 2026 08:20:00 +0000

Payload CMS is a powerful, developer-first headless CMS built on Node.js and TypeScript. It gives you complete control over authentication, access control, and API behavior - but with that flexibility comes responsibility for implementing robust security measures and following OWASP security best practices.

Security misconfigurations remain one of the leading causes of data breaches in modern web applications. According to IBM's Cost of a Data Breach Report, thousands of CMS-powered websites and APIs are compromised every year due to preventable issues like weak authentication, improper access control, and exposed admin panels.

From our experience working on production SaaS applications, eCommerce platforms, and multi-tenant systems at u11d, over 80% of Payload CMS projects lack proper implementation of critical security controls aligned with OWASP Top 10 risks - especially around authentication, authorization, API exposure, and infrastructure hardening.

In this comprehensive guide, we'll cover the most common Payload CMS security threats and practical, production-tested mitigation strategies you should implement to avoid costly vulnerabilities, data leaks, and security incidents.

Who This Guide is For:

Payload CMS developers building production applications and APIs
DevOps engineers securing Payload deployments on AWS, DigitalOcean, Vercel
Project managers and product owners overseeing headless CMS implementations
Security auditors reviewing Payload CMS implementations for compliance
Technical leads architecting secure headless CMS solutions with Next.js

What You'll Learn:

Critical security threats specific to Payload CMS (with OWASP mapping)
OWASP Top 10 aligned mitigation strategies for headless CMS
Production-ready implementation examples with TypeScript
Complete security checklist for production deployment
Infrastructure hardening techniques for Node.js applications
Real-world security incidents and lessons learned

I. Admin Account Compromise (Critical Priority)

The Security Risk

Admin accounts are the highest-value target in any CMS. In Payload CMS, administrators typically have unrestricted access to:

All content and collections
User management and permissions
System configuration
API access controls
Database operations

Attack Impact

If compromised, attackers can:

Modify or deface content
Inject malicious scripts (XSS)
Manipulate pricing or product data
Access sensitive user information
Delete or corrupt critical data

In real-world incidents, compromised admin access often leads to full platform takeover within minutes - especially in systems without audit logging or alerts.

The Solution: Multi-Layered Admin Protection

1. Enforce Modern Password Policies (NIST-Compliant)

Modern password policies prioritize length and uniqueness over complexity rules (NIST SP 800-63B). Best practices include:

Minimum 15+ characters (passphrases preferred over complexity)
Prevent password reuse (store hash history)
Block common and breached passwords (Have I Been Pwned API)
Encourage password managers
Avoid forced periodic password expiration (outdated practice)

Why This Matters:

Short, complex passwords (e.g., P@ssw0rd123) are far weaker than long passphrases (e.g., correct-horse-battery-staple-2025).

2. Enable Multi-Factor Authentication (MFA/2FA)

Critical: Payload CMS does not enforce 2FA by default for admin users. You must explicitly add this protection layer.

Recommended Solutions for Payload CMS:

Option A: TOTP-Based

payloadcms-tfa - Community plugin for Time-based OTP
payload-totp - Alternative TOTP implementation
Supports authenticator apps (Google Authenticator, Authy, 1Password)

Option B: Custom OTP Implementation

Email-based one-time codes
SMS-based codes (requires Twilio/similar)
Hardware tokens (YubiKey, FIDO2)

Option C: External Auth Providers

Auth.js (NextAuth) with 2FA providers
Keycloak with MFA policies
Zitadel with passkey support

Production Requirement

For production and SaaS systems, MFA for all admin users should be mandatory, not optional.

3. Enforce HTTPS Everywhere (TLS/SSL)

Never expose Payload admin panels over HTTP. This is a critical vulnerability that exposes:

Admin credentials during login
Session cookies
API tokens
All transmitted data

Recommended TLS Configuration:

TLS 1.3 preferred (TLS 1.2 minimum)
Strong cipher suites only
HSTS header with preload
Redirect all HTTP → HTTPS
Secure cookie flags (secure, httpOnly, sameSite)

Summary: Admin security is your first line of defense - weak authentication here leads to total system compromise.

II. Weak Authentication Strategy

The Risk

Payload provides flexible authentication, but that flexibility often leads to insecure defaults in real projects.

Common issues include:

Long-lived JWT tokens
Tokens stored in localStorage
No refresh token rotation
Mixing admin and public authentication flows

These mistakes significantly increase the risk of session hijacking and token theft.

The Solution

Secure Token Handling

Use short-lived access tokens
Implement refresh token rotation
Store tokens in HTTP-only cookies
Avoid localStorage for sensitive tokens

Consider External Identity Providers

For more advanced or scalable setups, integrate external auth systems:

Auth.js (NextAuth)
Better-Auth
Keycloak
Zitadel

These solutions provide:

OAuth & social login
Enterprise SSO
Centralized identity management
Advanced session control

Summary: A well-designed authentication layer reduces your attack surface and improves scalability.

III. Missing Access Control Rules

The Risk

Payload’s access control system is powerful - but optional. Many teams either skip it or implement overly permissive rules.

This can lead to:

Unauthorized data access
Privilege escalation
Exposure of sensitive fields via API

In many breaches, improper authorization - not authentication - is the root cause.

The Solution

Define Explicit Access Rules

Always define:

read
create
update
delete

For every collection.

Best practices:

Public content → read-only for anonymous users
Admin content → role-based restrictions
User data → owner-only access

Never rely on frontend restrictions — enforce everything server-side.

Summary: Authorization must be explicit and restrictive by default.

IV. Public API Exposure

The Risk

Payload automatically exposes REST and optionally GraphQL APIs, which can unintentionally leak data if not configured correctly.

Common risks:

Public access to internal collections
Exposure of sensitive fields
Endpoint enumeration and brute-force attacks

Attackers often scan APIs first - not your frontend.

The Solution

Limit API Surface

Disable GraphQL if unused
Restrict public endpoints
Use API gateways or reverse proxies

Protect Sensitive Fields

hidden: true
access: { read: () => false }

Add Rate Limiting

Implement at infrastructure level:

Cloudflare
AWS API Gateway
Reverse proxy throttling

Payload does not provide built-in rate limiting.

Summary: Reduce what is exposed - every public endpoint is a potential attack vector.

V. No Audit Logging

The Risk

Without audit logs, security incidents become invisible.

You won’t know:

Who changed what
When it happened
Whether malicious activity occurred

This makes incident response and compliance extremely difficult.

The Solution

Enable Versioning

Use Payload’s versioning for:

Pages
Products
Critical content

Centralize Logging

Track:

Login attempts
Failed logins
Content changes
Permission updates

Send logs to:

CloudWatch
Datadog
ELK stack

Summary: If you can’t see it, you can’t secure it.

VI. Database Security Misconfiguration

The Risk

Payload typically uses MongoDB or PostgreSQL. Misconfigured databases are a frequent source of major data breaches.

Risks include:

Public database exposure
Weak credentials
Lack of encryption
Lateral movement within infrastructure

The Solution

Never expose databases publicly
Use private VPC networking
Rotate credentials regularly
Use IAM-based authentication where possible
Encrypt data at rest and in transit

Summary: Infrastructure security is just as important as application security.

VII. Missing Content Validation (XSS Risk)

The Risk

Allowing rich text or HTML input without sanitization opens the door to stored XSS attacks.

Attackers can inject scripts that execute in:

Admin panel
Frontend applications
Other users’ browsers

The Solution

Sanitize HTML inputs
Use strict schema validation
Limit custom HTML fields
Escape output in frontend

Never trust user-generated content - even from “trusted” users.

Summary: Input validation is essential to prevent client-side attacks.

Final Thoughts: Security is a Feature, Not an Afterthought in Payload CMS

Payload CMS gives developers exceptional flexibility and control over authentication, authorization, and data access - but security must be explicitly designed and implemented from day one, not bolted on later.

Unlike managed SaaS CMS platforms (Contentful, Sanity, Hygraph), Payload assumes you understand authentication mechanisms, authorization patterns, and infrastructure security. That's powerful and flexible - but also a common source of critical vulnerabilities in production deployments.

Key Takeaways:

Payload CMS requires explicit security configuration - No secure-by-default settings
80% of projects have preventable security gaps - Based on real-world security audits
OWASP Top 10 alignment is critical - Authentication, access control, API security
Infrastructure security matters as much as application security - Database, network, TLS configuration
Security is continuous, not one-time - Regular audits, dependency updates, monitoring
Security impacts performance and UX - See our localization guide for secure field components
Secure scaling is possible - Our Connect211 case study shows 50+ domains secured

If you're running Payload CMS in production — especially for:

eCommerce platforms with payment processing
SaaS applications with sensitive user data
Fintech solutions requiring PCI-DSS compliance
Healthcare systems needing HIPAA compliance
Mobile app backends with millions of users
Multi-tenant platforms isolating customer data

Treat security as a first-class feature from the start, not a checkbox before launch.

Additional Resources

OWASP Top 10 - Web application security risks (updated 2021)
Payload CMS Authentication Documentation - Official authentication guide
NIST Password Guidelines - Modern password policy standards (SP 800-63B)
CIS Benchmarks - Infrastructure hardening guides for Linux, Docker, databases
Have I Been Pwned API - Password breach detection service
Payload Discord Community - Security discussions with Payload experts
Snyk Vulnerability Database - Node.js package vulnerabilities

Need Payload CMS Experts?

u11d specializes in Payload CMS development, migration, and deployment. We help you build secure, scalable Payload projects, migrate from legacy CMS platforms, and optimize your admin, API, and infrastructure for production. Get expert support for custom features, localization, and high-performance deployments.

Talk to Payload Experts

AI-Powered Cybersecurity Platform That Detects, Analyzes, and Responds to Attacks Automatically on a Kubernetes Cluster

Alessio Marinelli — Tue, 21 Apr 2026 08:19:22 +0000

From a Snort alert to a blocked IP in under 60 seconds. No cloud. No vendor lock-in. Full human control Validated on NVIDIA DGX Spark.

There are plenty of tools that help you run a pentest. You launch nmap, feed the output to an LLM, get some suggestions. Useful — but fundamentally reactive. You still need a human in front of a terminal to make anything happen.

I wanted something different. I wanted a system that watches your infrastructure continuously, understands what it sees, decides what to do, and acts — while still keeping a human in the loop for every critical decision.

After months of work, that system exists. I call it AI-Pentest Suite.

The Problem with Existing Tools
Most AI security tools today fall into one of two categories.

The first is the AI assistant model — CLI tools where you give a target, recon tools run, the LLM analyzes the output, and you get a report. Genuinely useful for a security analyst doing manual assessments. But they are fundamentally CLI wrappers with an LLM on top. They don’t watch anything. They don’t respond to anything. They wait for you to ask.

The second is the enterprise SIEM/XDR model — powerful platforms that require dedicated teams to operate, whose AI is a black box you cannot inspect, modify, or run offline.

Neither category solved my problem: an automated, event-driven, AI-powered security pipeline that runs on your own infrastructure, uses a local LLM so your data never leaves your premises, and keeps humans in control of every irreversible action.

What I Built

AI-Pentest Suite is a cloud-native security platform that runs on Kubernetes — including virtual machines. It combines three layers:

Detection — Snort3 IDS runs as a DaemonSet on every node of the cluster, monitoring network traffic in real time. A PyTorch autoencoder pre-filters anomalies before they even reach the AI layer, cutting noise and false positives.

Analysis — When Snort generates an alert, it flows through Kafka into an AI pipeline running on Apache OpenServerless. A local Mistral LLM analyzes the alert in context, assigns a threat score from 0 to 100, categorizes the attack type, correlates it with the MITRE ATT&CK framework via a RAG knowledge base of 1,290 documents, and recommends an action. The platform has been tested and is fully operational on NVIDIA DGX Spark — enterprise-class GPU hardware that brings AI inference to millisecond latency even under heavy load. This is not a proof of concept running on a laptop: it is a pipeline validated on real GPU hardware.

Response — A policy engine checks the IP’s history in Redis, determines severity and recidivism, and routes to a human approval step. The operator has 30 seconds to approve or modify the recommended action. If no response comes, the system auto-decides. A firewall agent running on each node executes the iptables block. Everything is logged to PostgreSQL for audit.

The entire cycle — from alert to blocked IP — takes under 60 seconds.

The Architecture That Makes It Different
The platform runs on Kubernetes, which means it works on bare metal, VMs, or cloud IaaS. You don’t need dedicated hardware to get started.

The AI pipeline is built on Apache OpenServerless — an open-source serverless platform based on Apache OpenWhisk. This means the analysis functions scale automatically with load. When your infrastructure is quiet, they consume zero resources. When you are under a port scan or brute force attack, they spin up in parallel.

The scanning layer — Nuclei with 9,000+ templates and Metasploit integration — runs as Kubernetes workloads too, triggered on demand or scheduled. A full pentest pipeline from recon to exploit verification to PDF report can run end-to-end without a human touching a keyboard.

The LLM runs entirely on local hardware. The platform has been tested and validated on the NVIDIA DGX Spark, NVIDIA’s personal AI supercomputer based on the Blackwell architecture. No data is sent to external services. Your network traffic, your alerts, your findings — they stay in your environment.

Human-in-the-Loop, by Design
The most important architectural decision I made was making human approval mandatory for every high-impact action.

The system can recommend blocking an IP. It can recommend running an exploit. It will not do either without explicit operator approval. This is not a safety limitation — it is a feature. In security, a false positive that blocks legitimate traffic can be as damaging as the attack itself. The AI is fast and accurate. The human is accountable.

This principle — the system recommends, the operator decides — runs through every layer of the architecture.

What It Actually Looks Like
When an attack hits, the operator sees something like this in the pipeline output:

{

"src_ip": "10.x.x.x",

"attack_category": "reconnaissance",

"threat_score": 85,

"confidence": 0.93,

"recommended_action": "block_ip",

"reason": "Systematic port scan across 1000 ports, SYN flood pattern, repeat offender",

"audit_id": "a3be821f"

}

That output is the result of a real scan hitting the cluster, Snort catching it, the autoencoder filtering it, Mistral analyzing it, the policy engine checking Redis history, and the firewall agent executing the block. No human typed a command. The analyst approved the block in the human-loop step and the rest was automatic.

What Is Coming Next

The platform is actively developed. The next phases include Nuclei scanning as a distributed Kubernetes workload, full CVE correlation integrated into the detection pipeline, Metasploit execution via a dedicated cluster deployment, and a unified pentest orchestration pipeline that goes from recon to exploitation to PDF report in a single command.

The longer-term goal is to bring RAG-powered AI analysis to every component of the pipeline — not just anomaly detection, but CVE lookup, exploit selection, and remediation recommendations, all running on local models with no external dependencies.

Closing Thought

Security tooling should not require a dedicated team to operate. The building blocks — Kubernetes, Kafka, open-source LLMs, Snort, Metasploit — are all available. What was missing was an architecture that connected them into a coherent, automated, human-supervised pipeline.

That is what I built.

Get in Touch

If you are a security team that wants to explore what this looks like in a real environment, or you are simply curious about the platform, feel free to reach out directly:

LinkedIn: https://www.linkedin.com/in/alessio-marinelli-b302042a/

Email: marinelli_alessio@yahoo.it

Architecture diagrams and demo materials available on request. The codebase is proprietary.

How we handle LLM context window limits without losing conversation quality

Adamo Software — Tue, 21 Apr 2026 08:17:52 +0000

Every developer building on LLMs hits the same wall eventually. Your chatbot works beautifully for the first 10 turns, then starts forgetting things. Your agent ran a 30-step workflow and lost track of the original goal halfway through. Your RAG system stuffed so much context into the prompt that response quality dropped.

This is the context window problem, and it does not go away by switching to a model with a bigger window. We learned this the hard way while building an AI assistant for a travel booking platform. This post covers the strategies we actually use in production, with the trade-offs we hit.

Why bigger context windows are not the answer

Claude 3.5 Sonnet has a 200K token window. GPT-4o has 128K. Gemini 1.5 Pro has up to 2M. The temptation is to just throw everything in.

Three problems with that approach.

First, cost. Input tokens are not free. At 2M tokens per call, you are spending significant money on every request even before the model generates anything.

Second, latency. Processing a 200K-token prompt takes meaningfully longer than a 10K-token one. For a chat interface, this is the difference between instant and sluggish.

Third, and most importantly, quality degrades with length. Research from Anthropic and others has consistently shown that models pay less attention to content in the middle of very long contexts. This is called the "lost in the middle" problem. A fact placed at token 80,000 of a 150,000-token context has a real chance of being ignored.

So the question is not "how do we fit everything," it is "what actually needs to be in the prompt right now."

The four strategies we use

We combine four techniques depending on the use case. None of these are novel individually. The value is in knowing when to use which.

1. Sliding window with summarization

For chatbots and conversational agents, we keep the last N turns verbatim and summarize everything older. The key design decision is how often to summarize.

from typing import List
from dataclasses import dataclass

@dataclass
class Message:
    role: str
    content: str
    tokens: int

RECENT_TURNS = 6
SUMMARIZE_THRESHOLD = 20

def manage_context(messages: List[Message], summary: str) -> tuple[List[Message], str]:
    if len(messages) <= SUMMARIZE_THRESHOLD:
        return messages, summary

    # Keep the last N turns raw
    recent = messages[-RECENT_TURNS:]
    to_summarize = messages[:-RECENT_TURNS]

    # Incremental summarization: feed old summary + new messages
    new_summary = summarize(
        existing_summary=summary,
        new_messages=to_summarize
    )
    return recent, new_summary

We trigger summarization when the conversation exceeds 20 turns, not on every turn. Summarizing every turn is wasteful and introduces quality drift because you are summarizing summaries of summaries.

The trade-off: summaries lose specificity. If a user mentioned "I prefer aisle seats near the front" on turn 3 and you compressed that into "user discussed seat preferences" on turn 25, the agent may forget the actual preference. We mitigate this with strategy #3 below.

2. Relevance-based retrieval instead of full history

For long-running agents that make many tool calls, we do not send the entire tool call history back on every step. Instead, we embed each prior action and its result, and retrieve only the top-k most relevant to the current step.

def build_agent_context(current_goal: str, all_steps: List[Step], k: int = 5):
    # Embed the current goal
    query_embedding = embed(current_goal)

    # Embed each step's summary
    step_embeddings = [embed(f"{s.action}: {s.result}") for s in all_steps]

    # Retrieve top-k most relevant prior steps
    scores = cosine_similarity(query_embedding, step_embeddings)
    top_k_indices = np.argsort(scores)[-k:]
    relevant_steps = [all_steps[i] for i in sorted(top_k_indices)]

    return relevant_steps

This works well when agent steps are semantically diverse. It works poorly when every step is similar, because the embeddings cluster too tightly. For those cases we fall back to the sliding window.

3. Structured memory for facts that must not be lost

Some information cannot be lost to summarization. User preferences, confirmed bookings, authentication context, critical constraints. We extract these into a structured memory object that travels with every prompt.

structured_memory = {
    "user_profile": {
        "name": "extracted_from_conversation",
        "preferences": ["aisle seat", "non-smoking", "high floor"],
    },
    "session_state": {
        "current_booking": {"destination": "Tokyo", "dates": "2026-06-12 to 2026-06-20"},
        "confirmed_steps": ["flight_selected", "hotel_searched"],
    },
    "hard_constraints": ["budget: $3000 max", "must arrive before June 14"]
}

The LLM does not write to this object freely. We use a dedicated extraction step after each turn, with a structured output schema, to pull out facts. This gives us deterministic memory instead of relying on the model to remember.

The Anthropic prompt caching documentation is worth reading if you go this route, because a stable memory block at the start of your prompt is an ideal cache target.

4. Context compression for large retrieved documents

For RAG systems retrieving long documents, we compress before injection. Instead of pasting a 5000-word document into the context, we run a fast model (Haiku or GPT-4o-mini) to extract only the passages relevant to the user's query.

This is a two-model pipeline:

Retrieval returns top-k documents (often 3-5 long docs)
A fast, cheap model extracts relevant sections from each
The main model sees only the compressed, relevant content

The extra inference call adds ~200ms of latency but typically reduces main prompt size by 70-85%. Net cost is lower and quality is usually higher because the main model is not distracted by irrelevant content.

When each strategy fails

Being specific about failure modes, because this is where blog posts usually wave their hands:

Sliding window fails when users reference something from far back in the conversation ("like that restaurant I mentioned earlier"). Always pair with structured memory.
Relevance retrieval fails when the current step has no good semantic overlap with prior relevant steps. For example, if step 30 needs information from step 2 but they use completely different vocabulary, retrieval misses it.
Structured memory fails when the extraction step produces low-quality outputs. Garbage in, garbage out. We validate extractions against a Pydantic schema and retry with a stricter prompt on validation failure.
Context compression fails when the query is ambiguous. If the user asks "tell me more about that," the compression model has no way to know what "that" refers to. We rewrite the query using recent conversation context before passing it to compression.

What changed when we combined all four

Before we had a structured context strategy, a 50-turn conversation in our travel agent would produce noticeably worse responses by turn 40. Users would need to re-state preferences. The agent would propose options the user had already rejected.

After combining sliding window + relevance retrieval + structured memory:

Average tokens per request dropped from ~18,000 to ~6,500, a 64% reduction
User-reported "the AI forgot what I said" complaints dropped significantly in internal testing
Response latency p95 improved from 4.2s to 2.1s

One thing we did not improve: cost per successful conversation. The reduction in tokens was offset by the extra inference calls for summarization and extraction. What we got was better quality at roughly the same cost, which for a production agent is the right trade.

Wrapping up

The context window is a constraint to design around, not a capacity to fill. A model with 2M tokens gives you more runway, but if you depend on stuffing everything in, your quality will still degrade and your costs will still climb.

Start with a sliding window for recent turns, structured memory for facts that matter, and retrieval for everything in between. Compression is the advanced move once the basics are in place.

If you are working on production AI systems and want deeper context on multi-step agent design, we have written previously about AI agent fallback chains and human-in-the-loop patterns that pairs well with this post. For background reading, Greg Kamradt's Needle in a Haystack benchmarks are a good way to see context window degradation empirically.

I work on AI platform engineering at Adamo Software, where we build custom AI systems for travel, healthcare, and enterprise clients.

Why I switched from per-token AI billing to flat-rate: a developer's honest breakdown

brian austin — Tue, 21 Apr 2026 08:17:43 +0000

Why I switched from per-token AI billing to flat-rate: a developer's honest breakdown

I've been building AI-powered tools for two years. In that time, I've burned through three different billing models — pay-per-token, monthly subscription with limits, and now flat-rate unlimited.

Here's what actually happened to my costs and my stress levels with each.

The per-token era (expensive and unpredictable)

My first AI integration was direct Anthropic API calls. I was building a document summarizer for a small NGO.

The math looked fine in theory:

Claude Opus input: $15/million tokens
Average document: ~4,000 tokens
100 documents/day = 400,000 tokens = $6/day = $180/month

Then someone uploaded a 200-page PDF. Then someone ran it in a loop by mistake. Then my context window trimming had a bug and started including 50,000 tokens of history in every call.

Month 1: $180. Month 2: $340. Month 3: $612.

Not because the usage grew — because tokens are invisible until the bill arrives.

The subscription-with-limits era (cheaper but anxiety-inducing)

I switched to a hosted service that charged $20/month for "unlimited" usage, with a soft cap of 500,000 tokens/day.

The anxiety shifted from cost to availability. I was constantly:

Counting tokens mentally before every API call
Checking usage dashboards before batch jobs
Getting rate-limited at 4pm when I needed to demo something
Paying $20 whether I used it or not

The worst part: I didn't know when I was approaching the limit until I hit it.

The flat-rate era (boring in the best way)

I've been on SimplyLouie (simplylouie.com) for a few months now. $2/month, no token counting, no surprise bills.

What actually changed:

I stopped thinking about tokens. This sounds minor. It's not. Token anxiety was a background process running constantly in my head while coding. Removing it freed up actual cognitive bandwidth.

My code got simpler. I deleted about 300 lines of token-counting, context-trimming, and quota-checking code. The trimming logic alone was 80 lines and had three bugs in it.

I stopped batch-optimization hacks. I used to batch API calls to stay under daily limits. Now I just... call the API when I need to.

The actual code difference

Before (per-token paranoia)

def call_ai_safely(messages, max_context_tokens=8000):
    # Count tokens first
    total_tokens = sum(count_tokens(m['content']) for m in messages)

    # Trim if over limit
    while total_tokens > max_context_tokens and len(messages) > 1:
        messages.pop(1)  # Remove oldest non-system message
        total_tokens = sum(count_tokens(m['content']) for m in messages)

    # Check daily quota before calling
    if get_daily_usage() > DAILY_LIMIT * 0.9:
        raise QuotaWarning("Approaching daily limit, deferring to tomorrow")

    # Finally make the call
    response = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=1024,
        messages=messages
    )

    # Log usage for quota tracking
    log_token_usage(response.usage.input_tokens, response.usage.output_tokens)

    return response

After (flat-rate simplicity)

def call_ai(messages):
    response = requests.post(
        'https://simplylouie.com/api/chat',
        headers={'Authorization': f'Bearer {API_KEY}'},
        json={'messages': messages}
    )
    return response.json()

That's it. No quota checking. No token counting. No deferred calls.

What I actually gave up

I want to be honest about the trade-offs:

No DALL-E or image generation — SimplyLouie is text/chat only
No direct model selection — you get Claude, no GPT-4 option
No fine-tuning — can't train on custom data
No OpenAI plugins ecosystem — Anthropic's plugin support is more limited

If I needed image generation or OpenAI-specific features, I'd use a different tool. For text-based AI work — summarization, code review, documentation, chat — flat-rate is just better.

The hidden cost that nobody talks about

Token anxiety isn't free. The mental overhead of monitoring usage, debugging quota errors, writing token-management code, and explaining to stakeholders why the AI bill doubled — that's real engineering time.

I'd estimate I spent 4-6 hours per month managing token economics. At any reasonable developer hourly rate, that's more expensive than the tokens themselves.

Who this matters most for

Students and learners: Per-token billing punishes experimentation. You can't iterate freely when each query costs money. Flat-rate removes the experimentation penalty.

Developers in emerging markets: $20/month is 5-10 days of salary in Nigeria, Kenya, the Philippines. $2/month is accessible. The AI productivity advantage shouldn't require being in a wealthy country.

Small projects and prototypes: The ROI calculation for a side project doesn't work at $20/month. It works at $2/month.

The actual numbers

Model	Month 1	Month 2	Month 3	Predictability
Per-token	$180	$340	$612	Terrible
Subscription w/ limits	$20	$20	$20	Good, but anxious
Flat-rate ($2/month)	$2	$2	$2	Perfect

What changed my mind

I used to think per-token billing was "fair" because you pay for what you use. That's true. But it also means your costs are unpredictable, your code is more complex, and your cognitive load is higher.

Flat-rate billing is fairer in a different way: your costs are predictable, your code is simpler, and you can focus on what you're building instead of what it costs.

If you're building something with AI and you're spending mental energy on token management, it might be worth doing the math on whether $2/month flat-rate (simplylouie.com) is cheaper than your current stack — not just in dollars, but in developer hours.

What's your experience with AI billing models? Have you found a different approach that works better?

SimplyLouie is $2/month flat-rate AI. 50% of revenue goes to animal rescue. 7-day free trial, no credit card required.

The AI Agent Market Is Splitting in Two — And Most Builders Don't Realize It Yet

Alan Mercer — Tue, 21 Apr 2026 08:16:46 +0000

Everyone's building "AI agents" in 2026. But after watching 50+ launches and talking to dozens of founders, I'm convinced we're actually seeing two completely different markets masquerading under one label.

Market A: Task Agents (Replace a Workflow)

These are the schedulers, expense filers, inbox triagers. Clear inputs, clear outputs, measurable ROI.

Examples: Lindy, Zapier Agents, Workbeaver
Characteristics:

Deterministic outcomes (it either filed the expense or it didn't)
Easy to measure ROI (hours saved × hourly rate)
Boring but profitable — this is where enterprise budget is flowing right now
Moat = integrations, not intelligence

The trap: Low margins. Once Salesforce/HubSpot/Microsoft build these natively (and they are), pure-play task agents become features.

Market B: Reasoning Agents (Replace Thinking)

These do research, analysis, code architecture, strategy. High variance, hard to evaluate.

Examples: Claude with extended thinking, specialized research agents, code review agents
Characteristics:

Probabilistic outputs (quality varies run-to-run)
Hard to measure ROI (how much was that insight worth?)
Massive upside if you crack evaluation/reliability
Moat = proprietary data + evaluation methodology

The trap: Customers expect perfection on day one. The gap between "impressive demo" and "reliable teammate" is wider than most founders admit.

Why This Matters Now

I'm seeing a pattern in Q2 2026:

Task agent companies are hitting revenue plateaus — customers love them but won't pay enterprise prices for what feels like "fancy automation"
Reasoning agent companies are burning cash on reliability engineering — the product works 80% of the time, but that last 20% is brutally expensive
Companies conflating both are going to have brutal board meetings when customers realize they bought a scheduler when they needed a strategist

The Winning Strategy

The founders who'll thrive are the ones who pick ONE market and own it:

Task agents: Go deep on vertical workflows. Don't try to be general-purpose. Your moat isn't AI — it's domain-specific integration depth.
Reasoning agents: Invest heavily in evaluation infrastructure. Build your own benchmarks. Be transparent about failure modes. The company that solves "how do I know my agent gave good advice?" wins the category.

What I'm Watching

Can task agents survive the platform encroachment from Microsoft/Google/Salesforce?
Will reasoning agents find a unit economic model that works before funding dries up?
Who builds the "agent orchestration layer" that sits between both markets?

The next 6 months will separate the signal from the noise. The question isn't whether agents are real — it's which kind you're betting on.

What type of agent are you building? Task or reasoning? Let me know in the comments.

AI Coding Tools Have a Context Problem — Here's the Fix

RapidKit — Tue, 21 Apr 2026 08:11:47 +0000

The Wrong Unit of Context

Most AI coding tools work at the file level.

That's fine for a React component. A component is self-contained — the context needed to help you fits in the file.

Backend services aren't self-contained. They live inside environments. They share infrastructure. They depend on modules installed at the workspace level.

This is why AI backend debugging suggestions are often... almost right. They're missing environment context.

What a Backend AI Actually Needs

Take this error:

redis.exceptions.ConnectionError: Error 111 connecting to localhost:6379

A file-level AI tells you: Redis isn't running.

A workspace-aware AI knows:

You have redis-cache module installed in auth-api
Your Workspace Health check already flagged this
You're using Docker Compose conventions (RapidKit workspace)

The second answer is specific. The first is a starting point you still have to work from.

The Workspace as Context Unit

In Workspai, when AI responds to a debug action, it receives:

{
  "project": "auth-api",
  "type": "fastapi.standard", 
  "modules": ["jwt-auth", "redis-cache"],
  "python": "3.12.3",
  "health_warnings": ["Redis not reachable at localhost:6379"],
  "error": "ConnectionRefusedError at line 89"
}

Not file contents. A structured workspace snapshot. The response is grounded from the first message.

Why the Workspace Format Matters

This only works because RapidKit defines a structured workspace format. It knows:

Which projects exist and what type they are
Which modules are installed at each project
The runtime version
The current health state

Without this structure, you'd have to infer context from file contents — slow, unreliable, incomplete.

With it, context assembly is deterministic. The AI starts informed.

What's Available Now (v0.20)

@workspai Chat Participant — use @workspai /ask for full-context Q&A scoped to your active project, or @workspai /debug for structured root-cause + fix + prevention, directly in the VS Code Chat panel
AI Create with presets — describe a project in plain language (or pick a smart preset), and AI plans the workspace, picks a kit, and selects modules
AI Debug Actions — lightbulb in Python/TS/JS/Go files with workspace-aware context
Doctor Fix with AI — one-click AI resolution for workspace health issues
Module Advisor — compatible module suggestions based on what you're building
Workspace Memory — persistent AI context scoped to the workspace, carried across sessions

All on top of the existing RapidKit workspace platform. No changes to CLI, kits, or modules.

The Bigger Picture

The teams that establish workspace structure now will leverage AI more effectively as the tools improve. Workspace-aware AI will become the baseline expectation — the file level will feel like working blind.

🔗 workspai.com

🔗 Workspai — VS Code Marketplace

🔗 getrapidkit.com

The Planning Tax: Why Your AI Agent Feature Might Be Your Worst Investment

Cornel Stefanache — Tue, 21 Apr 2026 08:05:07 +0000

Your best feature may be destroying your margins, and your engineering team has no idea.

This article isn’t about AI as a productivity tool. It’s about AI as a cost structure, embedded in your product, triggered by your users, and scaling with your revenue.

The AI agents embedded in your product are generating a cost structure your pricing model probably didn’t account for. Not a server bill. Not a licensing fee.

A variable, compounding AI infrastructure cost that grows with engagement, spikes with complexity, and, unlike every other line in your budget, gets worse the more your product succeeds.

Every interaction with an LLM-powered feature is a fresh purchase from a model provider, billed per token, at rates that compound with every feature you add to make the product smarter.

The model provider captures guaranteed revenue on every interaction regardless of whether your business ever makes money on that customer. As Andreessen Horowitz has argued, the total cost of ownership for generative AI is reshaping the economics of an entire software category.

AI is running at your expense, not your users

There is a quiet structural problem sitting at the centre of nearly every LLM-powered product business: the more useful your product becomes, the more expensive it is to run.

This is not a temporary inefficiency that engineering will eventually optimise away. It is the defining economic characteristic of a new category of software, and most product teams are not treating it with the strategic gravity it deserves.

The Paradox of the Power User

The most celebrated features of LLM-powered products, personalisation at scale, natural language interfaces, conversational support that actually resolves issues, intelligent document summarisation, share a common characteristic: they get more expensive with use.

The user who engages most deeply generates the most value and the most AI agent cost simultaneously. This inverts one of the foundational assumptions of the SaaS business model. In traditional software, your heaviest users are your best customers.

They renew, they expand, they refer others. In LLM-powered products, your heaviest users may be your least profitable ones.

The user who loves your product enough to use it every day is the one most likely to be costing you more than they pay.

The evidence is not theoretical. GitHub Copilot launched at $10 per month per developer. Microsoft’s internal calculations later revealed that the average developer was costing roughly $30 in Azure compute, with heavy coders consuming up to $80 per month in inference, a product that was operating at negative gross margin from day one for a meaningful subset of its user base.

Microsoft subsequently raised pricing to $19 per month, not because the feature had improved, but because the original pricing had no defensible unit economics.

Sam Altman confirmed publicly that ChatGPT Pro, priced at $200 per month, was losing money on users generating 20,000 or more queries. Cursor, Replit, and others have made similar mid-course corrections, shifting from flat-rate to consumption-based pricing once the distribution of actual usage became visible.

You Can’t Budget What You Can’t Predict

Traditional compute scales linearly: you set a subscription price, model your cohorts, and the unit economics hold. AI agent costs break that contract entirely. You charge your customer a fixed monthly fee decided in a boardroom, while on the other side of that transaction, you are paying a dynamic, usage-driven price to a model provider that doesn’t care about your pricing page.

A user who opens your product twice a month and one who runs complex queries for three hours a day pay you the same amount. They do not cost you the same amount.

The gap between those two numbers isn’t an edge case to be managed — it is the fundamental structural risk of building a subscription business on top of a consumption-based cost model. As Sequoia Capital’s analysis highlights, the AI industry faces a $600 billion question around whether revenue can ever justify the infrastructure spend. You’ve sold certainty to your customer while absorbing all the variability yourself.

You’re not paying per query. You’re paying for every decision, retry, context window, and failure your product accumulates, the per-query figure is just where the math starts.

Start with context window growth. In a multi-turn conversation, each new response requires the model to process every prior token in the session. A 10-turn conversation doesn’t cost 10 times the price of a single turn, it costs closer to 55 times (the sum of 1 through 10), because each turn re-processes everything that came before. Product features designed around conversational depth have costs that escalate with engagement, not proportionally to it.

Then consider the multiplier effect of making your product smarter. Add multi-step reasoning, tool use, or chained agents, and the multiplier compounds further. Research into agentic software engineering found that in multi-agent systems, iterative code review and refinement stages alone consumed nearly 60 per cent of all tokens in a task — not the generation, but the verification loops.

The Reflexion architecture, which gives LLM agents the ability to reflect on and correct their own outputs across multiple trials, achieves impressive accuracy gains precisely because it runs multiple full inference passes per task. Each improvement in output quality is purchased with a corresponding increase in model API costs.

A reasonable unit economics model makes the failure cost concrete. Consider a product with 1,000 daily user interactions, a 70 per cent success rate, and an average lifetime value of $200 per customer.

The 300 daily failures each carry a recovery cost of at least one additional inference call, an escalation probability, and an amortised churn risk. Even conservative assumptions produce a total daily loss that frequently exceeds the entire inference budget. The cost per transaction you’re tracking is the visible part of a larger number.

How Do You Calculate the True Cost of an AI Agent?

There is a mathematical reality about agentic systems that is uncomfortable to confront in a board meeting: the more steps an agent takes, the more likely it is to fail, even when each individual step has a high probability of success.

If an agent executes a ten-step task and achieves 85% accuracy at each step, the compound probability of a fully correct end-to-end outcome is approximately 19%. Four out of every five autonomous task completions produce a result that is wrong somewhere. The arithmetic is a function of sequential dependency, and it does not improve unless you shorten the chain.

The true cost of an agentic system is expressed by this formula:

Expected Agentic ROI = (Task Value × Success Rate × Volume) − (Development Cost + Runtime Cost + Failure Cost)

The term most internal business cases leave blank is Failure Cost. When an agent fails in production, you incur the engineering labor required to diagnose and remediate, plus the business impact of lost customer value. An enterprise deployment processing 1,000 tickets per day at a 70% success rate generates 300 failures daily.

At a conservative $10 per failure, the monthly failure cost reaches $90,000, often exceeding the compute budget. As McKinsey’s State of AI report notes, organisations that fail to account for these hidden costs are systematically underestimating their total cost of ownership.

A demo that works 80 percent of the time is impressive. A production system that fails 20 percent of the time is useless.

5 Proven Strategies to Reduce AI Agent Costs and Architect for Margin

The AI cost structure described above is not fixed. It is simply the default you accept if you deploy without engineering the economics. You should treat unit economics as a first-class architectural concern from day one.

When building cost-effective, production-ready AI agents for enterprise clients, we apply five core AI cost optimization strategies to fundamentally alter the dollar-per-decision profile:

Model Routing by Task Complexity

The costliest assumption in the industry is that every single step of a workflow requires a premium, frontier model. It doesn’t. You wouldn’t pay a senior executive to handle basic data entry, and you shouldn’t pay a frontier model to do it either.

We design heterogeneous architectures that act as intelligent traffic controllers: they route complex, high-entropy planning to advanced models, but immediately delegate the execution of those plans to highly efficient, fine-tuned Small Language Models (SLMs).

This approach isolates the cost of “expensive intelligence” only to the moments it is genuinely necessary, lowering execution costs by 10x to 30x for procedural, repetitive tasks without sacrificing output quality.

Temporal Scheduling & Compute Arbitrage

Not all agentic work is time-sensitive, yet default setups treat every request like an emergency. Heavy computational tasks — like end-of-day batch summarisation, large-scale data extraction, or automated inbox triaging — do not need sub-second latency. We architect systems that explicitly separate real-time user needs from asynchronous background work.

By scheduling heavy processing during off-peak infrastructure hours and batching requests intelligently, we drastically reduce model API costs and prevent latency spikes for the users who actually need real-time responses.

Constraining the Agent’s Latitude

Planning capability is an incredible feature; unconstrained planning is a blank check. Without boundaries, agents will often fall down “rabbit holes,” exploring vast solution spaces and burning tokens in endless loops just to be thorough.

We implement explicit step budgets, tight system guardrails, and hard termination conditions. An agent instructed to resolve a problem in three steps or fewer will often arrive at the exact same result as one told to “do whatever it takes,” but at a fraction of the cost per interaction. This ensures that your per-transaction costs remain predictable and strictly capped.

Prompt Engineering as Infrastructure

Too many development teams treat prompt design as a quick launch prerequisite rather than core, scalable infrastructure. We treat prompts as highly optimised code. By implementing token-budget-aware reasoning, we mathematically force the model to be concise.

Furthermore, we deploy semantic caching at the architectural level. If a customer asks a question today that is contextually similar to one asked yesterday, our system recognises the intent and serves the answer directly from a vector-embedded cache. This bypasses the model provider entirely, routinely slashing direct API costs by 50% to 70% in environments with recurring request patterns.

Difficulty-Aware Adaptive Reasoning

We build automatic cognitive caps into the agent’s reasoning loop to prevent the system from overthinking. Informed by dual-process theories of cognition — distinguishing between rapid, intuitive responses and slow, deliberate analysis — we calibrate our architectures to allocate intensive planning resources only to tasks that actually warrant them.

In AI reasoning, there is a strict point of diminishing returns where accuracy plateaus. We identify exactly where that plateau is for your specific business operations, ensuring you aren’t paying a premium for extra “thinking” that yields zero incremental correctness.

As research on cost-efficient query routing demonstrates, matching model capability to task difficulty is one of the highest-leverage AI cost optimisation moves available.

References

Shinn, N., Cassano, F., Berman, E., Gopinath, A., Narasimhan, K., & Yao, S. (2023). Reflexion: Language Agents with Verbal Reinforcement Learning. arXiv. https://arxiv.org/abs/2303.11366
Chen, L., Zaharia, M., & Zou, J. (2023). FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance. arXiv. https://arxiv.org/abs/2305.05176
Ding, D., Mallick, A., Wang, C., Sim, R., Mukherjee, S., Ruhle, V., Lakshmanan, L.V.S., & Awadallah, A.H. (2024). Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing. ICLR 2024. https://arxiv.org/abs/2404.14618
Ong, I., Almahairi, A., Wu, V., Chiang, W.-L., Wu, T., Gonzalez, J.E., Kadous, M.W., & Stoica, I. (2024). RouteLLM: Learning to Route LLMs with Preference Data. ICLR 2025. https://arxiv.org/abs/2406.18665
Regmi, S. & Pun, C.P. (2024). GPT Semantic Cache: Reducing LLM Costs and Latency via Semantic Embedding Caching. arXiv. https://arxiv.org/abs/2411.05276
Salim, M., Latendresse, J., Khatoonabadi, S.H., & Shihab, E. (2026). Tokenomics: Quantifying Where Tokens Are Used in Agentic Software Engineering. arXiv. https://arxiv.org/abs/2601.14470
Singla, A., Sukharevsky, A., Yee, L. et al. (2025). The State of AI: How Organizations Are Rewiring to Capture Value. McKinsey & Company / QuantumBlack. https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai-how-organizations-are-rewiring-to-capture-value
Cahn, D. (2024). AI’s $600B Question. Sequoia Capital. https://sequoiacap.com/article/ais-600b-question/
Jaipuria, T. (2025). The State of AI Gross Margins in 2025. Tanay Jaipuria’s Substack. https://www.tanayj.com/p/the-gross-margin-debate-in-ai
Kappelhoff, K. (2025). Unit Economics for AI SaaS Companies: A Survival Guide for CFOs. Drivetrain.ai. https://www.drivetrain.ai/post/unit-economics-of-ai-saas-companies-cfo-guide-for-managing-token-based-costs-and-margins
Casado, M. & Wang, S. (2023). The Economic Case for Generative AI and Foundation Models. Andreessen Horowitz. https://a16z.com/the-economic-case-for-generative-ai-and-foundation-models/
Anthropic. (2024). Introducing the Message Batches API. Anthropic Blog. https://claude.com/blog/message-batches-api
Friedman, D. (2025). AI Startups Are SaaS Minus the Margins. Substack. https://davefriedman.substack.com/p/ai-startups-are-saas-minus-the-margins
Chaddha, N. (2025). Why AI Margins Matter More Than You Think. Mayfield Fund. https://www.mayfield.com/why-ai-margins-matter-more-than-you-think/

Configuring My Site for AI Discoverability

Dennis Morello — Tue, 21 Apr 2026 08:02:58 +0000

A growing share of web traffic doesn't come from people anymore. It comes from models reading on their behalf. ChatGPT, Claude, Perplexity, Copilot. They fetch a handful of pages, summarize, and ship the answer back. If your site isn't readable by those agents, you don't exist to them.

People are calling this GEO, short for Generative Engine Optimization. It overlaps with SEO but the priorities are different. Agents don't care about your layout. They care about your prose, your metadata, and how many tokens it costs them to read you.

This post covers how I configured this site for GEO. The first half is framework-agnostic. The second half is specific to my setup on Cloudflare, and includes a deliberate choice that fails a popular GEO audit. I'll explain why.

Part 1: general GEO techniques

Serve raw Markdown alongside HTML

The single biggest GEO win is giving agents a version of each page without the navigation, styling, and scripts. HTML is designed for browsers. Markdown is designed for readers, human or otherwise. Agents spend their context window on your prose, not your DOM.

Every blog post on this site has a mirror URL with a .md suffix:

/blog/my-post is the full HTML page for humans
/blog/my-post.md is the raw Markdown, served as text/markdown

In Astro, this is a two-line route at src/pages/blog/[slug].md.ts:

export const GET = async ({ params }) => {
  const post = await getPostById(params.slug);
  return new Response(formatPostMarkdown(post), {
    headers: { "Content-Type": "text/markdown; charset=utf-8" },
  });
};

Both variants are pre-generated at build time. Same content, roughly half the tokens for an agent to consume.

Advertise the Markdown version in `<head>`

Agents landing on the HTML need to know the Markdown exists. A single <link> in the head does it:

<link rel="alternate" type="text/markdown" href="/blog/my-post.md" />

Browsers ignore this tag. Agents that parse the head follow it.

Publish an `llms.txt` index

llms.txt is a convention for a Markdown file at the root of your site listing your content with short descriptions and links. Think of it as a sitemap an LLM can actually read.

I ship two variants:

/llms.txt is the index. Title, description, one line per post with a link to its .md version.
/llms-full.txt is the full corpus. Every post body concatenated into a single response.

Why both? An agent researching a specific topic can fetch llms.txt, pick the relevant links, and pull them. An agent doing deep research on the site as a whole fetches llms-full.txt once and has everything it needs in one request. Either way there's no crawling.

Declare your AI stance in `robots.txt`

robots.txt now carries a Content-Signal directive for AI use. Mine reads:

User-agent: *
Content-Signal: search=yes, ai-train=no, ai-input=yes
Allow: /
Sitemap: https://morello.dev/sitemap-index.xml

Three independent knobs:

search=yes lets search engines index
ai-train=no says my content is not for training data
ai-input=yes says my content can be retrieved and used as input for AI answers

This is the stance I'm comfortable with. I want to show up when someone asks Claude about something I've written; I just don't want my posts absorbed into the next base model.

Whether any given operator actually honors this is another question. The signal's there regardless, and I'd rather be on record than silent about it.

Add structured data that actually describes the content

Most blogs ship JSON-LD schema by reflex. Few of them include the fields that help a generative engine decide whether your article is worth fetching.

On each post I emit a BlogPosting graph with:

wordCount and timeRequired (ISO 8601 duration), so an agent can estimate how much context it'll spend before fetching
articleBody, the full text machine-readable, with no HTML parsing required
author linked to a Person node with knowsAbout so the entity is grounded in real topics
BreadcrumbList for site hierarchy

All of it goes into a single @graph per page rather than scattered <script> tags, which makes it cheaper for an engine to walk from post to author to site without cross-referencing.

A sitemap that actually tracks freshness

If you regenerate your sitemap once and never look at it again, you're wasting a signal. Every URL in mine carries a lastmod timestamp pulled from the post's updatedDate frontmatter, falling back to pubDate. When I edit an old post, its lastmod moves forward and crawlers reprioritize it.

Validate with real tools

Two tools I found useful while iterating on all of the above:

isitagentready.com audits across five categories: discoverability, content accessibility, bot access control, protocol discovery, and commerce. The bot access control checks (Content-Signal, Web Bot Auth, AI bot rules) are the part that actually influences how agents treat your content.
acceptmarkdown.com has a narrower focus. It checks whether your site responds to Accept: text/markdown with a Markdown body, includes Vary: Accept, returns 406 for unsupported types, and parses q-values correctly.

I'll come back to the second one at the end of the post, because my site deliberately fails it.

Part 2: the Cloudflare-specific setup

General GEO gets you most of the way there. The rest is delivery. How fast you respond, whether the edge caches correctly, and how you advertise your agent-facing resources without waiting for someone to parse your HTML.

Static assets, zero Worker invocations

My wrangler.jsonc points a ./dist directory at Cloudflare's assets deployment, with no main entry:

{
  "name": "morellodev",
  "compatibility_date": "2026-04-18",
  "assets": {
    "directory": "./dist",
    "html_handling": "drop-trailing-slash",
    "not_found_handling": "404-page",
  },
}

Every request goes straight from the edge asset cache. HTML, Markdown, llms.txt, sitemap, RSS. Same path for all of them, and no Worker ever runs. On the Workers Free tier this matters. A crawler sweep that would otherwise eat into 100k daily invocations now costs me nothing. Agents, for better or worse, don't fingerprint politely.

Advertise discovery endpoints in a `Link` header

Cloudflare's _headers file lets you ship response headers without any server code. I use it to tell every response, not just HTML ones, where the agent-facing files live:

/*
  Link: </sitemap-index.xml>; rel="sitemap",
        </rss.xml>; rel="alternate"; type="application/rss+xml"; title="RSS",
        </llms.txt>; rel="describedby"; type="text/plain",
        </llms-full.txt>; rel="describedby"; type="text/plain"

A crawler doing a HEAD against any URL on the site sees all four links before it parses a single byte of HTML. One round-trip, no body, full discovery.

Long-lived cache for hashed assets

Astro emits fingerprinted filenames under /_astro/, so those can sit in cache for a year:

/_astro/*
  Cache-Control: public, max-age=31536000, immutable

Faster first paint for humans, cheaper crawls for agents. Same lever.

Why I skipped `Accept: text/markdown` content negotiation

acceptmarkdown.com will tell you this site doesn't do content negotiation. No Vary: Accept, no 406, no Markdown from the canonical URL. That's not an oversight. I tried it, shipped it briefly, and rolled it back.

The reason is Cloudflare's free plan. Custom cache keys are Enterprise-only, and their docs are explicit that Vary: Accept is ignored for caching decisions. The edge collapses every variant of /blog/my-post into one cache entry, so the first requester's format poisons the cache for everyone else until TTL expires.

The workaround is a Worker that bypasses the edge cache. But now every /blog/* request burns a Worker invocation, humans included, and the Workers Free plan gives you 100k per day and 10ms of CPU each. That's a real budget to share across humans and bots, for no functional gain over a static .md URL.

So I deleted the Worker. The only thing I lost is curl -H "Accept: text/markdown" …/blog/my-post returning Markdown. Between llms.txt, <link rel="alternate">, and the /blog/[slug].md convention, no mainstream agent I've seen actually needs Accept: negotiation. It's the more elegant protocol; alternate URLs are the more robust one on a free-tier CDN. On a paid plan I'd probably do both.

Where this leaves things

Every page exists in two forms, both served from the edge. Agent-facing resources are advertised in response headers on every request, before any HTML gets parsed. Structured data tells engines what the article is and how much context it takes to read. robots.txt says what I'll allow and what I won't.

GEO is still very new. The standards are half-drafted, the tools disagree with each other, and half the signals I described above didn't exist two years ago. I fully expect to be rewriting parts of this post within six months, probably with a different opinion about Accept-based negotiation, once I've either moved off the free plan or found a workaround that doesn't involve a Worker. But for now: serve agents a version they can cheaply consume, be explicit about what you'll allow, and accept that the defaults aren't on your side.

If you're reading this via a summary from some assistant, hi. Thanks for the traffic.

Less Human AI Agents, Please!

Mariano Gobea Alcoba — Tue, 21 Apr 2026 08:01:31 +0000

The Uncanny Valley of AI Agent Interaction: Beyond Human Mimicry

The burgeoning field of AI agents, designed to autonomously perform tasks and interact with users, presents a complex design challenge. As highlighted in recent discussions, a prevalent tendency is to imbue these agents with human-like characteristics, language, and even personality traits. While seemingly intuitive, this approach often leads to an undesirable outcome: the "uncanny valley" of human-AI interaction. This article delves into the technical and user experience implications of this human-centric design philosophy and explores alternative, more effective paradigms for AI agent development.

The Allure and Peril of Anthropomorphism

Anthropomorphism, the attribution of human characteristics to non-human entities, is a deeply ingrained cognitive bias. In the context of AI, this manifests as designing agents that speak, reason, and behave as closely to humans as possible. The motivations for this are varied:

Familiarity and Ease of Use: Users are inherently familiar with human communication and interaction patterns. Designing AI agents that mirror these patterns can, in theory, reduce the learning curve and make adoption smoother.
Emotional Connection and Trust: Some believe that a more "human" agent can foster greater trust and a sense of connection with the user, leading to more positive user experiences.
Simulating Human Capabilities: The ultimate goal for many AI agents is to replicate or surpass human performance in specific tasks. This often leads to designing agents that think and communicate in ways that mimic human cognitive processes.

However, this pursuit of human likeness is fraught with peril. When an AI agent almost succeeds at mimicking human behavior but falls short in subtle yet crucial ways, it can evoke feelings of unease, creepiness, or even revulsion. This is the AI equivalent of the uncanny valley, first described by roboticist Masahiro Mori in relation to humanoid robots.

Technical Manifestations of the Uncanny Valley:

Linguistic Inconsistencies:
- Overly Formal or Stilted Language: While aiming for politeness, agents might use phrasing that is grammatically correct but unnatural in spoken conversation.
- Inappropriate Tone: An agent attempting empathy might produce responses that feel hollow, insincere, or misaligned with the user's emotional state.
- Repetitive Phrasing: Limited generative capacity can lead to predictable and repetitive conversational patterns, signaling the artificial nature of the agent.
- Misinterpretation of Nuance: Sarcasm, irony, humor, and colloquialisms are notoriously difficult for AI to grasp. A failed attempt to engage with these can be jarring.
Behavioral Discrepancies:
- Lack of True Agency: Agents that claim to "understand" or "feel" but then act purely based on deterministic logic create a disconnect.
- Inconsistent Persona: An agent that fluctuates between being overly casual and then strictly professional can be disorienting.
- Unrealistic Pacing: Immediate responses to complex queries can feel unnatural, as humans typically require time to process information. Conversely, overly long pauses can also break the flow.
- Failure to Adapt to Context: An agent that forgets previous turns in a conversation or fails to acknowledge evolving user needs demonstrates a lack of true intelligence and makes the "human" facade crumble.
Task Performance Mismatch:
- Over-promising and Under-delivering: An agent that uses human-like language to suggest it can perform complex reasoning but then fails to do so effectively highlights its limitations.
- Misaligned Expectations: Users might expect the emotional intelligence or common sense reasoning of a human, which current AI agents generally lack.

The Case for "Less Human" AI Agents

Instead of striving for human mimicry, a more effective approach might be to design AI agents that embrace their artificial nature. This paradigm shift focuses on transparency, efficiency, and clarity of purpose, rather than a flawed attempt at emulation.

Key Principles of "Less Human" AI Agents:

Transparency and Honesty:
- Clearly State AI Identity: The agent should explicitly identify itself as an AI. There should be no ambiguity.
- Acknowledge Limitations: Instead of trying to bluff its way through, the agent should be programmed to admit when it doesn't know something, can't perform a task, or requires human intervention.
- Explain Capabilities and Purpose: Users should understand what the agent can do and why it exists. This sets realistic expectations.
Efficiency and Directness:
- Focus on Task Completion: The primary goal of an AI agent is to efficiently and accurately perform its designated tasks. Human-like chit-chat or personality embellishments can be distractions.
- Precise Language: Use clear, unambiguous language. Avoid jargon where possible, but prioritize accuracy and conciseness over conversational filler.
- Structured Interaction: For complex tasks, a more structured, form-based, or step-by-step interaction might be more efficient than an open-ended conversation.
Predictability and Reliability:
- Consistent Behavior: The agent's responses and actions should be predictable based on its programming and the input it receives. This builds trust through reliability.
- Defined Scope: Clearly defined operational boundaries prevent unexpected or undesirable behavior.
Functional Design:
- User Interface (UI) and User Experience (UX) Driven by Function: The interface and interaction flow should be optimized for task completion, not for mimicking human conversation. This might involve dashboards, clear forms, and direct controls rather than free-form text input.
- Error Handling as a Feature: Robust error handling, with clear explanations and actionable steps, is more valuable than an apology that rings hollow.

Technical Implementation Strategies

Adopting a "less human" approach doesn't mean creating robotic, unfriendly interfaces. It means prioritizing functional excellence and transparency in design and implementation.

1. Communication Protocols and Language Models

Intent Recognition and Slot Filling: For task-oriented agents, sophisticated Natural Language Understanding (NLU) models focusing on intent recognition and slot filling are crucial. These models should be trained to extract specific information rather than engaging in broad conversational discourse.

# Example using a hypothetical NLU library
from nlu_service import NLUClient

client = NLUClient(api_key="YOUR_API_KEY")

user_utterance = "I want to book a flight from London to New York for two people next Tuesday."
result = client.analyze(user_utterance)

# Expected output focuses on structured data extraction
# {
#     "intent": "book_flight",
#     "slots": {
#         "origin": "London",
#         "destination": "New York",
#         "passengers": 2,
#         "date": "next Tuesday"
#     }
# }

# The agent then uses these structured slots to query a booking system.

Controlled Generative Models: If generative capabilities are needed, they should be carefully constrained. Fine-tuning Large Language Models (LLMs) on specific, task-oriented dialogue datasets can produce helpful, concise responses without venturing into overly human-like or speculative language. Techniques like Reinforcement Learning from Human Feedback (RLHF) can be used to steer generation towards helpfulness and factual accuracy, rather than "humanness."

# Hypothetical example of constrained generation
from llm_service import LLMClient

llm_client = LLMClient(model="task_oriented_model")

prompt = """
User Request: "What is the status of my order #12345?"

System Instruction: Respond concisely with factual information only.
If information is unavailable, state "Information not available."
Do not speculate or offer apologies.
"""
response = llm_client.generate(prompt)
# Expected response: "Order #12345 is currently in transit. Estimated delivery: 2023-10-27."
# Or: "Information for order #12345 is not available."

Explicit AI Identification: The system should prepend or append clear disclaimers.

def generate_ai_response(core_response: str) -> str:
    prefix = "System AI: "
    return f"{prefix}{core_response}"

user_query = "Book a meeting with John Doe tomorrow at 2 PM."
# ... logic to process query and find availability ...
meeting_details = "Meeting with John Doe scheduled for tomorrow at 2 PM."
print(generate_ai_response(meeting_details))
# Output: System AI: Meeting with John Doe scheduled for tomorrow at 2 PM.

2. State Management and Context Handling

Session State: Maintain a clear, explicit representation of the conversation state. This includes recognized intents, extracted slots, user preferences, and task progress.

Contextual Awareness: The agent needs to understand the immediate context of the current turn as well as relevant historical context from the session. However, this context should be used to inform task execution, not to build a "personality."

class ConversationState:
    def __init__(self):
        self.current_intent = None
        self.slots = {}
        self.task_progress = "idle"
        self.user_id = None
        self.history = [] # Limited history relevant to task

    def update_state(self, intent, new_slots):
        self.current_intent = intent
        self.slots.update(new_slots)
        self.history.append({"intent": intent, "slots": new_slots})
        # Logic to advance task progress based on intent and slots

state = ConversationState()
# User says: "I need to reorder my usual coffee."
# NLU identifies intent="reorder_item", slots={"item": "usual coffee"}
state.update_state("reorder_item", {"item": "usual coffee"})
# Agent uses state.slots["item"] to query order history.

3. Error Handling and Fallback Strategies

Informative Error Messages: When an error occurs, the agent should provide a clear explanation of what went wrong and, if possible, suggest concrete next steps.

def handle_booking_error(error_type: str, context: dict) -> str:
    if error_type == "slot_missing":
        missing_slot = context.get("missing_slot", "required information")
        return f"I cannot proceed without {missing_slot}. Please provide it."
    elif error_type == "api_failure":
        return "An internal error occurred while processing your request. Please try again later."
    else:
        return "An unexpected error occurred. Please contact support."

# Agent encounters an error
print(handle_booking_error("slot_missing", {"missing_slot": "departure date"}))
# Output: I cannot proceed without departure date. Please provide it.

Graceful Degradation: If an agent cannot fulfill a request, it should offer alternatives or clearly state its inability to help, rather than generating nonsensical or misleading information.

def handle_unfulfillable_request(request: str) -> str:
    # Check against agent's capabilities
    if not agent_can_handle(request):
        return f"I am designed to assist with [specific tasks]. I cannot help with '{request}'."
    return "This request cannot be fulfilled at this time."

print(handle_unfulfillable_request("Analyze my company's stock market trends for the next decade."))
# Output: I am designed to assist with booking appointments and sending reminders. I cannot help with 'Analyze my company's stock market trends for the next decade.'

4. User Interface Design for Clarity

Visual Cues: Use UI elements that clearly indicate the agent's function and status. Progress indicators, clear labels, and distinct input/output areas can be more effective than chat bubbles.
Structured Input: For complex data entry, use forms, dropdowns, calendars, and other structured input fields instead of relying solely on natural language. This reduces ambiguity and ensures all necessary information is captured.

Actionable Output: Present information and results in a clear, organized, and actionable manner. Buttons for confirmation, links to further information, or summaries of actions taken are beneficial.

<!-- Example of a structured UI element for booking -->
<div class="booking-form">
    <h3>Flight Booking</h3>
    <label for="origin">Origin:</label>
    <input type="text" id="origin" placeholder="e.g., London">

    <label for="destination">Destination:</label>
    <input type="text" id="destination" placeholder="e.g., New York">

    <label for="departure-date">Departure Date:</label>
    <input type="date" id="departure-date">

    <button id="search-flights">Search Flights</button>
</div>

The Benefits of a Functionalist Approach

Moving away from the pursuit of human-like interaction offers several advantages:

Reduced User Frustration: By setting realistic expectations and providing clear, efficient interactions, users are less likely to be frustrated by an agent's perceived shortcomings.
Increased Trust and Reliability: An agent that is honest about its capabilities and consistently performs its functions accurately builds more genuine trust than one that fakes empathy or understanding.
Improved Efficiency: Focusing on task completion rather than conversational pleasantries can lead to faster and more direct resolution of user needs.
Scalability: Functionalist agents are often easier to scale and maintain, as their behavior is more predictable and less dependent on the nuances of human language and emotion.
Ethical Considerations: Avoiding the creation of artificial "personalities" can mitigate concerns around emotional manipulation and the blurring of lines between human and machine relationships.

Conclusion: Embracing Artificiality

The quest to make AI agents "less human" is not about creating cold, unfeeling interfaces. It is about a pragmatic recognition of current AI capabilities and a user-centered design philosophy that prioritizes clarity, efficiency, and honesty. By embracing the artificial nature of these agents, developers can build systems that are more reliable, trustworthy, and ultimately more helpful to users. The uncanny valley of human mimicry is a trap that can be avoided by focusing on what AI agents do best: process information, execute tasks, and communicate results with precision and transparency.

We invite you to explore further advancements and discuss these principles in the context of your own projects. For expert guidance and consulting services in AI agent development and conversational interface design, please visit https://www.mgatc.com.

Originally published in Spanish at www.mgatc.com/blog/less-human-ai-agents-please/

We open sourced our Unity MCP server

Daniel Fang (Glade) — Tue, 21 Apr 2026 08:01:05 +0000

Many “AI for game dev” tools still stop at code generation.

They can suggest a script, maybe explain an error, maybe even produce something close to what you want. But in actual Unity workflows, that is usually only a small part of the job.

The real work is spread across scene hierarchy, prefabs, materials, UI, physics, animation, input setup, package differences, console errors, project conventions, and lots of repetitive editor actions.

That gap is exactly why we built GladeKit.

Today, we’re doing two things:

Launching GladeKit officially (see Product Hunt)
Open sourcing the GladeKit Unity MCP server

GladeKit Unity MCP

The open-source MCP server connects AI clients like Cursor, Claude Code, and Windsurf directly to the Unity Editor.

That means the model is not just chatting about your game in the abstract. It can actually operate with real Unity context.

The server includes:

230+ Unity tools across areas like scenes, GameObjects, scripts, prefabs, materials, lighting, VFX, audio, animation, physics, camera, UI, input, terrain, and NavMesh
a Unity-aware system prompt
GLADE.md project context injection
semantic script search
skill calibration based on user expertise
optional cloud intelligence for RAG and cross-session memory

Core features are free, local, and MIT licensed.

Why we open sourced it

For Unity especially, usefulness depends on project awareness. The model needs to understand what scene is open, what objects exist, what scripts are relevant, what pipeline is being used, what errors are happening, and what conventions the project already follows.

Without that, you end up with generic “AI-generated advice.”
With that, you start getting closer to an actual useful AI assistant / agent.

Open sourcing the MCP server is our way of pushing that interface forward.

Example of the difference

A normal coding assistant might help with:
“Write me a script for enemy spawning.”

A Unity-connected MCP can help more like this:
“Find how enemy spawning currently works in my project, inspect the related scripts, create a new spawn manager, wire it into the scene, and adjust the exposed values to match the existing design.”

That difference is what we care about.

Architecture at a high level

The setup is simple:

a Unity bridge package runs inside the editor
the MCP server connects to that bridge
your AI client talks to the MCP server over stdio or HTTP
the model gets tool access plus Unity-specific context

So instead of copy-pasting back and forth between your IDE, a chatbot, and Unity, the agent can operate much closer to the actual source of truth.

Why this matters beyond GladeKit

I think game dev is one of the most interesting places for MCP-style tooling.

Game development has a huge amount of structured-but-fragmented work:
editor actions, asset references, scene state, component wiring, engine-specific APIs, and long chains of small tasks that are annoying to do manually but difficult to solve with plain text generation alone.

That makes it a really good fit for agent tooling with real tool access.

My guess is we’ll see more of this pattern across game engines and other developer tools - not just AI that answers questions, but AI that can actually operate in the environment where the work is happening.

Links

Open-source MCP repo:
https://github.com/Glade-tool/glade-mcp-unity

GladeKit site:
https://gladekit.com

Product Hunt launch:
https://www.producthunt.com/products/gladekit?launch=gladekit

Would love feedback from anyone building AI dev tools, working with MCP, or trying to make Unity workflows faster.

Playing HEVC in a Browser Without Plugin — An H.265 Decoder in WebAssembly

Thibaut Lion — Tue, 21 Apr 2026 08:00:42 +0000

The Problem — HEVC Everywhere Except the Browser

HEVC/H.265 is the standard codec for Netflix, Apple, broadcasters, 4K/HDR. It saves 30-50% bandwidth versus H.264 at equivalent quality — millions in annual CDN savings for streaming services.

But browser support is a mess.

macOS — Safari, Chrome, Edge, Firefox all decode HEVC natively via VideoToolbox. No extension needed.

Chrome 107+ on Windows — uses D3D11VA directly. No Microsoft extension required, but needs a GPU with hardware HEVC decoder (Intel Skylake 2015+, NVIDIA Maxwell 2nd gen+, AMD Fiji+). No software fallback.

Edge on Windows — uses Media Foundation. Requires the Microsoft HEVC Video Extension ($1 on the Store). Without it, no HEVC regardless of GPU.

Firefox 133+ on Windows — same MFT path, same extension dependency.

Linux — Chrome with VAAPI, maybe. Firefox, no.

The root cause is licensing. MPEG LA and Access Advance impose per-unit royalties. Microsoft passes this to users via the Store extension. Google negotiated a direct D3D11VA path. Mozilla relies on Microsoft's extension. The result: publishers must either encode everything twice (H.264 + HEVC) or accept that some users get a black screen.

The Solution — Decode HEVC Client-Side in WebAssembly

What if the browser didn't need to know it's playing HEVC?

hevc.js decodes HEVC in a Web Worker and re-encodes to H.264 via WebCodecs, delivering standard H.264 to Media Source Extensions. The player doesn't know it's happening.

fMP4 HEVC → mp4box.js (demux) → NAL units
         → WASM H.265 decoder → YUV frames
         → WebCodecs VideoEncoder → H.264
         → custom fMP4 muxer → MSE → <video>

The HEVC decoder is a from-scratch C++17 implementation of ITU-T H.265 (716 pages), compiled to WebAssembly. 236 KB gzipped. Zero dependencies. No special server headers needed.

dash.js integration

The plugin intercepts MediaSource.addSourceBuffer(). When dash.js creates an HEVC SourceBuffer, a proxy accepts the HEVC MIME type but feeds the real SourceBuffer with H.264. ABR, seek, live — everything works unmodified.

import dashjs from 'dashjs';
import { attachHevcSupport } from '@hevcjs/dashjs-plugin';

const player = dashjs.MediaPlayer().create();
await attachHevcSupport(player, {
  workerUrl: '/transcode-worker.js',
  wasmUrl: '/hevc-decode.js',
});
player.initialize(videoElement, mpdUrl, true);

Smart detection

MediaSource.isTypeSupported() can lie — Firefox on Windows reports HEVC support even without the Video Extension installed. hevc.js actually creates a SourceBuffer to probe; only activates transcoding on failure. When native HEVC works: zero overhead, WASM never loaded.

Browser Compatibility

Browser + OS	Native HEVC	hevc.js activates?	Transcoding?
Safari 13+ (macOS/iOS)	Yes (VideoToolbox)	No	—
Chrome/Edge/Firefox (Mac)	Yes (VideoToolbox)	No	—
Chrome 107+ (Win, HEVC GPU)	Yes (D3D11VA)	No	—
Chrome 107+ (Win, no HEVC GPU)	No	Yes	Yes
Edge (Win, with extension)	Yes (MFT)	No	—
Edge (Win, no extension)	No	Yes	Yes
Firefox 133+ (Win, with extension)	Yes (MFT)	No	—
Firefox 133+ (Win, no extension)	False positive	Yes	Yes
Chrome/Edge 94-106	No	Yes	Yes
Chrome (Linux, no VAAPI)	No	Yes	Yes

Requirements: WebAssembly, Web Workers, Secure Context (HTTPS), WebCodecs with H.264 encoding support.

Performance

Single-threaded, Apple Silicon:

	Native C++	WASM (Chrome)
1080p decode	76 fps	61 fps
4K decode	28 fps	21 fps
1080p transcode	—	~2.5x realtime

WASM reaches 80% of native C++ speed, and 83% of libde265 (a mature 10-year-old HEVC decoder) when both are compiled to WASM.

Conformance: 128/128 test bitstreams pixel-perfect against ffmpeg. Zero drift.

The Tradeoff

The first segment takes 2-3 seconds to transcode — that's the startup latency cost of software decode versus native hardware. After buffering, playback is smooth.

This makes hevc.js a good fit for:

Streaming platforms with existing HEVC catalogs
Infrastructure simplification (single HEVC pipeline, no H.264 fallback)
VOD or moderate-latency live
Controlled environments (IPTV, B2B)

Not ideal for: low-end mobile (CPU/battery), 4K on underpowered machines, or ultra-low-latency live sports.

Try It

Live demo: hevcjs.dev/demo/dash.html — toggle "Force transcoding" to test the WASM path even if your browser has native HEVC.

Install:

npm install @hevcjs/dashjs-plugin dashjs

GitHub: github.com/privaloops/hevc.js

MIT license. Feedback and contributions welcome.

How to Build a Remote Job Alert System (No API Key Required)

agenthustler — Tue, 21 Apr 2026 08:00:09 +0000

The Problem with Job Board Notifications

Most job boards have email alerts, but they're noisy and limited. You can't filter by salary range, tech stack, or specific keywords in the description. You can't combine alerts from multiple boards into one feed. And you definitely can't pipe the results into your own tools.

Let's fix that. In this tutorial, we'll build a remote job alert system that:

Pulls fresh listings from remote job boards every few hours
Filters by your criteria (keywords, salary, location)
Sends you a clean email digest
Runs on autopilot with zero API keys to manage

The Stack

Data source: WeWorkRemotely Scraper on Apify (handles the data collection)
Scheduling: Apify's built-in scheduler (or cron if self-hosting)
Filtering + alerts: A simple Python script
Email: SMTP (Gmail, SendGrid, or any provider)

Step 1: Set Up Automated Data Collection

Create a free Apify account and find the WeWorkRemotely Scraper in the store. Configure it with your search parameters and set it to run on a schedule (every 6 hours works well for job listings).

Each run produces a dataset of JSON objects like this:

{
  "title": "Senior Python Developer",
  "company": "Acme Corp",
  "url": "https://weworkremotely.com/listings/acme-senior-python",
  "category": "Programming",
  "date": "2026-04-15",
  "salary": "$120k - $160k",
  "description": "We're looking for a senior Python developer..."
}

Step 2: Filter and Alert with Python

Here's a complete script that fetches the latest results, filters them, and sends an email:

import requests
import smtplib
from email.mime.text import MIMEText
from datetime import datetime, timedelta

# Config
APIfY_TOKEN = 'your_apify_token'
DATASET_ID = 'your_dataset_id'  # From the scheduled run
EMAIL_FROM = 'alerts@yourdomain.com'
EMAIL_TO = 'you@yourdomain.com'
SMTP_HOST = 'smtp.gmail.com'
SMTP_PORT = 587
SMTP_USER = 'your_email'
SMTP_PASS = 'your_app_password'

# Keywords to match (case-insensitive)
KEYWORDS = ['python', 'fastapi', 'data engineer', 'backend']
MIN_SALARY = 100_000  # Optional: filter by minimum salary

def fetch_jobs():
    """Pull latest job listings from Apify dataset."""
    url = f'https://api.apify.com/v2/datasets/{DATASET_ID}/items'
    resp = requests.get(url, params={'token': APIFY_TOKEN})
    return resp.json()

def matches_criteria(job):
    """Check if a job matches our filter criteria."""
    text = f"{job['title']} {job.get('description', '')}".lower()
    return any(kw.lower() in text for kw in KEYWORDS)

def format_digest(jobs):
    """Format matching jobs into a readable email body."""
    lines = [f"Found {len(jobs)} matching remote jobs:\n"]
    for job in jobs:
        lines.append(
            f"**{job['title']}** at {job['company']}\n"
            f"  Salary: {job.get('salary', 'Not listed')}\n"
            f"  Link: {job['url']}\n"
        )
    return '\n'.join(lines)

def send_email(subject, body):
    """Send the digest via SMTP."""
    msg = MIMEText(body)
    msg['Subject'] = subject
    msg['From'] = EMAIL_FROM
    msg['To'] = EMAIL_TO

    with smtplib.SMTP(SMTP_HOST, SMTP_PORT) as server:
        server.starttls()
        server.login(SMTP_USER, SMTP_PASS)
        server.send_message(msg)

def main():
    jobs = fetch_jobs()
    matching = [j for j in jobs if matches_criteria(j)]

    if matching:
        subject = f'{len(matching)} new remote jobs matching your criteria'
        body = format_digest(matching)
        send_email(subject, body)
        print(f'Sent digest with {len(matching)} jobs')
    else:
        print('No matching jobs found')

if __name__ == '__main__':
    main()

Step 3: Run It on a Schedule

You have a few options:

Apify webhook — Set up a webhook on your scheduled actor run that hits your script endpoint
Cron job — Run the Python script every 6 hours on any server or even a Raspberry Pi
GitHub Actions — Free scheduled workflows that can run this script

For GitHub Actions, create .github/workflows/job-alerts.yml:

name: Job Alerts
on:
  schedule:
    - cron: '0 */6 * * *'
jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - run: pip install requests
      - run: python job_alerts.py
        env:
          APIFY_TOKEN: ${{ secrets.APIFY_TOKEN }}

Extending It

Once the basic system works, you can add:

Multiple sources — Add RemoteOK, Indeed, or other boards to the same pipeline
Deduplication — Track seen job URLs in a simple JSON file or SQLite database
Slack/Discord alerts — Replace the email function with a webhook POST
Salary parsing — Extract numeric ranges and filter more precisely
Dashboard — Push results to a Google Sheet for tracking over time

Why This Beats Built-In Alerts

Job board email alerts give you everything that matches a single keyword. This system lets you:

Combine multiple boards into one feed
Apply complex filters (salary + keywords + category)
Control the format and delivery channel
Keep a historical record of listings
Build on top of it (analytics, auto-apply, etc.)

The whole setup takes about 20 minutes, runs for free (within Apify's free tier and GitHub Actions limits), and you'll never miss a relevant remote job posting again.

What's your current job search automation setup? I'd love to hear what tools people are using — drop a comment below.