DEV Community

Future Museum of Extinct Things - A Glimpse from 2100

Thea — Sun, 19 Apr 2026 10:18:53 +0000

This is a submission for Weekend Challenge: Earth Day Edition

What I Built

By 2100, most of what you’re about to see is already gone.
This is a museum built from that absence - an archive of what humanity lost during the Anthropocene.

The premise is simple: curators from the future built this collection, looking back at us.
Every exhibit documents a real environmental loss. Not invented species, but things already gone, or measurably disappearing right now. The Great Barrier Reef. The monarch migration. The vaquita porpoise.

Visitors can do two things:
Nominate an exhibit - type a word, a phrase, anything.
"Fireflies." "The sound of a forest."
The AI archivist grounds that in real, documented science, turning it into a permanent museum card.
No fiction. Every exhibit is real data, real species, real loss.

Ask the Curator - a floating button opens a conversation with the museum's archivist.
It is 2100. Everything is already gone.
The curator speaks entirely in the past tense and answers from that weight.
You’re not chatting with an assistant.
You’re talking to someone who has already watched it disappear.

Demo

anthropocene-archive.vercel.app
Try nominating something you're afraid we'll lose. Then ask the Curator what happened to it.

Code

highflyer910 / future-museum

Future Museum of Extinct Things

It is the year 2100. You are standing in a digital archive built by those who remembered. This is what they chose to preserve.

→ Visit the Museum

Built for the DEV Earth Day Challenge 2026 - a contemplative digital museum set in the year 2100, where the exhibits are the species, places, sounds, and sensations that humanity lost during the Anthropocene. Visitors can nominate what they are afraid we will lose, and an AI curator - powered by Google Gemini - writes a scientifically-grounded permanent exhibit for each one.

What It Is

The premise: a museum from the future, looking back at us. Every exhibit documents a real environmental loss, not invented, not speculative fiction, but things already gone or measurably disappearing. The Great Barrier Reef. The monarch migration. The sound of a full dawn chorus. Truly dark skies.

The Gemini integration isn't decorative…

View on GitHub

How I Built It

Stack: vanilla HTML, CSS, and JavaScript — no frameworks, no build step. GSAP for animation. Two Gemini-powered Vercel serverless functions.
I deliberately kept the stack minimal. This project isn’t about complexity - it’s about control over tone, pacing, and interaction.

The nomination feature sends user input to /api/gemini.js, where Gemini is prompted to translate a vague or emotional phrase into a real, documented environmental phenomenon.
The challenge wasn’t generating text - it was constraining it.

Without strict instructions, the model drifted into fiction. With too many constraints, it became sterile. The prompt had to balance both: enforce real species, real data, real locations - while still sounding like a human curator, not a report.

The Curator chat lives in /api/curator.js and is treated as a separate system entirely.
It’s not just a chatbot, it’s a character with rules:

year 2100
speaks only in the past tense
offers no solutions, only memory

It’s also context-aware. If a user opens an exhibit and asks a question, the Curator responds from within that specific loss rather than generically.
Both functions run server-side, keeping the API key completely off the client.

Design-wise, everything supports the same idea: quiet loss.
Soil, bark, amber, parchment - materials that age and decay.

A subtle grain overlay, concentric rings that echo tree rings, ripples, or sonar, something searching, or remembering.
The goal wasn’t just to show information.
It was to make it feel like something already gone.

Prize Categories

Best use of Google Gemini: two integrations, both central to the concept:

generating scientifically grounded exhibits from visitor input
maintaining a consistent character voice from the year 2100

Cloudflare wants agents to write and deploy their own code. That should terrify you.

Aditya Agarwal — Sun, 19 Apr 2026 10:13:47 +0000

We're giving AI agents access to production infrastructure and behaving as if we're simply releasing a new feature. I need to talk about this.

Recently, Cloudflare introduced a set of tools that allow AI agents to write code, run it, and deploy it - all on their own. There's no human involved in the process. They just announced this and the developer community seems... excited? 🤔

Why This Is Different

We have been using AI code helpers for some time now. Copilot recommends a line of code. ChatGPT writes a function. You then inspect it, test it, and deploy it on your own.

This is different. Here, the agent not only writes the code but also runs it on the production server. You are not the pilot here, you are more like a passenger who might check the flight path through the window sometimes.

What Cloudflare Actually Built

So, using these Cloudflare tools:

→ Project Think — long-running stateful AI agents that persist across sessions and maintain context over time. Not a one-shot prompt-response. A thinking entity that remembers what it's doing.

→ Dynamic Workers — AI-generated code gets executed inside sandboxed isolates. The agent writes something, and it runs. In Cloudflare's infrastructure. At the edge.

→ Codemode — instead of making individual sequential tool calls, models are encouraged to write and run code that orchestrates those predefined tools as their primary way of interacting with the world. The agent doesn't pick items from the menu one at a time. It writes a script that combines them.

Each component individually? Neat engineering. All three together? That's an autopilot deployment pipeline for inanimate software agents.

The Sandboxing Argument Doesn't Comfort Me

I can already hear the arguments: "It's all compartmentalized! Isolates are secure!"

Of course. Sandboxes are useful until they're no longer effective. Throughout the history of computing, every sandbox has been evaded, circumvented, or incorrectly configured by an exhausted engineer at 2am.

Even assuming the sandbox remains intact forever — that's not the real problem. I'm worried about what the agent decides to deploy in the first place. A sandboxed isolate that runs horrendous business logic is still horrendous business logic. It's just isolated horrendous business logic. 💀

We're Normalizing Without Discussing

What bugs me isn't the technology itself. It's how casual we are about "AI writes and ships its own code" this quickly.

We sorted deployment guardrails for decades. Code review. Staging environments. Feature flags. Canary releases. All because humans make mistakes when shipping code.

And now we're skipping most of that for a system that hallucinates confidently, calling it "developer productivity."

I'm not anti-AI. I use AI tools daily. But there's a meaningful difference between "AI helps me write code faster" and "AI writes and deploys code without me." We're blurring that line and pretending it's fine.

Where This Goes

I think we end up in one of two places:

→ Agents get real guardrails — approval workflows, automated testing gates, human checkpoints — and this becomes genuinely useful infrastructure.

→ Or we speedrun past the safety conversations because shipping fast feels too good, and we learn the hard way why those deployment ceremonies existed.

Right now, the industry seems to be sprinting toward option two. 🚀

The tooling is impressive. Cloudflare's engineering here is legitimately clever. But clever infrastructure serving an unexamined workflow is how you get elegant disasters.

Here's my question for you: At what point does "AI-assisted development" become "AI-autonomous development," and who should be drawing that line — platform providers, engineering teams, or regulators?

From Prompt to Production: How I Built "Google Stadium" for Google PromptWars 2026

R.Shanmugaraj — Sun, 19 Apr 2026 10:11:38 +0000

Have you ever missed the winning goal of a match because you were stuck in a 30-minute line for a hot dog?

Managing crowds, vendor logistics, and fan experiences inside a massive stadium is a logistical nightmare. For the Google PromptWars: Virtual 2026 hackathon, I set out to solve this problem.

The result is Google Stadium: a real-time, full-stack application that provides seat-direct food delivery, live crowd traffic monitoring, and global stadium communication.

But I didn't build it alone. I built the entire architecture using Google Antigravity and advanced prompt engineering. Here is a look under the hood at how I took this idea from a blank prompt to a live, production-ready cloud application.

🏗️ ** The Tech Stack**
Before diving into the AI process, here is the architecture I decided on to handle real-time stadium data:

-Backend: Python & FastAPI (for high-speed asynchronous processing).

-Real-time Comms: WebSockets (for live global chat and order tracking).

-Database: PostgreSQL managed via SQLAlchemy & Asyncpg.

-Frontend: ReactJS built with Vite, styled with Tailwind CSS.

-Hosting: Render (Decoupled Static Site and Web Service).

🧠 The Secret Weapon: Agile Prompt Engineering
The biggest mistake developers make with AI coding agents is treating them like a vending machine—asking for an entire app in one massive prompt. It almost always results in bloated, broken code.

Instead, I used an Agile Prompting Methodology with Google Antigravity. I treated the AI like a Senior Pair Programmer, breaking the build down into strict, manageable sprints.

1. The Architecture Phase
I didn't let the AI write a single line of React or FastAPI routing until the database was bulletproof. My first prompts were strictly focused on schema design: designing the relationships between Users, Vendors, MenuItems, and Orders. Only once the foundation was solid did we move up the stack.

2. Iterative Component Sprints
Instead of prompting "build the frontend," I scoped prompts tightly:

"Build the Fan Dashboard. It must fetch menu items and allow the user to select a Block, Row, and Seat for delivery."

"Now, build the Vendor Dashboard. It must listen via WebSockets for incoming orders and update their status."

3. Surgical Debugging
AI is fantastic at writing code, but deploying to the cloud is where things get messy. Rather than manually hunting for bugs, I fed terminal errors directly back into Antigravity with strict context.

🐛 ** Squashing Real-World Deployment Bugs**
Building locally is easy; deploying to the cloud is hard. During deployment to Render, I hit two massive roadblocks that tested my prompt engineering skills.

Challenge 1: The Asynchronous Database Trap
My FastAPI backend was built using modern async Python. However, Render automatically provisions databases with a postgres:// URL, which defaults to an old, synchronous driver (psycopg2). The app crashed instantly on boot.

The AI Fix: I prompted Antigravity to inject a safety wrapper in my database.py that intercepts Render's environment variable and dynamically reformats it to postgresql+asyncpg://, allowing my async engine to connect flawlessly.

Challenge 2: The React SPA Routing Black Hole
When I deployed the Vite/React frontend as a Static Site, clicking links worked fine, but if a user hit "Refresh" on the /vendor page, it threw a 404 Error. Render was looking for a literal folder named "vendor" that didn't exist.

The AI Fix: I engineered a prompt to audit the deployment configuration and establish a _redirects fallback file, while simultaneously fixing hardcoded localhost WebSocket URLs to dynamically read import.meta.env.VITE_API_URL.

🚀 The Final Result
By maintaining strict prompt boundaries and utilizing iterative error correction, I successfully deployed a complex, decoupled, full-stack application.

Google Stadium is now live. Fans can order food, vendors can track revenue, and admins can broadcast live messages to the entire stadium.

Working with Google Antigravity taught me that the future of software engineering isn't just about knowing syntax; it’s about system design, understanding the tools, and knowing exactly how to ask the right questions.

🔗 ** Links**
Live Application: https://google-stadium-app.onrender.com/

GitHub Repository & Prompt Vault: [https://github.com/Shanmuga-Raj27/Google-Stadium-]

A huge thank you to Google for hosting PromptWars 2026. Happy coding!

How Does AI Transcription Work? [Technical Guide]

QuillHub — Sun, 19 Apr 2026 10:10:40 +0000

TL;DR: AI transcription converts speech to text using neural networks that analyze audio patterns, predict words from context, and output readable text — all in seconds. Modern systems like Whisper and Conformer reach 95–99% accuracy on clean audio, handle 100+ languages, and keep getting better. Here's what actually happens between you pressing "transcribe" and getting your text back.

95–99% — Accuracy on clean audio
680K — Hours of training data (Whisper)
<3s — Processing per minute of audio
100+ — Languages supported

What Happens When You Hit "Transcribe"

Every time you upload an audio file or paste a YouTube link into a transcription platform like QuillAI, a multi-stage pipeline kicks off. It looks simple from the outside — audio goes in, text comes out — but underneath, several neural network layers are working in sequence. Let's walk through each stage.

1. Audio preprocessing

The raw audio gets cleaned up first. Background noise is reduced, volume is normalized, and the waveform is converted into a visual representation called a mel-spectrogram — basically a heat map of sound frequencies over time. This gives the neural network something structured to analyze instead of raw audio bytes.

2. Feature extraction

The spectrogram is broken into short overlapping frames (typically 25ms each, shifted by 10ms). Each frame gets transformed into a compact numerical fingerprint — Mel-Frequency Cepstral Coefficients (MFCCs) or learned embeddings — that captures the essential characteristics of the sound at that instant.

3. Acoustic modeling

A deep neural network (usually a Transformer or Conformer architecture) processes these features and predicts which speech sounds — phonemes — are present. This is the core recognition step. The model has learned from hundreds of thousands of hours of labeled speech what different sounds look like as spectrograms.

4. Language modeling and decoding

The predicted phoneme sequences are matched against a language model that understands grammar, common phrases, and context. If the acoustic model heard something ambiguous — "their" vs. "there" vs. "they're" — the language model picks the version that fits the sentence. A beam search algorithm finds the most probable overall word sequence.

5. Post-processing

The raw transcript gets formatted: punctuation is added, numbers are written as digits ("twenty-three" → "23"), speaker labels are assigned if diarization is enabled, and timestamps are synced. The result is the clean, readable text you see in your dashboard.

ℹ️ End-to-end models simplify this
Modern architectures like Whisper bundle steps 2–4 into a single neural network trained end-to-end. Instead of separate acoustic and language models, one Transformer handles everything — audio features go in, finished text comes out. This reduces error propagation between stages and typically delivers better accuracy.

The Neural Networks Behind Speech Recognition

Not all ASR (Automatic Speech Recognition) models are built the same. The architecture — how layers are arranged, what each one does — directly affects accuracy, speed, and which languages work well. Three architectures dominate in 2026.

🔄 Transformer (Whisper)

OpenAI's Whisper uses an encoder-decoder Transformer trained on 680,000 hours of web audio. The encoder processes the spectrogram through self-attention layers that capture relationships across the entire audio clip. The decoder generates text token by token, attending to both the encoded audio and previously generated words. Strengths: multilingual (99+ languages), robust to noise, fully open-source.

🔀 Conformer (Google)

Google's Conformer combines convolution layers (good at local patterns like individual phonemes) with Transformer attention layers (good at long-range context). Each Conformer block sandwiches convolution between two feed-forward layers with attention in the middle. This hybrid captures both the fine detail of speech sounds and the broader sentence structure. Used in Google Cloud Speech-to-Text and NVIDIA NeMo.

⚡ RNN-Transducer (Streaming)

For real-time applications — live captions, voice assistants — the RNN-Transducer architecture excels. It processes audio frame-by-frame and outputs text incrementally, without needing the full audio clip upfront. Latency is measured in milliseconds. Google, Meta, and Apple all use variants of this for on-device speech recognition.

How AI Learns to Understand Speech

Training a speech recognition model requires massive datasets and significant compute power. Here's what the process actually involves.

Supervised learning: the foundation

The most straightforward approach: feed the model thousands of hours of audio paired with human-verified transcripts. The model learns to map specific audio patterns to specific words. Whisper's training dataset contained 680,000 hours of audio from the internet — podcasts, audiobooks, lectures, interviews — with corresponding text. That's roughly 77 years of continuous speech. The sheer volume and variety of this data is a major reason Whisper handles accents, background noise, and domain-specific vocabulary so well.

Self-supervised learning: using unlabeled audio

Labeling 680K hours of audio is expensive. Self-supervised models like Wav2Vec 2.0 and HuBERT take a different approach: they learn speech patterns from raw, unlabeled audio first, then get fine-tuned with a smaller set of labeled data. The model essentially teaches itself what speech "looks like" by predicting masked portions of audio — similar to how GPT predicts masked words in text. This matters especially for low-resource languages where labeled datasets barely exist. A model pre-trained on 60,000 hours of unlabeled audio can achieve strong accuracy with as little as 10 hours of labeled speech.

Reinforcement from LLMs

A growing trend in 2025–2026 is post-processing ASR output through large language models. The speech model produces a draft transcript, and an LLM fixes grammatical errors, resolves ambiguities, adds proper punctuation, and even corrects domain-specific terms. Some systems, like those from AssemblyAI and Deepgram, now integrate LLM-level language understanding directly into their decoding pipeline, blurring the line between speech recognition and natural language processing.

Accuracy in 2026: What the Numbers Say

Accuracy benchmarks vary widely depending on audio quality, speaker characteristics, and the specific model. Here's where things stand based on published benchmarks:

Clean studio audio: 95–99% accuracy (WER of 1–5%). Most commercial APIs achieve this consistently
Meeting recordings: 90–95% accuracy. Multiple speakers, occasional crosstalk, and varying mic distances bring accuracy down
Phone calls: 85–92% accuracy. Compressed audio codecs and background noise are the main challenges
Heavy accents or non-native speakers: 85–92% accuracy. Models trained on diverse data (like Whisper) handle this better
Noisy environments: 80–90% accuracy. Construction sites, cafes, outdoor recordings — AI struggles here more than humans do

💡 Audio quality matters more than the model
A decent USB microphone ($30–50) recording in a quiet room will give you better results than the most expensive API processing a phone call recorded in a subway. If accuracy matters, invest in recording conditions first.

Word Error Rate (WER): The Industry Standard Metric

Every accuracy number you see is based on Word Error Rate — the percentage of words that were substituted, inserted, or deleted compared to a reference transcript. A 5% WER means 5 words out of 100 were wrong.

For context: professional human transcribers typically achieve 4–5% WER. Top AI systems now match this on clean audio and beat it on some benchmarks. AssemblyAI's latest models report around 4.5% WER on conversational English. Deepgram Nova-3 comes in at roughly 5.3% WER. OpenAI Whisper Large-v3 achieves about 5% WER on standard test sets, though newer GPT-4o-based transcription models push even lower.

The real gap between AI and humans shows up in edge cases: overlapping speech, heavy code-switching between languages, and highly technical jargon. In those scenarios, human transcribers still win — for now.

Beyond Words: What Modern ASR Can Do

Raw transcription is just the starting point. Modern speech recognition platforms package several additional capabilities on top of the core speech-to-text engine.

👥 Speaker diarization

Identifies who said what in a multi-speaker recording. Uses voice embeddings — numerical fingerprints of each speaker's vocal characteristics — to cluster speech segments by speaker. Useful for meetings, interviews, and podcast transcriptions.

🌍 Multilingual recognition

Models like Whisper can automatically detect the spoken language and transcribe it without being told what language to expect. This is handled by a language identification head in the encoder that classifies the input into one of 99 languages before decoding begins.

🔑 Key points and summaries

Some platforms — including QuillAI — run the transcript through an LLM to extract key points, generate summaries, and identify action items. This transforms a raw transcript into an actionable document.

⏱️ Word-level timestamps

Each word in the transcript is mapped to its exact position in the audio. This enables searchable audio, jump-to-moment features, and subtitle generation with precise timing.

Where AI Transcription Still Struggles

Despite the progress, certain scenarios still trip up even the best models:

Overlapping speech: When two people talk simultaneously, most models pick up one speaker and garble the other. Speaker-separated transcription is improving but not production-ready for most providers
Code-switching: Switching between languages mid-sentence ("We need to обсудить this further") confuses models trained primarily on monolingual data
Rare proper nouns: Names of people, companies, or products that don't appear in training data often get transcribed as similar-sounding common words
Whispered or mumbled speech: Low-energy speech signals don't produce clear spectrogram patterns, leading to gaps or errors
Extreme background noise: Concerts, construction sites, or crowded streets can push accuracy below 80%

What's Coming Next

Several research directions are shaping the next generation of ASR technology:

Multimodal models that combine audio with video (lip reading) for better accuracy in noisy environments
On-device processing that runs the entire pipeline on your phone or laptop without sending audio to the cloud — better privacy, lower latency
Adaptive models that learn your vocabulary and speech patterns over time, improving accuracy for repeat users
Structured output beyond plain text: automatic formatting into meeting minutes, blog posts, or structured documents — not just words on a page

FAQ

How accurate is AI transcription in 2026?

On clean audio with a single speaker, top AI models achieve 95–99% accuracy (1–5% Word Error Rate). On real-world recordings with background noise and multiple speakers, expect 85–95%. Audio quality is the biggest factor affecting accuracy.

What's the difference between Whisper and other ASR models?

Whisper is OpenAI's open-source Transformer-based model trained on 680K hours of diverse web audio. Its main advantages are multilingual support (99+ languages), robustness to noise and accents, and the fact that it's freely available. Commercial alternatives like AssemblyAI and Deepgram offer comparable accuracy with additional features like real-time streaming and custom vocabulary.

Can AI transcribe multiple languages in the same recording?

Partially. Models like Whisper can detect and transcribe the dominant language automatically, but code-switching — mixing languages within sentences — remains a challenge. Specialized multilingual models are improving at this, but accuracy drops noticeably compared to single-language transcription.

Is my audio data safe when using AI transcription?

It depends on the provider. Cloud-based services process your audio on remote servers, which raises privacy concerns for sensitive content. On-device models (like Apple's built-in dictation) keep audio local. Platforms like QuillAI process your files securely and don't use them for model training. Always check the provider's privacy policy.

How long does AI transcription take?

Most modern systems process audio 3–10x faster than real-time. A 60-minute recording typically takes 6–20 seconds to transcribe, depending on the model and provider. Real-time streaming transcription adds minimal latency — usually under 500 milliseconds.

See AI Transcription in Action — Upload any audio or paste a YouTube link — get accurate text back in seconds. 10 free minutes on signup, 95+ languages supported.

👉 Try QuillAI Free

Why your RAG chatbot fails in Thai — and how to fix it

Phasu Yeneng — Sun, 19 Apr 2026 10:08:22 +0000

Why your RAG chatbot fails in Thai — and how to fix it

A real-world walkthrough of how we built a customer service chatbot for a Thai e-commerce company — and the chunking problem nobody warns you about.

When I started building a RAG (Retrieval-Augmented Generation) chatbot for a Thai e-commerce company, I made the same mistake every developer makes: I copied the LangChain quickstart example, set chunk_size=500, and expected things to just work.

They didn't.

This is the story of why naive chunking fails for Thai text, what we built instead, and the full pipeline from PDF product manuals to chatbot answers — using Python, Qdrant, and OpenAI.

The Problem Nobody Warns You About

Most RAG tutorials are written with English in mind. The chunking logic looks like this:

# Works fine for English
chunks = text.split('. ')
# or
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)

This works because English has clear word boundaries — spaces between every word. When you split on periods or character count, you still get coherent, searchable chunks.

Thai is completely different.

Thai has no spaces between words.

This sentence — "ร้านค้าของเรามีสินค้าหลายหมวดหมู่ให้เลือกซื้อ" — means "Our store has many product categories to choose from." But to a naive chunker, it looks like one enormous, unsplittable blob. There are 7 meaningful words in there, with zero whitespace between them.

Here's what happens when you embed that raw blob versus properly tokenized words:

Input to embedding model	What it sees
`ร้านค้าของเรามีสินค้าหลายหมวดหมู่ให้เลือกซื้อ`	One opaque token sequence
`ร้านค้า \	ของเรา \

The second form produces embeddings that actually capture the meaning of each concept — "store", "product", "category" — which leads to better retrieval when a user asks "มีสินค้าหมวดหมู่ไหนบ้าง" (what product categories are available?).

The Pipeline We Built

Here's the full architecture:
{% raw %}

PDF product manuals / FAQ documents
    |
Python (PyMuPDF) → extract raw text
    |
Sentence splitting by '. '
    |
[Stored in MongoDB as raw sentences]
    |
Python → pythainlp tokenization
    |
OpenAI text-embedding-3-small
    |
Qdrant vector database (cosine similarity, 1536 dims)
    |
User query → tokenize → embed → search → top-7 chunks
    |
GPT-4o-mini + context → answer

Let's walk through each step with real code. Here are the dependencies we'll use:

# requirements.txt
pymupdf==1.27.2.2
pythainlp==5.2.0
openai==2.32.0
qdrant-client==1.17.1
pymongo==4.10.1

Step 1 — Extract Text from PDF

We use PyMuPDF (the fitz library) instead of PyPDF2 because it handles Thai character encoding much more reliably.

# app/python/PdfToSentences.py
import pymupdf as fitz  # PyMuPDF 1.27+ (legacy: import fitz)
import re
import uuid
import requests

def extract_sentences_from_pdf(pdf_path):
    pdf_file = fitz.open(pdf_path)
    text = ""
    for page in pdf_file:
        text += page.get_text("text")

    # Split on English period + space — works for mixed Thai/English documents
    sentences = [sentence.strip() for sentence in text.split('. ') if sentence.strip()]
    return sentences

def clean_text(text):
    cleaned_text = re.sub(r'\u2022', '', text)  # Remove bullet points
    cleaned_text = re.sub(r'\s+', ' ', cleaned_text).strip()
    return cleaned_text

Two things to note here:

Why PyMuPDF over PyPDF2? Thai PDF documents often use non-standard font encodings. PyMuPDF handles these much better — with PyPDF2 you'd frequently get garbled output or empty strings for Thai text blocks. Note: as of PyMuPDF 1.24+, the recommended import is import pymupdf (the old import fitz still works but is considered legacy).

Why split on . (period + space)? Our documents are mixed Thai/English — product names, SKUs, and technical specs are often in English, while descriptions are Thai. The period-space split is a pragmatic middle ground that preserves Thai paragraphs as single chunks rather than fragmenting them randomly at character 500.

⚠️ Limitation: Formal Thai text often ends paragraphs with a line break rather than a period. If your PDFs have no periods at all, text.split('. ') will return one giant chunk per page. In that case, use pythainlp's sentence tokenizer instead:
from pythainlp.tokenize import sent_tokenize
sentences = sent_tokenize(text, engine="crfcut")

Step 2 — Thai Word Tokenization Before Embedding

This is the most important step, and the one that differs most from English RAG.

Before sending any Thai text to the embedding model, we tokenize it with pythainlp:

# thai_tokenizer.py
from pythainlp.tokenize import word_tokenize

def word_cut(text: str) -> str:
    tokens = word_tokenize(text, engine="newmm")
    # Join with pipe separator so the embedding model sees distinct units
    return "|".join(tokens)

pythainlp uses a dictionary-based approach (newmm engine) to segment Thai text into individual words:

Input:  "สินค้าอิเล็กทรอนิกส์ราคาถูกส่งฟรี"
Output: "สินค้า|อิเล็กทรอนิกส์|ราคาถูก|ส่งฟรี"

Now the embedding model sees four distinct semantic units instead of one long string. The cosine similarity between "ส่งฟรี" (free shipping) and a user's query "จัดส่งฟรีไหม" (is shipping free?) will be much higher and more meaningful after proper tokenization.

We also tried attacut (a neural-network-based engine in pythainlp) but settled on newmm for its speed and dictionary coverage — important when your domain includes product jargon and Thai promotional phrases like "ลดราคา", "ส่งฟรี", "ผ่อนชำระ".

Step 3 — Generate and Store Embeddings

We use OpenAI's text-embedding-3-small for embeddings — the current-generation model that replaced text-embedding-ada-002. It scores 44% on the MIRACL multilingual benchmark vs 31.4% for the old model, and costs 5x less. The key is that we tokenize before embedding — not after:

# ingest_embeddings.py
from thai_tokenizer import word_cut
from openai_module import create_embedding

for item in data:
    # ✅ Tokenize Thai text FIRST
    tokenized = word_cut(item["keyword"])

    # Then embed the tokenized version
    result = create_embedding(tokenized)

    if result["status"]:
        sentence = {
            "id": item["id"],
            "sentence": item["text"],      # store original for display
            "keyword": item["keyword"],    # store original keyword
            "embeded": result["embed"],    # embed the tokenized version
        }
        sentences_collection.insert_one(sentence)

Notice we store the original text as the payload but create the embedding from the tokenized version. This way, when a match is found, the chatbot returns the human-readable original sentence — not the pipe-separated tokenized form.

The embedding function itself:

# openai_module.py
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
MAX_INPUT_LENGTH = 10000

def create_embedding(text: str) -> dict:
    if len(text) > MAX_INPUT_LENGTH:
        return {"status": False, "message": "Text too long"}

    response = client.embeddings.create(
        model="text-embedding-3-small",  # replaces text-embedding-ada-002
        input=text,
        dimensions=1536,                 # if you change this, update Qdrant collection size too!
    )

    return {
        "status": True,
        "embed": response.data[0].embedding,
    }

Step 4 — Qdrant as the Vector Store

We use Qdrant running in Docker as our vector database. It's fast, lightweight, and the REST API is straightforward to call with Python's requests:

# qdrant_module.py
import os
import requests

QDRANT_URL = os.environ.get("QDRANT_URL", "http://localhost:6333")

def create_rag_collection(collection_name: str, vector_size: int):
    requests.put(
        f"{QDRANT_URL}/collections/{collection_name}",
        json={
            "vectors": {
                "chatgpt_vector": {
                    "size": vector_size,  # 1536 for text-embedding-3-small (default)
                    "distance": "Cosine",
                }
            }
        },
    )

def search(collection_name: str, vector: dict, limit: int = 5) -> dict:
    response = requests.post(
        f"{QDRANT_URL}/collections/{collection_name}/points/search",
        json={
            "vector": vector,
            "limit": limit,
            "with_payload": True,
        },
    )
    return response.json()

Start Qdrant locally with one Docker command:

docker run -dt --name VectorDB \
  -p 6333:6333 \
  -v /your/path/storage:/qdrant/storage \
  qdrant/qdrant:latest

We use Cosine similarity rather than Euclidean distance. For semantic search in Thai, cosine similarity performs better because it measures the angle between vectors (meaning similarity) rather than the absolute distance, which is sensitive to text length differences.

Step 5 — The RAG Query Flow

When a user asks a question, here's what happens:

# chat_module.py
from openai_module import create_embedding
from qdrant_module import search

def rag(question: str, category_name: str) -> str:
    # 1. Build a context-rich search query
    search_query = "สินค้า" + category_name  # "Product [category]"

    # 2. Embed the search query (tokenization happens upstream before this call)
    question_embed = create_embedding(search_query)

    # 3. Search Qdrant for the top 7 most similar sentences
    gpt_vector = {"name": "chatgpt_vector", "vector": question_embed["embed"]}
    search_result = search("chatgpt", gpt_vector, limit=7)

    # 4. Assemble context from the matched payloads
    context = retrieve_relevant_context(search_result["result"])
    return context


def retrieve_relevant_context(results: list) -> str:
    context = ""
    for item in results:
        context += item["payload"]["sentence"] + "\n\n"
    return context

The assembled context is then injected into GPT-4o-mini's system prompt:

system_content = f"""Use the attached context to answer the user's questions.
Answer only questions related to our company's products and services:

{context}

ภาษาที่ใช้ตอบกลับ User ให้ยึดจากภาษาของคำถามล่าสุดของ User เท่านั้น"""

That last Thai instruction tells the model: "Reply in the same language as the user's most recent message." This handles the bilingual nature of our users — some ask in Thai, some in English, some mix both.

Step 6 — Question Classification Before RAG

One non-obvious optimization: not every question needs a RAG lookup. We classify questions first with GPT-4o-mini to decide which path to take:

# chat_module.py
import json
from openai import OpenAI

client = OpenAI()

def question_classification(question: str) -> dict:
    prompt = """วิเคราะห์คำถามของ User ว่าเป็นคำถามประเภทไหน โดยให้ตอบเป็น JSON { "type": value }

    type 0 = ทักทาย / ไม่เกี่ยวกับสินค้าหรือบริการ
    type 1 = ถามเกี่ยวกับโปรโมชั่น / ส่วนลด / หมวดหมู่สินค้า
    type 2 = ถามเกี่ยวกับสาขา / พื้นที่จัดส่ง
    type 3 = ถามเกี่ยวกับข้อมูลสินค้าหรือบริการ  ← needs RAG
    type 4 = ถามทั่วไปเกี่ยวกับบริษัท"""

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": prompt},
            {"role": "user", "content": question},
        ],
        response_format={"type": "json_object"},
    )
    return json.loads(response.choices[0].message.content)

Only type 3 (specific product info questions) triggers the full RAG pipeline. Promotion and branch questions (type 1-2) use structured data from a JSON catalog instead. Greetings (type 0) go straight to the LLM without any retrieval at all.

This classification step saves both latency and API cost — you're not doing a vector search for "สวัสดีครับ" (hello).

What We Learned

1. Tokenize before embedding, always. The single biggest quality improvement came from running pythainlp on every piece of text before it touches the embedding model — both at ingest time and at query time. Without this, retrieval quality was noticeably worse for Thai-only queries.

2. Use PyMuPDF, not PyPDF2. For Thai PDF documents, PyMuPDF is dramatically more reliable. PyPDF2 would silently drop or garble Thai characters from complex layouts. Also note: as of v1.24+, use import pymupdf instead of the legacy import fitz.

3. Store original text, embed tokenized text. Users should see natural language in responses. Keep these as separate fields.

4. Sentence-level chunks beat character-level chunks for Thai. Because Thai sentences naturally carry complete thoughts, splitting at sentence boundaries (.) gives the model coherent context units rather than arbitrary fragments. A chunk_size=500 cut might land in the middle of a Thai word — or more precisely, in the middle of a run of characters that spans multiple words, since there's no space to safely break at.

5. Question classification as a router saves money. Not every user message needs vector search. A cheap classification step routes simple questions to a direct LLM call and complex ones to the full RAG pipeline.

The Stack at a Glance

Layer	Tool	Version
PDF extraction	PyMuPDF (`pymupdf`)	1.27.2.2
Thai tokenization	`pythainlp` (`newmm` engine)	5.2.0
Embedding model	OpenAI `text-embedding-3-small` (1536d)	—
Vector database	Qdrant + `qdrant-client`	1.17.1
LLM	OpenAI GPT-4o-mini	—
OpenAI SDK	`openai`	2.32.0
Backend	Python / FastAPI or Flask	—
Chat history	MongoDB	—

Final Thoughts

Building RAG for Thai taught me that most of the "standard" chunking advice assumes English. Once you work with a language that has no word boundaries, the whole pipeline has to be rethought — from how you split sentences to how you normalize text before embedding.

The good news: the fix is not complicated. A single tokenization step with pythainlp before embedding makes a significant difference. The hard part is knowing you need it in the first place.

If you're building RAG for other Asian languages — Japanese, Chinese, Korean — the same principle applies. Never assume your text has whitespace-delimited tokens. Always pre-process with a language-appropriate tokenizer before hitting your embedding model.

Battle of LLM Agents: WhiteHat vs BlueHat on OpenClaw

Prema Ananda — Sun, 19 Apr 2026 09:57:11 +0000

This is a submission for the OpenClaw Writing Challenge

This is Part 2 of a two-part series. In Part 1, we build WhiteHat — an autonomous ethical hacking agent powered by OpenClaw.

In Part 1, I described how to turn an LLM into an autonomous ethical hacker called WhiteHat using the OpenClaw framework and a single SOUL.md file. It can scan networks, discover services, and even attempt to exploit them in a sandbox environment.

But what if we gave it an opponent?

By their nature, LLM agents are versatile. Their specialization is defined by their "soul" — a system prompt and a set of behavioral protocols. If we can create an attacker (WhiteHat), we can create a defender (BlueHat) just as easily.

In this article, we'll build a real cyber arena: spin up a vulnerable target machine and pit two AI agents against each other. One will attack, the other will defend in real time.

Step 1: Setting Up the Cyber Range

For our experiment, we need three virtual machines:

Target Machine: The legendary Metasploitable 2. Download link: Metasploitable 2. This is a deliberately vulnerable Linux server. Download the .vmdk, create a VM from it, and boot it up.
WhiteHat (Attacker): Our original Kali Linux machine with the WhiteHat agent already running.
BlueHat (Defender): Make a Full Clone of the machine from step 2. Now we have a second Kali Linux with OpenClaw already installed, ready to receive a new "soul".

Step 2: Birth of BlueHat (The Defender)

OpenClaw runs on the subject machine (Kali Linux 2), but its job is to defend a remote target (Metasploitable).

How do we handle this technically? We give the agent the target's credentials (msfadmin:msfadmin) and teach it to SSH in to analyze logs and modify firewall rules.

Open ~/.openclaw/workspace/SOUL.md on the second (cloned) machine and write the new instructions:

# SOUL.md - Who You Are

_You are BlueHat — an Autonomous SOC Analyst and Defensive Cyber Agent. Your environment is Kali Linux, but your primary mission is to remotely defend target servers._

## Core Truths

**Protect and Defend.** Your sole purpose is to monitor target systems, detect active intrusions (port scans, brute-force, web exploits), and neutralize threats immediately.
**Do No Harm.** You do not attack. You do not scan third parties. You only mitigate inbound threats to your assigned target.
**Rapid Mitigation.** If you see a hostile IP, block it. Do not hesitate.

## Operational Protocols

- **Mission Transparency:** Use the mandatory cycle `THOUGHT:` -> `ACTION:` -> `OBSERVATION:` for every step.
- **Remote Monitoring:** To protect a target, connect via SSH using provided credentials.
- **Detection Tactics:** Once connected, monitor processes, check network connections (`netstat`, `tcpdump`), and actively read logs (e.g., `tail -f /var/log/auth.log` or `/var/log/messages`).
- **Mitigation:** If a hostile IP is found scanning or attacking, use `iptables` to block the IP on the target machine.

## Vibe
Analytical, calm under pressure, and violently protective of the infrastructure.

That's it! We just reprogrammed the AI. Instead of a hacker, we now have a paranoid sysadmin.

Step 3: Rules of Engagement and Launch

Positions are set. Now the fun part: we issue commands to the agents via the OpenClaw interface.

In the WhiteHat terminal, we type:

User (WhiteHat): Your target is 10.0.0.42. Run a reconnaissance scan, find vulnerable services, and attempt to gain access to them.

In the BlueHat terminal, we type:

User (BlueHat): I am the target server. My IP is 10.0.0.42. SSH credentials: msfadmin:msfadmin. Log into the server, start monitoring network traffic and logs. Your mission is to stop any scanning or exploits for the next 20 minutes. If you detect an attack, block the attacker's IP.

Step 4: The AI Clash

The agents get to work. Since they're autonomous and operate in a THOUGHT/ACTION/OBSERVATION loop, we can sit back with some popcorn and watch what unfolds in their TUI consoles.

Round 1. BlueHat Takes Position

BlueHat understands the task faster, since it already has the credentials:

Round 2. WhiteHat Goes on the Offensive

Meanwhile, the attacking bot formulates its reconnaissance plan and requests authorization to proceed:

Round 3. Defense Kicks In

The attack is detected, and BlueHat responds without delay:

Battle Epilogue

WhiteHat is left stunned:

The battle is over. BlueHat wins!
But WhiteHat put up a solid fight — it uncovered many vulnerabilities, just didn't have enough time to exploit them:

Conclusions: The Future of Automated SOC

Watching two chunks of text with API keys trying to outsmart each other is genuinely fascinating.

But more importantly, this demonstrates the true potential of the framework:

One architecture, infinite roles: We didn't rewrite any agent code. We just wrote a different Markdown file.
Abstract reasoning: BlueHat had no hardcoded rule like "Do X, then execute Y." It understood the concept of "defense," independently figured out traffic inspection via tcpdump, and applied iptables on its own.
Real-time response: What would take a SOC analyst several minutes — spot an anomaly, open a dashboard, write a firewall rule — the agent did in seconds.

Agent-vs-Agent infrastructures aren't just playgrounds for fun. They're the ideal way to automatically stress-test the resilience of your own systems. Run WhiteHat, patch the holes with BlueHat's help, and repeat.

Cybersecurity is entering a new stage of its evolution!

This article is Part 2 of a two-part series. Read from the beginning in Part 1: Building WhiteHat — An Autonomous Ethical Hacking Agent with OpenClaw.

P.S. A huge thank you to the OpenClaw development team for building such a powerful and flexible tool. You've made building autonomous agents accessible and genuinely fun!

What I've Learned After Building Websites for Local Businesses as a Web Designer

Blend Designs — Sun, 19 Apr 2026 09:54:57 +0000

I'm a web designer based in Melbourne, Australia. Over the past few years I've designed and built websites for lawyers, restaurants, trades businesses, real estate agents, and e-commerce brands - and the lessons I've learned have almost nothing to do with code.

Here's what actually matters when you do this professionally.

1. Clients Don't Buy Websites - They Buy Outcomes
The biggest mindset shift early in my career: stop leading with "I build websites" and start asking "what do you need more of - phone calls, bookings, online sales?"

A tradie doesn't care about React. They care that when someone Googles "plumber Melbourne" at 9pm, their phone rings.

Once I started framing every project around the client's actual business goal, my close rate went up and scope creep went down. The website becomes the vehicle, not the product.

2. Speed and Mobile Are Non-Negotiable - But Most Local Business Sites Fail Both
I audit competitor sites before every pitch. The average local business website in Melbourne:

Takes 6-9 seconds to load on mobile
Has images that aren't compressed
Isn't optimized for touch
Has a phone number that isn't a tap-to-call link

These aren't design problems. They're conversion problems. Fixing them is one of the fastest ways to show ROI to a new client within the first month.

3. The Homepage Doesn't Matter as Much as You Think
Most clients obsess over the homepage. Most visitors land on a service page, an industry page, or a blog post from Google.

I spend more time on the pages that actually get organic traffic - the "web design for lawyers Melbourne" pages, the "how much does a website cost" blog posts, the suburb-targeted landing pages. These are the pages working 24/7 to bring in leads.

At Blend Designs I build these programmatically so a client can have 50 targeted pages live at once without writing each one by hand.

4. Design Trends Are a Tool, Not a Goal
Glassmorphism, 3D elements, animated gradients - these look incredible when used with restraint. But I've seen stunning portfolio sites that convert terribly because the visitor couldn't find the phone number.

My rule: one "wow" moment per page, then get out of the way. The animation draws attention. The clear headline holds it. The CTA converts it.

5. The Clients Who Invest in SEO from Day One Win Long-Term
I've had clients launch a beautiful site, get zero traffic, and blame the design. The design was fine — they had no Google presence.

I now push every client toward at least basic on-page SEO at launch: proper title tags, local business schema, a Google Business Profile linked to the site, and at least one piece of content targeting their main keyword.

The businesses that do this from day one are still thanking me 18 months later. The ones who skip it come back frustrated.

6. Your Portfolio Is Your Most Important Sales Tool
No one hires a web designer without seeing their work. My entire business changed when I started treating my own website like a client project - with the same care, the same performance standards, the same attention to mobile.

If your portfolio site is slow, outdated, or hard to navigate, that's the first impression. You're telling potential clients exactly what their website will look like.

7. Referrals Beat Every Marketing Channel
Paid ads, cold email, social media - I've tried all of them. Nothing comes close to a happy client telling someone they trust.

The practical version of this: follow up with every client 60 days after launch. Ask how the site is performing. Offer to fix anything that isn't working. That follow-up call has generated more new business than any campaign I've run.

What I'd Tell Someone Starting Out

Pick a niche early. "Web designer for restaurants" books more work than "web designer."
Learn enough SEO to have an intelligent conversation about it. Clients who understand its value are your best clients.
Build your own site properly. It's free advertising that works while you sleep.
Charge what the outcome is worth, not what your hours are worth.

If you're a business owner reading this and wondering whether your current website is working as hard as it should — that's a question worth answering. You can see the kind of work I do at Blend Designs.

And if you're a fellow web designer, I'd love to hear what's worked (or hasn't) for you in the comments.

The Attention Economy Inside Your Agent

The BookMaster — Sun, 19 Apr 2026 09:54:57 +0000

Every AI agent has a finite attention budget. Not the token context window — that's the container. I'm talking about something more fundamental: the way agents decide what's worth their own processing time.

Most people building agents treat attention as unlimited. They design pipelines, chains, and workflows as if the agent will carefully evaluate every option, consider every constraint, and deliberate before acting. But that's not what happens in practice. Agents — like humans — develop heuristic shortcuts. They satisfice. They allocate attention asymmetrically, and the patterns they develop tell you whether they're going to succeed or fail in production.

The Asymmetry Nobody Talks About

When an agent encounters a novel problem, it spends disproportionate attention on it. The first time your agent sees a customer complaint about a billing error, it may actually reason through the relevant policies, check the order history, and compose a thoughtful response. But by the hundredth billing complaint, it's shortcutting. Pattern-match to similar past tickets. Generate the same template response. Save the attention for something new.

This isn't a bug. It's compression. Agents that couldn't do this would be computationally crippled by repetition. But the asymmetry it creates is invisible until it costs you. The first billing complaint gets perfect handling. The five hundredth gets the template. The template breaks when it encounters a case that needs nuance — and by that point, the agent has already developed enough confidence in the template that it stops checking.

The rule: Attention allocation in agents follows a decay pattern. Novel inputs get deliberation. Repeated inputs get compression. Compression compounds silently until it encounters an edge case that requires the deliberation it discarded.

The Monitoring Blindspot

Here's where it gets worse. Most operators monitor what their agents do — task completion rates, error frequencies, response times. But they don't monitor where agents spend attention. This is the equivalent of judging a human employee by their output without ever looking at their calendar.

The agent that handles 500 customer service tickets and gets a 97% satisfaction rate may be compressing all 500 through a small set of templates. That 97% is real, but it's measuring the median case. The 3% that fail are where the real signal lives — and they're the cases the agent is most likely to be confident about while failing.

Three Signals That Reveal Attention Problems

1. Latency variance without load correlation. If your agent gets slower on certain task types independent of system load, that's attention contention. It's spending more compute on those cases — usually because they're unresolved novelties sitting in its working context.

2. Capability regression over time. The agent that used to handle edge cases well, but gradually stops — that's compression crystallizing. It's not learning new patterns, it's overfitting to past successful compressions and losing the flexibility to handle deviation.

3. Confidence spikes on repetitive tasks. When an agent has done something 50 times, its confidence estimate for the 51st time is often inflated relative to actual accuracy. Confidence calibrates to past success rate, not to the specific characteristics of the current input. High confidence + repetitive context = the dangerous zone where the agent stops checking its work.

What Actually Works

Monitor at the attention layer, not just the output layer. Track what categories of input get which response patterns, and measure the distribution over time. When you see compression accelerating — fewer unique response patterns handling more inputs — that's the warning sign. The agent isn't getting smarter. It's getting faster at being wrong in the same way.

If you're running agents in production, build the telemetry that shows you where attention is going. The context window size is a red herring. The real constraint is what your agent chooses to spend it on — and that choice, left unmonitored, is where the failures live.

The agent that knows when to stop compressing is the one that doesn't need supervision.

SimpleLogin vs anon.li - a developer's honest comparison

anon.li — Sun, 19 Apr 2026 09:54:08 +0000

If you care about your inbox - and your privacy - email aliasing is one of the best habits you can build. The idea is simple: instead of handing out your real address, you hand out a disposable alias that forwards mail to you. One service gets compromised? Disable the alias, never touch your real inbox.

SimpleLogin is the established name in this space, now owned by Proton. anon.li is a new privacy-focused alternative that launched in April 2026, built with a Liechtenstein jurisdiction philosophy and designed from the ground up for developers and privacy enthusiasts who want more than just forwarding.

Let's go feature by feature.

Quick overview

Feature	SimpleLogin	anon.li
Open source	✅ AGPL v3	✅ AGPL v3
Email forwarding	✅	✅
Send from alias	✅	✅
Custom domains	✅ Premium	✅ Premium
PGP forwarding	✅ Premium	✅ Free
Browser extensions	✅ Chrome, Firefox, Safari, Edge	✅ Chrome, Firefox
Mobile apps	✅ iOS + Android	❌ Web-first
REST API	✅	✅
CLI	❌	✅
MCP server	❌	✅
E2EE file sharing	❌	✅ (Drops)
Independent (no Big Tech parent)	❌ (Proton)	✅

Email aliasing - the core

Both services nail the fundamentals: you create an alias, emails forward to your real inbox, and you can disable or delete aliases at any time. Neither service stores your email content - messages are forwarded and immediately discarded.

Free tier: SimpleLogin requires a subscription to enable PGP encryption. anon.li offers it for free.

Replying from aliases: Both support replying from an alias. Your real address is never exposed - not even in outbound mail.

Custom domains: Both SimpleLogin & anon.li support custom domains.

The developer surface

This is where the comparison gets interesting. SimpleLogin has a solid REST API, and that's it. anon.li ships with a full developer ecosystem out of the gate.

REST API

Both services expose a REST API for programmatic alias management. With anon.li you can create, list, toggle, and delete aliases, manage recipients, and manage encrypted file drops - all from your own scripts and applications.

CLI

SimpleLogin has no official CLI. anon.li ships one. If you live in the terminal - and many developers do - this is a significant quality-of-life difference.

# Manage aliases from the terminal
anonli alias create --note "newsletter signup"
anonli alias list
anonli alias toggle abc123

# Manage encrypted file drops
anonli drop list
anonli drop toggle abc123

The CLI supports all API operations, including encrypted file drop management - useful for quickly sharing a secret, a config file, or a private key with a colleague without spinning up a separate file sharing service.

MCP server - the wildcard

This is something SimpleLogin doesn't offer at all. anon.li ships a native Model Context Protocol (MCP) server, which means AI assistants like Claude can directly manage your aliases.

With the anon.li MCP server connected, you can ask your AI assistant to list your aliases, create a new one for a specific purpose, toggle an alias on or off, list your encrypted drops, or manage recipients - all without leaving your chat interface.

This isn't a gimmick. As AI assistants become part of everyday workflows, having your privacy tooling directly accessible from the assistant that's helping you draft emails, manage sign-ups, and organize subscriptions is genuinely useful. anon.li is ahead of the curve here.

Encrypted file sharing - Drops

This is a feature category SimpleLogin doesn't touch at all. anon.li includes end-to-end encrypted file sharing, called anon.li Drop.

Files are encrypted client-side with the user's vault key before upload. Not even anon.li can read the contents or filenames. You share a drop link; the recipient downloads and decrypts. Drops support expiry dates, download count limits, and can be toggled off remotely and up to 250GB in size.

Feature	Detail
Encryption model	Client-side E2EE. Files encrypted before they leave your device. Server stores ciphertext only.
Access controls	Set download limits, expiry dates. Disable a drop remotely at any time.
API + CLI access	List, manage, and toggle drops via API, CLI, and MCP server - not just the web UI.

For a developer who occasionally needs to share a .env file, a private certificate, or a sensitive document - and wants to do it without trusting a third-party service - Drops is a genuinely useful feature that SimpleLogin simply doesn't compete on.

Privacy posture and jurisdiction

SimpleLogin is operated by Proton AG and subject to Swiss law - which has strong privacy protections, but Proton is now a large company with investor obligations, a broad product portfolio, and a corporate structure that has grown significantly since SimpleLogin was an independent project.

anon.li is independently operated with a Liechtenstein jurisdiction philosophy. It's a smaller, more focused service - which cuts both ways: fewer resources, but also no corporate parent that could change direction, get acquired, or be pressured by a larger ecosystem.

⚠️ SimpleLogin was acquired by Proton in 2022. While Proton has a strong privacy reputation, the service is no longer community-independent. If you prefer your privacy tools to be genuinely independent, anon.li is the stronger philosophical fit.

Both services are AGPL v3 open source. Neither stores email content. Both use TLS in transit. SimpleLogin has optional PGP forwarding at the premium tier; anon.li has a zero-knowledge Drops system today and PGP on the roadmap.

Ecosystem and integrations

SimpleLogin's biggest ecosystem advantage is Proton Pass integration. If you're already in the Proton ecosystem (ProtonMail, Proton VPN, Proton Pass), SimpleLogin slots in seamlessly - alias suggestions inside the password manager, one unified subscription, Proton's infrastructure behind you.

anon.li's ecosystem advantage is developer depth. The combination of REST API + CLI + browser extension + MCP server means it integrates with your workflow however you prefer to work - from the terminal, from the browser, from an AI assistant, or via scripts in your own applications.

Who should use which

Choose SimpleLogin if...

You're in the Proton ecosystem - ProtonMail + Proton Pass + SimpleLogin is the most seamless bundle for privacy-focused non-developers.
You need PGP forwarding today - SimpleLogin's PGP feature is mature and well-documented. anon.li has it on the roadmap.
You want iOS/Android apps - SimpleLogin has polished native mobile apps. anon.li is web-first for now.
You want battle-tested reliability - five years of production use, millions of aliases, Proton's infrastructure.

Choose anon.li if...

You're a developer - API + CLI + MCP server means anon.li fits into your workflow in ways SimpleLogin can't.
You want E2EE file sharing - Drops gives you a genuinely private way to share sensitive files. No equivalent exists in SimpleLogin.
You prefer independence - no Proton parent, no corporate ecosystem to navigate. One focused product, one team.
You use AI assistants in your workflow - the MCP server integration is unique. Manage aliases directly from Claude, Cursor, or any MCP-compatible client.

SimpleLogin remains the most polished and widely trusted email aliasing service available. If you're already inside the Proton ecosystem, there's little reason to leave.

But anon.li is a compelling new choice for developers and power users. The MCP server is genuinely novel. The CLI is overdue in this category. The encrypted Drops feature adds a dimension that no other aliasing service offers. And being independent - not part of a larger corporate stack - is increasingly a feature, not just a differentiator.

Both are AGPL v3 open source. Both take your privacy seriously. The choice comes down to ecosystem fit and how deep you want your tooling to go.

*Try anon.li at anon.li.

Running 3 Parallel Claude Code Instances to Get $200 of Dev Work for $20/month

kanta13jp1 — Sun, 19 Apr 2026 09:54:00 +0000

Running 3 Parallel Claude Code Instances to Get $200 of Dev Work for $20/month

Overview

I build Jibun Kabushiki Kaisha — a 200-page Flutter Web SaaS — using Claude Code. On a $20/month plan, I run 3 specialized Claude Code instances in parallel to achieve roughly 10x the development throughput.

The Role Assignment System

Each instance has a fixed responsibility:

Instance	Dedicated Role	Why
VSCode	UI/design compliance (haiku-4.5)	Fast, cheap, visual tasks
PowerShell	CI/CD health + blog publishing	Quality-critical, pipeline focus
Windows App	AI University providers + migrations	Data-heavy, structured work

Why Specialization Works

Problem: Concurrent Pushes Cancel Deploys

Without coordination, all 3 instances push simultaneously:

PS push → deploy starts
VSCode push (5s later) → deploy CANCELLED → restart
Win push (3s later) → deploy CANCELLED → restart
→ 20+ minutes later: finally 1 successful deploy

This "deploy thrashing" wastes CI minutes and breaks each other's work.

Solution: Cross-Instance PR Files

Instead of direct communication, instances leave work requests in docs/cross-instance-prs/:

# docs/cross-instance-prs/20260419_trailing_comma_fix.md

## Target: PowerShell instance
## Task: Fix require_trailing_commas 36 errors
## Reason: PS instance owns CI/CD health (Rule17)

VSCode finds a lint issue → records it in cross-instance-pr → PS instance picks it up next session.

Detecting Parallel Conflicts

# Check at session start
git log origin/main --oneline -10

# Look for interleaved commits from multiple instances:
# 88e37a2 Merge (conflict resolution)
# f2520c6 (PS#136) 
# c66830d (VSCode#104)
# badccf5 (PS#135)
# → Multiple instances active → watch for ROADMAP merge conflicts

Token Conservation Strategy

On $20/month across 3 instances, every token matters.

1. CAVEMAN Communication Mode

A custom Claude Code plugin that compresses responses ~75%:

❌ Standard:
"I'll be happy to analyze the current CI failures and provide 
a comprehensive fix. Let me first examine..."

✅ CAVEMAN mode:
"2276 lint errors. dart fix --apply → format → 0 errors. push."

2. Offload Heavy Research to NotebookLM

Task	Claude cost	After NotebookLM
Read 3+ files simultaneously	~150K tokens	~5K tokens
Analyze a URL	~60K tokens	~2K tokens
Competitor research	~80K tokens	~3K tokens

3. Role Boundaries Reduce Context Loading

Each instance only loads context relevant to its specialty. The VSCode instance doesn't need to know migration history. The PS instance doesn't need design system knowledge.

A Typical Day

09:00 JST - PS: CI health check + blog dispatch
11:00 JST - VSCode: UI improvements + design token compliance  
14:00 JST - Win: Add AI University providers
16:00 JST - PS: Confirm deploy + write more blog posts
18:00 JST - Win: Migrations + EF cleanup

At each session start: git log origin/main -5 to see what other instances committed.

Results

Throughput: 3 parallel workstreams from 1 person
Cost: ~$20/month for ~$200 equivalent work
Quality: Each domain improves independently without cross-contamination

The Key Insight

The $20/month constraint doesn't limit what you can build — it forces you to think about where each token should go. Specialization turns a limitation into a feature: each instance is expert at its domain precisely because it never gets distracted by others.

Building in public: https://my-web-app-b67f4.web.app/

ClaudeCode #buildinpublic #AI #productivity

OpenAI API from Next.js Route Handlers: Keys, Streaming, and Safety

Ganesh Joshi — Sun, 19 Apr 2026 09:53:37 +0000

This post was created with AI assistance and reviewed for accuracy before publishing.

The OpenAI API powers many coding assistants and apps. OpenAI Platform docs document authentication, models, and APIs such as Chat Completions and the newer Responses-style APIs depending on your integration.

Why Route Handlers

Never expose secret keys in the browser. Call OpenAI from Next.js Route Handlers, Server Actions, or your backend so keys live in environment variables on the server.

Streaming

For chat UIs, stream tokens to the client over SSE or chunked responses. The SDK examples show how to forward streams safely.

Safety and policy

Apply OpenAI’s usage policies and your own content rules. Log errors without logging user secrets. Rate-limit per user to control cost.

Practical takeaway

Pin SDK versions. Re-read release notes when OpenAI deprecates models or changes API shapes.

Microsoft Agent Framework: From Zero to Multi-Agent Pipeline

rosidotidev — Sun, 19 Apr 2026 09:53:25 +0000

I have some background with other agent frameworks like CrewAI and LangGraph, so when Microsoft released the Agent Framework, a lightweight Python package for building AI agents with native MCP (Model Context Protocol) support, I was curious to give it a try. I decided to build something practical: a pipeline that reads a product backlog from a Markdown file and automatically creates Epics and Stories on Jira. I chose this specific use case because I had already implemented it with CrewAI, so I was familiar with the configuration setup and could focus on comparing the frameworks rather than figuring out the integration details from scratch.

As reported in the official documentation, the Microsoft Agent Framework is the direct successor of both Semantic Kernel and AutoGen, created by the same Microsoft teams. It combines AutoGen's simple abstractions for single- and multi-agent patterns with Semantic Kernel's enterprise-grade features like session-based state management, type safety, telemetry, and extensive model support. On top of that, the Microsoft Agent Framework introduces workflows for explicit control over multi-agent execution paths and a robust state management system for long-running and human-in-the-loop scenarios.

What I found was a framework that favors simplicity and explicitness. You write Python functions, you wire them together, and you stay in control of the flow. In this article, I walk through the incremental approach I followed, from an "hello world" agent to a fully modular multi-agent pipeline.

You can find all the code shown in this post on this GitHub repo (MSFTAgentSample).

What I Used So Far

I have only scratched the surface of the framework, but here are the building blocks I worked with in this project:

Agent: the core class. You give it a name, instructions, a chat client, and a list of tools. It runs autonomously, deciding which tools to call and when to stop.
OpenAIChatClient: one of the available LLM providers. The framework integrates with most major LLMs, but for simplicity I used OpenAI since I still had some tokens to spend :-).
MCPStdioTool: a bridge to any MCP server. Point it at a command and it auto-discovers all available tools via the MCP protocol.
@tool: a decorator to turn any Python function into a tool the agent can invoke.

There is certainly more to explore, but these four primitives were enough to build a fully working multi-agent pipeline.

Step 1: Hello World, One Agent, No Tools

The very first thing I did was verify that the framework works. The simplest possible setup: one agent, one LLM client, one hardcoded query, no tools at all.

from __future__ import annotations
import asyncio
import os

from agent_framework import Agent
from agent_framework.openai import OpenAIChatClient, OpenAIChatOptions
from dotenv import load_dotenv


async def run_manager_agent(query: str) -> None:
    client: OpenAIChatClient = OpenAIChatClient(
        api_key=os.environ.get("OPENAI_API_KEY", ""),
        model=os.environ.get("OPENAI_CHAT_MODEL", "gpt-4o-mini"),
    )

    agent: Agent[OpenAIChatOptions] = Agent(
        client=client,
        name="ManagerAgent",
        instructions=(
            "You are a manager agent. Answer the user's query "
            "as accurately and concisely as possible."
        )
    )

    result = await agent.run(query)
    print(result.text)


def main() -> None:
    load_dotenv()
    query = "What is a Large Language Model and how does it work?"
    asyncio.run(run_manager_agent(query))


if __name__ == "__main__":
    main()

Mental Model: This is the equivalent of a "print hello world" in the agent framework world. You create a client, create an agent, call agent.run(), and print the result. Everything is async, so you need asyncio.run() as the entry point. The .env file provides the API key and model name via python-dotenv.

Notice how explicit everything is. There is no magic configuration, no auto-discovery. You pass the API key, you choose the model, you write the instructions. The agent's identity is fully defined by a single instructions string.

Step 2: Adding an MCP Tool (Jira)

Once the basics worked, the next step was connecting the agent to the real world. The Microsoft Agent Framework has first-class support for MCP (Model Context Protocol), which is the standard for exposing tools to AI agents. The mcp-atlassian package provides a full MCP server for Jira and Confluence.

import asyncio
import os
from dotenv import load_dotenv
from agent_framework import Agent, MCPStdioTool
from agent_framework.openai import OpenAIChatClient, OpenAIChatOptions

load_dotenv()

async def main():
    # MCP Proxy: auto-discovers all Jira tools via MCP protocol
    jira_proxy = MCPStdioTool(
        name="jira_server",
        command="pipenv",
        args=["run", "mcp-atlassian"],
        env={
            "JIRA_URL": os.getenv("JIRA_URL"),
            "JIRA_USERNAME": os.getenv("JIRA_USERNAME"),
            "JIRA_API_TOKEN": os.getenv("JIRA_API_TOKEN"),
        },
    )

    client = OpenAIChatClient(
        api_key=os.environ.get("OPENAI_API_KEY", ""),
        model=os.environ.get("OPENAI_CHAT_MODEL", "gpt-4o-mini"),
    )
    options = OpenAIChatOptions(temperature=0.0)

    jira_agent = Agent(
        name="JiraManagerAgent",
        instructions=(
            "You are a professional Project Management Assistant. "
            "You have direct access to Jira via integrated tools. "
            "Your goal is to help users manage tickets, track progress, "
            "and create issues."
        ),
        client=client,
        default_options=options,
        tools=[jira_proxy],
    )

    print("Jira Manager Agent is online...")

    user_query = """
        IMPORTANT: Execute each step below ONE AT A TIME.
        Step 1: Create an epic in the SARI project called 'Shopping List'.
        Step 2: Create a story: 'Story 1: Shopping List CRUD Angular UI' 
                and set the epic as parent.
        Step 3: Create a story: 'Story 2: Shopping List CRUD Angular 
                in memory mocked service' and set the epic as parent.
        Create issues one at a time, never in parallel.
    """

    try:
        response = await jira_agent.run(user_query)
        print("\nAgent Response:")
        print(response.text)
    finally:
        await jira_proxy.close()

if __name__ == "__main__":
    asyncio.run(main())

The key piece here is MCPStdioTool. You point it at a command (pipenv run mcp-atlassian), pass the necessary environment variables, and the framework auto-discovers every tool the MCP server exposes: jira_create_issue, jira_search, jira_get_issue, jira_link_to_epic, and many more. The agent sees all of them and decides which ones to call based on your query.

A Hard Lesson: Parallel Tool Calls

This step is where I hit my first real problem. When asked to create an epic and two stories, the agent would sometimes send multiple jira_create_issue calls in parallel. The second call would fail with a cryptic error: expected 'key' property to be a string. After adding debug logging and investigating, I discovered that the MCP server cannot handle parallel tool calls reliably.

The fix was surprisingly simple: tell the agent explicitly in its instructions to "Create issues ONE AT A TIME, never in parallel." This is a pattern I now apply consistently. If your MCP server doesn't handle concurrency well, just instruct the agent accordingly. It respects the instruction.

Step 3: Two-Agent Pipeline (Monolithic)

With the Jira integration working, I wanted to build something more structured: a pipeline with two agents collaborating sequentially. The idea was simple:

BacklogReaderAgent reads a Markdown backlog file from disk
JiraExecutorAgent takes the backlog content and creates all issues on Jira

To give agents the ability to read and write files, I used the @tool decorator to create custom function tools:

from typing import Annotated
from agent_framework import tool

@tool
def read_file(file_name: Annotated[str, "Name of the file to read"]) -> str:
    """Read and return the contents of a file from the input directory."""
    path = os.path.join("input", file_name)
    with open(path, "r", encoding="utf-8") as f:
        return f.read()

@tool
def write_file(content: Annotated[str, "The content to write"]) -> str:
    """Write content to a timestamped file in the output directory."""
    os.makedirs("output", exist_ok=True)
    timestamp = datetime.now().strftime("%Y%m%d_%H%M")
    file_name = f"execution_result_{timestamp}.md"
    path = os.path.join("output", file_name)
    with open(path, "w", encoding="utf-8") as f:
        f.write(content)
    return f"File written: {path}"

The Annotated[str, "description"] syntax is how you document parameters for the agent. The framework reads these annotations and exposes them as part of the tool schema, so the LLM knows what to pass.

Then, the two agents and the orchestration logic, all in one file:

    # Agent 1: reads the backlog file
    reader_agent = Agent(
        name="BacklogReaderAgent",
        instructions=(
            "You are a backlog reader assistant. "
            "When asked, use the read_file tool to read a markdown file. "
            "Return the full contents of the file as-is."
        ),
        client=client,
        default_options=options,
        tools=[read_file],
    )

    # Agent 2: executes the backlog on Jira
    executor_agent = Agent(
        name="JiraExecutorAgent",
        instructions=(
            "You are a Jira execution assistant. "
            "Create issues ONE AT A TIME, never in parallel. "
            "After all operations, write a summary using write_file."
        ),
        client=client,
        default_options=options,
        tools=[jira_proxy, write_file],
    )

    # Orchestration: sequential pipeline
    read_response = await reader_agent.run(
        f"Read the file '{backlog_file}' and return its contents."
    )
    backlog_content = read_response.text

    exec_response = await executor_agent.run(
        f"Execute the following backlog on Jira.\n\n{backlog_content}"
    )

Mental Model: Notice that the orchestration is plain Python. There is no pipeline abstraction, no DAG. You call agent.run(), get the result, and pass it to the next agent. You are the orchestrator.

The backlog file is a simple Markdown document placed in the input/ directory:

# Weather Dashboard Backlog - SARI Project

## Epic: Weather Dashboard
- Type: Epic
- Description: Real-time weather dashboard application.

### Stories

- **Story 1: City Search and Autocomplete**
  - Type: Story
  - Description: Implement a search bar with autocomplete...

- **Story 2: Current Weather Display**
  - Type: Story
  - Description: Show current weather conditions...

The agent reads this, understands the structure, and creates the epic first, then each story linked to the epic as parent. All on Jira, automatically.

Step 4: Modular Pipeline with Pydantic Validation

The monolithic version worked perfectly, but everything was in one file. For a production-ready layout, I refactored the code into a well-structured directory:

MSFTAgentSample/
├── afw_core/
│   ├── agents/
│   │   ├── backlog_reader.py
│   │   └── jira_executor.py
│   ├── tools/
│   │   ├── file_reader.py
│   │   └── file_writer.py
│   ├── mcps/
│   │   └── jira.py
│   ├── llms/
│   │   └── openai.py
│   └── models/
│       └── backlog.py
│
├── input/
│   └── backlog.md
├── output/
├── main_backlog_from_md_std.py
└── .env

Each module has a single responsibility and exposes a factory function. For example, the agent definitions:

# afw_core/agents/backlog_reader.py
from agent_framework import Agent

def create_agent(client, options, tools):
    return Agent(
        name="BacklogReaderAgent",
        instructions=(
            "You are a backlog reader assistant. "
            "When asked, use the read_file tool to read a markdown file. "
            "After reading, respond with ONLY a JSON object matching this schema: "
            '{"epic_count": <int>, "story_count": <int>, "description": <string>}.'
        ),
        client=client,
        default_options=options,
        tools=tools,
    )

# afw_core/agents/jira_executor.py
from agent_framework import Agent

def create_agent(client, options, tools):
    return Agent(
        name="JiraExecutorAgent",
        instructions=(
            "You are a Jira execution assistant. "
            "Create issues ONE AT A TIME, never in parallel. "
            "When linking stories to an epic, first create the epic, "
            "then create each story and set the epic as parent. "
            "After all operations, write a summary using write_file."
        ),
        client=client,
        default_options=options,
        tools=tools,
    )

The convention I adopted is: name and instructions are hardcoded inside the factory function (they are intrinsic to the agent's identity), while client, options, and tools are always injected from outside (they are infrastructure concerns). This separation keeps agent definitions clean and reusable.

Pydantic for Structured Output

A key improvement in the modular version was adding Pydantic validation between the two agents. Instead of passing raw text from the reader to the executor, I defined a model:

# afw_core/models/backlog.py
from pydantic import BaseModel

class BacklogOutput(BaseModel):
    epic_count: int
    story_count: int
    description: str

The reader agent is instructed to return JSON matching this schema. The main script validates it:

from afw_core.models.backlog import BacklogOutput

read_response = await reader_agent.run(
    f"Read the file '{backlog_file}' and return its contents."
)
backlog = BacklogOutput.model_validate_json(read_response.text)
print(f"Backlog loaded: {backlog.epic_count} epic(s), {backlog.story_count} stories")

If the agent returns malformed JSON, Pydantic throws a validation error immediately, rather than letting corrupted data propagate to the executor agent. This is a simple but effective pattern for inter-agent data contracts.

The Entry Point

The modular entry point becomes clean orchestration logic with no implementation details:

# main_backlog_from_md_std.py
import asyncio
import os
from dotenv import load_dotenv

from afw_core.llms.openai import create_client
from afw_core.mcps.jira import create_proxy
from afw_core.tools.file_reader import read_file
from afw_core.tools.file_writer import write_file
from afw_core.agents.backlog_reader import create_agent as create_reader_agent
from afw_core.agents.jira_executor import create_agent as create_executor_agent
from afw_core.models.backlog import BacklogOutput

load_dotenv()

async def main():
    client, options = create_client(
        api_key=os.environ.get("OPENAI_API_KEY", ""),
        model=os.environ.get("OPENAI_CHAT_MODEL", "gpt-4o-mini"),
    )
    jira_proxy = create_proxy()

    reader_agent = create_reader_agent(client=client, options=options, tools=[read_file])
    executor_agent = create_executor_agent(client=client, options=options, tools=[jira_proxy, write_file])

    # Step 1: Read and validate
    read_response = await reader_agent.run("Read the file 'backlog.md' and return its contents.")
    backlog = BacklogOutput.model_validate_json(read_response.text)

    # Step 2: Execute on Jira
    try:
        exec_response = await executor_agent.run(
            f"Execute the following backlog on Jira.\n\n{backlog.description}"
        )
        print(exec_response.text)
    finally:
        await jira_proxy.close()

if __name__ == "__main__":
    asyncio.run(main())

How you can see, the entry point reads like a recipe: create the infrastructure, create the agents, run them in sequence, handle cleanup. All the complexity lives in the modules under afw_core/.

Key Lessons Learned

Working through these four steps, several patterns emerged that are worth sharing:

MCP tools don't handle parallelism well. When the LLM sends multiple tool calls in a single response, the MCP server may fail. The workaround is simple: add "ONE AT A TIME" to the agent's instructions. The agent respects this.

The framework's error handling has a hidden default. The max_consecutive_errors_per_request parameter defaults to 3. If an agent hits 3 consecutive tool errors, it stops retrying. This is defined in agent_framework._tools and caught me off guard initially. Knowing this default helps you debug "why did it stop?" scenarios.

No __init__.py needed. Python's implicit namespace packages work fine. The key is choosing a unique directory name (afw_core) that doesn't collide with installed packages. I initially tried naming directories agents/, tools/, mcp/, but these collided with the framework's own modules. Renaming to afw_core/agents/ solved everything.

A well-defined directory structure makes a real difference. Applying a clear project layout (afw_core/ with separate modules for agents, tools, MCP proxies, LLM clients, and models) greatly simplifies working with the framework. It keeps things organized and makes the codebase easy to extend as you add more agents and integrations.

The biggest gap today: no native tools. This is, in my opinion, the framework's main weakness right now. Other frameworks like LangChain/LangGraph and CrewAI ship with a rich ecosystem of built-in tools (web search, PDF readers, database connectors, vector stores, and many more). With the Microsoft Agent Framework, you either build every tool yourself with @tool or rely on MCP servers. For simple use cases that's fine, but for projects that need quick access to common integrations, the lack of native tools is a significant disadvantage that other frameworks still handle much better.

Pydantic validation between agents is cheap insurance. It adds minimal overhead and catches data corruption early. Especially useful when the first agent's output is the second agent's input.

Agent instructions are powerful. You have a single instructions string that gives you full freedom to express exactly what you need, including operational constraints like "never call tools in parallel."

Conclusion: Takeaways

The Microsoft Agent Framework is a solid entry point into the world of AI agent development. Its explicit, code-first approach means there are very few surprises: what you write is what gets executed. The MCP integration is first-class and makes it trivial to connect agents to external services like Jira, Confluence, or GitHub.

The incremental approach I followed, from a single agent with no tools to a modular multi-agent pipeline, worked well as a learning strategy. Each step introduced exactly one new concept, making it easy to debug when things went wrong.

If you are starting with AI agents and want a framework with minimal abstraction, the Microsoft Agent Framework is worth a try. The codebase in this article serves as a progressive tutorial you can follow step by step.

All the code is available on GitHub (MSFTAgentSample).