Forem

Teaching AI Your Trade: Automating Proposals with Precision

Ken Deng — Sat, 25 Apr 2026 13:10:56 +0000

For electrical and plumbing contractors, generating accurate, profitable service proposals is a constant bottleneck. You're on-site, taking photos and voice notes, then spending hours back at the office translating that into a line-item estimate. The promise of AI automation is tantalizing, but generic systems fail because they don’t know your specific materials, brands, and labor costs. The key isn't just using AI; it's teaching it your business rules.

The Core Principle: Codify Your Trade Knowledge

AI cannot guess your preferences. You must systematically encode them. The most effective method is to start with a simple, actionable framework: Create "Brand Preference Rules" and a Standardized Materials List. These are the foundational datasets your AI will use to interpret site data and generate proposals that reflect your actual operations, not generic assumptions.

A "Brand Preference Rule" is a clear instruction you feed into the system. For example: "For all residential tankless water heater installations, specify the Navien NPE-240A unit unless the customer's photo shows an existing Rheem model." Or for electrical: "For all recessed LED downlights, specify the Halo HLB6 series unless a different trim is visible in the customer’s photo." This ensures consistency and eliminates errors where an AI might suggest an unbranded or incorrect component.

The Foundation: Your Master Materials Spreadsheet

The practical starting point is a spreadsheet you likely already have in some form. Structure it with these columns:

Column A: Item Description (e.g., “1/2” Type L Copper Pipe 10’ length”).
Column B: Your Supplier’s Item Code/SKU.
Column C: Your Current Net Cost.
Column D: Your Standard Selling Price (or markup percentage).
Column E: Primary Use (e.g., “Water Supply,” “Branch Circuit”).

This becomes your AI’s pricing and product bible. When the system identifies a need for "12/2 NM-B cable" from a photo, it pulls your specific Southwire item, applies your exact cost and markup from the sheet, and outputs a line item with your protected profit margin.

Mini-Scenario: An AI analyzes a site photo showing a new circuit run. It applies your rules: selects Eaton BR breakers, Halo HBU4 boxes, and Southwire 12/2 NM-B cable, generating a perfectly branded, priced proposal line.

Three Steps to Implementation

Build Your Datasets. Populate your master materials spreadsheet and draft your top 10 Brand Preference Rules. Simultaneously, define your labor units: break down 10 common tasks (e.g., "Replace a GFCI outlet: 0.5 hrs, $30").
Train Your System. Input these datasets into your chosen automation tool. Many platforms, like Briggs, are designed to ingest such structured data and apply it when analyzing photos and voice notes to auto-generate proposal drafts.
Validate and Iterate. Choose a past, simple job and manually create a proposal using your new lists. Then, run the same job data through your AI system and compare the outputs. Refine your rules and lists based on the discrepancies.

Key Takeaways

Automating proposal generation requires teaching AI your unique business logic. By codifying your brand preferences, material costs, and labor units into structured datasets, you transform AI from a generic tool into a precise estimator that protects your margins, ensures consistency, and drastically cuts administrative time. Start with the data you already have, and build from there.

Tian AI Architecture Deep Dive: Building a Multi-Engine AI System

Jeffrey.Feillp — Sat, 25 Apr 2026 13:10:30 +0000

Tian AI Architecture Deep Dive: Building a Multi-Engine AI System

This post takes a deep technical look at the architecture of Tian AI — an open-source, self-evolving local AI system. If you haven't read the overview, check out Tian AI: The Self-Evolving AI System Powered by Qwen2.5.

Project Architecture Overview

Tian AI is organized as a multi-engine system with six core modules. Here's the complete architecture:

┌──────────────────────────────────────────────────────────────────┐
│                     CLI / Web / Gradio UI                         │
├──────────────────────────────────────────────────────────────────┤
│                        Flask API Layer                            │
├──────────────────────────────────────────────────────────────────┤
│                                                                   │
│  ┌──────────────────────────────────────────────────────────┐    │
│  │                    Thinker (LLM Engine)                    │    │
│  │  ┌────────────┐  ┌────────────┐  ┌──────────────────┐    │    │
│  │  │  Fast Mode  │  │  CoT Mode  │  │    Deep Mode     │    │    │
│  │  │ (single     │  │ (step-by-  │  │ (multi-view +    │    │    │
│  │  │  pass)      │  │  step)     │  │  reflection)     │    │    │
│  │  └────────────┘  └────────────┘  └──────────────────┘    │    │
│  └─────────────────────────┬────────────────────────────────┘    │
│                            │                                      │
│  ┌─────────────────────────┴────────────────────────────────┐    │
│  │                     Talker (Dialog)                       │    │
│  │  Short-term Memory  |  Long-term Memory  |  Emotion      │    │
│  └─────────────────────────┬────────────────────────────────┘    │
│                            │                                      │
│  ┌─────────────────────────┴────────────────────────────────┐    │
│  │              Knowledge Retriever (RAG)                    │    │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐   │    │
│  │  │ Concept      │  │ Pattern      │  │ LLM-Augment  │   │    │
│  │  │ Extraction   │  │ Matching     │  │ Generation   │   │    │
│  │  └──────────────┘  └──────────────┘  └──────────────┘   │    │
│  └─────────────────────────┬────────────────────────────────┘    │
│                            │                                      │
│  ┌─────────────────────────┴────────────────────────────────┐    │
│  │                  Agent Scheduler                          │    │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐   │    │
│  │  │ TaskQueue    │  │ Priority     │  │ Security     │   │    │
│  │  │ (dependency  │  │ Scheduler    │  │ Whitelist    │   │    │
│  │  │  sorting)    │  │              │  │              │   │    │
│  │  └──────────────┘  └──────────────┘  └──────────────┘   │    │
│  └─────────────────────────┬────────────────────────────────┘    │
│                            │                                      │
│  ┌─────────────────────────┴────────────────────────────────┐    │
│  │                Self-Evolution System                      │    │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐   │    │
│  │  │ AST Analysis  │  │ LLM Suggest │  │ Auto-Patch   │   │    │
│  │  │ (code scan)   │  │ (improvement)│  │ (backup +    │   │    │
│  │  │               │  │              │  │  verify)     │   │    │
│  │  └──────────────┘  └──────────────┘  └──────────────┘   │    │
│  │  ┌──────────────┐  ┌──────────────┐                     │    │
│  │  │ XP System    │  │ Version      │                     │    │
│  │  │ (leveling)   │  │ Manager      │                     │    │
│  │  └──────────────┘  └──────────────┘                     │    │
│  └─────────────────────────────────────────────────────────┘    │
│                                                                   │
├──────────────────────────────────────────────────────────────────┤
│                   LLMManager (Process Lifecycle)                   │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐           │
│  │ Process Spawn│  │ Health Check │  │ Auto-Restart │           │
│  └──────────────┘  └──────────────┘  └──────────────┘           │
├──────────────────────────────────────────────────────────────────┤
│              llama.cpp Backend (Qwen2.5-1.5B GGUF)               │
└──────────────────────────────────────────────────────────────────┘

The system uses Flask as the central web server, with Gradio as an alternative UI frontend. All communication between modules goes through Python function calls (direct in-process calls) rather than microservice RPC, keeping latency minimal on constrained devices.

1. Thinker — The Three-Layer Reasoning Engine

The Thinker is the most critical module. It wraps the local LLM with three distinct reasoning strategies, each optimized for different query types.

Fast Mode (Default)

For simple queries — greetings, fact lookup, straightforward questions. Single pass through the LLM with minimal context.

# Simplified Fast Mode flow
def fast_think(query: str, context: str = "") -> str:
    prompt = _build_simple_prompt(query, context)
    # Single LLM call, no chaining
    response = llm.generate(prompt, max_tokens=256, temperature=0.3)
    return response

Characteristics:

Single LLM call, no chaining
Low temperature (0.3) for deterministic answers
256 max tokens for speed
~1-3 seconds on mobile hardware

Chain-of-Thought Mode

For problems that benefit from step-by-step reasoning. The LLM is prompted to reason aloud before answering.

# Simplified CoT Mode flow
def cot_think(query: str) -> str:
    prompt = f"""Question: {query}

Let's solve this step by step:
1) First, I need to understand what's being asked...
2) Let me break down the key components...
3) Considering each part...
4) Therefore, the answer is:"""

    full_response = llm.generate(prompt, max_tokens=512, temperature=0.5)
    # Extract final answer from reasoning chain
    answer = _extract_final_answer(full_response)
    return answer

Key implementation details:

Higher temperature (0.5) allows creative reasoning paths
512 max tokens to accommodate the reasoning chain
Answer extraction uses regex patterns to find the final conclusion
Context window management: the reasoning process is truncated if it exceeds the model's limit

Deep Mode

For complex analysis requiring multi-perspective evaluation and reflection. This is the most sophisticated mode.

# Simplified Deep Mode flow
def deep_think(query: str) -> dict:
    # Step 1: Generate multiple perspectives
    perspectives = [
        _ask_perspective(query, "technical"),
        _ask_perspective(query, "ethical"),
        _ask_perspective(query, "practical")
    ]

    # Step 2: Cross-perspective synthesis
    synthesis_prompt = f"""
    Original query: {query}

    Perspectives gathered:
    1. Technical: {perspectives[0]}
    2. Ethical: {perspectives[1]}
    3. Practical: {perspectives[2]}

    Synthesize these perspectives into a comprehensive answer.
    Note areas of agreement and disagreement.
    Provide a balanced final assessment.
    """

    synthesis = llm.generate(synthesis_prompt, max_tokens=768, temperature=0.7)

    # Step 3: Self-reflection
    reflection_prompt = f"""
    Original query: {query}
    My synthesized answer: {synthesis}

    Critically evaluate your own answer. What might be missing?
    What assumptions were made? Is the reasoning sound?
    """

    reflection = llm.generate(reflection_prompt, max_tokens=256, temperature=0.4)

    return {
        "perspectives": perspectives,
        "synthesis": synthesis,
        "reflection": reflection,
        "final": _combine(synthesis, reflection)
    }

Implementation notes:

3-5 independent perspective generations using different prompt frames
Each perspective call can run independently (parallelizable)
Synthesis phase combines viewpoints and identifies conflicts
Reflection phase adds a meta-cognitive layer
Total: 4-6 LLM calls per deep query

The Thinker module also handles prompt caching via an LRU+TTL cache (PromptCache), which avoids regenerating responses for identical queries within a configurable time window.

2. Knowledge Retriever — RAG Implementation

The Knowledge Retriever implements a Retrieval-Augmented Generation (RAG) pipeline using a local SQLite database as the document store.

Database Architecture

The knowledge base is a pre-built SQLite database containing millions of entries:

-- Core tables
CREATE TABLE concepts (
    id INTEGER PRIMARY KEY,
    name TEXT UNIQUE,
    domain TEXT,
    description TEXT
);

CREATE TABLE qa_pairs (
    id INTEGER PRIMARY KEY,
    concept_id INTEGER REFERENCES concepts(id),
    pattern_id INTEGER,
    question TEXT,
    answer TEXT,
    confidence REAL DEFAULT 1.0
);

-- Full-text search index
CREATE VIRTUAL TABLE qa_fts USING fts5(
    question, answer, concept_name,
    content='qa_pairs',
    content_rowid='id'
);

Retrieval Flow

User Query
    ↓
[1] Concept Extraction (keyword matching + NER)
    ↓
[2] FTS5 Search on SQLite
    ↓
[3] Score & Rank Results (BM25 + confidence weighting)
    ↓
    ┌─────────────────┐
    │ confidence > 0.8 │──Yes──→ Return KB answer directly
    └────────┬────────┘
             │ No
             ↓
[4] Context Assembly (top-3 results as context)
    ↓
[5] LLM Augmented Generation
    ↓
    Final Response

Performance

Operation	Time	Notes
Concept extraction	~0.01s	Regex + keyword matching
FTS5 search	~0.02s	Indexed full-text search
Result ranking	~0.01s	BM25 scoring
Total retrieval	~0.04s	Without LLM call
LLM augmentation	~1-3s	Depends on context size

Key Design Decisions

SQLite over vector DB: No need for embeddings or vector similarity search. The structured QA pairs with FTS5 provide faster and more predictable results than embedding-based retrieval on a mobile device.
Confidence threshold of 0.8: Tuned empirically. Below this, the LLM augmentation adds significant value. Above it, the KB answer is already reliable.
30 question patterns per concept: Each concept has 30 pre-written question templates (e.g., "What is X?", "Explain X", "How does X work?"), ensuring flexible matching against diverse user inputs.

3. Agent Scheduler — TaskQueue + Security Whitelist

The Agent Scheduler is the orchestration layer that routes tasks between engines, manages concurrency, and enforces security policies.

TaskQueue with Dependency Sorting

class Task:
    def __init__(self, task_id, func, args, kwargs,
                 depends_on=None, priority=0, timeout=30):
        self.id = task_id
        self.func = func        # Callable
        self.args = args
        self.kwargs = kwargs
        self.depends_on = depends_on or []  # List of Task IDs
        self.priority = priority
        self.timeout = timeout
        self.status = "pending"  # pending → running → done/failed
        self.result = None

class TaskQueue:
    def __init__(self):
        self.tasks = {}          # task_id → Task
        self.results = {}        # task_id → result
        self._lock = threading.Lock()

    def add_task(self, task):
        self.tasks[task.id] = task

    def get_ready_tasks(self):
        """Return tasks whose dependencies are all met."""
        ready = []
        for tid, task in self.tasks.items():
            if task.status != "pending":
                continue
            deps_met = all(
                dep_id in self.results
                for dep_id in task.depends_on
            )
            if deps_met:
                ready.append(task)
        # Sort by priority (higher = first)
        ready.sort(key=lambda t: -t.priority)
        return ready

    def execute(self, max_workers=4):
        """Execute ready tasks with thread pool."""
        with ThreadPoolExecutor(max_workers=max_workers) as pool:
            futures = {}
            while self.tasks:
                ready = self.get_ready_tasks()
                if not ready:
                    if not futures:
                        break  # Deadlock or done
                    # Wait for some futures to complete
                    done, _ = wait(futures, return_when=FIRST_COMPLETED)
                    self._collect_results(done, futures)
                    continue

                for task in ready:
                    task.status = "running"
                    future = pool.submit(self._run_task, task)
                    futures[future] = task.id

                done, _ = wait(futures, return_when=FIRST_COMPLETED)
                self._collect_results(done, futures)

Security Whitelist

All Agent actions are filtered through a security whitelist that prevents unauthorized system operations:

class SecurityWhitelist:
    ALLOWED_FUNCTIONS = {
        "thinker.fast_think",
        "thinker.cot_think",
        "thinker.deep_think",
        "knowledge.search",
        "knowledge.query",
        "memory.store",
        "memory.retrieve",
        "evolution.add_xp",
        "evolution.check_level",
        "evolution.apply_patch",
    }

    ALLOWED_PATHS = {
        "/data/data/com.termux/files/home/miniGPT_project/Tian AI/",
        "/tmp/tian_ai/",
    }

    ALLOWED_IMPORTS = {
        "json", "os", "re", "datetime",
        "sqlite3", "threading", "subprocess"
    }

    @classmethod
    def validate_action(cls, action_name, path=None, imports=None):
        if action_name not in cls.ALLOWED_FUNCTIONS:
            raise SecurityError(f"Function {action_name} not allowed")
        if path and not any(path.startswith(p) for p in cls.ALLOWED_PATHS):
            raise SecurityError(f"Path {path} not allowed")
        if imports:
            bad = set(imports) - cls.ALLOWED_IMPORTS
            if bad:
                raise SecurityError(f"Import(s) not allowed: {bad}")
        return True

4. Self-Evolution System — AST Analysis + Auto-Patching

The Self-Evolution system is the most unique component. It enables Tian AI to analyze its own source code and apply improvements autonomously.

AST Analysis Pipeline

import ast
import asttokens  # Optional, for source-level annotations

class CodeAnalyzer:
    def __init__(self, project_root):
        self.root = project_root
        self.report = {
            "files": 0,
            "total_lines": 0,
            "functions": 0,
            "classes": 0,
            "complexity": {},
            "duplications": [],
            "issues": []
        }

    def analyze_file(self, filepath):
        with open(filepath) as f:
            source = f.read()

        tree = ast.parse(source)

        # Count functions and classes
        functions = [n for n in ast.walk(tree) if isinstance(n, ast.FunctionDef)]
        classes = [n for n in ast.walk(tree) if isinstance(n, ast.ClassDef)]

        # Calculate cyclomatic complexity per function
        for func in functions:
            complexity = self._calc_complexity(func)
            self.report["complexity"][f"{filepath}:{func.name}"] = complexity

        # Detect long functions (>50 lines)
        for func in functions:
            if func.end_lineno - func.lineno > 50:
                self.report["issues"].append({
                    "type": "long_function",
                    "file": filepath,
                    "function": func.name,
                    "lines": func.end_lineno - func.lineno
                })

        # Detect duplicate code blocks
        self._detect_duplicates(filepath, tree)

        self.report["files"] += 1
        self.report["total_lines"] += len(source.splitlines())
        self.report["functions"] += len(functions)
        self.report["classes"] += len(classes)

    def _calc_complexity(self, func_node):
        """McCabe cyclomatic complexity."""
        base = 1
        for node in ast.walk(func_node):
            if isinstance(node, (ast.If, ast.While, ast.For,
                                 ast.ExceptHandler, ast.With,
                                 ast.Assert)):
                base += 1
            elif isinstance(node, ast.BoolOp):
                base += len(node.values) - 1
        return base

Auto-Patching System

class PatchEngine:
    def __init__(self):
        self.backup_dir = "/data/data/com.termux/files/home/miniGPT_project/Tian AI/backups/"

    def apply_patch(self, filepath, old_code, new_code):
        """Apply a code patch with automatic backup."""
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        backup_path = f"{self.backup_dir}{os.path.basename(filepath)}.{timestamp}.bak"

        # 1. Backup original
        shutil.copy2(filepath, backup_path)

        # 2. Apply the change
        with open(filepath, 'r') as f:
            full_source = f.read()

        if old_code not in full_source:
            raise PatchError("Old code not found — patch rejected")

        new_source = full_source.replace(old_code, new_code, 1)

        # 3. Verify syntax before saving
        try:
            compile(new_source, filepath, 'exec')
        except SyntaxError as e:
            raise PatchError(f"Syntax error in patch: {e}")

        # 4. Save
        with open(filepath, 'w') as f:
            f.write(new_source)

        return {
            "status": "applied",
            "backup": backup_path,
            "file": filepath
        }

The Full Evolution Loop

1. SCAN ──→ AST walk all .py files
                ↓
2. ANALYZE ──→ Complexity metrics, duplication detection, code smells
                ↓
3. SUGGEST ──→ Send analysis report to LLM with structured prompt
                ↓
4. DECIDE ──→ LLM returns patches (old code → new code)
                ↓
5. APPLY ──→ PatchEngine applies with backup + syntax verification
                ↓
6. VERIFY ──→ Run compile(), run basic assertions
                ↓
7. COMMIT ──→ If verified, git commit with auto-generated message

Each successful evolution cycle grants XP to the system, which contributes to leveling up and unlocking new capabilities.

5. LLMManager — Process Lifecycle Management

The LLMManager is responsible for starting, monitoring, and restarting the llama.cpp server process. This is critical because the LLM backend is a separate C++ process that can crash under memory pressure.

class LLMManager:
    def __init__(self, model_path, port=8080, threads=4, context=2048):
        self.model_path = model_path
        self.port = port
        self.threads = threads
        self.context = context
        self.process = None
        self._health_url = f"http://localhost:{port}/health"
        self.restart_count = 0
        self.max_restarts = 5

    def start(self):
        """Spawn the llama-server process."""
        cmd = [
            "llama-server",
            "-m", self.model_path,
            "--port", str(self.port),
            "-t", str(self.threads),
            "-c", str(self.context),
            "--mlock",  # Prevent swapping
            "--no-mmap",  # Use malloc instead of mmap
        ]
        self.process = subprocess.Popen(
            cmd,
            stdout=subprocess.DEVNULL,
            stderr=subprocess.DEVNULL
        )

    def health_check(self):
        """Check if the LLM process is responsive."""
        try:
            resp = requests.get(self._health_url, timeout=3)
            return resp.status_code == 200
        except:
            return False

    def wait_until_ready(self, timeout=60):
        """Poll health endpoint until the model is loaded."""
        start = time.time()
        while time.time() - start < timeout:
            if self.health_check():
                return True
            time.sleep(2)
        return False

    def restart(self):
        """Graceful restart with automatic retry."""
        if self.restart_count >= self.max_restarts:
            raise RuntimeError("Max restarts exceeded")
        self.stop()
        self.start()
        if self.wait_until_ready():
            self.restart_count += 1
            return True
        return False

    def auto_recover(self):
        """Monitor and auto-restart on crash."""
        while True:
            if not self.health_check():
                logger.warning("LLM process unresponsive — restarting...")
                if not self.restart():
                    logger.error("Failed to restart LLM process")
                    break
            time.sleep(10)  # Check every 10 seconds

    def stop(self):
        """Terminate the LLM process."""
        if self.process and self.process.poll() is None:
            self.process.terminate()
            try:
                self.process.wait(timeout=5)
            except subprocess.TimeoutExpired:
                self.process.kill()

Key design decisions:

--mlock + --no-mmap prevents the OS from swapping the model to disk, which would cause catastrophic slowdowns
Separate health check thread runs every 10 seconds
Max 5 restart attempts before giving up (prevents infinite crash loops)
Process is launched with stdout/stderr suppressed to avoid filling up logs on the phone

6. PromptCache — LRU + TTL Caching Strategy

To avoid regenerating responses for identical queries (common in multi-turn conversations), the system implements a combined LRU+TTL cache:

class PromptCache:
    def __init__(self, max_size=100, ttl_seconds=300):
        self.cache = OrderedDict()  # LRU ordering
        self.max_size = max_size
        self.ttl = ttl_seconds
        self.timestamps = {}  # key → timestamp

    def get(self, key):
        if key not in self.cache:
            return None
        # Check TTL
        if time.time() - self.timestamps[key] > self.ttl:
            self._evict(key)
            return None
        # Move to end (LRU update)
        self.cache.move_to_end(key)
        return self.cache[key]

    def put(self, key, value):
        # Evict oldest if at capacity
        if len(self.cache) >= self.max_size:
            oldest = next(iter(self.cache))
            self._evict(oldest)
        self.cache[key] = value
        self.timestamps[key] = time.time()
        self.cache.move_to_end(key)

    def _evict(self, key):
        del self.cache[key]
        del self.timestamps[key]

Cache key composition: f"{mode}:{query}:{context_hash}" where mode is the thinking mode, query is the user input, and context_hash is a SHA256 of the conversation context.

Project Statistics

Here are the raw numbers from the codebase:

Metric	Value
Python files	770
Total lines of code	~170,041
Core modules	6 (Thinker, Talker, Knowledge, Agent, Evolution, LLMManager)
Extension languages	3 (C, C++, Java)
C files	1 (tian_hash — fast hashing)
C++ files	1 (tian_engine — performance engine)
Java files	1 (tian_tools — Android tooling)
Knowledge Base size	Millions of indexed concepts
Supported LLM	Qwen2.5-1.5B (GGUF quantized)
Backend framework	Flask
Alternative UI	Gradio

The project is backed by a self-hosted GitHub mirror at github.com/3969129510/tian-ai.

What's Next

The architecture is designed for extensibility. Future directions include:

Plugin system — Hot-loadable agent plugins with sandboxed execution
Multi-modal pipeline — Image/audio understanding via local models
Distributed agents — Multiple Tian AI instances collaborating over LAN
Federated evolution — Privacy-preserving code improvement across instances
Android APK — Standalone app packaging (no Termux required)

Getting Involved

Tian AI is fully open source. Contributions, issues, and forks are welcome.

git clone https://github.com/3969129510/tian-ai
cd tian-ai
# Explore the architecture
ls -la modules/
cat Thinker.py | head -100

Support development:
USDT (TRC-20): TNeUMpbwWFcv6v7tYHmkFkE7gC5eWzqbrs
BTC: bc1ph7qnaqkx4pkg4fmucvudlu3ydzgwnfmxy7dkv3nyl48wwa03kmnsvpc2xv

Tian AI — Open Source. Local. Self-Evolving.

GitHub: github.com/3969129510/tian-ai

AI Is Becoming Infrastructure

Jono Herrington — Sat, 25 Apr 2026 13:09:42 +0000

I was talking to a tech lead a couple of months ago, over drinks. Someone who's been shipping production code for over twenty years. I asked him, almost as an afterthought, when was the last time he wrote a line of code without AI assistance.

He paused. Actually paused. "Over a year," he said. "Maybe longer."

The number wasn't the surprising part. It was the realization in his voice. He hadn't noticed the transition. It just ... happened.

I've been thinking about that conversation ever since. Not because it's unusual, but because it's becoming the norm. The principal engineers I talk to, the ones running platforms at scale, are quietly shifting how they work. Not announcing it. Not debating it. Just ... doing it.

This is how infrastructure arrives. Not with a press release. With a shrug.

This is how infrastructure arrives. Not with a press release. With a shrug.

The Pattern We Keep Repeating

I remember when Docker was controversial.

Two senior engineers I worked with got into an actual argument about it. Virtual machines versus containers. Resource overhead versus isolation guarantees. One of them was convinced Docker was a toy for startups. The other thought VMs were dinosaurs.

I didn't understand the intensity at the time. I was too busy playing with the new tool to notice the religious war forming around it. But looking back, I see what was happening. People were trying to apply Docker in places where it didn't fit yet. Companies with three engineers treating containerization like they were Google. The tool was right, but the context was wrong.

Then CI/CD was a "nice to have." Then Kubernetes was "overkill." Every infrastructure shift starts as controversy before it becomes furniture.

The pattern is consistent. First comes the debate ... should we use this? Then comes the adoption ... how do we use this? Then, eventually, comes the invisibility ... we use this.

We're somewhere between adoption and invisibility now.

The Christmas Tree Problem

Don't get me wrong. AI is not fully infrastructure yet.

Look at Claude's status page on any given Tuesday. Green dots, yellow dots, red dots. Systems failing, recovering, failing again. It's less like reliable plumbing and more like a Christmas tree lighting up in sequence. We're failing, but we're failing fast. We're learning in public at scale.

The flakiness is part of the transition. Early CI/CD pipelines broke constantly. Early container orchestration ate memory and crashed nodes. Infrastructure doesn't arrive fully formed. It arrives with bruises.

But here's what I'm noticing. The conversations are changing.

A VP at a lifestyle brand told me his engineering teams are still asking ... "Should we use AI?" Three months ago, it became ... "How do we govern AI?" Now I'm hearing ... "How do we orchestrate agents through our existing pipeline?"

The question moved from permission to plumbing. That's the signal.

The question moved from permission to plumbing. That's the signal.

What Infrastructure Actually Looks Like

Infrastructure has a specific quality ... you stop talking about it.

Nobody holds meetings about DNS. Nobody writes strategy documents about load balancers. These things just ... are. They're assumed. They're the medium, not the message.

AI is getting there. Not because the tools are perfect. Because the workflows are embedding.

On my team, agents are part of the SDLC now. Not alongside it. Inside it. Code reviews don't separate "AI-assisted" from "manual" anymore. They review the output. Deployment pipelines don't ask who wrote the code. They validate that it meets standards. The distinction between "using AI" and "not using AI" is becoming as irrelevant as "using an IDE" versus "using a text editor."

It's not a specific workflow. It's all the workflows. Not a specific integration point. All the integration points.

This is what people miss when they ask ... "Is AI ready for production?" The question assumes AI is a thing you adopt or don't adopt. But infrastructure doesn't work that way. You don't adopt plumbing. You build houses, and the plumbing is just ... there.

The FOMO Trap

There's a counter-narrative I need to address. Every infrastructure shift creates this tension.

Large companies need orchestration at scale. They have thousands of services, complex dependency graphs, teams that can't possibly understand every system they touch. Kubernetes makes sense for them.

Smaller companies see the success stories and want the same. They have twelve engineers and three services, but they're standing up a full Kubernetes cluster because ... "that's what the big players do." They're applying infrastructure to problems that don't need it yet.

AI has the same trap. I see teams with straightforward CRUD applications trying to build agent orchestration frameworks. I see companies with simple deployment pipelines adding complexity they can't maintain because "AI is the future."

The future arrives unevenly. Not every team needs agents orchestrated through their pipeline. Not every problem benefits from AI automation. The infrastructure shift is real, but that doesn't mean you should front-run it.

Know your context. Know your scale. Know what problem you're actually solving.

Six Months From Now

I'll make a prediction I'm reasonably confident about.

Six months from now, asking an engineer if they "use AI" will feel as strange as asking if they "use Git." The question assumes a choice that no longer exists as a meaningful distinction.

The teams that thrive won't be the ones with the best AI policies. They'll be the ones where AI has become invisible. Where agents run through pipelines without fanfare. Where the conversation shifted from "should we" to ... "how do we make this reliable."

The revolution isn't coming. It's quietly becoming the status quo.

Blockchain faded because it never became infrastructure. It stayed specialized, stayed controversial, stayed a solution looking for problems that fit it. Every use case felt forced because the infrastructure never integrated.

AI is different. It's integrating. Messily, imperfectly, with status pages that look like Christmas trees. But integrating nonetheless.

We're not fully there. But I can see the transition from here. I've watched it happen enough times to recognize the pattern.

Infrastructure doesn't make headlines. It just enables everything that does.

One email a week from The Builder's Leader. The frameworks, the blind spots, and the conversations most leaders avoid. Subscribe for free.

I watched AI Agents Take Over the Cloud Live from Google NEXT '26, and Nothing Will Be the Same

Oni — Sat, 25 Apr 2026 13:09:00 +0000

This is a submission for the Google Cloud NEXT Writing Challenge

Las Vegas, April 22, 2026. The lights are bright. The room holds thousands of developers. And up on stage, Google is about to change the way we think about software forever.

The Morning Everything Shifted

I woke up at 6:30 AM just to watch a keynote.

That sentence alone should tell you something.

I have watched hundreds of tech keynotes over the years. Product launches. "One more thing" moments. Slides full of benchmarks. But Google Cloud NEXT '26 felt different from the first minute.

Google CEO Thomas Kurian walked onto the stage and said two words that set the tone for everything that followed:

"The Agentic Cloud."

Not AI-assisted. Not AI-powered. Agentic.

Meaning: the cloud does not just store your data or run your code anymore. It acts. It decides. It coordinates. It works while you sleep.

This is the shift I have been waiting to see articulated clearly, and Google just drew the map.

What Even Is an "Agentic Cloud"

Let me break this down in plain language before we go deep.

Traditional cloud = you write code, deploy it, it runs when called.

Agentic cloud = you describe a goal, and a network of AI agents figures out the steps, executes them, monitors results, and corrects itself.

Think of it like the difference between hiring a contractor who waits for your instructions versus hiring a project manager who runs the whole thing and only loops you in when needed.

Google is betting the entire next era of cloud computing on this model.

And based on what they showed at NEXT '26, the bet is already paying off.

The Announcements That Stopped Me Mid-Coffee

1. Vertex AI is Dead. Long Live the Gemini Enterprise Agent Platform.

This was the biggest rename in Google Cloud history, and it was not just cosmetic.

Vertex AI has been rebranded and rebuilt as the Gemini Enterprise Agent Platform.

What changed:

Over 200 models available, including third-party ones like Anthropic's Claude
A visual, no-code agent builder for Google Workspace
Managed MCP (Model Context Protocol) servers across Google Cloud services
Production-grade Agent2Agent (A2A) protocol for cross-platform agent communication

The A2A protocol is the piece that matters most to me as a developer.

# What A2A enables:
Agent_A (built on Gemini) <---> Agent_B (built on Claude) <---> Agent_C (custom model)
     All communicating in a shared, open standard

No more vendor lock-in at the agent layer. This is huge.

"We are leading the industry with open standards like the Agent2Agent protocol, ensuring agents can communicate and interoperate regardless of their underlying model or platform."
-- Google Cloud documentation

2. Meet Project Mariner: The Agent That Browses the Web For You

This one made me put my coffee down.

Project Mariner is Google DeepMind's web-browsing AI agent, powered by Gemini 2.0.

Here is what it can do:

Scores 83.5% on the WebVoyager benchmark (the standard test for web agents)
Handles 10 concurrent tasks simultaneously on cloud-based virtual machines
Automates shopping, form-filling, and information retrieval
Runs in the background while you do other work

The roadmap alone is exciting:

Quarter	Feature
Q2 2026	Mariner Studio (visual builder)
Q3 2026	Cross-device synchronization
Q4 2026	Agent marketplace

Imagine telling your AI: "Book me a flight under $400, compare hotel reviews in the area, and add the best option to my calendar."

That is not science fiction anymore. That is Project Mariner on a Tuesday morning.

3. The Chip That Powers It All: 8th Gen TPU

Every software leap needs a hardware foundation.

Google announced their 8th generation TPU family with two purpose-built architectures:

TPU 8t -- optimized for training frontier models
TPU 8i -- optimized for real-time inference

Both are hosted on Google's own Axion ARM-based processors for the first time, creating a fully co-designed stack from chip to API.

This is not just a speed upgrade. It is a philosophy shift: specialized hardware for specialized workloads.

The result: Gemini 2.0 Flash running on this infrastructure achieves 24x higher intelligence per dollar compared to GPT-4o, and 5x higher than DeepSeek R1.

Those numbers are hard to ignore when you are building production applications at scale.

4. Gemini Enterprise: One Product, Every Employee

Google also consolidated Google Agentspace into a unified product called Gemini Enterprise.

What this means for developers and businesses:

Single interface for intranet search, AI assistance, and agentic workflows
Prebuilt connectors for Confluence, Jira, SharePoint, ServiceNow
No-code agent creation in Google Workspace
Custom agents deployable in days, not months
Multimodal search across all your organization's data

# The old way:
search_tool = IntranetSearch()
ai_assistant = DuetAI()
agent_builder = VertexAI()
# 3 separate products, 3 separate learning curves

# The new way:
gemini_enterprise = GeminiEnterprise()
# One platform. Everything connected.

The consolidation removes friction. And in enterprise software, friction is the enemy of adoption.

The Part Nobody Is Talking About: MCP Servers

Buried in the announcements but enormous in impact: managed MCP servers across Google Cloud services.

MCP stands for Model Context Protocol. It is the standard that lets AI models connect to external tools and data sources in a secure, structured way.

Google is now offering managed MCP servers for:

Google Security Operations
Google Workspace
BigQuery
And more services rolling out through 2026

This means you can build a custom security agent, point it at your Google Security Operations MCP server, and it instantly has context-aware access to your threat data.

No custom API integrations. No brittle webhooks. Just:

# Connect your agent to Google Security Operations
agent.connect(mcp_server="google-security-operations")
agent.run("Analyze all anomalies from the last 7 days and summarize critical threats")

Clean. Powerful. The kind of thing that makes security engineers sleep better.

My Honest Take: What This Means for Developers Like Us

I have been building with AI tools since the early days of GPT-3. I have seen the hype cycles. I have also seen the real breakthroughs.

NEXT '26 felt like a real breakthrough moment.

Here is why I believe that:

The stack is finally complete. Hardware (TPU 8t/8i), runtime (Gemini Enterprise Agent Platform), protocol (A2A + MCP), and interface (Gemini Enterprise app) are all aligned and shipping together.

The open standards matter. A2A and MCP are not proprietary lock-in plays. They are Google betting on ecosystem growth over short-term control. That is a mature, confident move.

The developer experience is genuinely better. 200+ models in one place. No-code builders alongside pro-code APIs. Managed infrastructure for the messy parts. This is what developer-first actually looks like.

The Things I Am Still Watching

Not everything from NEXT '26 has me fully convinced yet.

A few honest questions I am sitting with:

A2A interoperability sounds great in theory. Does it hold up when Claude agents and Gemini agents are actually passing complex state to each other in production?
Project Mariner at scale -- 10 concurrent tasks is impressive, but enterprise workflows often involve 100x that. What happens to error handling and recovery at that volume?
MCP server governance -- who controls access, who logs what, and how does this work in regulated industries like healthcare or finance?

These are not dealbreakers. They are the right questions to be asking as we move from keynote excitement to production reality.

Getting Started Right Now

If you want to dive in today, here is your starting map:

For Agent Builders

Explore the Gemini Enterprise Agent Platform
Read the Agent2Agent Protocol spec
Try the no-code agent builder in Google Workspace

For ML Engineers

Check out the 8th Gen TPU page
Benchmark Gemini 2.0 Flash for your inference workloads
Explore the 200+ model catalog on the new platform

For Security Teams

Look into MCP server support for Google Security Operations
Test the new security agent builder capabilities

A Personal Note

I build AI content and tools for a living. I watch this space every single day.

But watching the Google Cloud NEXT '26 opening keynote this morning gave me a feeling I do not get often: the sense that the foundation just got a lot more solid under my feet.

The agentic era is not coming. It arrived this morning in Las Vegas.

And for developers who are ready to build on it, the tools have never been better.

What announcement from NEXT '26 has you most excited?

Drop it in the comments. I read every single one.

Tags: googlecloud gemini ai agents devops machinelearning cloudnextchallenge devchallenge

Borrowed Strings: API Designs That Cut 94% of Allocations

speed engineer — Sat, 25 Apr 2026 13:00:00 +0000

The 6ms latency improvement from one character change — how &str over String transformed our hot path performance

Borrowed Strings: API Designs That Cut 94% of Allocations

The 6ms latency improvement from one character change — how &str over String transformed our hot path performance

String borrowing eliminates ownership transfer costs — APIs designed around &str instead of String prevent allocations and enable zero-copy performance.

One character change in our API signature — from String to &str—eliminated 2.4 million allocations per second. Our text processing service was hemorrhaging memory and CPU on unnecessary string copies. Every API call took ownership of strings, forcing allocations even when we just needed to read them.

The symptoms were clear but the cause was hidden:

P99 latency: 47ms
Allocations: 2,400,000/sec
GC pressure: Constant
Memory churn: 847MB/sec
Throughput: 12,000 req/sec

Then we profiled and saw the truth: 94% of our allocations were defensive string copies. Our APIs demanded owned String when they only needed to read. Users had to .to_owned() or .to_string() every call, even for temporary operations.

We redesigned our entire API surface around borrowed strings. The results were transformative:

After ( &str everywhere):

P99 latency: 41ms (13% better)
Allocations: 140,000/sec (94% reduction!)
GC pressure: Minimal
Memory churn: 52MB/sec (94% reduction!)
Throughput: 18,400 req/sec (53% increase!)

The same functionality, the same safety guarantees, but zero unnecessary copies. Here’s how we did it — and the seven API patterns that eliminated allocations without sacrificing ergonomics.

The String Ownership Tax

Rust has three string types, and choosing wrong costs performance:

**String** - Owned, heap-allocated, growable
**& str** - Borrowed, reference to string data
**Cow <'a, str>** - Clone-on-write, smart about allocation

Our original API looked clean but hid expensive operations:

// After: take a &str — zero extra allocations for callers who already have a &str  
pub fn validate_email(email: &str) -> bool {  
    email.contains('@') &&            // yep, still naive; we’re only fixing the ownership story  
    email.contains('.') &&  
    !email.is_empty()  
}  

// Usage (no allocation)  
let valid = validate_email(user_input);             // `&str` all the way, cheap + cheerful  

// If you want to be extra friendly to callers, accept anything “stringy”  
pub fn validate_email_flexible<S: AsRef<str>>(email: S) -> bool {  
    let s = email.as_ref();            // borrow without allocating  
    s.contains('@') && s.contains('.') && !s.is_empty()  
}  

// Works with &str, String, Cow<'_, str>, etc., all without new allocations:  
let a = validate_email_flexible(user_input);  
let b = validate_email_flexible(user_owned_string);

Every call allocated, even though validate_email only reads the string. With 2.4M validations per second, that's 2.4M unnecessary allocations.

| The critical insight: APIs should borrow by default, own only when necessary.

Pattern #1: &str for Read-Only Operations

The fundamental optimization — accept borrows for read-only operations:

// after: same logic, kinder to callers — borrows &str so no extra allocations anywhere  
pub fn validate_email(email: &str) -> bool {  
    // still a deliberately naive check; we’re only fixing ownership here, not spec-grade validation  
    email.contains('@') &&            // quick sanity: needs an @  
    email.contains('.') &&            // and a dot somewhere (yeah, simplistic)  
    !email.is_empty()                 // obviously can’t be empty  
}  

// usage: all zero-copy borrows — no new Strings created just to call the function  
let valid = validate_email(&user_input);        // borrowing from an existing &str  
let valid = validate_email("test@example.com"); // string literal is already &str  
let valid = validate_email(&owned_string);      // borrow from a String without allocating

Benchmark (10M validations):

String parameter:

Runtime: 847ms
Allocations: 10,000,000
Peak memory: 3.2GB
GC pauses: 247ms total

& str parameter:

Runtime: 234ms (72% faster!)
Allocations: 0
Peak memory: 8MB
GC pauses: 0ms

The performance difference is stunning. But the ergonomics improved too — callers can pass &str, &String, or string literals without conversion.

Pattern #2: AsRef for Maximum Flexibility

Sometimes you want to accept anything string-like:

// generic + friendly: accept anything “stringy”, return a fresh owned String  
pub fn normalize_email<S: AsRef<str>>(email: S) -> String {  
    email  
        .as_ref()        // borrow without allocating (works for &str, String, etc.)  
        .trim()          // shave off accidental spaces/newlines at the edges  
        .to_lowercase()  // emails are case-insensitive (local-part case rules aside)  
}  

// works with everything — all callers compile down to the same borrow-then-own flow  
let s1 = normalize_email("Test@Example.com");     // &str literal  
let s2 = normalize_email(&owned_string);          // borrow from String  
let s3 = normalize_email(String::from("test"));   // move a String in  
let s4 = normalize_email(Box::new("test"));       // even boxed &str via AsRef<str>

When to use: Functions that work with any string-like type but don’t need ownership.

Performance: Near-zero cost — monomorphization creates specialized versions, no trait object overhead.

We converted 187 API functions to use AsRef<str>. Result:

Caller allocations: Down 78%
API documentation: Clearer (one function vs many overloads)
Generic code: Eliminated 234 wrapper functions

Pattern #3: Cow<’_, str> for Conditional Ownership

When you might need to modify but usually don’t:

use std::borrow::Cow; // borrow-or-own smart pointer — perfect for “allocate only if we must”  

pub fn sanitize_html<'a>(input: &'a str) -> Cow<'a, str> {  
    // quick escape hatch: if nothing needs escaping, don’t touch it  
    if needs_escaping(input) {  
        // we *might* double the size (every char → entity), so start roomy and avoid re-allocs  
        let mut output = String::with_capacity(input.len() * 2);  

        // walk the input once; swap problem chars with their HTML entities  
        for c in input.chars() {  
            match c {  
                '<' => output.push_str("&lt;"),   // less-than → &lt;  
                '>' => output.push_str("&gt;"),   // greater-than → &gt;  
                '&' => output.push_str("&amp;"),  // ampersand → &amp;  
                _ => output.push(c),              // everything else passes through  
            }  
        }  

        Cow::Owned(output) // we modified it, so return an owned String  
    } else {  
        // best case: zero-copy — no allocation, no work  
        Cow::Borrowed(input)  
    }  
}

Real-world data from our HTML sanitizer:

Processing 1M HTML snippets:

94% needed no escaping → Cow::Borrowed (zero-copy)
6% needed escaping → Cow::Owned (allocated)

Results:

Total allocations: 60,000 (vs 1,000,000 always-owned)
Average latency: 2.1μs (vs 34μs)
Memory throughput: 23MB/sec (vs 847MB/sec)

The 94% fast path made Cow a massive win. Most inputs didn't need modification, so we avoided most allocations.

Pattern #4: String Interning for Repeated Values

When you see the same strings repeatedly:

// goal: intern strings => one global copy; return &'static str; yes, we leak by design.  

use std::collections::HashSet;      // set for fast “seen?” checks  
use once_cell::sync::Lazy;          // lazy init for globals  
use std::sync::Mutex;               // simple thread safety  

// global pool of canonical &'static str  
static STRING_POOL: Lazy<Mutex<HashSet<&'static str>>> =  
    Lazy::new(|| Mutex::new(HashSet::new()));  

/// If we've seen `s`, return the same &'static str; else leak a new one and store it.  
/// tradeoff: tiny leaks for stable identity + speed; fine for small vocabularies.  
pub fn intern(s: &str) -> &'static str {  
    let mut pool = STRING_POOL.lock().unwrap();             // grab lock (good enough for demo)  
    if let Some(&interned) = pool.get(s) {                  // already there?  
        return interned;                                    // yup — reuse pointer  
    }  
    let leaked: &'static str = Box::leak(                   // not found: make it immortal…  
        s.to_string().into_boxed_str()                      // own it, box it,  
    );                                                      // …and never free it (on purpose)  
    pool.insert(leaked);                                    // remember for next time  
    leaked                                                  // hand back the canonical ref  
}  

// tiny demo to prove pointer identity  
fn main() {  
    let status1 = intern("active");                         // first insert  
    let status2 = intern("active");                         // reuse same pointer  
    assert!(std::ptr::eq(status1, status2));                // identity holds  
    println!("interned: {status1:?} == {status2:?} ✅");     // quick victory lap  
}

Real-world case: User status strings

Our user management API had millions of status checks. Only 5 distinct status values:

“active” — 89% of users
“inactive” — 8% of users
“pending” — 2% of users
“suspended” — 0.8% of users
“banned” — 0.2% of users

Without interning:

Memory usage: 2,300MB (status strings)
String comparisons: 1,240ns avg

With interning:

Memory usage: 47MB (string pool)
String comparisons: 8ns avg (pointer equality!)

We interned status strings, reducing memory by 98% and making comparisons 155x faster through pointer comparison.

Pattern #5: Zero-Copy Parsing with Borrowed Slices

Parse without allocating intermediate strings:

// tiny http request parser, zero-copy-ish and deliberately simple.  
// i’m aiming for "works for basic requests", not full RFC wizardry. breathe. keep it human.  

#[derive(Debug)]                              // we’ll want to print errors without drama  
pub enum ParseError {                         // bare-minimum error shape; good enough for demo  
    Empty,                                    // input was empty (no first line to parse)  
    InvalidRequestLine,                       // method path version not exactly three parts  
    NoHeaderSection,                          // couldn’t find the headers/body separator  
}  

#[derive(Debug)]  
pub struct HttpRequest<'a> {  
    method: &'a str,                          // e.g., "GET" — borrowed from input  
    path: &'a str,                            // e.g., "/index.html" — also a borrow  
    headers: Vec<(&'a str, &'a str)>,         // header name/value pairs, all borrowed  
    body: &'a [u8],                           // body as bytes (don’t assume UTF-8)  
}  

impl<'a> HttpRequest<'a> {  
    /// Parse a raw HTTP request string into borrowed views.  
    pub fn parse(input: &'a str) -> Result<Self, ParseError> {  
        if input.is_empty() {                  // first: do we even have anything?  
            return Err(ParseError::Empty);     // nope — bail early  
        }  

        // find the end of headers: ideally CRLF CRLF, but fall back to LF LF (because… real life)  
        // i started with lines.len math, then remembered: slicing needs *byte* offsets. backtrack!  
        let (head, body_str) = if let Some(idx) = input.find("\r\n\r\n") {  
            // split at CRLFCRLF; body starts *after* that 4-byte separator  
            (&input[..idx], &input[idx + 4 ..]) // header text, body text  
        } else if let Some(idx) = input.find("\n\n") {  
            // okay, some clients just do LF; it happens in toy servers/tests  
            (&input[..idx], &input[idx + 2 ..])  
        } else {  
            // no separator means either no headers or malformed request  
            return Err(ParseError::NoHeaderSection);  
        };  

        // now parse the start-line + headers from `head` (which is the header block)  
        let mut head_lines = head.lines();     // iterate lines safely (CRLF handled by .lines())  

        // request line: METHOD SP PATH SP HTTP/VERSION (we only check len == 3)  
        let first_line = head_lines.next().ok_or(ParseError::Empty)?; // must exist  
        let parts: Vec<&str> = first_line.split_whitespace().collect(); // split by any spaces/tabs  
        if parts.len() != 3 {                  // we’re strict here because ambiguity is pain  
            return Err(ParseError::InvalidRequestLine);  
        }  
        let method = parts[0];                 // borrow directly — zero copies  
        let path   = parts[1];                 // ditto (we’re ignoring the version)  

        // parse headers: "Name: value" per line, preserve borrowing  
        let mut headers = Vec::new();          // store (&str, &str) pairs  
        for line in head_lines {               // walk remaining header lines  
            if line.is_empty() {               // defensive: though we split at blank, tolerate extras  
                continue;                      // skip empties  
            }  
            if let Some(pos) = line.find(':') {// find the first colon: separates name/value  
                let name  = &line[..pos];      // header name (no trim per spec; names are token chars)  
                let value = line[pos + 1 ..].trim(); // header value — trim spaces around  
                headers.push((name, value));   // stash the pair  
            } else {  
                // no colon? meh — ignore malformed line; could also error out if you prefer  
                // (i'm choosing leniency because that’s what you want in a toy parser)  
            }  
        }  

        // body is whatever remains after the separator — as bytes, no assumptions  
        let body = body_str.as_bytes();        // don’t force UTF-8; binary is common  

        Ok(HttpRequest {                      // finally, assemble the borrow-only struct  
            method,                           // "GET" / "POST" etc.  
            path,                             // "/things?x=1"  
            headers,                          // collected pairs  
            body,                             // borrowed bytes  
        })  
    }  
}  

// --- tiny demo, because seeing it work calms the nerves ---  
fn main() {  
    // quick, slightly messy request with LF-only newlines to prove the fallback works  
    let raw = "GET /hello HTTP/1.1\nHost: example.com\nContent-Length: 5\n\nhello";  

    // parse the thing; if it explodes, i want to *see* it  
    let req = HttpRequest::parse(raw).expect("failed to parse");  

    // sanity checks — not exhaustive, just “does this smell right”  
    assert_eq!(req.method, "GET");             // request line captured method  
    assert_eq!(req.path, "/hello");            // and path (we’re ignoring the version on purpose)  
    assert_eq!(req.headers.len(), 2);          // we fed 2 headers  
    assert_eq!(req.body, b"hello");            // body is exactly 5 bytes  

    println!("{req:#?}");                      // take a victory lap  
}

Benchmark (parsing 1M requests):

Owned strings (String everywhere):

Runtime: 3,847ms
Allocations: 12,000,000 (method + path + headers)
Peak memory: 8.2GB

Borrowed slices ( &str everywhere):

Runtime: 234ms (94% faster!)
Allocations: 1,000,000 (just Vec allocations)
Peak memory: 340MB

The parser points into the original buffer instead of copying. As long as the original input lives, the parsed structure is valid — zero copying, maximum performance.

Pattern #6: Smart String Builders

When you need to build strings, borrow during construction:

// tiny, opinionated string formatter — collects borrowed pieces (&str) and joins them later  
// idea: pre-compute exact capacity so we allocate only once in `build()`.  
// also: keep it zero-copy on inputs (we just borrow &str), so super lightweight.  

#[derive(Debug)]                              // because printing during debugging is therapy  
pub struct StringFormatter <'a> {  
    parts: Vec<&'a str>,                      // stash of string slices; we don't own them  
    separator: &'a str,                       // the glue between parts (", ", " | ", etc.)  
}  

impl<'a> StringFormatter<'a> {  
    /// make a new formatter with a chosen separator  
    pub fn new(separator: &'a str) -> Self {  
        Self {  
            parts: Vec::new(),                // start empty; we'll push as we go  
            separator,                        // remember the glue  
        }  
    }  

    /// add a new piece; returns &mut Self for chain-y vibes  
    pub fn add(&mut self, part: &'a str) -> &mut Self {  
        self.parts.push(part);                // just store the borrow; no allocation here  
        self                                   // allow .add(...).add(...).add(...)  
    }  

    /// convenience: add only if non-empty (sometimes you don't want stray separators)  
    pub fn add_if_nonempty(&mut self, part: &'a str) -> &mut Self {  
        if !part.is_empty() {                 // tiny guard to avoid "" in the output  
            self.parts.push(part);            // same as add, but conditional  
        }  
        self  
    }  

    /// build the final String with exactly one allocation (that’s the whole flex)  
    pub fn build(&self) -> String {  
        // edge case time: if there are no parts, this should just be empty. no drama.  
        if self.parts.is_empty() {            // avoid underflow on (len - 1) below  
            return String::new();             // zero parts → empty string  
        }  

        // how many separators do we need? between N parts, there are N-1 separators  
        let sep_count = self.parts.len() - 1; // safe because we handled len==0 above  

        // sum of all part lengths (no allocs yet) + separators  
        let parts_len: usize = self.parts  
            .iter()  
            .map(|s| s.len())                 // just lengths, please  
            .sum();  

        let total_len = parts_len + sep_count * self.separator.len(); // exact capacity  

        // pre-allocate so pushes don't reallocate; we're being a bit smug, yes  
        let mut result = String::with_capacity(total_len);  

        // now the simple, boring join loop (boring is good)  
        for (i, part) in self.parts.iter().enumerate() {  
            if i > 0 {                        // after the first item, insert glue  
                result.push_str(self.separator);  
            }  
            result.push_str(part);            // tack on the actual piece  
        }  

        debug_assert_eq!(result.len(), total_len, "capacity math went sideways"); // sanity  

        result                                 // and we’re done — one allocation 🎯  
    }  

    /// optional: consume builder and produce the string (ergonomic in some flows)  
    pub fn into_string(self) -> String {  
        self.build()                           // same implementation, just different signature  
    }  
}  

// --- demo time --- because proof beats vibes  
fn main() {  
    // let's assemble a tiny guest list; thoughts: order, commas, and oh,  
    // no trailing separator please (we got you)  
    let mut fmt = StringFormatter::new(", ");  // glue will be ", "  

    fmt.add("Alice")                           // first guest  
       .add("Bob")                             // second  
       .add_if_nonempty("")                    // noop thanks to the guard  
       .add("Charlie");                        // third — chaotic good  

    let result = fmt.build();                  // single allocation for the win  
    assert_eq!(result, "Alice, Bob, Charlie"); // yep  
    println!("{result}");                      // "Alice, Bob, Charlie"  
}

Benchmark (building 100K strings, 10 parts each):

Naive concatenation:

 // 10 allocations per string = 1M allocations  
let mut s = String::new();  
s.push_str(p1); s.push_str(", ");  
s.push_str(p2); s.push_str(", ");  
// ... etc

Runtime: 1,847ms
Allocations: 1,000,000
Peak memory: 1.2GB

StringFormatter:

Runtime: 187ms (90% faster!)
Allocations: 100,000 (one per string)
Peak memory: 140MB

By borrowing parts and allocating once with exact capacity, we eliminated 900K allocations.

String builders with borrowed parts minimize allocations — collect references first, allocate once with precise capacity for optimal memory efficiency.

Pattern #7: Lifetime-Aware Return Types

Return borrowed data when possible:

// goal: read config strings with minimal allocs; borrow when we can.  

use std::collections::HashMap;                 // we stash key/value pairs here  

#[derive(Debug)]  
pub struct Config {  
    data: HashMap<String, String>,             // own the strings; callers just borrow  
}  

impl Config {  
    // meh: always clones — simple but alloc-happy  
    pub fn get_bad(&self, key: &str) -> Option<String> {  
        self.data.get(key).cloned()            // copy-on-read (costly if frequent)  
    }  

    // better: borrow &'str from our owned String  
    pub fn get(&self, key: &str) -> Option<&str> {  
        self.data.get(key).map(|s| s.as_str()) // no alloc; just a view  
    }  

    // pragmatic: borrow value or fall back to a provided default  
    pub fn get_or_default<'a>(&'a self, key: &str, default: &'a str) -> &'a str {  
        self.data.get(key).map(|s| s.as_str()).unwrap_or(default) // still zero alloc  
    }  

    // tiny helper for examples  
    pub fn insert(&mut self, k: impl Into<String>, v: impl Into<String>) {  
        self.data.insert(k.into(), v.into());  // own the data once, up front  
    }  
}  

// quick sanity check — thoughts jump: does borrow survive? yes, tied to &self  
fn main() {  
    let mut cfg = Config { data: HashMap::new() }; // start empty  
    cfg.insert("mode", "release");                 // store owned strings  
    cfg.insert("color", "blue");                   // another one  

    let m = cfg.get("mode").unwrap();              // borrowed &str, no alloc  
    let z = cfg.get_or_default("zone", "us-east"); // fallback path  
    let bad = cfg.get_bad("color").unwrap();       // allocates (by design)  

    assert_eq!(m, "release");                      // borrowed value ok  
    assert_eq!(z, "us-east");                      // default used  
    assert_eq!(bad, "blue");                       // cloned string matches  
    println!("{m}, {z}, {bad}");                   // prints: release, us-east, blue  
}

Real-world impact in our config system:

Config reads: 18M/sec
Values rarely modified (98% reads)

Before (get_bad with cloning):

Allocations: 18,000,000/sec
Memory churn: 2.4GB/sec
Latency: 87ns per call

After (get with borrowing):

Allocations: 0/sec
Memory churn: 0MB/sec
Latency: 12ns per call (86% faster!)

Returning &str instead of String eliminated 18M allocations per second in our config hot path.

The Lifetime Complexity Trade-off

Borrowed strings introduce lifetime complexity. Here’s what we learned:

Simple case (no problem):

 fn process(input: &str) -> bool {  
    input.len() > 10  
}

Medium complexity (manageable):

 fn find_domain<'a>(email: &'a str) -> Option<&'a str> {  
    email.split('@').nth(1)  
}

Complex case (requires thought):

 struct EmailParser<'a> {  
    input: &'a str,  
    domain: Option<&'a str>,  
}  
impl<'a> EmailParser<'a> {  
    fn parse(input: &'a str) -> Self {  
        let domain = input.split('@').nth(1);  
        Self { input, domain }  
    }  
}

When lifetimes become painful:

 // This doesn't compile - lifetime conflicts  
struct Cache<'a> {  
    data: HashMap<String, &'a str>,  
}  
// Fix: Use String or Cow instead  
struct Cache {  
    data: HashMap<String, String>,  
}

Our rule: If lifetime annotations become confusing or restrictive, selectively use owned types. Optimize the hot path, not everything.

The Benchmarking Methodology

Our testing approach for reproducible results:

// benchmark owned vs borrowed validation; keep it small, no drama.  
use criterion::{black_box, criterion_group, criterion_main, Criterion};  

fn bench_owned(c: &mut Criterion) {  
    c.bench_function("validate_owned", |b| {  
        let email = String::from("test@example.com"); // owned String we’ll clone per iter  
        b.iter(|| {  
            validate_owned(black_box(email.clone()))   // bench includes clone cost  
        });  
    });  
}  

fn bench_borrowed(c: &mut Criterion) {  
    c.bench_function("validate_borrowed", |b| {  
        let email = "test@example.com";               // &'static str — zero alloc  
        b.iter(|| {  
            validate_borrowed(black_box(email))       // borrow; avoid cloning entirely  
        });  
    });  
}  

// group + entrypoint — Criterion’s standard glue  
criterion_group!(benches, bench_owned, bench_borrowed);  
criterion_main!(benches);  

// --- if you need minimal stubs to compile locally, uncomment below ---  
// fn validate_owned(s: String) -> bool { s.contains('@') }  
// fn validate_borrowed(s: &str) -> bool { s.contains('@') }

We ran benchmarks with:

1,000 warmup iterations
10,000 measurement iterations
Statistical significance testing
Allocation tracking with dhat

Decision Framework: When to Borrow vs Own

After 18 months using borrowed APIs, our guidelines:

Use &str When:

Function only reads the string
String is used temporarily
Performance matters (hot path)
Memory pressure is high
You control both sides of API

Use String When:

Ownership transfer is needed
String might be modified
Lifetime complexity becomes painful
Storing in long-lived structures
API crosses FFI boundaries

Use Cow <’_, str> When:

Modification is conditional
Most calls don’t need allocation
You need both owned and borrowed flexibility
Clone-on-write semantics match use case

Use AsRef When:

Maximum caller flexibility needed
Function is generic over string types
Zero-cost abstraction is maintained
No ownership transfer occurs

The Real-World Production Impact

After 24 months with borrowed string APIs in production:

Performance metrics:

P50 latency: 24ms (vs 32ms before)
P99 latency: 41ms (vs 47ms before)
Throughput: 18.4K req/sec (vs 12K before)
Memory usage: 52MB/sec (vs 847MB/sec before)

Developer experience:

Initial confusion: High (lifetimes are hard)
After 2 weeks: Moderate (patterns emerge)
After 2 months: Low (becomes natural)
Long-term: “Much cleaner” (team survey)

Unexpected benefits:

Cache locality improved (fewer heap allocations)
Debug builds 34% faster (less allocation overhead)
Code reviews easier (ownership is explicit)
Bugs reduced 23% (fewer clone-related issues)

Common Pitfalls We Hit

Pitfall #1: Over-Borrowing

 // Bad: Borrowed to death  
fn process<'a, 'b>(  
    s1: &'a str,   
    s2: &'b str,  
) -> Result<&'a str, &'b str> {  
    // Lifetime hell  
}  

// Better: Selectively own  
fn process(s1: &str, s2: &str) -> Result<String, String> {  
    // Clear ownership  
}

Pitfall #2: Premature Optimization

 // Bad: Optimizing cold path  
fn rarely_called(s: &str) {  
    // Called once per day  
}  

// Better: Keep simple  
fn rarely_called(s: String) {  
    // Ergonomics over performance  
}

Pitfall #3: Hidden Allocations

 // Looks fast, allocates


fn get_uppercase(s: &str) -> &str {


    // Can't return &str from to_uppercase!


    // Must allocate


}  

// Honest: Shows allocation


fn get_uppercase(s: &str) -> String {


    s.to_uppercase()  // Explicit allocation


}

The Long-Term Lesson

Two years of borrowed string APIs taught us: Ownership semantics aren’t just about safety — they’re about performance. Every unnecessary clone() or .to_string() is a memory allocation, a cache miss, and a latency spike.

The Rust type system makes ownership explicit. APIs that demand String force allocations. APIs that accept &str enable zero-copy. The difference between these approaches isn't theoretical—it's 94% fewer allocations, 13% better latency, and 53% more throughput.

The lesson: Design APIs that borrow by default, own only when necessary. Accept &str for reading, return &str when possible, use Cow for conditional allocation, and intern repeated strings.

Our text processing service now handles 18.4K requests per second on the same hardware that struggled with 12K. We eliminated 2.26 million allocations per second through thoughtful API design. The same functionality, the same safety, zero unnecessary copies.

Sometimes the best performance optimization is changing one character in a function signature — from String to &str.

Enjoyed the read? Let’s stay connected!

🚀 Follow The Speed Engineer for more Rust, Go and high-performance engineering stories.
💡 Like this article? Follow for daily speed-engineering benchmarks and tactics.
⚡ Stay ahead in Rust and Go — follow for a fresh article every morning & night.

Your support means the world and helps me create more content you’ll love. ❤️

9 Tools Big Tech Uses Internally (Now Open Source)

Tommaso Bertocchi — Sat, 25 Apr 2026 12:59:02 +0000

Most "best tools" lists are just GitHub trending with extra steps.

Same 10 repos. Same README marketing. Nothing that shows you how teams shipping at scale actually build their internal systems.

The actually interesting tools got built by engineers who had no choice but to build them.

Spotify needed to navigate 2,000 microservices. Uber needed workflows that didn't die silently. YouTube needed MySQL to scale horizontally. None of them built these tools for GitHub stars — they built them to survive the week.

That's the list.

I picked these based on:

Genuine internal origin — built and used in production before being open-sourced, not a side project that got donated
Still actively maintained — real commits in 2025–2026, active issues, responding maintainers
Solves a problem you'll actually hit — not theoretical Google-scale problems
Not already a commodity — nothing that's been in every DevOps job listing for five years
High complexity/value ratio — tools that take a day to set up but save months

TL;DR: The best infrastructure tools in 2026 aren't built by startups chasing a community — they're built by engineers who got tired of waiting for someone else to solve the problem.

Backstage — Spotify's developer portal, now the IDP standard
Temporal — Uber's durable workflow engine for code that can't fail mid-run
Vitess — YouTube's MySQL sharding layer, now powering PlanetScale
Envoy — Lyft's proxy that became the foundation of the service mesh market
OpenFGA — Auth0's Zanzibar-style fine-grained authorization
pompelmi — The zero-dep file scanner every serious prod team builds internally and never ships
Turborepo — Vercel's monorepo build system with remote caching
OpenTelemetry Collector — The observability pipeline every cloud provider adopted
Buf — Protobuf tooling that makes gRPC schema management survivable

1) Backstage — Spotify's developer portal, now the IDP standard

What it is: A framework for building internal developer portals — software catalogs, scaffolding, docs, and plugin-based integrations unified in one UI.

Why it matters in 2026: Spotify open-sourced Backstage because managing 2,000+ microservices without a catalog is organized chaos. The internal developer platform (IDP) space was previously only accessible to companies with a dedicated platform engineering team — Backstage changed that. If your engineers spend 20 minutes finding the right service or figuring out who owns a repo, that's a product problem dressed as a process problem. In 2026, the question isn't whether you need an IDP. It's why you haven't set one up yet.

Best for: platform engineering teams, orgs with 10+ services, DevOps leads trying to cut onboarding time, teams drowning in scattered Confluence docs.

Links: GitHub | Website

2) Temporal — Uber's durable workflow engine for code that can't fail mid-run

What it is: A workflow orchestration engine where application state is durable by default — your code resumes exactly where it left off after crashes, restarts, or deploys.

Why it matters in 2026: Cron jobs fail silently. Queues lose messages. Sagas get complicated faster than anyone wants to admit. Uber built Temporal (originally Cadence) because every existing alternative broke under real load — and the same breaking points hit every team that tries to orchestrate multi-step async work. The explosion of AI agents and multi-step pipelines in 2026 has made durable execution a baseline requirement. If your workflow can fail in the middle and leave a user in an unknown state, that's a bug.

Best for: long-running business processes, AI agent orchestration, payment and fulfillment flows, async pipelines where partial failure is unacceptable.

Links: GitHub | Website

3) Vitess — YouTube's MySQL sharding layer, now powering PlanetScale

What it is: A database clustering system for horizontal scaling of MySQL — the same system handling YouTube's query volume since 2010.

Why it matters in 2026: Most teams hit MySQL limits and immediately start planning a full migration to Postgres or a managed cloud DB. Vitess proves that migration is often the wrong answer. PlanetScale was built entirely on top of it, which means the operational understanding and tooling is now mature enough for teams well outside Google's infrastructure. Compute is cheap. Full DB migrations are expensive, slow, and high-risk. Vitess gives you a third option.

Best for: teams already on MySQL hitting read/write bottlenecks, orgs that can't afford a full DB migration, high-throughput SaaS apps with uneven load patterns.

Links: GitHub | Website

4) Envoy — Lyft's proxy that became the foundation of the service mesh market

What it is: A high-performance L7 proxy and communication bus built at Lyft, now the underlying layer of Istio, AWS App Mesh, and most major service mesh products.

Why it matters in 2026: Nginx handles traffic. Envoy understands services. The moment you need retries, circuit breaking, distributed tracing, and gRPC support in the same proxy — nothing else comes close. Lyft built it because no existing proxy could handle their microservice topology. It's now the de facto standard for any team running services at scale. If you're using a service mesh, you're almost certainly using Envoy without knowing it.

Best for: microservice architectures, teams running on Kubernetes, engineers needing deep per-request observability at the network layer.

Links: GitHub | Website

5) OpenFGA — Auth0's Zanzibar-style fine-grained authorization

What it is: An open-source authorization system based on Google's Zanzibar paper — the same model behind Google Drive and Docs permissions — built and production-tested by Auth0.

Why it matters in 2026: Role-based access control breaks down the moment you need "user X can edit document Y only if they're in project Z and the document isn't locked." Auth0 built OpenFGA because RBAC doesn't model real-world permission graphs — it approximates them, badly. With AI agents now needing scoped, auditable access to specific resources across multiple systems, authorization models that seemed over-engineered in 2022 are now the minimum viable approach.

Best for: multi-tenant SaaS products, platforms with document or resource-level permissions, teams building AI agents that need bounded, auditable access.

Links: GitHub | Website

6) pompelmi — The zero-dep file scanner every serious prod team builds internally and never ships

What it is: A minimal Node.js wrapper around ClamAV that scans any file and returns a typed Verdict (Clean, Malicious, ScanError). No daemons, no cloud, no native bindings, zero runtime dependencies.

Why it matters in 2026: Every team that accepts file uploads eventually writes something like this internally — a ClamAV wrapper buried in a utils folder that never gets cleaned up, documented, or tested properly. pompelmi is what that internal util should have been from the start: typed, tested, and actually installable in one line. With LLM-powered tools now generating and accepting files at scale, scanning uploads before they reach your storage layer isn't paranoid — it's baseline. You don't build a ClamAV wrapper because you want to. You build it because you got burned.

Best for: Node.js apps handling file uploads, SaaS platforms processing user-generated content, teams adding a security layer without adding new infrastructure.

Links: GitHub

7) Turborepo — Vercel's monorepo build system with remote caching

What it is: A high-performance build system for JavaScript/TypeScript monorepos with task pipelines, incremental computation, and shared remote cache.

Why it matters in 2026: Vercel built Turborepo because managing 15+ packages in a single repo with a chain of npm run build calls is a slow way to hate your CI. The caching alone — skipping work that hasn't changed — cuts CI time by 40–80% on most real codebases. Remote caching means your teammates benefit from builds you already ran. In a world where AI-assisted development moves at a different pace than legacy CI pipelines, waiting 12 minutes for a green check is a product bottleneck.

Best for: teams with shared component libraries, full-stack TypeScript monorepos, frontend platform teams with multiple apps deploying from one repo.

Links: GitHub | Website

8) OpenTelemetry Collector — The observability pipeline every cloud provider adopted

What it is: A vendor-agnostic agent for collecting, processing, and exporting telemetry (traces, metrics, logs) — the common layer between your app and any observability backend.

Why it matters in 2026: Datadog and New Relic are great until you see the bill at 10M spans per day. OpenTelemetry lets you instrument once and route anywhere — swap backends without rewriting a single line of instrumentation. Every major cloud provider now supports it natively. If you're still vendor-locked on your observability pipeline, you're one contract renewal from a painful, expensive migration. The CNCF graduating it in 2023 wasn't a formality — it was the industry agreeing this is the standard.

Best for: platform engineers building internal observability stacks, teams tired of vendor lock-in, anyone running services across multiple cloud providers.

Links: GitHub | Website

9) Buf — Protobuf tooling that makes gRPC schema management survivable

What it is: A build system, linter, breaking change detector, and schema registry for Protocol Buffers — with remote plugin execution and a full BSR (Buf Schema Registry) for sharing schemas across teams.

Why it matters in 2026: gRPC is excellent until you try to manage .proto files across 8 teams without accidentally breaking a consumer. Protobuf has no standard toolchain, and it shows — protoc is a command-line puzzle from 2008. Buf is what Google and Stripe already have internally: enforced compatibility rules, centralized schema distribution, and CI that fails before you ship a breaking change. With more internal services and AI APIs moving to gRPC for performance in 2026, the schema management problem goes from annoying to blocking.

Best for: teams using gRPC or Protobuf internally, platform engineers managing API schemas across multiple services, anyone doing API versioning where backward compatibility matters.

Links: GitHub | Website

Final thoughts

Every tool on this list started as a private repo someone had to fight to get open-sourced.

That's why the most interesting open-source releases right now aren't from startups optimizing for community growth. They're from engineering teams that:

Hit a wall that no existing tool could solve
Built something internal that actually worked under real load
Eventually decided the maintenance cost of keeping it private was higher than publishing it
Didn't design for adoption — and ended up getting adopted anyway

Backstage, Temporal, Vitess — all went through internal reviews, legal clearance, and months of cleanup before anyone outside the company could use them. That friction is actually a signal. If a team put in that work to open-source something they didn't have to share, it's usually because the tool genuinely solved something hard.

The irony is that the tools most worth your time have the least marketing behind them.

If I missed something obvious, drop it in the comments.

Which internal tool are you surprised wasn't open-sourced sooner?

Most Software Engineering -ilities Are Becoming Irrelevant in the Age of AI

Alvaro — Sat, 25 Apr 2026 12:58:16 +0000

For decades, engineering has been shaped around a set of principles that we rarely question. Maintainability, testability, modularity, and reusability have been treated as foundational qualities of good systems. They are deeply embedded in how we design architectures, review code, and evaluate technical decisions.

The assumption behind them is simple: if we optimize for these qualities, we will build systems that last longer, scale better, and are easier to evolve.

But that assumption was built for a different world.

A world where writing software was expensive. A world where change was slow. A world where mistakes were costly to recover from.

That world is disappearing, but our mental models have not caught up.

In previous articles, I explored how AI is not removing complexity, but shifting where it lives. Execution is no longer the bottleneck. The constraints have moved toward coordination, decision-making, and validation. The system did not get simpler. It just changed shape.

If you haven’t read them, this builds directly on:

Because once execution stops being the limiting factor, many of the qualities we have historically optimized for don’t just lose importance. They become the wrong optimization target.

Most “-ilities” were never ideals

We tend to talk about “-ilities” as if they were timeless engineering virtues. In reality, they were responses to constraints.

Maintainability exists because rewriting systems was historically expensive. If making changes was difficult, then the rational strategy was to design systems that could be safely extended over time. The goal was not elegance, but survival.

Reusability exists because duplication was costly. If writing the same logic twice meant additional effort, additional bugs, and additional maintenance overhead, then abstracting and centralizing logic became the obvious optimization.

Testability exists because confidence was hard to achieve. When systems were opaque and debugging was slow, the only scalable way to reduce risk was to introduce layers of validation.

These were not universal truths about good engineering. They were adaptations to a specific cost structure.

That cost structure has changed

With AI-assisted development, the cost of producing code has dropped significantly. Generating, modifying, and even rewriting large portions of a system is no longer a prohibitive activity. What used to require careful design upfront can now be explored iteratively at a much lower cost.

This changes the trade-offs in a fundamental way.

If rewriting is cheap, the value of maintainability decreases. If duplication is cheap, the value of reusability decreases. If tests can be generated automatically, the role of testability shifts.

This does not mean these qualities disappear. But it does mean they are no longer the dominant factors in system design.

One of the most overlooked consequences of this shift is that we are still optimizing for the constraints of the past. We are investing time and complexity into preserving systems that, in many cases, would be cheaper to regenerate than to maintain.

When rebuilding becomes cheaper than maintaining, maintainability stops being the primary concern.

The first cracks appear in practice

This is not just a theoretical shift. It is already visible in how systems behave in real organizations.

In one case, a company I spoke with was focused on evolving their core platform to support a new business model. The effort was estimated in months of refactoring, carefully preserving existing structures and abstractions. Instead, a full rewrite using AI was completed in two weeks and reached feature parity.

The implication was not just speed. It was that the original investment in maintainability did not pay off under the new cost structure.

The pattern tends to repeat. Systems were designed with long-term maintainability in mind, often introducing layers of abstraction and structure intended to make future changes easier. Engineers avoid touching certain parts of the system because they are too complex or too constrained.

However, as requirements evolve faster and timelines compress in the age of AI, those same structures become friction. What was designed to enable change starts to resist it.

The assumption that systems will be incrementally evolved over long periods of time is becoming less reliable, and in some cases, a fallacy.

The “-ilities” that AI is killing

If we take the shift in constraints seriously, some “-ilities” are not just losing importance. They are becoming actively misleading as primary goals.

Maintainability is the clearest example. It assumes that systems should be preserved and evolved over long periods of time. But when rebuilding is cheaper than understanding, preserving structure becomes a liability rather than an advantage.

Reusability follows the same pattern. It was designed to reduce duplication in a world where writing code was expensive. Today, duplication is often cheaper than coordination. Shared abstractions introduce dependencies, and dependencies slow teams down. What was once efficiency becomes friction.

Readability is another concept that starts to break under this new model. We have long optimized code for human consumption, assuming that engineers would spend significant time reading and understanding existing systems. But increasingly, machines are the ones generating, modifying, and even explaining code. Human readability is no longer the only, or even the primary, interface.

This does not mean readability disappears entirely, but its role changes. We are no longer optimizing code to be read line by line by humans. We are optimizing systems to be interpreted, transformed, and regenerated by machines.

Testability, while still relevant, is also shifting. The ability to generate tests is improving rapidly, but the ability to define what correctness means has not kept pace. We are automating validation without necessarily improving understanding. This creates a risk of false confidence at scale.

What all of these have in common is that they optimize for a world where execution was expensive. As that constraint weakens, so does their centrality.

The -ilities that remain

If we shift the focus from code to systems, a different set of concerns emerges. These are not tied to how code is written, but to how systems behave under continuous change.

Observability becomes critical because systems evolve faster than individuals can track. Understanding behavior in production becomes more valuable than understanding implementation details. When change accelerates, visibility becomes the only stable reference point.
Reliability remains essential because increased change frequency introduces more opportunities for failure. Faster delivery does not reduce risk; it amplifies it. Systems need to withstand continuous modification without degrading.
Security expands in scope as AI-generated systems introduce new forms of risk, including supply chain vulnerabilities, generated flaws, and unintended behaviors. The surface area increases even if the effort per change decreases.
Scalability also becomes more, not less, important. As the cost of building decreases, the number of systems, features, and interactions increases. Load is no longer just a function of users, but of system complexity and internal interactions. Systems must scale not only in traffic, but in the rate of change they can handle.
Evolvability becomes a more accurate framing than maintainability. The question is no longer whether a system can be maintained efficiently, but whether it can adapt continuously without collapsing under its own complexity.

The uncomfortable conclusion

Engineering has always been about managing constraints.

What has changed is which constraints matter.

If we continue to optimize for maintainability, reusability, and other code-level qualities as primary goals, we risk improving the wrong part of the system. We become more efficient at preserving systems that should be replaced.

The dominant challenges are no longer in writing software, but in deciding what to build, aligning on why it matters, and adapting systems as reality changes.

The most important “-ilities” are no longer properties of code.

They are properties of how organizations operate under increasing speed and complexity.

🗞️ Other articles you may like

✌️ That’s all folks

I love hearing from readers, and I’m looking for feedback. How am I doing with The Engineering Tax? Is there anything you’d like to see more or less? Which aspects of the newsletter do you enjoy the most?

Use the links below, or even better, hit reply and say hello. I’d love to hear from you!

Please don’t hesitate to connect with me on LinkedIn and send a message. I always respond to everyone!

Clean Architecture Is Dying How AI Is Killing Essential Software Patterns

Alvaro — Sat, 25 Apr 2026 12:58:06 +0000

For decades, we repeated a simple idea: code is read more than it is written.

So we optimized for readability. For naming. For clarity. For structure that could be navigated by someone who didn’t write the code.

That assumption is breaking.

Code is now generated more than it is written. It is traversed by machines before it is ever read by humans. It is modified, expanded, and reorganized by systems that do not need meaningful names or carefully crafted layers to understand what is happening.

Humans are no longer the primary readers of code. We are reviewers at best. We serve as auditors for a system that is becoming increasingly self-reliant.

Despite this, we persist in optimizing as if a future developer will manually navigate the system and comprehend every line.

We are optimizing codes for the wrong audience.

The industry still treats the code as literature.

It is increasingly closer to bytecode.

And because of that, a large part of what we call “good engineering” has quietly become unnecessary. Meaning, patterns, and languages are becoming less important.

The Cost Model Has Inverted

This transformation is the shift most people haven’t fully internalized.

In the old model, writing code was expensive. Changing it was risky. Running it was relatively cheap. So we optimized everything around reducing change. We introduced patterns to isolate impact. We introduced abstractions to avoid rewriting. We introduced processes to minimize risk.

In the new model, writing code is cheap. Changing it is cheap. Running it—at scale, continuously, globally—is where cost accumulates. So the optimization target changes.

We should no longer be asking, “How do we make the code easier to maintain?” . We should be asking, “How do we make the software cheaper to run, easier to validate, and faster to evolve—even if that means rewriting it?”

We spent decades optimizing the cost of writing code. We are entering a phase where the dominant cost is running it.

DDD, Hexagonal Architecture… Were Always About Fear

SOLID principles, Domain-Driven Design, ports and adapters, hexagonal architectures…we were taught these as markers of maturity. These were taught as indicators of engineering rigor. These were taught to distinguish amateurs from professionals.

But underneath, they all share the same assumption: we don’t know what will change, and when it does, it will be expensive.

So we prepare. We abstract. We decouple. We introduce interfaces that may never have more than one implementation. We create layers of indirection to absorb hypothetical futures.

We built systems that are easy to change because changing them was hard.

AI changes that equation. We are now operating in a world where regeneration is getting cheaper faster than abstraction.

Abstractions exist to preserve structure over time. They reduce the need to change codes by anticipating variations.

Regeneration does the opposite. It embraces change. It assumes that structure is temporary and can be recreated as needed.

We Now Care More About Product Quality, Not Code Quality

We haven’t stopped caring about quality. But it’s moving from structure to behavior. The focus has shifted from code elegance to system outcomes.

Validation shifts toward runtime instead of relying entirely on compile-time guarantees. Observability becomes more important than internal structure, because what matters is not how the system is built, but how it behaves under real conditions.

We move from designing systems that are easy to understand to systems that are easy to evolve.

The source of truth shifts.

It is no longer the code.

It is the behavior of the system in production.

TypeScript, C#, and “Developer-Friendly” Languages Were a Local Optimum

We chose our languages based on developer experience.

TypeScript, C#, Python, Java… All of them share a common goal: make it easier for humans to write and maintain code.

That was the right optimization—when humans were the ones writing most of it.

Now the cost of writing code is collapsing. The marginal effort of producing another implementation, variation, or approach is close to zero when AI is in the loop.

So the axis shifts.

If developer time is no longer the dominant cost, then optimizing for developer ergonomics is no longer the dominant strategy.

What starts to matter is not how easy code is to write but how it behaves when it runs.

We spent the last two decades optimizing the authoring experience, but we are entering a phase where the dominant cost is execution.

The Return of Low-Level Thinking

This phase is where things get uncomfortable.

If code is cheap to generate and cheap to change, then performance, memory efficiency, and runtime predictability become more important.

Not because systems got bigger. But because the cost structure shifted.

The trade-off flips. We no longer optimize for developer time. We optimize for machine time.

And that naturally pulls us toward languages and paradigms we spent years abstracting away from.

We might return to low-level languages not because developers improved, but because developers matter less in the loop.

The Part Nobody Wants to Admit

A large part of modern software engineering exists to compensate for human limitations.

We built patterns to help us think. Languages to help us express intent. Architectures to help us avoid mistakes.

Those constraints were real. They shaped an entire discipline. But they are not as dominant as they used to be. If the constraints change, the practices built around them do not automatically remain optimal.

Clean Architecture didn’t become wrong.

SOLID didn’t become useless.

TypeScript didn’t become a bad language.

They became expensive.

And most of the industry hasn’t noticed yet.

🗞️ Other articles you may like

✌️ That’s all folks

Use the links below, or even better, hit reply and say hello. I’d love to hear from you!

Please don’t hesitate to connect with me on LinkedIn and send a message. I always respond to everyone!

Engineering After AI 3 Ways to Fix the Real Bottlenecks in Modern Teams

Alvaro — Sat, 25 Apr 2026 12:58:04 +0000

Execution is no longer scarce. It has been compressed by years of tooling improvements and, more recently, by AI. The cost of producing software continues to fall.

What has not changed is everything around it.

Decisions are still slow.
Validation is still uncertain.
Alignment is still expensive.

This creates a structural imbalance: the system can now produce more than it can meaningfully process.

At that point, improving speed stops being useful and just hinders the system.

What matters is how the system decides what deserves to be scaled.

1. Optimize for Learning Velocity, not Delivery Speed

Speed only creates value if it is connected to learning. Shipping faster does not matter if the system cannot determine whether what was shipped was correct. The real loop is not:

build → ship → repeat

It is:

decide → build → learn → adjust

And in many organizations, the “learn” step is the weakest.

Feedback is delayed, indirect, or disconnected from the original decision. By the time signals arrive, multiple layers of work have already been built on top of unvalidated assumptions. The system moves quickly, but without a tight connection to reality.

Improving this does not require more data, but better alignment between decisions and feedback.

Every initiative should define, upfront, what change it expects to create—whether in user behavior, system performance, or business outcomes. Feedback mechanisms should be designed to observe that change as directly as possible. When that is not feasible, uncertainty should be made explicit rather than ignored.

Research on software delivery performance consistently shows that high-performing teams are defined not just by speed, but by how quickly they can detect and recover from mistakes.

In a world of cheap execution, the advantage is not who builds faster.

It is who learns faster.

2. Design for Flow Efficiency, not Resource Efficiency

Most organizations are still optimized for keeping people busy. Maximizing utilization. Filling roadmaps. Ensuring constant activity.

This made sense when execution was expensive—when writing, testing, and shipping software required significant effort and time. But that constraint is fading. AI has reduced the cost of execution dramatically.

What hasn’t changed is how organizations are designed. Activity is no longer scarce. Attention is.

And yet, many teams continue to optimize for utilization, as if idle time were the primary risk. It isn’t. The real problem is not idle engineers—it is work that doesn’t move.

Work that sits in queues. Waiting for decisions. Waiting for alignment. Waiting for context. Waiting to be understood. This is where most time is lost.

Flow efficiency shifts the lens. Instead of asking whether people are busy, it asks whether work is progressing smoothly through the system. Whether ideas become outcomes without unnecessary friction.

This leads to very different design choices.

Smaller batches, so uncertainty and decisions surface earlier instead of accumulating. Fewer parallel initiatives, so attention is not fragmented across competing priorities. Reduced handoffs, so context is preserved and rework minimized. Clear ownership, so work does not stall in ambiguity or shared responsibility.

These are not optimizations of effort—they are optimizations of movement.

Research in lean systems and software delivery consistently shows that reducing work-in-progress and shortening queues improves system performance disproportionately. Not by increasing output, but by eliminating waiting time and coordination overhead.

Because when execution becomes fast, queues become the dominant constraint.

And most organizations are full of invisible ones.

3. Institutionalize the Ability to Say No

When execution becomes cheap, the default response is to build more. More features, more initiatives, more surface area. Work enters the system easily, almost automatically, because the cost of saying “yes” has collapsed.

But the cost of carrying what you build has not.

Every addition introduces complexity. Dependencies increase, constraints accumulate, and the system becomes harder to evolve. These costs are not immediate, which is why they are often ignored. Over time, the system doesn’t fail because it lacks capability—it fails because it has too much of it.

At that point, speed stops helping. You can still ship, but each change requires more coordination, more context, more effort. The system moves, but with increasing friction.

This is rarely framed as a consequence of saying “yes” too often, but that is exactly what it is.

In most organizations, saying no is informal. It depends on individuals, moments, and negotiation. That makes it inconsistent. Some things are rejected, many are not, and very little is ever removed.

Without removal, complexity only grows.

Research on cognitive load consistently shows that beyond a certain point, more options and features degrade both usability and decision quality. More is not neutral. It actively makes the system worse.

In a world where building is cheap, the constraint is no longer creation.

It is how much you allow into the system—and how much you are willing to remove.

Conclusion

Execution now scales easily. That is no longer where advantage comes from.

What defines the system is how quickly it learns, how smoothly work flows, and how much unnecessary complexity it avoids. These are not improvements on top of the system—they are what the system is now built around.

Organizations that continue to optimize for output will produce more.

Organizations that adapt will produce less—but with far greater clarity and impact.

Because once building becomes cheap, the real discipline is not in what you create. It is in what you choose not to carry forward.

The Real Bottleneck in Engineering Why AI Didnt Fix What Slows Teams Down

Alvaro — Sat, 25 Apr 2026 12:57:53 +0000

For years, we optimized engineering speed.

We invested in better tooling, faster CI/CD pipelines, cleaner architectures, and platform engineering capabilities that reduced friction across the delivery lifecycle. Entire organizations reorganized around improving developer productivity, shortening lead times, and increasing deployment frequency. The assumption was simple: if we could make engineering faster, everything else would follow.

And for a long time, it did.

But something subtle has changed.

Today, teams can build faster than ever before. AI has further reduced the cost of execution, compressing hours of work into minutes and making exploration almost free. Yet despite this acceleration, most organizations are not seeing a proportional improvement in outcomes. Delivery performance is not dramatically better. Quality is not consistently higher. In some cases, stability is even degrading.

That mismatch is not accidental.

It’s a signal that we are still optimizing for a constraint that does not exist.

Engineering is no longer the bottleneck

For most of modern software development, execution was expensive.

Writing production-grade code required time, coordination, and deep expertise. Testing, integrating, and deploying safely introduced additional layers of friction. Improving these areas had a direct and measurable impact on performance, which is why so much focus was placed on optimizing them.

Frameworks like DORA reinforced this approach by showing how capabilities such as deployment frequency, lead time, and change failure rate correlated with high-performing teams. The industry responded accordingly—investing heavily in automation, DevOps practices, and platform engineering to remove bottlenecks in delivery.

Recent observations across teams show that even as execution becomes significantly faster—especially with AI assistance—system-level performance does not improve at the same rate. In fact, some studies suggest a more complex picture: while developers report feeling up to 55% faster when using AI tools, controlled experiments on complex tasks show that experienced engineers can actually become slower, as they spend additional time validating, correcting, and integrating AI-generated outputs. At the same time, broader delivery metrics show little to no improvement in throughput and, in some cases, a decline in stability.

This contradiction points to a deeper reality:

When improving one part of a system no longer improves the whole, it usually means the constraint has moved elsewhere.

The real bottleneck: coordination and decisions

Modern engineering is not a pipeline—it is a network.

Value is not created in isolation by writing code but through a continuous flow of decisions across multiple domains: product defining priorities, design shaping user experience, security enforcing constraints, compliance defining boundaries, and business stakeholders driving commercial direction. Every meaningful change in the system crosses these boundaries.

And at each boundary, coordination is required.

Research into organizational dynamics has consistently shown that as companies scale, collaboration overhead increases non-linearly. Studies on “collaborative overload” highlight how a small percentage of individuals, often in central roles like engineering leads and senior developers, become bottlenecks because they sit at the intersection of too many dependencies. At the same time, research on attention residue demonstrates that frequent context switching significantly degrades cognitive performance, reducing the quality of decision-making and increasing error rates of those individuals.

In practice, this means that a growing portion of engineering time is not spent building but navigating communication: meetings, alignment discussions, clarifications, and trade-offs.

This is where the real work happens.

Deciding what to build. Aligning on why it matters. Understanding how it fits into an evolving system. And validating it affects the correct metrics in the expected way.

These activities are inherently slower, more ambiguous, and harder to optimize than execution. And unlike code generation, they cannot be easily automated.

This has always been and will continue to be a bottleneck, as it’s a system absorption problem and not a procedural optimization.

AI didn’t remove complexity—it accelerated exposure to it

AI has not simplified this system. It has intensified it. By dramatically reducing the cost of execution.

AI increases the rate at which teams can generate output. More ideas are explored, more features are started, and more changes are introduced into the system. On the surface, this looks like progress. But each of those outputs carries a hidden cost: it requires decisions, alignment, and validation.

And those layers have not accelerated because:

Customer feedback still takes time.
Understanding real impact still takes time.
Aligning multiple stakeholders still takes time.

So what emerges is a structural mismatch:

Execution operates at one speed.
Decision-making and validation operate at another.

The result is predictable.

Teams hit the coordination bottleneck more frequently. More work enters the system than the organization can properly process. Context switching increases. Alignment becomes fragmented. And decision quality begins to degrade under pressure.

We did not remove complexity. We increased the rate at which we collide with it.

Conclusion: The constraint is now cognitive, not technical

For years, we thought the limiting factor in engineering was only technical.

How fast we could build. How safely we could deploy. How efficiently we could execute.

Today, that has been shown not to be the truth. The constraint is something less visible but far more fundamental: our ability to make good decisions, align across boundaries, and validate before scaling.

And most organizations are not designed for this.

They are still optimized for a world where execution was expensive and slow. A world where the primary challenge was building the thing.

That is no longer the challenge.

Now, the challenge is knowing whether the thing we are building should exist at all—and proving it before we scale it.

The Illusion of Speed Why AI Is Making Teams Fasterbut Not Better

Alvaro — Sat, 25 Apr 2026 12:57:43 +0000

Two weeks ago, I built an MVP for StrengthsOS in under 12 days. At the same time, I started rewriting Octolaunch from scratch. That’s not the interesting part.

The interesting part is that this is becoming normal.

What used to feel like exceptional productivity is quickly turning into baseline. Features that once required days of focused work now emerge in hours. Entire systems can be scaffolded in a single sitting. The barrier between idea and implementation has almost disappeared.

And if you look at most engineering teams right now, the signals all point in the same direction: more output, faster cycles, and shorter time to code.

So why does it feel like nothing is actually improving?

Output is exploding; outcomes are not

Across teams, the pattern is hard to ignore.

Developers are shipping more code than ever before. Pull requests are increasing, backlogs are moving faster, and the visible indicators of productivity are trending up. On dashboards and status reports, it looks like acceleration.

But when you step back and look at the system as a whole, the picture becomes less convincing.

Quality is not improving in a meaningful way. Delivery performance remains largely unchanged. In many cases, rework is quietly increasing. The system absorbs more activity, but it does not translate into better outcomes.

This is not just anecdotal. Early data is starting to surface a contradiction. Some studies show developers feel significantly more productive—reporting perceived speed increases of over 50% when using AI tooling. At the same time, controlled experiments on complex tasks show performance can actually degrade, particularly for experienced engineers. And broader delivery metrics show little to no improvement in throughput, with stability in some cases declining.

At the individual level, everything feels faster. At the system level, nothing really moves.

That is the paradox.

AI optimized execution, not decisions

The reason becomes clearer when you separate two layers that are often conflated: execution and decision-making.

AI has dramatically improved execution. Writing code, generating tests, exploring implementations—these activities are now cheaper, faster, and easier than ever before. The friction that once slowed down development has largely been removed.

But the hardest part of engineering was never execution.

It was deciding what to build, why it matters, and how it fits into a broader system of constraints. It was aligning multiple stakeholders with different incentives. It was designing systems that could evolve without collapsing under their own complexity.

None of that has changed.

Modern engineering is not a linear pipeline; it is a network of interdependent decisions spanning product, design, security, compliance, and operations. Value is created—or destroyed—at the boundaries between these domains. And those boundaries are still governed by coordination, judgment, and trade-offs.

Research consistently shows that as organizations scale, these coordination costs grow faster than the teams themselves. More stakeholders, more dependencies, more alignment overhead. In many environments, the majority of time is already consumed by communication, meetings, and context switching rather than actual building.

AI does not remove this complexity.

It simply makes it easier to execute whatever decision has already been made.

You can now scale bad decisions

This is where the dynamic becomes dangerous.

When execution was expensive, it didn’t just slow teams down—it created space for validation. Building something took time, and that time allowed reality to catch up. Teams could observe how features behaved in the wild, understand trade-offs, and adjust direction before moving forward.

That constraint is now gone. Execution is nearly instantaneous, but validation is not. Customer feedback still takes time. Business impact still takes time. Understanding whether something actually works still takes time.

So we’ve created a mismatch. Decisions are made quickly. They are implemented even faster. But they are validated at the same speed as before.

That means teams are now chaining decisions that haven’t proven themselves yet. A feature is extended before its value is clear. A direction is reinforced before it’s tested. An assumption becomes a roadmap before it becomes evidence.

What used to be a sequence of build → learn → adjust is quietly turning into build → build → build. We are not just scaling bad decisions. We are scaling unvalidated ones. And the more we accelerate execution without closing that validation loop, the weaker the connection becomes between what we build and the value it creates.

More output doesn’t mean more impact. It just means we are moving faster on assumptions we haven’t yet earned the right to trust.

Conclusion: The bottleneck has moved

For years, engineering was constrained by execution. Now it isn’t. We can build faster than ever before. But we cannot validate, align, or decide any faster than the systems around us allow.

And that creates a new kind of bottleneck.

Not in code. Not in tooling. But in understanding.

The ability to decide what matters. The ability to align around it. The ability to validate it before scaling it.

That is now the limiting factor.

And most teams are still optimizing for the layer that stopped being the constraint.

In the next issue, I’ll break down where the real bottleneck is hiding—and why most organizations are not designed to handle it.

The Moment Process Starts Eating Your Day

Alvaro — Sat, 25 Apr 2026 12:57:43 +0000

Introduction

As organizations scale, governance expands. Reporting structures multiply, compliance requirements mature, alignment rituals increase, and cross-functional touchpoints become more frequent. None of this is inherently problematic. In fact, process often emerges to reduce chaos and increase predictability.

However, there is a tipping point.

At a certain stage of organizational growth, engineering leaders begin to notice a structural shift: the majority of their time is no longer invested in enabling value creation or shaping long-term direction. Instead, it is absorbed by coordination, documentation, reporting, and operational synchronization.

This is the moment the process starts eating the day. The risk is not that the process exists. The risk is when the process begins to systematically displace strategic and value-generating work.

The paradox is clear: process is introduced to support value creation, yet beyond a certain threshold, it competes with it.

Governance in Growing Environment Normally Becomes Self-Reinforcing

As organizations grow, process layers are often added in response to past incidents or perceived risk. Rarely are they removed. Over time, leaders inherit overlapping rituals, redundant reporting, and duplicated reviews.

Research on digital transformation from McKinsey indicates that complexity is a primary cause of execution failure in scaling organizations. High-performing digital companies actively simplify decision-making and reduce unnecessary procedural friction.

When process expansion is not accompanied by intentional simplification, it gradually dominates leadership bandwidth as governance complexity compounds over time.

Engineering leaders become coordinators of governance rather than designers of systems.

Value Work vs. Process Work

To understand the tipping point, it is useful to distinguish between two categories of leadership effort:

Value-oriented work includes:

Architectural & strategic direction setting
Technical & non-technical prioritization
Developer experience improvement
Talent development and mentoring
Long-term system design

Process-oriented work includes:

Status reporting
Governance documentation
Compliance questionnaires
Budget control & updates
Recurrent alignment meetings
Escalation handling

Both categories are necessary. However, only one category compounds long-term engineering capability.

Beyond the time allocation, there is also a cognitive dimension to this problem. As value-oriented tasks require uninterrupted cognitive bandwidth. When leadership calendars are saturated with 30-minute blocks dedicated to reporting, coordination, and alignment, deep thinking becomes fragmented.

The result is not immediate failure, but a slow erosion of strategic clarity:

Architectural decisions become reactive.
Technical debt accumulates incrementally.
Innovation slows.
Platform investments are deferred.

The organization continues operating, but its engineering foundation weakens. In engineering leadership, process overload does not merely consume time; it reduces strategic depth.

The Strategic Opportunity Cost

The most significant cost of process saturation is opportunity cost.

Time invested in incremental reporting is time not invested in:

Platform modernization
Reliability engineering
Developer productivity tooling
Technical innovation
Architectural resilience

These investments may not generate immediate visibility, but they determine long-term competitiveness.

DORA research demonstrates that elite-performing engineering organizations achieve both higher velocity and stronger stability. This dual capability is not accidental; it results from consistent investment in foundational engineering practices.

If leadership capacity is absorbed entirely by governance management, the organization risks optimizing for short-term coordination while sacrificing long-term capability.

Restoring Balance: Designing for Strategic Capacity

The solution is not eliminating the process. Governance is necessary in complex systems. The objective is balance.

Periodic process audits are important. Every recurring meeting and reporting requirement should be evaluated against a simple question: does this materially improve decision quality or reduce meaningful risk?

If not, it should be simplified or removed.

Protecting strategic bandwidth is not optional. It is a prerequisite for sustainable engineering performance.

Conclusion

The moment the process work starts eating the day is rarely dramatic. It is incremental. A few more meetings. Additional reporting cycles. Expanded review layers. Increased cross-functional synchronization.

Over time, the cumulative effect is substantial.

Engineering leadership requires governance, but it also requires protected space for value creation and strategic direction.

When process displaces strategy, long-term engineering capability erodes.

The responsibility of engineering leadership is not to resist governance but to prevent it from overwhelming the system it was designed to support.

Sustainable organizations do not eliminate process. They design it intentionally and defend the time required to build the future.

Forem

Teaching AI Your Trade: Automating Proposals with Precision

The Core Principle: Codify Your Trade Knowledge

The Foundation: Your Master Materials Spreadsheet

Three Steps to Implementation

Key Takeaways

Tian AI Architecture Deep Dive: Building a Multi-Engine AI System

Tian AI Architecture Deep Dive: Building a Multi-Engine AI System

Project Architecture Overview

1. Thinker — The Three-Layer Reasoning Engine

Fast Mode (Default)

Chain-of-Thought Mode

Deep Mode

2. Knowledge Retriever — RAG Implementation

Database Architecture

Retrieval Flow

Performance

Key Design Decisions

3. Agent Scheduler — TaskQueue + Security Whitelist

TaskQueue with Dependency Sorting

Security Whitelist

4. Self-Evolution System — AST Analysis + Auto-Patching

AST Analysis Pipeline

Auto-Patching System

The Full Evolution Loop

5. LLMManager — Process Lifecycle Management

6. PromptCache — LRU + TTL Caching Strategy

Project Statistics

What's Next

Getting Involved

AI Is Becoming Infrastructure

The Pattern We Keep Repeating

The Christmas Tree Problem

What Infrastructure Actually Looks Like

The FOMO Trap

Six Months From Now

I watched AI Agents Take Over the Cloud Live from Google NEXT '26, and Nothing Will Be the Same

The Morning Everything Shifted

What Even Is an "Agentic Cloud"

The Announcements That Stopped Me Mid-Coffee

1. Vertex AI is Dead. Long Live the Gemini Enterprise Agent Platform.

2. Meet Project Mariner: The Agent That Browses the Web For You

3. The Chip That Powers It All: 8th Gen TPU

4. Gemini Enterprise: One Product, Every Employee

The Part Nobody Is Talking About: MCP Servers

My Honest Take: What This Means for Developers Like Us

The Things I Am Still Watching

Getting Started Right Now

For Agent Builders

For ML Engineers

For Security Teams

A Personal Note

Borrowed Strings: API Designs That Cut 94% of Allocations

Borrowed Strings: API Designs That Cut 94% of Allocations

The 6ms latency improvement from one character change — how &str over String transformed our hot path performance

The String Ownership Tax

Pattern #1: &str for Read-Only Operations

Pattern #2: AsRef for Maximum Flexibility

Pattern #3: Cow<’_, str> for Conditional Ownership

Pattern #4: String Interning for Repeated Values

Pattern #5: Zero-Copy Parsing with Borrowed Slices

Pattern #6: Smart String Builders

Pattern #7: Lifetime-Aware Return Types

The Lifetime Complexity Trade-off

The Benchmarking Methodology

Decision Framework: When to Borrow vs Own

The Real-World Production Impact

Common Pitfalls We Hit

The Long-Term Lesson

9 Tools Big Tech Uses Internally (Now Open Source)

Table of Contents

1) Backstage — Spotify's developer portal, now the IDP standard

2) Temporal — Uber's durable workflow engine for code that can't fail mid-run

3) Vitess — YouTube's MySQL sharding layer, now powering PlanetScale

4) Envoy — Lyft's proxy that became the foundation of the service mesh market

5) OpenFGA — Auth0's Zanzibar-style fine-grained authorization

6) pompelmi — The zero-dep file scanner every serious prod team builds internally and never ships

7) Turborepo — Vercel's monorepo build system with remote caching

8) OpenTelemetry Collector — The observability pipeline every cloud provider adopted

9) Buf — Protobuf tooling that makes gRPC schema management survivable