close
jthomas.site// notebook · v.4.2026
note 001 · Engineer · Researcher · Essayist

Emergent behavior, applied to the largest training systems on Earth.

I work on the data that shapes Gemini. I write about the agents that will replace it.

Book a consult Read recent essays
10B+
events / day in production
17
years shipping distributed systems
4
peer-reviewed publications
§ argument 2008 → 2026

A throughline, in three notes.

  1. Built robots that found odor sources by talking to each other. Master's thesis at IISc, under Debasish Ghose.
  2. The work became US Patent 8,838,271 — swarm-optimization algorithms for nuclear-spill detection.
  3. The same primitives — emergence, local rules, fitness landscapes — are how I think about agentic AI, ARC-AGI, and the data systems behind Gemini.
“Complex, intelligent behavior emerges from simple rules applied at scale — the same principle behind modern distributed AI systems.”
— invited talk, IDSIA, Switzerland ·
§ writing 03 · view archive · rss

Recent essays.

autoresearch

When AI Becomes Its Own Scientist

Inside the Evolution Arena and the rise of autoresearch.

An AI agent that proposes its own experiments, runs them against a live 2D simulation, scores the result with a hard mechanical metric, and commits or reverts on its own. A ratchet that only moves forward.

swarm intelligence

When Swarms Write Code

How particle swarm optimization escapes the local-minima trap in ARC-AGI.

Standard LLM agents get stuck. The fix: a PSO-governed swarm of specialized LLM particles with a continuous fitness function that rewards near-misses. The swarm provides strategy; the LLM provides syntax.

fine-tuning

Stop Wrestling with Boilerplate

Local Tinker — a clean API for local LLM fine-tuning.

A Tinker-style API for LoRA fine-tuning of 1B–13B LLMs on your own GPU. Four primitives cover SFT, DPO, PPO, and GRPO — without the HuggingFace + PEFT + bitsandbytes boilerplate.

§ selected work 03 / 12

Three projects, one throughline.

Google Cloud · 2019 → present

CrowdCompute & DQaaS

Backend storage and data integrity for the engine generating RLHF and SFT datasets that train Google's foundation models. Plus a low-latency LLMOps platform for prompt versioning at enterprise scale.

GeminiRLHFLLMOpsgRPC
Amazon · 2017 → 2019

DataCraft

Architected and shipped a centralized ingestion platform processing 10B+ events/day on Kinesis, Lambda, and S3. Resilient by default; the data lake's front door.

KinesisLambdaS3Data infra
IISc · 2006 → 2008 · still cited

Glowworm Swarm Optimization

Master's thesis on multi-source odor localization with swarm robotics — the work behind US Patent 8,838,271 for nuclear spill detection. Cited 12× and the foundation for everything I do now.

Swarm intelligencePatentResearch
§ bibliography 06 entries

Patents, publications & talks.

  1. [01] Detection of Nuclear Spills Using Swarm Optimization Algorithms
    D. Ghose, J. Thomas, K.N. Krishnanand · US Patent 8,838,271 · Patent
    11 cites
  2. [02] Strategies for Locating Multiple Odor Sources using Glowworm Swarm Optimization
    J. Thomas, D. Ghose · IICAI, pp. 842–861 · Conference
    12 cites
  3. [03] A GSO-Based Swarm Algorithm for Odor Source Localization in Turbulent Environments
    J. Thomas, D. Ghose · Handbook of Approximation Algorithms and Metaheuristics, 2nd Ed., pp. 711–737 · Book Chapter
    1 cite
  4. [04] Odor Source Localization using Swarm Robotics
    J. Thomas · Master's thesis, Indian Institute of Science, Bangalore · Thesis
    3 cites
  5. [05] Industry Talk: Data Science Road
    University of New Brunswick · Invited Talk
  6. [06] Odor Source Localization using Swarm Robotics
    IDSIA, Switzerland · Invited Talk
§ curriculum vitae 06 roles

Experience.

– present
Software EngineerNOW
Google · Google Cloud · Mountain View, CA
SDE — Big Data Technologies
Amazon · Palo Alto, CA
Data Scientist
Datanyze · San Mateo, CA
Energy Analytics SDE
Ascend Analytics · Oakland, CA
Senior Analyst — R&D
Global Analytics · Chennai, India
Associate — R&D
Idea R&D · Pune, India
§ schooling 02 degrees

Education.

GPA 3.5/4.0

MEng, Computer Science

Cornell University
Machine Learning · NLP · Algorithms
GPA 7.0/8.0

MSc (Eng), Aerospace Engineering

Indian Institute of Science
Thesis: Odor Source Localization using Swarm Robotics
§ open source 04 of many

Things on GitHub.

arc-agiPython

Multi-agent systems for the ARC-AGI challenge — abstract reasoning frontier.

arc-agi-searchPython

Swarm-based search applied to ARC-AGI — bio-inspired optimization for reasoning.

recursive-self-improvementPython

RSI framework for LLMs — iterative capability enhancement.

deepsnakePython

Deep Q-Network agent that learns to play Snake from raw reward.

§ about the author

A short biography.

Joseph Thomas, portrait

I started as an aerospace engineer designing intake systems for missiles, became a swarm-robotics researcher under Debasish Ghose at IISc, picked up a US patent and a Cornell MEng on the way, and have spent the last decade shipping data infrastructure at Amazon and Google.

Today I work on the data engine behind Google's foundation models. Off-hours, I write essays and run experiments on multi-agent reasoning — the long tail of the swarm research, in a new vocabulary.

Saratoga, California · Senior Member, IEEE · FBCS

§ service 05 active

On editorial boards & program committees.

Also: Senior Member of IEEE · FBCS · Hackathon Raptors Fellow · IEEEXtreme Proctor · GATE 99th-percentile scholar. Mentor at Google Tech Exchange, Founder Institute, and Opportunity Machine.