Forem

Seeded Universe Recreation Engine: Building a Deterministic Universe Timeline from One Seed

Gary Doman/TizWildin — Fri, 15 May 2026 01:00:51 +0000

Seeded Universe Recreation Engine: Building a Deterministic Universe Timeline from One Seed

I’m building Seeded Universe Recreation Engine, a deterministic seed-based universe simulation project.

The core idea is simple but ambitious:

one canonical seed
→ physics
→ stars
→ planets
→ atmospheres
→ oceans
→ geology
→ chemistry
→ life
→ civilisation
→ signal detection
→ ARC receipts
→ branch-comparable timelines

The project is designed around a doctrine where the universe is not manually forced into outcomes. The seed defines the canonical timeline, physics unfolds from that seed, and interventions must be receipted instead of silently rewriting causality.

What the project is

Seeded Universe Recreation Engine is a browser-based deterministic universe simulator with an optional Python/FastAPI ARC backend.

The current system combines three major pieces:

Universe Engine v16
Synth Origin / Proto-Synth Grid Engine
Universe Bridge v1
ARC-Core receipt and ledger backend

Together they create a split-screen master-control environment where the universe simulation and the synth/observer system can communicate without breaking causality.

Universe Engine v16

The Universe Engine is the deterministic simulation layer.

From one seed, the engine unfolds a traceable universe containing:

stars
planets
atmospheres
oceans
geology
chemistry
life checks
evolution paths
civilisations
signal signatures
intervention branches

The model includes physics concepts such as:

Stefan-Boltzmann temperature
Jeans escape atmospheres
water phase diagram checks
Kepler-style orbital structure
tidal locking
radioactive heating
supernova enrichment
Kardashev civilisation detection
64-bit genome encoding
autocatalytic first-replication events

The point is not to hand-place life or civilisation.

The point is to let a deterministic seed produce a traceable universe state.

Zoom stack

The universe view is organized into zoom levels:

L0 → Cosmos / full universe
L1 → Galaxy cluster
L2 → Stellar system
L3 → Planet surface
L4 → Region cross-section
L5 → Molecule field
L6 → Atom patch
L7 → Synth Center / universe origin eye

The zoom stack matters because the project is not only a visual demo. It is meant to show a universe that can be explored across scale.

From cosmos to atoms, the goal is a continuous seeded timeline.

Synth Origin

The Synth Origin layer comes from the Proto-Synth Grid Engine direction.

In this universe project, the synth sits at the center as the signal instrument.

It acts as:

master control eye
scanner surface
signal router
blueprint-driven execution shell
communication backbone
ARC-gated authority surface

In universe mode, the synth scanner can detect civilisation contacts from the universe state.

The synth’s signal network then becomes the communication backbone for universe events.

Universe Bridge v1

The Universe Bridge connects the universe simulation and the synth system without breaking causality.

The bridge flow is:

Universe state
→ bridge extraction
→ civilisation contacts
→ synth scanner feed
→ synth signal events
→ universe receipt

The bridge logs crossings and keeps the interaction traceable.

That means the synth can observe and signal without silently mutating the canonical universe.

ARC-Core backend

The optional ARC backend provides a receipt and ledger layer.

A typical local backend setup is:

pip install fastapi uvicorn pydantic
python launch.py

The backend direction includes:

universe record ledger
tamper-evident receipt chain
branch simulation
REST endpoint surface
intervention evidence
origin record tracking

The repo’s architecture frames ARC-Core as the system that records truth, receipts, and branch outcomes.

TT-101 Doctrine

The project follows six core TT-101 rules:

1. Seed canonical — the seed is never changed to force outcomes.
2. Causality absolute — no signal travels faster than c_sim.
3. Energy conserved — ΔE_total = 0 always.
4. Intelligence emergent — life cannot be hardcoded, only arise from physics.
5. Interventions receipted — every perturbation is logged in ARC.
6. Branch comparable — a modified universe never replaces the canonical timeline.

This doctrine is the most important part of the project.

It means the simulation is not just about visuals. It is about traceability, causality, receipts, and controlled branching.

Why branch comparison matters

In a normal simulation, changing a value can overwrite the timeline.

In Seeded Universe Recreation Engine, an intervention should create a comparable branch.

That means:

canonical universe remains intact
intervention creates branch
branch stores divergence
branch can be compared
receipts explain what changed

This makes the project more like a deterministic timeline laboratory than a simple sandbox.

Master Control

The top-level launcher is MasterControl.html.

It provides:

split view between universe and synth
universe-only mode
synth-only mode
synth-center jump
bridge test pulse
ARC console access
draggable split panels

The point of Master Control is to make the system observable from one surface.

File structure direction

The repo includes major pieces such as:

MasterControl.html
launch.py
universe_bridge.js
sure/universe_observer_v16_vision.html
synth/index.html
ARC_Console/

The architecture connects them like this:

MasterControl.html
├─ Universe Engine v16
├─ Universe Bridge
├─ Synth Origin
└─ ARC-Core

Why this matters

Seeded Universe Recreation Engine is exploring a larger question:

Can a deterministic seed-based world be made traceable from cosmic scale down to chemistry, life, intelligence, signal detection, and intervention receipts?

That makes the project useful as an experimental foundation for:

universe simulation
deterministic timelines
procedural world generation
AI observer systems
seeded replay
emergent-life modeling
branch-comparable experiments
local-first scientific visualization
ARC-style receipt ledgers
Synth/observer interfaces

Repo

https://github.com/GareBear99/Seeded-Universe-Recreation-Engine

What I’m looking for

I’m looking for feedback from:

simulation developers
procedural generation developers
game engine developers
physics/math people
AI researchers
local-first software builders
JavaScript developers
Python/FastAPI developers
worldbuilding/tooling developers
people interested in deterministic timelines

Useful feedback includes:

physics model suggestions
seed/replay architecture feedback
zoom-stack design ideas
branch comparison design feedback
ARC receipt format suggestions
Universe Bridge feedback
Synth Origin integration feedback
performance ideas
visual clarity improvements
docs/onboarding suggestions

Long-term direction

The long-term direction is a deterministic universe recreation engine where the whole world can be traced back to a canonical seed.

Not just procedural noise.

Not just a pretty universe view.

A seed-rooted, branch-comparable, receipt-backed simulation where physics, life, civilisation, observation, and intervention all remain traceable.

Related ARC / Synth Ecosystem Repos

Seeded Universe Recreation Engine is part of a larger local-first ARC/Synth research ecosystem.

Related projects:

ARC-Neuron LLMBuilder — local-first AI model lifecycle, benchmark receipts, candidate/incumbent promotion, and dataset-connected model growth.

https://github.com/GareBear99/arc-neuron-llmbuilder-v1.0.0
ARC-Core — authority, receipts, event ledger, replay/rollback, and governed runtime control plane for ARC-style systems.

https://github.com/GareBear99/ARC-Core
Proto-Synth Grid Engine — deterministic 2D simulation projected visually as 3D, blueprint geometry, Neural-Synth view, Voxel Directory, and programmable world/runtime surfaces.

https://github.com/GareBear99/Proto-Synth_Grid_Engine
Neo-VECTR Solar Sim NASA Standard — seeded solar-system simulation direction with NASA-style physics framing, orbital structure, planetary state, and simulation validation goals.

https://github.com/GareBear99/Neo-VECTR_Solar_Sim_NASA_Standard
TT-101 Handbook — doctrine layer for seeded universe handling, emergent life, communication ethics, signal bridging, and intervention rules.

https://github.com/GareBear99/TT-101_Handbook
ARC Language Module — governed multilingual backend for language graph, routing, readiness, coverage reports, and future AI communication layers.

https://github.com/GareBear99/arc-language-module
ARC-StreamMemory — local-first visual memory spine for AI-readable footage, screenshots, frame hashes, module attachments, and receipt-backed visual replay.

https://github.com/GareBear99/ARC-StreamMemory

Together, these repos form the larger architecture around deterministic simulation, local-first AI memory, governed receipts, language routing, visual replay, and Synth-style runtime interfaces.

Applied Scientist Skills Companies Want in 2026: A comprehensive analysis on 3,146 active postings

Gnana — Fri, 15 May 2026 00:55:52 +0000

The Applied Scientist Title Hides Two Very Different Roles

"Applied Scientist" reads like a single job title, but it isn't. Inside the same keyword sit at least two distinct roles: the product-science flavor (experimentation, causal inference, A/B testing, recommendation systems) that lives at consumer tech companies, and the research-lab flavor (biostatistics, clinical research, biotech R&D, applied physics) that lives at universities, hospitals, and pharma. In the live market, the second flavor is more common than most candidates expect.

To put numbers on it, we looked at every active Applied Scientist posting on the InterviewStack.io job board as of May 2026: 3,146 listings, with skills extracted from descriptions and synonyms collapsed (so ETL and data pipelines count once, GCP and Google Cloud count once).

The most distinctive structural feature of the role: no single skill clears the 50% line. The Applied Scientist title is fragmented enough that the most common individual skill, A/B Testing, appears in only 26.3% of postings. Compare that to Data Engineer, where three skills cluster around 71-74%. There is no canonical Applied Scientist stack in the way there is a canonical Data Engineer stack.

Key Findings

3,146 active Applied Scientist postings analyzed across the live job board as of May 2026.

No table-stakes tier exists: the most-requested skill, A/B Testing, appears in only 26.3% of postings (828 of 3,146). Python (25.4%) and Statistics (24.6%) follow.

Statistics & Experimentation is the dominant skill family at 44.6% of postings, ahead of Coding Languages (28.3%) and Machine Learning & AI (19.3%).

Median US base salary is $110,000 across 878 postings with US salary disclosed; equity, bonus, and sign-on are not in the data.

Deep-learning specialists earn $145,300 in median US base salary (PyTorch and Deep Learning both n=60+), about $35K above the role baseline.

Mid-level dominates at 60.6% (1,905 postings); entry-level is 14.2% (446), markedly more accessible than Data Engineer's 3%.

60.9% of postings are in the US, with Singapore (6.0%), the UK (5.2%), Canada (4.8%), and India (3.9%) rounding out the next tier.

Onsite is the dominant work mode at 77.1% of postings; remote is just 9.9%, reflecting the heavy academia, healthcare, and pharma presence in the employer mix.

What Skill Families Define an Applied Scientist Role in 2026?

Group every individual skill into the higher-level family it belongs to and count how many postings ask for at least one skill in that family. The shape of the role becomes a fan of related specialties rather than a single stack.

Share of Applied Scientist postings that ask for at least one skill in each family. A posting that mentions both A/B Testing and Statistics counts once under "Statistics & Experimentation".

The families that actually define the role:

Statistics & Experimentation: 44.6% (A/B testing, statistical inference, forecasting)
Coding Languages: 28.3% (overwhelmingly Python; TypeScript is a long-tail noise term)
Tools & Infrastructure: 21.5% (monitoring of deployed models, experiment automation)
Machine Learning & AI: 19.3% (classical ML, deep learning, PyTorch, LLMs, generative AI)
Spreadsheets: 14.1% (essentially Excel, mostly in clinical and life-sciences postings)
Data Visualization & BI: 10.0% (generic visualization, plus Tableau and Power BI as a long tail)
Data Engineering Foundations: 9.1% (data quality, data pipelines)
Querying & SQL: 5.9% (almost entirely SQL itself)
Cloud Platforms: 5.5% (Google Cloud and AWS roughly tied)

A few things stand out against Data Engineer and AI Engineer postings. Statistics & Experimentation, which sits at 17% for Data Engineer, leads the Applied Scientist field at 44.6%; this is the single biggest differentiator from neighboring roles. Querying & SQL, which dominates analyst and engineer hiring, sits at just 5.9% for Applied Scientist, the lowest of any role we have analyzed. And Spreadsheets at 14.1% reflects how much of the hiring comes from clinical research, biostatistics, and lab-applied-science postings where Excel is still a primary analytics tool.

What Are the Three Tiers of Individual Applied Scientist Skills?

Drill into individual skills and three tiers appear, with one important caveat: the top tier is empty.

Top individual skills in Applied Scientist postings, by share of listings that mention them. Skills above 50% would be table stakes; 20-50% are common; 5-20% are differentiators. Generic role keywords and universal soft skills are filtered before counting.

Table Stakes (50%+ of postings)

There are none. No individual skill appears in more than half of Applied Scientist postings. The role is structurally too fragmented across product-science, research, and ML-building subspecialties for any one skill to be universal. This is the single most useful framing for a candidate: do not waste time trying to "cover everything." Pick a flavor of the role and concentrate.

Common Expectations (20-50% of postings)

Three skills cluster in the common tier, and they are exactly the three you would expect from an experimentation-oriented role:

A/B Testing: 26.3%
Python: 25.4% (Applied Scientist + Python openings)
Statistics: 24.6% (Applied Scientist + Statistics openings)

The three travel together. Python plus Statistics co-occur in 369 postings (11.7% of the market, lift 1.87), and A/B Testing plus Statistics co-occur in 264 postings (8.4%, lift 1.29). A candidate competent in all three is positioned for the experimentation-heavy product-science version of the role, which is the most consistently defined flavor in the dataset.

Differentiators (5-20% of postings)

This tier is where Applied Scientist subspecialties separate.

Machine Learning: 15.3% (Applied Scientist + Machine Learning openings)
Excel: 14.0%
Monitoring: 11.0%
Data Visualization: 8.7%
Automation: 8.0%
SQL: 5.7%
Deep Learning: 5.6%
PyTorch: 5.4% (Applied Scientist + PyTorch openings)

Three groupings sit inside this tier. Machine Learning, Deep Learning, and PyTorch (5-15%) are the model-building flavor of the role. Excel and SQL are the analytics-and-reporting flavor (notably, SQL is unusually low for a role family adjacent to data analytics, which tells you most Applied Scientist work happens in Python notebooks on extracted data, not directly in a warehouse). Monitoring and Automation are infrastructure-leaning differentiators for postings that ask the scientist to ship and operate models, not just train them.

Of the newer AI-stack terms, only PyTorch (5.4%) clears into the differentiator tier; LLMs (4.5%) and Generative AI (3.6%) still sit below the 5% cutoff in noise territory, though both are rising fast (a year ago all three were well below noise).

Which Applied Scientist Skills Pay More Than the Baseline?

Salary numbers below are restricted to US postings only (where wage-transparency laws produce consistent disclosure) so they are directly comparable. The numbers are base salary: equity, bonuses, RSUs, and sign-on are not disclosed in postings, so total compensation at top employers is meaningfully higher than what we report here, especially in product-led tech.

The overall median US base salary for Applied Scientist postings is $110,000 (n=878). That sits below the Data Engineer median ($128,300) and below the AI Engineer median ($146,000), and the reason is in the employer mix: 38% of postings are in healthcare, education, biotech, or pharmaceutical industries, where base salaries are lower than they are in product-led tech. The Big-Tech Applied Scientist roles you might be picturing exist, but they are a slice of the market, not the bulk of it.

Median US base salary in USD for postings that mention each skill, among US Applied Scientist postings with structured salary data.

The skills with the largest premiums above the $110,000 baseline cluster around C++ and the deep-learning/modern-AI stack.

Premiums of roughly $30K to $36K:

C++: $145,900 (n=25), about $35,900 above baseline
PyTorch: $145,300 (n=62), about $35,300 above baseline
Deep Learning: $145,300 (n=60), about $35,300 above baseline
Data Pipelines: $140,000 (n=29), about $30,000 above baseline
Generative AI: $140,000 (n=51), about $30,000 above baseline

Premiums of roughly $20K to $30K:

LLMs: $139,600 (n=62), about $29,600 above baseline
Machine Learning: $138,600 (n=169), about $28,600 above baseline
Agile: $130,200 (n=34), about $20,200 above baseline

Premiums of roughly $10K to $20K:

AWS: $128,000 (n=49), about $18,000 above baseline
Java: $125,100 (n=27), about $15,100 above baseline
Google Cloud: $124,500 (n=34), about $14,500 above baseline
Python: $121,500 (n=257), about $11,500 above baseline
Forecasting: $120,000 (n=45), about $10,000 above baseline

Skills near baseline (under $5K above):

Statistics: $112,600 (n=273), about $2,600 above baseline
SQL: $112,100 (n=69), about $2,100 above baseline
A/B Testing: $110,000 (n=297), at baseline

And finally, skills that sit below the role baseline:

Data Visualization: $96,200 (n=76), about $13,800 below baseline
Monitoring: $95,500 (n=101), about $14,500 below baseline
Excel: $85,000 (n=133), about $25,000 below baseline
Power BI: $74,400 (n=26), about $35,600 below baseline

The below-baseline pattern is informative, not noise. Excel, Power BI, and generic data visualization show up most often in clinical research, university lab, and healthcare Applied Scientist postings, where base salaries are structurally lower than in product-led tech. Picking up Excel skills does not lower your salary; it correlates with the segment of the market that pays less. Read the median for what it is: a marker of which kind of Applied Scientist posting tends to mention each skill.

The practical takeaway: the experimentation-and-statistics version of the role pays roughly at baseline, the model-building version pays a $20K to $35K premium, and the research-and-reporting version sits below baseline. Pick the version you want to interview for, and let your skill mix match it.

What Is the Dominant Applied Scientist Skill Stack?

We computed every two-skill co-occurrence among the top 25 skills to find the combinations that show up together more often than chance. Two distinct stacks emerge.

Skill pair	Postings that mention both	% of postings	Lift
Deep Learning + PyTorch	95	3.0%	10.11
Deep Learning + Machine Learning	147	4.7%	5.48
Machine Learning + PyTorch	138	4.4%	5.33
LLMs + Machine Learning	103	3.3%	4.70
Python + PyTorch	159	5.1%	3.70
AWS + Python	88	2.8%	3.50
Python + SQL	155	4.9%	3.41
Deep Learning + Python	148	4.7%	3.33
Machine Learning + Python	350	11.1%	2.86
SQL + Statistics	104	3.3%	2.36
Python + Statistics	369	11.7%	1.87
Automation + Machine Learning	76	2.4%	1.98
Machine Learning + Statistics	230	7.3%	1.94
A/B Testing + Machine Learning	177	5.6%	1.40
A/B Testing + Statistics	264	8.4%	1.29

The story is two stacks layered over the role:

The broad experimentation stack is Python plus Statistics, the highest-volume pair at 369 postings (11.7% of the market, lift 1.87). Add A/B Testing as a third leg (264 postings with Statistics, lift 1.29) and you have the canonical product-science Applied Scientist: someone who designs experiments, runs hypothesis tests, and writes analysis in Python notebooks. This is the most consistently defined version of the role.
The deep-learning specialty stack is Machine Learning plus Python (350 postings, 11.1%, lift 2.86), with a sharp PyTorch plus Deep Learning sub-pair (95 postings, lift 10.11). Lift above 10 is rare in any dataset: it means PyTorch and Deep Learning postings overlap nearly 10 times more than their individual frequencies would predict, because they are essentially the same skill in this market. Add LLMs or Generative AI on top and you have the modern-AI Applied Scientist building, fine-tuning, or evaluating models.

The two stacks barely overlap. Postings that lead with A/B Testing rarely also ask for PyTorch; postings that ask for PyTorch rarely also ask for A/B Testing. Choosing which stack to interview for is the most important upstream decision a candidate can make.

Who's Hiring at Which Seniority Level?

We tagged each posting's seniority based on title keywords (Senior, Lead, Principal, Junior, Intern). Postings with no explicit signal default to mid-level.

Seniority distribution of Applied Scientist postings.

Mid-level: 60.6% (1,905 postings)
Senior: 16.1% (508) (senior Applied Scientist openings)
Entry: 14.2% (446) (entry-level Applied Scientist openings)
Staff / Lead / Principal: 9.1% (287)

Two things stand out. First, the entry-level door is much wider here than for adjacent roles. 14.2% of Applied Scientist postings are explicitly entry-level, compared with 3% for Data Engineer and roughly 8% for Data Analyst. The reason is the academia and healthcare share of the employer mix: universities and research hospitals routinely hire entry-level scientists with newly minted PhDs (or, increasingly, master's degrees in statistics, biostatistics, or applied math). If you are a PhD student or postdoc looking for a first industry role, Applied Scientist is one of the more open entry points in the role family.

Second, the senior-and-above slice (senior plus staff) is 25.3% of the market, lighter than Data Engineer (45%) and AI Engineer (40%). The IC ladder in research-flavored Applied Scientist roles is real but narrower; longer-term career growth often routes through Principal Investigator, ML Manager, or Research Director titles rather than Staff-IC tracks.

Where Are Applied Scientist Jobs Located, and How Remote-Friendly Are They?

Geography is the most US-concentrated of any data-and-analytics role we have analyzed. The US share is over 60%, with no other country breaking 7%.

Top countries by share of Applied Scientist postings.

United States: 60.9% (1,916) (US-only Applied Scientist openings)
Singapore: 6.0% (188)
United Kingdom: 5.2% (163)
Canada: 4.8% (150)
India: 3.9% (123)
Germany: 2.0% (63)
China: 1.6% (50)
Australia: 1.3% (40)

Two of those numbers are unusual. Singapore at 6.0% is the second-largest single market for Applied Scientists, driven primarily by Nanyang Technological University's heavy posting volume in this role family. India at 3.9% is much lower than for Data Engineer (where India is 23%), because the global consulting-and-services firms that drive India's Data Engineer demand don't hire as many Applied Scientists; the work is concentrated at university research labs and pharma R&D centers, which are based in the US and Western Europe.

Work mode reinforces the same pattern.

Share of Applied Scientist postings tagged with each work mode. Some postings carry multiple tags, so percentages sum to more than 100%.

Onsite: 77.1% of postings (2,427)
Hybrid: 19.4% (611)
Remote: 9.9% (310) (fully-remote Applied Scientist openings)

77% onsite is the highest onsite share of any role we have analyzed; for context, Data Engineer is ~50% onsite and Data Analyst is ~56%. The cause is the employer mix: universities, hospitals, pharma R&D, and government labs almost never post remote scientist roles. They want the work happening in their facilities, often because the data is sensitive, the equipment is physical, or the IRB protocols require it. The fully remote slice exists, but it concentrates in product-led tech companies (Adobe and a small handful of others on this list), not in the academic-and-pharma majority.

Who's Hiring Applied Scientists in 2026?

The list of top hiring employers is one of the most informative single signals in this dataset. It looks almost nothing like the top employers for Data Engineer or AI Engineer.

Top companies by active Applied Scientist postings. Counts include all locations of the same job.

Nanyang Technological University: 155 postings (research university)
Thermo Fisher Scientific: 59 (life-sciences instruments and services)
Mass General Brigham: 52 (academic medical center)
Adobe Inc.: 46 (consumer software)
Washington University in St. Louis: 45 (research university)
University of Arizona: 43 (research university)
AstraZeneca: 40 (pharmaceutical)
Eurofins Scientific: 31 (lab testing and life sciences)
Danaher Corporation: 31 (life-sciences and diagnostics conglomerate)
Merck & Co., Inc.: 26 (pharmaceutical)
Mayo Clinic: 26 (academic medical center)
Eli Lilly and Company: 25 (pharmaceutical)

The top 12 employers are dominated by research universities (Nanyang, Washington University, University of Arizona, plus several more outside the top 12), academic medical centers (Mass General, Mayo Clinic, Cleveland Clinic), pharmaceutical firms (AstraZeneca, Merck, Eli Lilly, Amgen), and life-sciences companies (Thermo Fisher, Danaher, Eurofins). Adobe is the only consumer-tech name in the top tier. The Big-Tech Applied Scientist roles that dominate the role's reputation (at Amazon, Microsoft, Meta) exist on the board but are spread across many smaller per-company posting counts, so they do not surface in the top-12 list.

If you are interviewing for an Applied Scientist role in 2026, the practical implication is this: the modal employer is a research university, hospital, or pharma R&D group, not a Big-Tech ML team. Tailor your resume, your research statement, and your interview prep accordingly. Our interview preparation guides cover the technical and behavioral rounds at the specific companies above.

How to Use This in Your Job Search

If you are preparing for an Applied Scientist job hunt, the data points to a clear sequence.

1. Pick a flavor of the role before applying. Applied Scientist is two roles inside one keyword: the product-science version (experimentation, A/B testing, statistics, Python) and the model-building version (Machine Learning, Deep Learning, PyTorch, increasingly LLMs and Generative AI). The skills, employer types, salary distributions, and interview formats are different. A resume that tries to be both reads as expert in neither. Decide which version you are targeting and concentrate your prep there.

2. Build the matching foundation. For the product-science flavor, the foundation is Python plus Statistics plus A/B Testing methodology: confidence intervals, hypothesis testing, multiple-comparison correction, causal-inference patterns. For the model-building flavor, the foundation is Python plus PyTorch plus the math behind modern deep learning (linear algebra, optimization, attention mechanisms). The salary data shows the model-building track pays roughly $28K to $35K more in median US base, but it has a steeper technical entry bar and a tighter employer set.

3. Add the differentiator your target stack values. For product-science, add forecasting (+$10K), Bayesian methods, or a strong causal-inference toolkit. For model-building, add a current modern-AI specialty: LLMs ($139,600), Generative AI ($140,000), or distributed training. Cloud fluency (AWS at $128,000, Google Cloud at $124,500) lifts both stacks roughly $14K to $18K above the role baseline.

4. Drill the topics, then practice the rounds. Reading about Applied Scientist skills is easy; performing under interview conditions is the hard part. Our interview-prep courses cover the foundations across statistics, ML, system design, and SQL. The question bank lets you drill statistics, A/B testing, machine learning, and deep-learning topics one at a time. AI mock interviews let you practice the full round under realistic conditions, with on-demand feedback on case studies, experimental design, and ML system design.

5. Filter the job board for your flavor. Browse current Applied Scientist openings on the InterviewStack.io job board and combine role and skill filters to narrow to the version you want, e.g., Applied Scientist + Statistics for the experimentation track or Applied Scientist + PyTorch for the deep-learning track. The board updates daily, so the listings are current.

FAQ

Q. What skills do companies want for Applied Scientist roles in 2026?

No single skill clears a majority of postings. The most-requested individual skill, A/B Testing, appears in 26.3% of listings, followed by Python (25.4%) and Statistics (24.6%). At the family level, Statistics & Experimentation leads at 44.6%, followed by Coding Languages (28.3%) and Machine Learning & AI (19.3%). Differentiators like Machine Learning (15.3%), PyTorch (5.4%), and Deep Learning (5.6%) pay the largest salary premiums.

Q. What is the median Applied Scientist salary in 2026?

The median US base salary across 878 Applied Scientist postings with disclosed US salary is $110,000. That figure excludes equity, bonuses, and sign-on, so total compensation at top employers runs meaningfully higher. Postings that ask for PyTorch, Deep Learning, LLMs, or Generative AI cluster around $139K to $145K, roughly $30K to $35K above the role baseline.

Q. Which Applied Scientist skills pay the highest premium over the role baseline?

Among US postings, C++ and the deep-learning/modern-AI stack pay the most. C++ ($145,900, n=25), PyTorch ($145,300, n=62), and Deep Learning ($145,300, n=60) top the list, followed by Data Pipelines ($140,000, n=29), Generative AI ($140,000, n=51), and LLMs ($139,600, n=62), each sitting roughly $30K to $36K above the $110,000 role baseline. Machine Learning ($138,600, n=169) and AWS ($128,000, n=49) follow at $19K to $29K premiums.

Q. Is Applied Scientist a good entry-level role to break into?

It is more accessible than several adjacent roles. 14.2% of Applied Scientist postings are explicitly entry-level (446 of 3,146), well above the 3% entry share for Data Engineer. Mid-level postings dominate at 60.6%, and senior plus staff together are 25.3% of the market.

Q. Where are Applied Scientist jobs located, and how remote-friendly are they?

The United States is by far the largest market at 60.9% of postings (1,916 of 3,146). The next-largest single markets are Singapore (6.0%), the United Kingdom (5.2%), Canada (4.8%), and India (3.9%). Work mode is heavily onsite at 77.1% of postings, with 19.4% hybrid and just 9.9% remote. Many top employers are universities, hospitals, and pharma R&D centers, which rarely post remote scientist roles.

Q. Which companies hire the most Applied Scientists in 2026?

Nanyang Technological University leads with 155 active postings, followed by Thermo Fisher Scientific (59), Mass General Brigham (52), Adobe (46), Washington University in St. Louis (45), University of Arizona (43), AstraZeneca (40), Eurofins Scientific (31), Danaher (31), Merck (26), Mayo Clinic (26), and Eli Lilly (25). The top of the list is dominated by universities, hospitals, and life-sciences companies rather than Big Tech.

Q. What is the dominant Applied Scientist skill stack in 2026?

Two stacks coexist in the data. The broad analytical stack is Python plus Statistics, which appear together in 369 postings (11.7% of the market, lift 1.87), often with A/B Testing as a third leg. The deep-learning specialty stack is Machine Learning plus Python (350 postings, lift 2.86) with a tight PyTorch plus Deep Learning sub-pair (95 postings, lift 10.11). The split reflects two distinct flavors of the role: experimentation-heavy product science and model-building research.

Final Thoughts

The Applied Scientist role in 2026 is the most fragmented title in the data-and-analytics family. No single skill carries the role, no single industry dominates the employer mix, and no single salary band describes the comp range. What does carry the role is the deliberate choice of which flavor to interview for: experimentation and statistics, or model-building and deep learning. Pick one early, build the foundation cleanly, and the differentiator that earns the salary premium will follow.

We will refresh this analysis quarterly so the trend lines stay current.

React Compiler and and the promise of automated memoization

Darren Hwang — Fri, 15 May 2026 00:55:27 +0000

The real-world impact of the React Compiler (formerly React Forget). The promise of this tool is to automate memoization, theoretically freeing developers from the manual overhead of useMemo, useCallback, and React.memo.

The Problem: Manual Memoization

React re-renders are cascading; a change in a parent component triggers a re-render for all children unless stopped by memoization. Manually implementing this is often complex and leads to:

Referential instability: Objects and functions recreated on every render.
"Prop drilling" complexity: Tracing memoization through long component chains.
Messy code: Over-use of hooks making the codebase unreadable.

The Compiler's Performance

1. Initial Load Performance

One major concern was that memoizing "everything" would bloat the initial load. However, the tests showed minimal to no impact on initial load times. The compiler is efficient enough that the overhead is negligible.

2. Interaction Performance

The results here were mixed but generally positive:

Best Case: On a settings preview page, total blocking time dropped from 280ms to 0ms.
Realistic Case: On a gallery page, blocking time dropped from 130ms to 90ms. The compiler eliminated many re-renders, but some heavy components still re-rendered due to unstable data references from external libraries (like React Query).

3. Can it catch everything?

No. The investigation found the compiler failed to stop all re-renders in 7 out of 9 complex cases. Reasons include:

Incompatibility with certain external libraries.
Legacy code structures the compiler doesn't yet understand.
Non-primitive props (objects/arrays) that change references outside of the component's scope.

React 18 Vs React 19

React 18 made rendering smarter. React 19 improves performance more by reducing work the browser has to do, loading resources earlier, and making async updates feel faster.

However, you have to opt-in to these improvements. It’s not that every render is magically faster; the biggest gains come from using the new React 19 patterns.

React Compiler is often discussed with modern React because it can automatically memoize components and reduce unnecessary re-renders, but simply upgrading to React 19 does not automagically mean your app is using the compiler; it must be configured in your build setup.

1. Less JavaScript sent to the browser

React 19 stabilizes Server Components, which let parts of your UI run on the server or at build time instead of in the browser. That means the user may download less JavaScript, parse less code, and see content sooner. React’s docs give an example where expensive markdown libraries are not included in the client bundle when moved into a Server Component. (react.dev)

Simple analogy:
React 18 often ships more of the “kitchen” to the customer. React 19 can cook more on the server and only send the finished meal.

2. Better loading of CSS, scripts, fonts, and other resources

React 19 adds better support for things like stylesheets, async scripts, and preload/preconnect APIs. This helps the browser discover important files earlier and avoid duplicated scripts or styles. React’s release notes specifically say these resource APIs can improve initial page loads and client-side navigations. (react.dev)

Simple example:
Instead of waiting until a component appears to discover its font or script, React can help the browser start fetching it earlier.

3. Smoother forms and async updates

React 19 adds Actions, useActionState, useFormStatus, and useOptimistic. These don’t necessarily make the CPU faster, but they make the app feel faster because React can show pending states and optimistic UI more naturally. For example, useOptimistic can immediately show the expected result while the server request is still running. (react.dev)

Simple example:
You click “Save,” and the UI updates right away instead of waiting for the server to respond.

4. Better Suspense/data handling with use

React 19’s use API lets components read promises during render and suspend until data is ready. Used with Suspense and frameworks, this can help avoid awkward loading flows and make async rendering more coordinated. (react.dev)

Simple version:
React gets better at saying, “Pause this part until the data is ready, but keep the rest of the page moving.”

5. More resilient hydration

Hydration is when React connects server-rendered HTML to interactive JavaScript in the browser. React 19 improves how hydration handles unexpected tags from third-party scripts or browser extensions, reducing cases where React has to throw away server HTML and re-render on the client. (react.dev)

Why that matters:
Less unnecessary re-rendering means fewer janky page loads.

6. Faster JSX transform requirement

React 19 requires the modern JSX transform, which React says enables additional improvements including JSX speed improvements and faster performance. (react.dev)

Important caveat

React 19 is not simply “React 18 but every render is faster.” React 18 already introduced major performance features like automatic batching, transitions, and streaming server rendering. (react.dev) React 19’s performance benefits mostly come when you use its newer architecture: Server Components, better resource loading, Actions, Suspense patterns, and modern tooling.

Again, React Compiler is often discussed with modern React because it can automatically memoize components and reduce unnecessary re-renders, but simply upgrading to React 19 does not automagically mean your app is using the compiler; it must be configured in your build setup. (react.dev)

Bottom line:

React 18 made updates smoother. React 19 helps apps load less, fetch smarter, hydrate more reliably, and feel faster during async work.

⚠️
While the React Compiler is a massive step forward, developers seeking to squeeze every millisecond of performance out of their apps will still need to understand and occasionally implement manual memoization.

ARC Turbo OS: Building a Seed-Rooted Runtime That Collapses Redundant Computation

Gary Doman/TizWildin — Fri, 15 May 2026 00:53:30 +0000

ARC Turbo OS: Building a Seed-Rooted Runtime That Collapses Redundant Computation

I’m building ARC Turbo OS, a deterministic execution runtime designed around one core idea:

Collapse computation. Reuse everything. Jump to the end when possible.

The project explores a runtime model where tasks are transformed into canonical problem graphs, resolved outputs are indexed, dependency subgraphs can be reused, and repeated workflows can jump directly to already-known end states.

This is not about claiming every task becomes magically faster.

It is about recognizing when work has already been done, when subgraphs already exist, when the final state is derivable, and when recomputation can be avoided.

The core idea

Traditional execution usually looks like this:

input → compute → output

ARC Turbo OS execution is designed to look more like this:

input → normalize → match → reuse → jump → output

If the system has already resolved the same normalized problem, it should not recompute the whole chain.

It should jump directly to the resolved output.

What ARC Turbo OS is

ARC Turbo OS is a seed-rooted, branch-aware deterministic runtime.

The system model is:

State(t) = F(root_seed, branch_id, event_spine)

Where:

root_seed defines the deterministic session origin
branch_id identifies the lineage path
event_spine is the append-only causal history

The design goal is to avoid hidden mutable state and make runtime state reconstructable from explicit inputs, branches, and events.

Architecture

The architecture is built around several layers.

1. Root Seed Layer

The root seed defines the deterministic origin of the session.

It gives the runtime a reproducible starting point so future state can be understood as a function of seed, branch, and event history.

2. Binary Event Spine

Every meaningful action becomes a structured event.

The event spine acts as an append-only causal log, allowing state reconstruction, replay, lineage inspection, and receipt generation.

3. Deterministic Runtime

The runtime avoids uncontrolled randomness.

All state transitions should be explicit, and external I/O should be wrapped as receipts so the system can distinguish deterministic internal state from externally observed effects.

4. ARC Receipt Layer

The receipt layer tracks:

causality
dependencies
trust levels
execution lineage
external observations
resolved output provenance

This is important because reuse only works safely when the system knows what was reused and why.

5. Implicit to Explicit Expansion

High-level user intent can be expanded into structured execution graphs.

For example:

"build project"
→ compile
→ link
→ package
→ validate
→ export

Once a workflow becomes an explicit graph, the runtime can identify which pieces are new and which pieces have already been resolved.

6. Turbo Resolver

The Turbo Resolver is the core engine.

It is responsible for:

canonical problem identification
output matching
subgraph reuse
execution collapse
end-state resolution

Canonical problem identity

The runtime depends on normalized task identity.

problem_id = hash(normalized_task)

Equivalent tasks should map into the same solution space.

That lets the runtime ask:

Have I already solved this?
Have I solved part of this?
Is the output still valid?
Can I reuse a subgraph?
Can I jump to the end?

Resolved output index

The resolved output index stores completed results:

resolvedOutputs[problem_id] = output

A simplified resolver looks like this:

function resolveTask(task) {
  const id = hash(normalize(task));

  if (resolvedOutputs.has(id)) {
    return resolvedOutputs.get(id); // jump to end
  }

  const graph = expand(task);

  for (const node of graph) {
    if (!resolvedOutputs.has(hash(node))) {
      execute(node);
    }
  }

  const result = finalize(task);
  resolvedOutputs.set(id, result);

  return result;
}

The idea is simple: if an output or dependency is already known, do not recompute it.

Where this helps

ARC Turbo OS is strongest in structured, repeatable workflows.

Examples include:

build systems
packaging pipelines
deterministic AI workflows
simulation reruns
branch comparisons
session restoration
structured content generation
repo maintenance tasks
repeated validation pipelines

These are cases where the same or similar work often appears again and again.

Performance model

The performance benefit depends on how much work is reusable.

A rough model:

new task             → baseline speed
partial reuse        → faster
structured workflow  → much faster
fully resolved state → instant jump

The repo frames this as a system where performance improves as reusable outputs accumulate.

The important part is that the speedup comes from avoiding redundant work, not from violating the cost of genuinely new computation.

What it does not accelerate

ARC Turbo OS does not accelerate everything.

It does not eliminate the cost of:

irreducible new computation
unpredictable external systems
non-deterministic processes
novel problem spaces with no prior lineage
unsafe reuse where dependencies have changed

This matters because the runtime has to be honest.

The system should only jump when the end state is already computed, safely derivable, or verified as reusable.

Branch-aware execution

Branch awareness lets tasks fork from any point while preserving lineage.

That makes it possible to explore alternate outcomes without destroying history.

A branch-aware runtime can support:

alternate build paths
candidate outputs
rollback
replay
comparison
promotion
experiment tracking
deterministic restoration

This fits the broader ARC-style architecture direction: receipts, lineage, replay, promotion, and reproducible state.

End-state resolution

The defining feature is end-state resolution:

If an output is already derivable, the system jumps directly to it.

Example:

first run:
build plugin
→ compile
→ link
→ package
→ export

second run:
build plugin
→ matched
→ jump to final artifact

In a mature system, the runtime should identify exactly which stages changed and which outputs remain valid.

Why this matters

Modern systems recompute too much.

A lot of development workflows repeat the same work:

rebuilding unchanged dependencies
regenerating unchanged assets
rerunning identical validation
reprocessing already-known source states
recreating artifacts that could have been resolved from lineage

ARC Turbo OS explores a runtime model where the system remembers solved work, verifies dependency identity, and collapses repeated computation into reuse.

Current roadmap

The repo roadmap is staged around:

v0.1

task normalization
output cache
basic graph expansion
manual execution

v0.2

ARC receipt system
branch tracking
reusable subgraphs

v0.3

implicit command expansion
turbo resolver

v1.0

full runtime shell
session rail
deterministic workspace

Repo

https://github.com/GareBear99/ARC-Turbo-OS

What I’m looking for

I’m looking for feedback from:

systems developers
build tool developers
DevOps engineers
AI workflow developers
deterministic runtime builders
cache/incremental build people
graph execution researchers
local-first software builders
open-source maintainers

Useful feedback includes:

task normalization ideas
graph expansion design feedback
cache invalidation concerns
receipt format suggestions
branch lineage ideas
deterministic runtime risks
reuse safety rules
build-system comparisons
roadmap suggestions

Long-term direction

The long-term goal is to make ARC Turbo OS a deterministic runtime shell that reduces redundant work through canonical identity, reusable outputs, event-spine lineage, and safe end-state resolution.

Not magic speed.

Not speculative future computation.

A runtime that knows when the work is already done.

Proto-Synth Grid Engine: Building a Math-First 2D World Runtime That Feels 3D

Gary Doman/TizWildin — Fri, 15 May 2026 00:50:08 +0000

Proto-Synth Grid Engine: Building a Math-First 2D World Runtime That Feels 3D

I’m building Proto-Synth Grid Engine, also described in the repo as I/O Synth Grid Engine.

The project is an experimental, deterministic, low-weight world runtime where geometry is not just decoration. Geometry becomes structure, storage, routing, and execution space.

The core idea is:

Geometry = storage
Movement = computation
Entities = executors

Instead of building a heavy 3D stack first, the engine starts with deterministic 2D simulation logic and projects it into a visually 3D synth-grid interface.

What this is

Proto-Synth Grid Engine is a math-first simulation surface.

It treats the world like a programmable environment:

shell geometry defines the world
module blueprints attach systems into that shell
entities move through the grid as executors
grid mutations become event-shaped state changes
deterministic replay becomes possible through event logs and receipts
the render layer projects the 2D core into a 3D-feeling visual surface

The result is not just a game prototype or visual toy. It is an engine surface for future local-first systems, AI runtimes, neural interfaces, spatial dashboards, and programmable world simulations.

Why 2D first

The engine is built around a deterministic 2D vector-space core.

That matters because 2D simulation is:

easier to replay
easier to audit
easier to seed
easier to run on older hardware
easier to reason about
lighter than full 3D
still capable of looking spatial through projection

The visual layer can then use:

perspective scaling
cube-grid projection
layered sprite depth
shell overlays
depth shading
reticle and HUD surfaces
synthwave geometry

That creates a 3D-feeling interface without making the core simulation dependent on a heavyweight 3D engine.

Blueprint-driven worlds

The engine loads blueprints that define the structure and behavior of the world.

The main blueprint layers are:

Shell Blueprint — defines the geometry of the world.
Module Blueprints — attach systems into the shell.
Execution Layer — runs the deterministic simulation loop.

Example runtime concepts include:

shell blueprints
ship modules
scanner modules
HUD modules
cube-grid projection mapping
deterministic seeded worlds
modular system attachment
spatial execution visualization

This lets the world become a programmable surface instead of a fixed scene.

ARC-Core-shaped event discipline

Proto-Synth Grid Engine is designed around the same doctrine as the ARC ecosystem: authority, events, receipts, deterministic replay, and audit trails.

The repo describes the engine as built on an ARC-Core pattern where grid mutations, module attachment, blueprint loads, and execution steps are modeled as receipt-shaped events.

That means core actions can be thought of as:

blueprint load → signed receipt
grid mutation → append-only event
module attach → authority-gated event
simulation loop → deterministic replay
save/load → event log + snapshot

This direction is important because it gives the engine a path toward:

reproducible worlds
receipt-verified loads
replayable simulations
audit trails
source-of-truth state
module synchronization

Iteration path

The repo has evolved through multiple iterations:

Iteration 8 — Blueprint Shell Prototyping

Early shell generation and blueprint structure.

Example direction:

blueprint_octagon.json
→ octagon shell
→ module attachment surface

Iteration 9 — Game Engine Prototype

Prototype world runtime demonstrating:

blueprint shell generation
cube-grid projection mapping
deterministic seed worlds
modular system attachment
spatial execution visualization

Iteration 10 — Synth Grid Engine

A stronger blueprint-driven simulation shell where geometry becomes computation.

This iteration frames the runtime as a serious modular world engine direction, not just a one-off demo.

Iteration 11 — Neural-Synth / Wetware Core

The engine expands into a neural-style interface direction with:

Neural-Synth view
Voxel Directory view
synchronized visual structures
RGB/seed reproducibility
wetware-style runtime presentation
spatial interface concepts for future AI systems

Neural-Synth and Voxel Directory

One of the most interesting pieces is the relationship between the Neural-Synth view and the Voxel Directory view.

Both are intended to represent the same underlying source information through different visual surfaces:

Neural-Synth: node/web/thinking surface
Voxel Directory: icon/grid/filesystem-style surface

The important idea is synchronization.

A change in one representation should correspond to the same source structure in the other representation.

That creates a future path where an AI or user can inspect the same runtime through multiple visual modes without losing the underlying source-of-truth relationship.

Why this matters

A lot of engines treat visuals, state, and logic as separate concerns.

Proto-Synth Grid Engine explores a different idea:

space itself can act like a filesystem
geometry can be executable structure
visual layout can reflect runtime state
entities can act as autonomous executors
blueprints can define both shape and behavior

This makes the project relevant beyond normal game development.

Possible use cases include:

deterministic game/sim prototypes
AI runtime visualizers
spatial dashboards
local-first programmable environments
neural interface experiments
visual source-of-truth editors
low-weight world simulations
seeded universe or grid simulations
blueprint-based runtime shells

Controls

The engine includes simple interaction controls such as:

W A S D → move master control
Mouse   → aim vector
C       → toggle reticle
R       → reset

The goal is direct interaction with the simulated surface while still keeping the core lightweight.

Repo

https://github.com/GareBear99/Proto-Synth_Grid_Engine

What I’m looking for

I’m looking for feedback from:

game developers
simulation developers
JavaScript developers
AI interface builders
low-level engine designers
UI/UX experimenters
local-first software builders
people interested in deterministic systems
people interested in visual AI runtimes

Useful feedback includes:

simulation architecture feedback
blueprint format ideas
deterministic replay suggestions
low-weight rendering ideas
Neural-Synth interface feedback
Voxel Directory interaction ideas
event/receipt architecture feedback
performance suggestions
docs and onboarding improvements

Long-term direction

The long-term goal is to make Proto-Synth Grid Engine a lightweight programmable world surface.

Not just a visual demo.

Not just a grid.

A deterministic simulation layer where geometry, execution, memory, and interface all live in the same blueprint-driven environment.

Smart Meds: Building a Real-Time Drug Interaction Warning System with GPT-4o and Neo4j

Beck_Moulton — Fri, 15 May 2026 00:50:00 +0000

Have you ever looked at a pile of medication boxes and wondered, "Is it actually safe to take these together?" Drug-Drug Interactions (DDI) are a massive concern in healthcare, often leading to unintended side effects or reduced efficacy. Today, we’re bridging the gap between computer vision and medical knowledge graphs to build a Smart DDI Warning System.

In this tutorial, we will leverage Multimodal LLMs (GPT-4o), OCR automation, and Graph Databases (Neo4j) to transform a simple photo of medicine packaging into a real-time risk assessment. By the end of this post, you'll understand how to orchestrate a Healthcare AI pipeline that handles unstructured visual data and queries complex relationships with ease.

The Architecture

The logic is simple but powerful: we capture an image, extract the active pharmaceutical ingredients (APIs), and then traverse a graph of known interactions.

graph TD
    A[Medicine Box Image] --> B{Vision Pipeline}
    B -->|GPT-4o / Tesseract| C[Extracted Ingredients]
    C --> D[Entity Normalization]
    D --> E[(Neo4j Graph Database)]
    E --> F{Interaction Found?}
    F -->|Yes| G[🚨 High Risk Warning]
    F -->|No| H[✅ Safe to Use]
    G --> I[Detailed Report]
    H --> I

Prerequisites

To follow along, you’ll need:

Python 3.9+
OpenAI API Key (for GPT-4o vision capabilities)
Neo4j Instance (Local or AuraDB)
Tesseract OCR (Optional, for pre-processing)

Step 1: Extracting Ingredients with GPT-4o

Traditional OCR can be messy with shiny medicine boxes. That's where GPT-4o shines—it doesn't just "read" text; it understands the context of a "Drug Label." We'll use Pydantic to ensure we get structured data back.

import openai
from pydantic import BaseModel
from typing import List

class MedicationInfo(BaseModel):
    brand_name: str
    active_ingredients: List[str]
    dosage: str

def extract_meds_from_image(image_url: str):
    client = openai.OpenAI()
    response = client.beta.chat.completions.parse(
        model="gpt-4o",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Extract the active ingredients from these medicine boxes."},
                    {"type": "image_url", "image_url": {"url": image_url}}
                ],
            }
        ],
        response_format=MedicationInfo,
    )
    return response.choices[0].message.parsed

# Example usage
# meds = extract_meds_from_image("https://example.com/pill_box.jpg")
# print(meds.active_ingredients) # ['Ibuprofen', 'Diphenhydramine']

Step 2: The Knowledge Graph (Neo4j)

Relational databases struggle with many-to-many interactions. Neo4j is perfect here because interactions are essentially "edges" between "nodes."

First, let's define our schema in Cypher:

// Create a relationship between two drugs
CREATE (d1:Drug {name: 'Ibuprofen'})
CREATE (d2:Drug {name: 'Warfarin'})
CREATE (d1)-[:INTERACTS_WITH {
    severity: 'High', 
    effect: 'Increased bleeding risk'
}]->(d2);

Step 3: Querying for DDI Risks

Now, we connect the dots. Once we have the ingredients from the image, we query Neo4j to see if any pair of drugs in our "basket" has a known interaction.

from neo4j import GraphDatabase

class DDIChecker:
    def __init__(self, uri, user, password):
        self.driver = GraphDatabase.driver(uri, auth=(user, password))

    def check_interactions(self, ingredients_list):
        with self.driver.session() as session:
            query = """
            MATCH (d1:Drug)-[r:INTERACTS_WITH]-(d2:Drug)
            WHERE d1.name IN $list AND d2.name IN $list
            RETURN d1.name, d2.name, r.severity, r.effect
            """
            result = session.run(query, list=ingredients_list)
            return [dict(record) for record in result]

# Initialize and check
checker = DDIChecker("bolt://localhost:7687", "neo4j", "password")
risks = checker.check_interactions(['Ibuprofen', 'Warfarin'])

for risk in risks:
    print(f"⚠️ WARNING: {risk['d1.name']} + {risk['d2.name']} -> {risk['r.effect']}")

Going Beyond the Basics

While this prototype works for simple cases, production-grade medical systems require much more: entity resolution (mapping "Advil" to "Ibuprofen"), dosage considerations, and handling massive datasets like DrugBank.

Pro-Tip: If you are interested in diving deeper into advanced architectural patterns for healthcare AI and production-ready RAG (Retrieval-Augmented Generation) setups, I highly recommend checking out the technical deep-dives over at WellAlly Tech Blog. They have some fantastic resources on building robust, compliant AI systems that go beyond just a "Hello World" example.

The Result

Imagine a mobile app where a user simply snaps a photo of three different prescription bottles. The app immediately flashes a red warning because the combination of Clopidogrel and Omeprazole reduces the former's effectiveness. That is the power of combining Vision AI with Graph Intelligence.

Key Takeaways:

GPT-4o handles the messy "Vision to Structured Data" pipeline.
Neo4j makes querying complex relationships (like DDI) performant and intuitive.
Pydantic is your best friend for making LLM outputs reliable for code consumption.

What do you think? Could this approach be used for other industries? Maybe checking chemical compatibility in labs or food allergens in recipes? Let me know in the comments! 👇

GraphRAG Local Search Text Unit Selection Strategy: Design Trade-offs and Improvement Directions

eyanpen — Fri, 15 May 2026 00:49:08 +0000

Introduction

GraphRAG's Local Search needs to select the most relevant raw text fragments (Text Units) associated with the knowledge graph to fill the LLM context window during query time. This selection strategy seems simple — sort by entity similarity, fill one by one — but in real-world scenarios it exposes a significant limitation: popular entities can monopolize the entire Text Unit budget, causing key text from other entities to be truncated.

This article provides an in-depth analysis of the root cause of this problem, the core problem it was designed to solve, and possible improvement directions.

What Is the Current Strategy

Local Search's Text Unit selection has four steps:

Iterate through selected entities (ranked by vector similarity), collecting each entity's associated text_unit_ids
Deduplication: each TU is attributed only to the first entity encountered
Sorting: by (entity_index, -num_relationships) — entity order takes priority, within the same entity sorted by relationship density in descending order
Fill into context one by one until reaching the token limit (default 50% of total budget, approximately 6000 tokens)

Core code:

for index, entity in enumerate(selected_entities):
    entity_relationships = [rel for rel in relationships if rel.source == entity.title or rel.target == entity.title]
    for text_id in entity.text_unit_ids or []:
        if text_id not in text_unit_ids_set and text_id in self.text_units:
            num_relationships = count_relationships(entity_relationships, self.text_units[text_id])
            text_unit_ids_set.add(text_id)
            unit_info_list.append((self.text_units[text_id], index, num_relationships))

unit_info_list.sort(key=lambda x: (x[1], -x[2]))

Problem Scenario: Popular Entities Monopolize the Budget

Concrete Example

Suppose the user asks: "What is the anti-inflammatory mechanism of chamazulene?"

Entities returned by vector search:

Rank	Entity	Associated TU Count	Notes
0	Chamomile	50	High-frequency entity, mentioned in almost all herbal documents
1	Chamazulene	4	Active component of chamomile, fewer specialized references
2	NF-κB pathway	2	Specific anti-inflammatory molecular mechanism

TU attribution after deduplication:

index 0 "Chamomile": TU1, TU2, TU3, ..., TU50  (50 items)
index 1 "Chamazulene": TU51, TU52              (TU1, TU5 already claimed by Chamomile)
index 2 "NF-κB":  TU53                    (only 1 unclaimed)

Sorting result:

TU1(index=0, rel=5) → TU2(index=0, rel=4) → ... → TU50(index=0, rel=0)
→ TU51(index=1, rel=2) → TU52(index=1, rel=1)
→ TU53(index=2, rel=1)

Assuming a token budget of 6000 tokens and each TU averaging 300 tokens, only about 20 TUs can fit.

Result: All top 20 positions are occupied by "Chamomile" TUs. The text about "chamazulene's anti-inflammatory mechanism" that the user actually cares about (TU51, TU52, TU53) is entirely truncated. The context fed to the LLM is filled with generic introductions about "Chamomile" but contains no original text supporting chamazulene's specific molecular mechanisms.

Why It Was Designed This Way: What Problem It Solves

This strategy was not designed arbitrarily — it solves a more fundamental problem: ensuring that the most semantically relevant entities receive the most comprehensive original text support.

The Scenario It Addresses

Suppose the user asks: "What is the status of chamomile in European traditional medicine?"

Vector search returns:

Rank	Entity	Associated TU Count
0	Chamomile	50
1	European Herbalism	8
2	Lavender	30

In this scenario, "Chamomile" is indeed the most core entity — the user is asking about it. If a round-robin strategy were used (taking 1 TU from each entity in turn), then "Lavender's" 30 TUs would split the budget equally with "Chamomile" — but the user never asked about lavender.

The advantages of the current strategy:

Respects semantic ranking: The entity with the highest vector similarity gets the most original text support, which is correct in most cases
Relationship density sorting ensures quality: Among multiple TUs for the same entity, the most information-dense ones come first
Deduplication avoids redundancy: The same TU won't appear repeatedly because it's associated with multiple entities

Core Trade-off

This is a classic relevance depth vs. coverage breadth trade-off:

The current strategy chooses depth: ensuring the most relevant entity has sufficient original text evidence
The cost is breadth: secondary entities may have no original text support at all

For most "questions about a specific entity" (the design target of Local Search), depth-first is reasonable. The problem emerges when queries involve cross-entity relationships.

The Essence of the Problem: A Single Sorting Dimension Cannot Express Multi-Objective Optimization

Text Unit selection is fundamentally a multi-objective optimization problem:

Relevance: The semantic relevance of a TU to the query (expressed indirectly through entity ranking)
Information density: The number of relationships contained in a TU
Coverage: Ensuring every selected entity has original text support
Diversity: Avoiding homogeneous content flooding the context

The current strategy uses a single tuple (entity_index, -num_relationships) attempting to optimize the first two objectives simultaneously, but completely ignores the latter two.

Improvement Directions

Approach 1: Per-Entity Cap

The simplest improvement — set a TU contribution cap for each entity:

MAX_TU_PER_ENTITY = 5

for index, entity in enumerate(selected_entities):
    count = 0
    for text_id in entity.text_unit_ids or []:
        if count >= MAX_TU_PER_ENTITY:
            break
        if text_id not in text_unit_ids_set and text_id in self.text_units:
            # ... addition logic unchanged
            count += 1

Pros: Simple to implement, guarantees each entity at least has a chance to contribute TUs
Cons: Cap value is hard to determine; if an entity genuinely needs extensive original text support, it gets artificially limited

Approach 2: Round-Robin

Each round takes 1 TU from each entity (selecting the best by relationship density), cycling until the budget is exhausted:

entity_queues = {i: sorted_tus_for_entity_i for i in range(len(selected_entities))}
result = []
while budget > 0 and any(entity_queues.values()):
    for i in range(len(selected_entities)):
        if entity_queues[i]:
            tu = entity_queues[i].pop(0)
            result.append(tu)
            budget -= token_count(tu)

Pros: Guarantees coverage, every entity has original text support
Cons: Depth of the most relevant entity is diluted; lower-ranked irrelevant entities also receive equal budget

Approach 3: Weighted Quota Allocation

Allocate TU quotas based on entity vector similarity scores:

# Assuming similarity scores: [0.95, 0.82, 0.71]
scores = [0.95, 0.82, 0.71]
total = sum(scores)
quotas = [int(max_tus * s / total) for s in scores]
# quotas ≈ [15, 13, 11] (assuming max_tus=39)

Pros: Balances depth and breadth; higher-relevance entities get more quota without monopolizing
Cons: Increased implementation complexity; requires preserving similarity scores from vector search results (not retained in current code)

Approach 4: Minimum Guarantee + Remaining Competition

Guarantee each entity at least N TUs (e.g., 2), with remaining budget competed for using the current strategy:

# Phase 1: Guarantee 2 best TUs per entity
for entity in selected_entities:
    guaranteed_tus = top_2_by_relationship_density(entity)
    result.extend(guaranteed_tus)

# Phase 2: Fill remaining budget using original sorting strategy
remaining = all_tus - guaranteed_tus
remaining.sort(key=lambda x: (x.entity_index, -x.num_relationships))
fill_until_budget(remaining)

Pros: Guarantees coverage while preserving the depth advantage of the original strategy
Cons: If many entities are selected, the guarantee phase may consume significant budget

Summary

Dimension	Current Strategy	Issue
Relevance depth	✅ Excellent	—
Information density	✅ Excellent	—
Coverage breadth	❌ Missing	Popular entities monopolize budget
Content diversity	❌ Missing	Homogenization risk

GraphRAG's current Text Unit selection strategy is a "depth-first" design that performs well for "questions about a single entity" scenarios, but exposes insufficient coverage when queries involve multi-entity cross-relationships.

The most pragmatic improvement is Approach 4 (Minimum Guarantee + Remaining Competition) — it guarantees that every selected entity has at least some original text support with minimal code changes, without breaking the original strategy's advantages in mainstream scenarios.

ARC Language Module: Building a Governed Multilingual Backend for Future AI Systems

Gary Doman/TizWildin — Fri, 15 May 2026 00:45:38 +0000

ARC Language Module: Building a Governed Multilingual Backend for Future AI Systems

I’m building ARC Language Module, a governed multilingual backend foundation for future AI systems.

The project is not meant to be “just another translator.” It is a language knowledge engine and multilingual control layer that helps an AI system understand:

what languages it has data for
what scripts, variants, pronunciation hints, and lineage relationships exist
what it can actually translate or route today
what still depends on external providers or future corpora
what was seeded, imported, changed, reviewed, or left unresolved

The goal is to make multilingual capability visible, inspectable, and honest.

Why this exists

Most language tools specialize in one narrow layer:

translation endpoint
offline machine translation
browser translation
locale/reference data
script or formatting data

Those are useful, but future AI systems need something broader.

They need to know:

what language knowledge they own
what runtime tools are available
what support is partial or missing
which routes are trustworthy
which data came from which source
what changed between releases
what needs to be acquired, reviewed, or expanded

That is the lane ARC Language Module is built for:

not best translator in the world
but a governed language substrate for multilingual AI memory, routing, readiness, and auditability

What ARC Language Module is

Think of it as the brain, filing system, and traffic controller behind a multilingual AI stack.

It provides:

a structured language graph
SQLite-backed storage
CLI operator tooling
FastAPI API surface
seeded language records
scripts and variants
pronunciation and phonology profiles
transliteration profiles
phrase translation seed data
capability/readiness records
coverage reports
policy snapshots
release evidence snapshots

The important distinction is that the system separates language knowledge from runtime capability.

Knowing a language exists is not the same as being able to translate it, speak it, transliterate it, or route it through a provider.

ARC Language Module models that distinction directly.

What it can do today

The current production-track foundation can store and report structured language knowledge such as:

language records
aliases and alternate names
scripts
language lineage / family relationships
variants, dialects, registers, orthographies, and historical stages
pronunciation profiles
phonology hints
transliteration profiles
seeded phrase translations
runtime capability and readiness records

It can answer practical operator questions like:

Which languages are loaded?
Which scripts are attached to each language?
Which languages have pronunciation or phonology profiles?
Which languages have transliteration coverage?
Which capabilities are production, reviewed, experimental, or absent?
Which runtime routes are available?
What changed between releases?

Honest routing

A key idea in ARC Language Module is honest routing.

Instead of pretending every language path is fully supported, the system can route requests through explicit states such as:

seeded local phrase support
optional local/runtime providers
external provider bridges
not-ready states
gap states
missing corpus states

That makes it a language operations layer, not just a translation wrapper.

For AI systems, that matters because false confidence is dangerous. A multilingual backend should be able to say:

I know this language exists.
I have partial metadata.
I have script information.
I do not have enough translation data yet.
This route requires an external provider.
This path is experimental.
This path is production-ready.

That kind of capability boundary is the difference between a toy translation endpoint and a governed AI language substrate.

Architecture

The repo is split into clear layers:

core/      → config, database, models
services/  → language logic, ingestion, routing, policy, evidence, coverage
api/       → FastAPI surface grouped by concern
cli/       → operator entrypoints and handlers
config/    → seed manifests and curated inputs
sql/       → schema and indexes
docs/      → architecture, runtime, policy, onboarding, and comparison docs

This gives the system both application-facing and operator-facing surfaces.

Current release snapshot

The current package snapshot reports:

Version: 0.27.0
Languages: 35
Phrase translations: 385
Language variants: 104
Language capabilities: 245
Pronunciation profiles: 35
Phonology profiles: 35
Transliteration profiles: 21
Semantic concepts: 30
Concept links: 46

Provider support is intentionally modeled separately from core graph truth. Runtime provider availability depends on what is installed, registered, and enabled in the target environment.

Quick start

A typical local setup looks like:

pip install -e .

PYTHONPATH=src python -m arc_lang.cli.main init-db
PYTHONPATH=src python -m arc_lang.cli.main seed-common-languages
PYTHONPATH=src python -m arc_lang.cli.main stats
PYTHONPATH=src python -m arc_lang.cli.main coverage-report
PYTHONPATH=src python -m arc_lang.cli.main system-status
PYTHONPATH=src python -m arc_lang.cli.main build-implementation-matrix
PYTHONPATH=src python -m arc_lang.cli.main release-snapshot

The point is not just to run a server. The point is to inspect what the language backend actually contains and what it can honestly support.

Evidence and release snapshots

ARC Language Module includes release/evidence snapshot concepts so the package can explain what it contains.

A release snapshot can include:

package version
version consistency checks
API health/version integrity checks
live graph counts
coverage state
readiness state
evidence outputs

That helps turn language infrastructure into something auditable instead of a hidden pile of tables and assumptions.

Where it fits compared to other tools

Different projects solve different problems well.

Argos Translate is useful for offline open-source translation packages.
LibreTranslate is useful as a self-hosted translation API.
Firefox Translations / Bergamot is useful for local browser translation.
Unicode CLDR is useful for locale/reference data and internationalization.
ARC Language Module is aimed at the governed orchestration layer: language knowledge, routing, readiness, provenance, and auditability.

The project can sit above or beside translation providers instead of replacing every provider.

What it is not

To keep the claims honest, ARC Language Module is not:

a universal best-in-class machine translation model
a finished speech/TTS stack
a complete transliteration engine for every script pair
a giant cloud service by itself

It is strongest as a multilingual control layer inside a larger AI product, local-first stack, research runtime, or language-aware system.

Repo

https://github.com/GareBear99/arc-language-module

What I’m looking for

I’m looking for feedback from:

AI developers
NLP developers
localization engineers
language technology researchers
multilingual app builders
Python developers
FastAPI developers
SQLite/data-modeling people
corpus/data curators
open-source maintainers

Useful feedback includes:

language graph design feedback
provider routing ideas
corpus ingestion ideas
coverage/reporting improvements
pronunciation/phonology expansion ideas
transliteration profile suggestions
API/CLI design feedback
release snapshot and evidence improvements
docs and onboarding issues

Long-term direction

The long-term goal is to make ARC Language Module a governed multilingual substrate for future AI systems.

Not just translation.

Not just locale data.

A language operations layer that can tell an AI system what it knows, what it can route, what it can prove, and what still needs to be acquired or reviewed.

[Workshop][Gemini CLI] Building with AI 2026: Hands-on with Gemini CLI and Official MCP to Launch a Google Drive LINE Bot from Scratch

Evan Lin — Fri, 15 May 2026 00:45:26 +0000

(Event: Build with AI 2026 @ Google Taipei 101 / Presentation: SpeakerDeck / Materials: kkdai/BwAI-2026 / Example: kkdai/bwai2026-sample)

Background: When the CLI Becomes a "Thinking Colleague"

After Google I/O in 2026, Gemini CLI is no longer just another terminal toy that packages LLM, but a development tool that can mount MCPs, plan on its own, run gcloud on its own, and stop to ask you when it doesn't understand.

In this Build with AI 2026 workshop, I compressed this tool flow into two hands-on sessions:

Workshop 1: Environment Preparation + Two Essential Official MCPs — Connecting Gemini CLI to Google's official knowledge and Maps Platform.
Workshop 2: Tell Gemini CLI a Sentence and Deploy a LINE Bot to Cloud Run — No more hand-typing that long and painful gcloud run deploy ....

The entire teaching material has been open-sourced at kkdai/BwAI-2026, the example project is at kkdai/bwai2026-sample, and the event slides are on SpeakerDeck. This is the full text version of the on-site walkthrough, including the three pitfalls we encountered on stage that day.

Why Gemini CLI + MCP? First, Look at the Timeline

The update pace of Gemini API and its ecosystem has been very dense in the past year:

Time	New Stuff	Impact on Workflow
2025/08	Gemini YouTube Video Understanding	Directly feed URLs of videos to the model
2025/11	Gemini File Search	Managed RAG, no need to connect your own vector DB
2025/12	Google Search Grounding (Vertex)	Model answers can be grounded to search results
2025/12	Maps Grounding & Maps Platform Assist MCP	Native map scenarios
2026/02	Google Developer Knowledge API + MCP Server	Official documentation becomes a tool queryable by LLM
2026/03	Gemini 3 Flash + Tool Combo	Single call chains multiple grounding tools

Core Observation: Google has made each new capability into an MCP Server, which means that Gemini CLI can upgrade the IDE from "an LLM that can write code" to "an LLM that can write code using Google's official resources" with just one line of gemini mcp add.

This workshop, I chose two MCPs that are most impactful for LINE Bot developers to demonstrate.

Workshop 1: Environment Preparation and Official MCP Installation

Why It's Recommended to Start with Cloud Shell

The biggest fear in on-site workshops is the environment issue like "Teacher, I can't find Python 3.11 here". I put the entire demonstration directly on Google Cloud Shell:

gcloud is pre-installed.
gemini CLI is pre-installed (the latest Cloud Shell image is built-in).
gcloud auth automatically links with the Cloud Shell account, saving the OAuth dance.

Go to https://console.cloud.google.com/, first confirm that the project is the one you just created (don't accidentally open the company's official environment), and then click Cloud Shell in the upper right corner:

# Verify that both tools are there
gcloud --version
gemini --version

[!TIP] If you want to run it locally, you can follow the Gemini CLI official installation guide, but in the workshop, we all use Cloud Shell to avoid the tragedy of "everyone's environment is different".

What is MCP? Explained in Three Sentences

MCP (Model Context Protocol) is an open protocol proposed by Anthropic that allows LLM clients to communicate with external capability providers in a unified format.
Gemini CLI is the MCP client, and you can gemini mcp add ... to mount any server that complies with the MCP specification.
Google itself has now packaged several APIs into official MCP servers, which is equivalent to equipping your AI assistant with "Google's internal knowledge base".

MCP #1: Google Developer Knowledge

This MCP turns the official documentation of the Google family (Cloud / Android / Web / Firebase / Workspace…) into a tool that Gemini can call. The advantage over web search is that: it returns chunks that have been officially indexed, with the correct source URL, and will not be misled by outdated blogs.

Setup Steps

Enable Developer Knowledge API at Google Cloud Console.
Create an API Key in "Credentials" and restrict it to only call the Developer Knowledge API (the principle of least privilege).
Run in Cloud Shell:

gemini mcp add -t http \
  -H "X-Goog-Api-Key: YOUR_API_KEY" \
  google-developer-knowledge \
  https://developerknowledge.googleapis.com/mcp \
  --scope user

--scope user means that this MCP is valid for all your projects, and you don't need to install it again next time you change repos.

Verification

Enter gemini interactive mode, first type:

/mcp list

You should see google-developer-knowledge with the status Connected. Then throw a typical question:

Please help me query the latest deployment limits of Google Cloud Run (Deployment Limits) and list the top three.

Correct behavior:

Gemini will call the google-developer-knowledge tool.
The answer content is referenced from official pages like cloud.google.com/run/quotas.
Finally, it includes a reference URL.

MCP #2: Google Maps Platform Code Assist

This MCP is specifically designed to help you write code for Google Maps integration — including the latest calling methods for Maps JavaScript API, Places API, and Routes API. It is extremely friendly to developers who "want map features but are too lazy to flip through three docs".

gemini mcp add -s user -t http \
  maps-code-assist-mcp \
  https://mapscodeassist.googleapis.com/mcp

Verification

I want to embed a Google map in a webpage, please write a basic JavaScript code for me,
with the center point set to Taipei 101.

Expected behavior:

Gemini calls maps-code-assist-mcp.
The generated code will not use the deprecated new google.maps.Map() synchronous loader, but will use the currently recommended importLibrary async pattern.
It will proactively remind you to get the Maps JavaScript API Key and make referer restrictions.

If you see it still generating the old writing style from 2020, then the MCP is not mounted correctly — re-/mcp list to check the status.

Workshop 2: Deploying a LINE Bot to Cloud Run

This part uses the example project kkdai/bwai2026-sample. It is a LINE Bot file backup helper:

Users put images / videos / audio / PDFs into the LINE chat box.
The bot automatically saves the files to the user's own Google Drive, in folders by YYYY-MM.
Supports commands like /recent_files, /search_files <keyword>, /disconnect_drive.

Tech stack: Go + LINE Messaging API SDK + Google Drive API + Firestore (to store OAuth token) + Cloud Run.

git clone https://github.com/kkdai/bwai2026-sample
cd bwai2026-sample

Deployment Flow Overview

[Phase One] Get LINE Keys (Channel Secret + Access Token)
      ↓
[Phase Two] GCP Project Setup (Enable Run / Build / Firestore / Artifact / Drive API)
      ↓
[Phase Three] Set up OAuth Consent Screen + Gemini CLI Login
      ↓
[Phase Four] Tell Gemini CLI a sentence in Chinese and deploy to Cloud Run
      ↓
[Phase Five] Fill in the Webhook URL in LINE Developers Console

Phase One: LINE Keys

Create an official account at LINE Official Account Manager.
In the background, "Settings → Messaging API" enable Messaging API, and create a Provider.
Back to LINE Developers Console corresponding Channel:
- Basic settings → Get Channel Secret.
- Messaging API → Click Issue to get Channel Access Token (long-lived).
Very important: Go back to OA Manager and disable "Auto-reply messages", otherwise your code will never be able to get the messages to reply to.

Phase Two: GCP Project Activation

# Switch to the clean project used in the workshop
gcloud config set project your-cool-project-id

# Enable the entire set of services in one go
gcloud services enable \
  run.googleapis.com \
  cloudbuild.googleapis.com \
  firestore.googleapis.com \
  artifactregistry.googleapis.com \
  drive.googleapis.com

# Build Firestore (used to store per-user OAuth token + state anti-counterfeiting)
gcloud firestore databases create \
  --location=asia-east1 \
  --type=firestore-native

[!NOTE] --type=firestore-native This value will be explained in the third pitfall, why it's easy to get wrong.

Phase Three: OAuth Consent Screen + Gemini CLI Login

Because the Bot needs to represent "the user themselves" to upload files to their Google Drive, this path must go through OAuth.

Go to OAuth Consent Screen:
- User Type: External.
- Application Name: My LINE Bot (or whatever name you want to call it).
- Support Email / Developer Contact Email: Fill in your own Gmail.
Be sure to click "Publish App" after filling it out — if you don't publish it, only accounts in the Test Users list can use it.
Create an OAuth client ID:
- Select Web Application for the type.
- Authorized redirect URI: Temporarily fill in https://placeholder/oauth/callback, and come back to modify it after getting the Cloud Run URL in Phase Four.
- Save the Client ID and Client Secret.
Run locally:

gcloud auth application-default login

This will write ADC (Application Default Credentials) to the local machine, and Gemini CLI will use this credential when running gcloud, without popping up a browser to re-auth halfway.

Phase Four: Deploy to Cloud Run with Gemini CLI (The Highlight)

This part is where the participants in the workshop were most "wow".

After entering the project directory, start Gemini CLI interactive mode:

gemini

Then say a sentence:

Help me deploy to Cloud Run using gcloud, and stop to ask me if you need any data.
Refer to repo https://github.com/kkdai/bwai2026-sample,
region use asia-east1, environment variables will use
ChannelSecret, ChannelAccessToken, GOOGLE_CLIENT_ID,
GOOGLE_CLIENT_SECRET, GOOGLE_REDIRECT_URL.

Gemini CLI will then:

ls and cat Dockerfile by itself to confirm the project structure.
Generate a plan: First use PENDING to reserve the deployment → get the URL → supplement the OAuth redirect → update env vars.
Stop and ask you for confirmation before execution (this is the CLI's confirm mode, enabled by default, and will not yolo).
Run a command that looks like this:

gcloud run deploy linebot-backup-service \
  --source . \
  --region asia-east1 \
  --set-env-vars "GOOGLE_CLOUD_PROJECT=your-cool-project-id,\
ChannelSecret=YOUR_LINE_SECRET_XXXX,\
ChannelAccessToken=YOUR_LINE_TOKEN_XXXX,\
GOOGLE_CLIENT_ID=PENDING,\
GOOGLE_CLIENT_SECRET=PENDING,\
GOOGLE_REDIRECT_URL=PENDING" \
  --allow-unauthenticated \
  --quiet

After 3 to 5 minutes, get the Service URL, such as https://linebot-backup-service-xxxxx.a.run.app.

Supplement the Real OAuth Settings

Go back to the Console and change the https://placeholder/oauth/callback you just filled in to https://linebot-backup-service-xxxxx.a.run.app/oauth/callback.
Paste the real Client ID / Secret to Gemini CLI and ask it to help you update:

gcloud run services update linebot-backup-service \
  --region asia-east1 \
  --update-env-vars \
"GOOGLE_REDIRECT_URL=https://linebot-backup-service-xxxxx.a.run.app/oauth/callback,\
GOOGLE_CLIENT_ID=real-client-id.apps.googleusercontent.com,\
GOOGLE_CLIENT_SECRET=real-secret-xxxx"

Phase Five: Point the LINE Webhook to Cloud Run

Go back to LINE Developers Console → Messaging API tab.
Webhook URL: Fill in https://linebot-backup-service-xxxxx.a.run.app/callback.
Press Verify, and expect to see Success.
Toggle Use webhook to on.
Finally, go back to OA Manager and reconfirm that "Auto-reply messages" is off and "Webhook" is on.

Open LINE, add the Bot as a friend, throw a picture, run OAuth once, and see a folder LINE Bot Uploads/2026-05/... in Drive — the entire process is complete.

Common Maintenance Commands

Function	Command
Redeploy	`gcloud run deploy linebot-backup-service --source . --region asia-east1`
Change env vars	`gcloud run services update linebot-backup-service --update-env-vars "KEY=VALUE"`
Real-time log	`gcloud beta run services logs tail linebot-backup-service`
Check service status	`gcloud run services describe linebot-backup-service --region asia-east1`

The entire maintenance can actually be given to Gemini CLI: "Help me check the logs of linebot-backup-service for the last 5 minutes, and find 5xx" is enough.

Workshop On-Site Pitfall Records

Pitfall One: Billing Not Enabled, Red Error on First Deploy

The first gcloud run deploy directly spewed:

FAILED_PRECONDITION: Billing account for project [your-cool-project-id] is not found.
Please ensure that you have linked an active billing account.

Reason: Most workshop participants open new projects to do this, and new projects don't have Billing bound by default. Cloud Run, Cloud Build, and Artifact Registry all require billing to run — even within the free tier, you must have a "billing account with a linked card" attached to the project.

Solution:

# Check the current billing status of the project
gcloud beta billing projects describe your-cool-project-id

# List available billing accounts
gcloud beta billing accounts list

# Bind
gcloud beta billing projects link your-cool-project-id \
  --billing-account=0X0X0X-0X0X0X-0X0X0X

If you can't or don't want to bind a card, we used the " sandbox project with billing already " as a demonstration on site.

Pitfall Two: Firestore type Parameter Name

The first version of the teaching material (even what AI guessed the first time) was written as --type=native or --type=native-mode:

ERROR: argument --type: Invalid choice: 'native-mode'.
  Valid choices: ['firestore-native', 'datastore-mode']

Reason: After an update in 2024, gcloud firestore databases create changed the type parameter value to the more explicit firestore-native / datastore-mode. Old documents and old answers (including LLM training data) will give you the old values.

Solution:

gcloud firestore databases create \
  --location=asia-east1 \
  --type=firestore-native

This pitfall just demonstrated why you should install the Google Developer Knowledge MCP — after mounting it, Gemini will check the latest official documentation and will not give you outdated type values.

Pitfall Three: Forgot to Enable Drive API, OAuth Passed but Can't Write In

After deployment, Webhook is set up, OAuth consent screen is completed, and the token is obtained, but the first picture upload is 500. Check the log:

googleapi: Error 403: Google Drive API has not been used in project
your-cool-project-id before or it is disabled.

Reason: If you miss drive.googleapis.com in the gcloud services enable ... string in Phase Two, OAuth can pass (because the Consent Screen and Drive API are two different things), but your server will be blocked when it uses the access token to call drive.googleapis.com.

Solution (Quickest):

gcloud services enable drive.googleapis.com

Solution (Fundamental): Enable all the APIs you need at once, list them in the checklist of the teaching material, and run along with it on site so you won't miss it. I specifically wrote drive.googleapis.com into the string in Phase Two to block this pitfall.

[!TIP] A good habit for debugging: As long as the server has the correct token but is 403, first go to API Library to confirm that the corresponding API is enabled, then check the OAuth scope, and finally look at IAM. The wrong order will waste a lot of time.

Why is this combination worth learning?

After the workshop, I asked the on-site participants what moment they felt the most, and the answer was almost unanimous: "Deploying the service just by speaking Chinese to Gemini CLI" that moment.

So why does it feel that way? Breaking it down:

Previously, DevOps was stuck on remembering which command, now it's stuck on expressing clearly what you want to do. The latter is much lower in threshold, with newcomers getting started in three days vs. three months before daring to touch gcloud.
MCP injects official knowledge into Gemini in advance. You no longer need to RTFM yourself first, then translate it into a prompt for LLM; MCP is equivalent to letting LLM have the ability to RTFM itself.
Error messages return to the tool itself. Previously, you had to Google + StackOverflow for errors, now you can directly paste them back to the CLI, which reads the error and then decides the next step — forming a complete plan-act-observe loop.
The entire workflow is reproducible. The teaching materials, examples, and prompts are all in the GitHub repo, and anyone can clone it and follow along, and the results should be consistent.

Want to go deeper? Recommended Advanced Reading

Official Materials: kkdai/BwAI-2026
Example Project: kkdai/bwai2026-sample
Slides: SpeakerDeck
Gemini CLI: github.com/google/gemini-cli
MCP Specification: modelcontextprotocol.io
Extension: Using Gemini CLI + Developer Knowledge MCP, Map MCP Grounding

Postscript: Come to LINE and Make Things Together

This workshop is also one of the recruitment events for our LINE Taiwan DevRel. If you read this and feel:

Want to play with the integration of LINE Messaging API + Google Cloud + Gemini for a long time.
Like to write production code while making the process into teaching materials that can be copied by others.
Can invest more than three days a week and are willing to become a full-time partner after the internship.

Welcome to send me a private message or email to chat, we have a flexible internship program of three days a week, and if you do well, you have the opportunity to become a long-term partner.

Finally, thank you to all the developers who came to the site and did hands-on together — those who are willing to spend their weekends on "using new tools to get through the entire pipeline" are always the most admirable group in the community. See you next time!

Claude Code vs Cursor — 90 days with both in 2026

Muhammad Moeed — Fri, 15 May 2026 00:43:39 +0000

If you have already tried one of them, you are probably wondering whether the other is worth a switch. The short version is that Claude Code and Cursor are not competing for the same job, even though they look like they are. One lives in your terminal and behaves like a junior engineer with shell access. The other lives inside an editor and behaves like a very fast pair programmer sitting next to you.

I ran both on real work for ninety days. Some of it was a Next.js client project, some of it was a Python data pipeline, and a fair amount was housekeeping in my own blog. The picture that came out of that is more nuanced than the comparison posts I had read going in.

TL;DR

Claude Code is better when the task is large and the work happens across many files. Cursor is better when the task is small and you need to stay in the file you are looking at. Most working developers end up using both.

What they actually are

	Claude Code	Cursor
Form	Terminal CLI (plus IDE extension)	Forked VS Code editor
Default working mode	Agentic — reads, plans, edits, runs cmds	Inline completion + chat + agent
Pricing	Pro $20 / Max $200 per month	Pro $20 / Ultra $200, free tier
Best for	Multi-file refactors, repo-wide work, CI	Single-file edits, fast iteration

Cursor is an editor. Claude Code is an agent. That one sentence explains most of the differences below.

Where Cursor wins

I want to be honest here because the internet has decided Claude Code is the winner and Cursor is yesterday's news. That is not what I saw.

Inline tab completion is still the best in the category. For small edits where you already know what you want, this beats any agent loop on raw speed.
Diff review inside a real editor. Hunk-by-hunk accept/reject with keyboard shortcuts is genuinely nicer than reading the same diff in a terminal.
Exploring an unfamiliar codebase. Right-click → "explain this function" while looking at the function is the fastest way to learn a new repo.
Per-request model switching. Mix Opus 4.7, GPT-5, and cheaper models depending on the task.

Where Claude Code wins

These are the cases where I would not even open Cursor. The gap is large enough that there is no contest.

Large refactors across many files

The first time Claude Code paid for itself was a migration job. Rename a config option across thirty-eight files, update the types, fix every test, add a deprecation notice. In Cursor I would have done this with search-and-replace and a lot of cleanup. In Claude Code I described the task in two sentences and walked away for ten minutes. When I came back, it was done and the tests were passing.

For anything that touches more than four or five files, the agent loop is the right shape. You stop being a typist and start being a reviewer. That shift is the real product.

Long-running, autonomous work

Claude Code can run for thirty or forty minutes on a single task without losing the thread. It plans, executes, hits errors, debugs, and finishes. Ultraplan, the newer cloud-planning feature, pushes this even further by separating planning from execution.

Cursor's agent mode can do similar work, but I have never gotten a clean half-hour run out of it. It stops to ask questions or loses context. Claude Code is more comfortable with autonomy.

Running in CI and headless environments

Because Claude Code is a CLI, it runs anywhere a shell runs. Drop it into a GitHub Action and have it review PRs. Pipe data into it. Cursor is an editor, so it lives where editors live: on a developer's laptop. For team automation, this is a real gap.

Real cost over three months

People hand-wave about cost. Here are numbers I actually saw.

Tool	Plan	Months	Real spend
Cursor	Pro $20/mo	3	$60
Claude Code	Max $200/mo	3	$600
Total			$660

That Claude Code spend looks high until you compare it to what those tasks would have cost in human hours. The refactor I mentioned above would have taken me a full day. Claude Code did it for about eight dollars of compute.

If you are on a tight budget, Cursor Pro at $20 is the better starting point. If you bill client work and your time is worth more than $50 an hour, Claude Code pays for itself inside the first project.

Which one to pick for which work

Your situation	Pick
Solo developer, writing a lot of new code	Cursor
Bug fixes in a codebase you know well	Cursor
Multi-file refactor or migration	Claude Code
Writing tests for an existing module	Either
Reviewing a PR (especially in CI)	Claude Code
Learning a new codebase	Cursor for poking, Claude Code for summaries
Heavy automation, scripting, glue work	Claude Code
Very limited budget	Cursor Pro
Client work where your hourly rate is high	Claude Code Max

The honest answer for most working developers is to use both. They are inexpensive enough together that the question is not which to pick, but how to set up your workflow so each one does what it is good at.

The setup I actually shipped

After ninety days, this is what stayed.

Cursor for active coding sessions. Fast tab complete, quick diffs.
Claude Code for everything else. Refactors, test runs, PR reviews, repo-wide search, anything I want running while I am doing something else.
Both pointed at the same shared .claude/ folder so my hooks, skills, and MCP config travel with the repo. A server I write once works in both places.
A few small subagents for jobs I do often — diff review before commit, weekly change log.

Total: $220 a month for the two of them. Saved a lot of time, I have not measured it carefully enough to put a defensible number on it.

A few common questions I get asked about this

Is Cursor going to be replaced by Claude Code?

No. Cursor is an editor with AI. Claude Code is an agent in your terminal. Either can copy a feature, but the form factor of each one limits how much it can become the other.

Can I use Claude Code inside Cursor?

Yes — run the CLI in Cursor's integrated terminal. You lose the editor integration with the agent but keep Cursor's other features.

Does Cursor support MCP?

Yes. Same .cursor/mcp.json format. An MCP server you write once works in both.

Better for non-developers?

Cursor. CLIs have a learning curve not everyone wants to climb.

The full version

This is the dev.to cut. The full version on my blog goes deeper on:

Speed, reliability, and memory benchmarks I tracked
Editor lock-in concerns with Cursor
A longer "common questions" section
Decision rules I now follow when picking which tool to open

If you have the opposite experience from what I described above, I genuinely want to hear it. The most useful comparisons come from people whose work shape is different from mine.

How to Audit Your AI Agent Skills for Credential Exposure and Malicious Instructions

Armor1 — Fri, 15 May 2026 00:40:52 +0000

Two independent security research groups published this week with findings that land on the same problem from different angles: AI agent skill files are a serious and underaudited supply chain surface, and the attack techniques targeting them are already in active use.

The Scale Finding

Capsule Security's analysis covered more than 200,000 agent skill files and 160,000 code files. The result that stands out: 2,909 of 19,618 distinct skill files carry hardcoded credentials alongside direct database write access. Roughly 15% of distinct skill files in active use. No additional exploit is required. Install the skill, the agent reads the skill configuration, the credentials are there.

The same analysis found that AI workloads present a supply chain attack surface six times larger than traditional software. It also observed that malicious skills continue to persist and propagate after the campaigns that distributed them are officially terminated.

The Active Campaign

A separate disclosure published the same week documents a March 2026 campaign targeting a popular AI coding agent framework. Attackers published deceptive community skills that appeared legitimate at a glance. The payload delivery mechanism was not a traditional malware dropper. It was the installation instruction inside the skill file itself.

The skill's installation instructions directed the agent to perform operations that installed Remcos RAT and GhostLoader. The agent followed those instructions because that is exactly what installation instructions are for. No user interaction beyond installing the skill was required.

This is a distinct campaign from the January 2026 supply chain attack covered in prior security reporting. Different delivery mechanism. Different payloads. The point of connection: both used the skill ecosystem as the distribution channel.

What the Attack Surface Looks Like

An AI agent skill typically consists of a few components:

A metadata file (often named SKILL.md or similar) containing the skill's name, description, and installation instructions
Configuration specifying what tools, permissions, and external resources the skill uses
Optionally, code files the skill executes

The attack surface is broader than the code. The metadata file, particularly the installation instructions, is executed by the agent as part of skill setup. An agent that reads and follows installation instructions is following arbitrary instructions from whoever wrote that file. If the file was tampered with or written by a threat actor, those instructions are arbitrary commands.

The credential exposure problem is a separate issue: skill files that embed API keys, database connection strings, or other credentials expose those values to every developer who installs the skill, to the agent that reads the configuration, and to anything else in the agent's context window.

How to Audit Your Skills

Step 1: Inventory what you have. List every skill file currently active in your agent environment. For community-sourced skills, note the source and whether the version has changed since you installed it.

Step 2: Check skill metadata for credentials. Search skill configuration files for patterns that suggest embedded credentials: connection strings, API key patterns, private key markers. A regex scan for common credential patterns across skill metadata is a reasonable first pass.

Step 3: Review installation instructions for anomalies. Read the installation instruction sections of skill files, particularly community-sourced ones. Installation instructions that invoke shell commands, download additional packages from unverified sources, or reference external URLs outside the skill's stated purpose are worth investigating.

Step 4: Check skill versions and provenance. Skills that have changed since their last verified install are a flag. Skills from sources without a clear maintainer are a flag. If a skill you installed months ago now behaves differently, that is worth examining.

Step 5: Treat skill installs as supply chain events. The same controls that apply to adding a dependency to package.json should apply to adding a skill to an agent environment. Review what it does, check the source, pin to a specific version.

How Armor1 Approaches This

Armor1's skill security scanner evaluates every skill file before execution. The scanner checks for hardcoded credentials and credential misuse patterns, malicious installation instructions, data exfiltration patterns embedded in skill configuration, and supply chain risks such as references to unverified external packages or remote code in skill definitions. The scanner runs two passes: an initial analysis and a verification pass to reduce false positives.

The credential exposure Capsule Security found at scale and the installation instruction attack vector documented in the March 2026 campaign both fall inside the categories the scanner evaluates.

Check the risk of any MCP server in your environment with Armor1's free public catalog.

To cover every agentic app, MCP, tool, skill, and plugin across your stack, sign up free Here.

ARC-StreamMemory: Building a Local-First Visual Second Brain for AI-Readable Video Memory

Gary Doman/TizWildin — Fri, 15 May 2026 00:39:22 +0000

ARC-StreamMemory: Building a Local-First Visual Second Brain for AI-Readable Video Memory

I’m building ARC-StreamMemory, a local-first visual memory system for AI-readable video, screen, snapshot, robotics, DAW/plugin, game, and app UI sessions.

The goal is to turn visual activity into something an AI can inspect, replay, cite, verify, and attach to a module.

Instead of treating video as a flat recording, ARC-StreamMemory turns it into a structured memory object:

visual source
→ FFmpeg video/snapshot ingest
→ AI frame-speed schedule
→ frame hashes
→ seeded source spine
→ OCR-ready/event-ready timeline
→ AI digest
→ ARC-style receipts
→ OmniBinary-style chunk map
→ Arc-RAR-style bundle manifest
→ local source-spine viewer
→ AI module attachment JSON

What ARC-StreamMemory does

ARC-StreamMemory can ingest visual sources such as:

video files
screen recordings
screenshots
DAW/plugin sessions
game footage
browser workflows
robotics camera feeds
app UI states

The output is not just a folder of screenshots.

The output is a deterministic visual memory bundle with:

frame indexes
frame hashes
event timelines
AI digest files
module attachment JSON
seeded memory spine
validation reports
bundle manifests
a local HTML viewer

Why this matters

A normal screen recording answers:

What happened?
Maybe watch the whole video again.

ARC-StreamMemory is designed to answer:

What happened?
→ Read the AI digest.
→ Jump to the relevant event.
→ Open the frame.
→ Verify the frame hash.
→ Follow the receipt.
→ Follow the chunk pointer.
→ Restore or export the bundle.

That makes visual memory easier for an AI or developer to inspect and verify.

Current capabilities

The current release foundation supports:

demo visual-memory session generation
snapshot folder ingest
regular FFmpeg video ingest
AI frame-speed policies
per-frame SHA-256 hashing
deterministic memory spine hashing
seeded source-spine lineage
Markdown and JSON AI digests
AI module attachment output
ARC-style receipt export
OmniBinary-style chunk map export
Arc-RAR-style bundle manifest export
local HTML viewer
validation reports
ZIP bundle export
ARC-FusionCapture adapter/spec layer

The repo intentionally avoids overclaiming unfinished integrations.

The current public foundation is complete for deterministic visual memory ingest, indexing, hashing, digesting, viewing, validating, and bundle export. Future gates include native live screen capture, full OCR engine hookup, native OmniBinary persistence, native Arc-RAR packaging, live ARC-Core sync, and production robotics sensor bus integration.

AI frame-speed policy

ARC-StreamMemory supports different frame sampling speeds depending on what the AI needs to remember.

Recommended frame rates include:

0.2 FPS → long passive session memory
0.5 FPS → lightweight visual diary
1 FPS   → general AI inspection default
2 FPS   → UI debugging / GitHub / DAW workflows
5 FPS   → detailed interaction review
10 FPS  → motion-sensitive review

This matters because not every AI memory task needs full video.

A long passive session may only need sparse visual anchors, while a DAW/plugin bug or UI regression may need denser frame sampling.

Deterministic source-spine model

The memory spine is built around a deterministic seed chain:

capture_policy_hash
+ source_fingerprint
+ frame_schedule_hash
+ ordered_frame_hashes
+ chunk_hash
= session_root_seed

That creates a reproducible source spine:

root_seed
→ chunk
→ frame
→ frame_hash
→ event_receipt
→ module_attachment_pointer

The goal is to make visual memory verifiable and replayable instead of vague.

Example workflows

A standard FFmpeg workflow looks like this:

python scripts/ffmpeg_probe.py
python scripts/ingest_video.py input.mp4 --fps 1 --out sessions/video_memory
python scripts/build_stream_memory.py sessions/video_memory --title "Video memory"
python scripts/hash_memory_spine.py sessions/video_memory
python scripts/build_seed_spine.py sessions/video_memory
python scripts/build_ai_digest.py sessions/video_memory
python scripts/validate_memory_bundle.py sessions/video_memory

A demo session workflow looks like this:

python scripts/create_demo_session.py
python scripts/build_stream_memory.py examples/demo_session --title "ARC demo visual memory"
python scripts/hash_memory_spine.py examples/demo_session
python scripts/build_seed_spine.py examples/demo_session
python scripts/build_ai_digest.py examples/demo_session
python scripts/validate_memory_bundle.py examples/demo_session
python scripts/make_bundle.py examples/demo_session --out release_evidence/demo_streammemory_bundle.zip

Output structure

A memory session can include:

session/
├─ frames/
├─ memory/
│  ├─ capture_policy.json
│  ├─ frame_index.json
│  ├─ event_timeline.jsonl
│  ├─ ocr_index.jsonl
│  ├─ ai_digest.md
│  ├─ ai_digest.json
│  ├─ module_attachment.json
│  ├─ memory_spine.json
│  ├─ seed_spine.json
│  └─ session_summary.md
├─ receipts/arc_receipts.jsonl
├─ omnibinary/chunk_map.json
├─ arcrar/bundle_manifest.json
├─ reports/validation_report.json
└─ reports/bundle_export_report.json

This gives each visual memory session a structure an AI system can navigate.

ARC-FusionCapture direction

ARC-StreamMemory also includes a compatibility layer for the planned ARC-FusionCapture runtime.

The future capture layer is meant to wrap regular FFmpeg with:

camera/feed profiles
robotics capture modes
hardware acceleration selection
sensor timestamp sync
rolling buffer policy
event-triggered clips
AI-friendly frame-speed output
ARC receipts
OmniBinary pointers
Arc-RAR bundle manifests

This creates a path from simple video ingest today toward robotics/media capture workflows later.

Public use cases

ARC-StreamMemory can be useful for:

AI developers

Turn debugging videos, browser workflows, and UI sessions into reproducible visual memory modules.

Audio/plugin developers

Archive DAW/plugin tests, plugin validation sessions, FreeEQ8 or FreeVox8 regressions, and visual evidence from test runs.

Robotics developers

Use FFmpeg now, then connect ARC-FusionCapture later for sensor-synced camera memory and robot black-box replay.

Research and reproducibility

Use seeded spines, hashes, citations, validation reports, and module attachments to make visual sessions inspectable and reproducible.

Game and app developers

Capture game states, UI flows, visual bugs, and build history as replayable evidence bundles.

Repo

https://github.com/GareBear99/ARC-StreamMemory

What I’m looking for

I’m looking for feedback from:

AI developers
computer vision developers
robotics developers
Python developers
FFmpeg users
local-first builders
reproducibility researchers
audio/plugin developers
game developers
people interested in AI visual memory

Useful feedback includes:

frame sampling policy ideas
OCR integration suggestions
robotics capture suggestions
viewer/UI feedback
validation/reporting improvements
bundle format feedback
source-spine design feedback
module attachment use cases
local-first architecture feedback

Long-term direction

The long-term goal is to make ARC-StreamMemory a local-first visual second brain for AI systems.

Not just video storage.

Not just screenshots.

A deterministic, replayable, source-verifiable memory spine that can turn visual sessions into AI-readable evidence.