DEV Community: Bharath Kumar

The Silent Bug: How a DOM Click Target Issue Was Breaking Formbricks Surveys

Bharath Kumar — Sun, 19 Apr 2026 08:18:02 +0000

Here's something that will frustrate you once you see it.
You set up a Formbricks survey trigger. Configure it to fire when a user clicks .submit-btn. Deploy it. Test it yourself — works perfectly. Ship it.
Then nothing happens. Zero surveys triggered. No errors. No warnings. Just silence.
That's the bug I fixed in PR #7327. And the reason it's interesting isn't the fix itself — it's what it taught me about how SDKs fail in the real world.

What Was Actually Breaking
The Formbricks JS SDK lets you trigger surveys based on user actions — including CSS selector click actions. You tell it "when someone clicks .feedback-btn, show this survey."
The SDK listened for click events and checked if the clicked element matched your selector:
typescriptif (!targetElement.matches(".feedback-btn")) {
return false // action dropped, survey never shows
}
Looks fine. Works fine — until your button has any content inside it.
html
...
Give Feedback

Now when a user clicks the SVG icon inside the button, event.target is the — not the .feedback-btn. The .matches() check runs against the SVG. It returns false. The survey is dropped silently.
The only way to trigger the survey was to click the exact 1-2px padding of the button where no child element exists. Which nobody does.

Why Nobody Reported It Directly
This is the part that stuck with me.
The bug had almost certainly been there for a while. But nobody filed an issue saying "event.target doesn't match the selector for nested elements." They filed issues saying "the survey trigger doesn't work reliably" or "only fires sometimes." They assumed it was a configuration problem and gave up.
The bug was invisible because it failed silently. No console error. No warning. Just... nothing.
This is a classic SDK failure mode — the kind that's hard to debug because the feedback loop is broken. The user did everything right. The SDK said nothing. The survey never showed.

How Common Was This Really?
Extremely common. This affects virtually every real-world button.
Modern design systems — shadcn/ui, Radix UI, MUI, Headless UI — almost always put content inside buttons. Icon buttons. Buttons with text wrappers. Buttons with badges. Every single one of these would silently fail with the old behavior.
When I demonstrated the reproduction to Dhruwang:

Click the SVG → Survey does not trigger ❌
Click the text → Survey does not trigger ❌
Click the 1-2px button edge → Survey triggers ✅

His response: "Looks good 🚀" — merged.

The Fix: .closest() as a Fallback
The solution is a DOM method called .closest(). It walks up the DOM tree from the clicked element until it finds an ancestor that matches the selector.
typescript// Before — only checks the exact clicked element
if (!targetElement.matches(selector)) return false

// After — falls back to checking ancestors
const matchesDirectly = targetElement.matches(cssSelector)

if (!matchesDirectly) {
const ancestor = targetElement.closest(cssSelector)
if (!ancestor) return false
matchedElement = ancestor // use the button, not the SVG
}
When the user clicks the SVG icon, .closest(".feedback-btn") walks up the DOM, finds the parent button, and returns it. The survey fires correctly.
Performance note: .closest() is only called as a fallback. If the direct match succeeds — which it does for simple elements — the code takes the same fast path as before. No regression for the common case.

What This Taught Me About SDK Design
Three things that I keep coming back to:

Silent failures are worse than loud failures. An error in the console is annoying. A survey that silently never fires is a support ticket three weeks later when the customer asks why they have zero responses. SDKs that fail silently destroy trust slowly. If the fix fails for some reason, it should say so.
The gap between "works in testing" and "works in production" is the DOM. In testing you click the button. In production users click whatever their cursor lands on — which is almost always a child element. The SDK has to handle the messy reality of how people actually interact with interfaces, not the clean version you test with.
Event delegation is harder than it looks. event.target gives you the most specific element that was clicked. That's often not the element you care about. Any SDK that listens to click events and matches CSS selectors needs to account for this — otherwise it breaks on every button with an icon.

The Regression Tests
I added three tests that fail on the old code and pass on the new:
✅ Clicking a child inside .my-btn → action fires correctly
✅ Clicking an element with no matching ancestor → correctly returns false

✅ Clicking the target directly → .closest() is not called (fast path preserved)
The third test matters. It confirms the fix doesn't slow down the common case. .closest() is only invoked when the direct match fails.
232 tests. 19 files. All passing.

Why I Picked This Up
I was exploring the Formbricks codebase looking for reliability gaps — places where the SDK could fail silently without the developer knowing. This was one of the clearest examples I found.
The issue (#7314) had been sitting open. The reproduction wasn't obvious unless you thought about how click events actually propagate through the DOM. Once I understood it, the fix was clear.
That's usually how it goes with SDK bugs. Understanding the problem takes 90% of the time. Writing the fix takes 10%.

Links

PR #7327: https://github.com/formbricks/formbricks/pull/7327
My GitHub: https://github.com/bharathkumar39293
WebhookDrop (another project in this space): https://web-hook-drop-t4k6.vercel.app

I Fixed a DoS Vulnerability in Formbricks — and Added a Second Layer Nobody Asked For

Bharath Kumar — Thu, 09 Apr 2026 11:10:45 +0000

A story about picking up a security issue, going beyond the spec, and what defense-in-depth actually means in practice

The issue

Someone opened a GitHub issue on Formbricks pointing out that the userId parameter in the SDK had no length validation. Next.js's 4MB default body limit was the only thing standing between a bad actor and the server.

The fix suggested was straightforward: add .max(255) to the Zod schema. That's it.

I picked it up the same day. But as I dug in, I realized the schema fix alone wasn't enough.

Why 255?

Before writing a single line, I thought about what userIds actually look like in production:

UUIDs: 36 characters
Emails (RFC 5321 max): 254 characters
Custom IDs: typically tens to hundreds of characters

255 covers everything real. It rejects everything abusive. The number isn't arbitrary — it's the smallest limit that breaks nothing legitimate.

The schema fix (Layer 1)

The issue pointed at one schema. I found four that needed fixing:

// packages/types/displays.ts
userId: z.string().max(255, {
  message: "User ID cannot exceed 255 characters"
}).optional()

// packages/types/js.ts
userId: z.string().max(255)  // ZJsUserIdentifyInput
userId: z.string().max(255)  // ZJsPersonSyncParams

This validates at the API boundary — if an oversized userId reaches the server, it gets rejected before touching the database.

But here's what bothered me: the payload still travels over the network first.

The SDK guard (Layer 2)

The Formbricks JS SDK runs in the browser. setUserId() is called client-side. If I only validate on the server, a 4MB string still gets serialized, sent over the network, and processed by Next.js before being rejected.

That's wasteful at best. At scale with many concurrent requests, it's a real resource drain.

So I added an early rejection guard directly in user.ts:

const MAX_USER_ID_LENGTH = 255;

if (userId.length > MAX_USER_ID_LENGTH) {
  logger.error(`UserId exceeds maximum length of ${MAX_USER_ID_LENGTH} characters`);
  return okVoid();
}

This runs before updateQueue.updateUserId() is ever called. The oversized string never leaves the browser. No network call. No server processing. No database touch.

The issue didn't ask for this. But once I saw the attack surface clearly, the schema fix alone felt incomplete.

The test

I added a unit test to lock in this behavior:

test("should reject userId longer than 255 characters and not send updates", async () => {
  const longId = "a".repeat(256);
  const result = await setUserId(longId);

  expect(result.ok).toBe(true);
  expect(mockLogger.error).toHaveBeenCalledWith(
    "UserId exceeds maximum length of 255 characters"
  );
  expect(mockUpdateQueue.updateUserId).not.toHaveBeenCalled();
  expect(mockUpdateQueue.processUpdates).not.toHaveBeenCalled();
});

The test verifies three things: the function returns cleanly, the error is logged, and the update queue is never triggered. Future refactors can't accidentally regress this silently.

What I learned

The schema fix was the correct answer to the issue as written. The SDK guard was the correct answer to the actual problem.

These are different things. Reading an issue description and reading the underlying risk are different skills. The description tells you what to change. The risk tells you why, and once you understand why, you often see that the suggested change is necessary but not sufficient.

Defense in depth isn't a fancy term. It just means: don't rely on a single check. If the client-side guard fails or gets bypassed somehow, the server-side schema catches it. If someone calls the API directly without the SDK, the schema catches it. Two independent layers, neither depending on the other.

The PR got merged. Matti left a note: "The additional validation makes sense."

That's the whole story.

Links

Merged PR: #7378
Original issue: #7375
My GitHub: bharathkumar39293

I'm a final year CS student graduating in 2026, looking for backend/infra roles. If this kind of thinking interests your team, I'd love to connect.

I Built a Rate Limiter SDK from Scratch — Here's Every Decision I Made and Why

Bharath Kumar — Sun, 05 Apr 2026 03:27:18 +0000

I'm a final-year CS student who contributes to open source — Formbricks, Trigger.dev. While doing that I kept running into the same class of problems: rate limiting, retry logic, SDK reliability.
So I built a rate limiter SDK from scratch. Not to follow a tutorial. To actually understand every decision.
This post is about those decisions — why Redis over PostgreSQL, why sliding window over fixed window, why fail-open over fail-closed, and a few others. Each one taught me something that no tutorial ever explained.
Live demo: https://rate-limiter-sdk.vercel.app
GitHub: https://github.com/bharathkumar39293/Rate-Limiter-SDK

What I built
A rate limiter that any Node.js developer can drop into their app with one npm install:
typescriptimport { RateLimiterClient } from 'rate-limiter-sdk'

const limiter = new RateLimiterClient({
apiKey: 'your-api-key',
serverUrl: 'https://your-server.com'
})

const result = await limiter.check({ userId: 'user_123', limit: 100, window: 60 })

if (!result.allowed) {
return res.status(429).json({ retryAfter: result.retryAfter })
}
One line. Everything handled. That's the goal of an SDK — hide the complexity so the developer never has to think about it.
The stack: TypeScript, Node.js, Express, Redis, PostgreSQL, Docker. Let me walk through the decisions.

Decision 1: Redis over PostgreSQL for the rate limiting logic
This was the first question I had to answer. I already know PostgreSQL. Why bring in Redis at all?
The answer is simple once you think about it.
Rate limiting happens on every single request — before anything else runs. At scale that's thousands of times per second. PostgreSQL lives on disk. Every query is a disk read. That's fine for storing user data. It's not fine for something that needs to respond in under a millisecond.
Redis lives in RAM. No disk. The difference is roughly 100 nanoseconds (Redis) vs 10 milliseconds (PostgreSQL). That's 100,000x faster.
So the rule became clear: Redis for real-time decisions. PostgreSQL for permanent history. Different jobs, different tools.

Decision 2: Sliding window over fixed window
This is the one I get asked about most. Both algorithms count requests over a time window — but they behave very differently under pressure.
Fixed window divides time into rigid buckets: 0-60s, 60-120s, and so on. Limit is 100 requests per bucket. Sounds fine.
The problem: a user can send 100 requests at second 59 and another 100 at second 61. That's 200 requests in 2 seconds — double the limit — and both batches pass the check. The bucket boundary is a hole.
Sliding window doesn't use buckets. The window always looks back exactly N seconds from right now. If you sent 100 requests in the last 60 seconds, you're blocked. Doesn't matter when the clock ticks over.
The implementation uses a Redis sorted set. Each request is stored as an entry with its timestamp as the score. To check the limit:
typescript// Remove entries older than the window
await redis.zremrangebyscore(key, 0, now - windowMs)

// Count what's left — these are all within the window
const count = await redis.zcard(key)

// Make the decision
if (count >= limit) return { allowed: false, retryAfter: ... }

// Allow — add this request
await redis.zadd(key, now, requestId)
Four lines of logic. The sliding window moves automatically because we always remove old entries before counting.
Stripe uses sliding window. Cloudflare uses sliding window. There's a reason.

Decision 3: Fail-open over fail-closed
This was the most important design decision in the SDK client — and the one that took the longest to think through.
When the rate limiter server is unreachable (network down, timeout, crash), the SDK has two options:

Fail closed → block all requests. Safe, strict.
Fail open → allow all requests. Risky, but resilient.

I chose fail-open. Here's why.
My rate limiter is a secondary service. It exists to protect the developer's app — not to be the app itself. If my server goes down and I fail closed, I just blocked every user of every app that's using my SDK. The developer's product is now broken because of my infrastructure problem.
That's a worse outcome than allowing a few extra requests temporarily.
typescript} catch (error: any) {
// Server unreachable — fail open
if (!error.response) {
console.warn('[RateLimiter] Server unreachable — failing open')
return { allowed: true, remaining: -1 }
}
return error.response.data
}
The remaining: -1 is a deliberate signal. Negative remaining means "we allowed this but couldn't actually check." Developers who want to monitor fail-open events can watch for it.
The principle: never let your secondary service take down someone's primary app.

Decision 4: Fire-and-forget for PostgreSQL logging
Every request — allowed or rejected — gets logged to PostgreSQL for analytics. But I don't await the log call.
typescriptconst result = await checkRateLimit(apiKey, userId, limit, window)

// No await — fire and forget
logRequest({ apiKey, userId, allowed: result.allowed, remaining: result.remaining })

// Response goes out immediately
return res.status(result.allowed ? 200 : 429).json(result)
Why? Because the client doesn't care about logging. The decision is already made. If I await the PostgreSQL write, I'm adding ~5ms of latency to every single request — for something the client gets zero value from.
Fire-and-forget: start the operation, send the response immediately, let the log finish in the background.
The tradeoff: if the server crashes in that 5ms window, the log is lost. That's acceptable for analytics data.
The rule: never make clients wait for things they don't care about.

Decision 5: In-memory cache for API key validation
Every request needs to validate the API key against PostgreSQL. But if I hit the database on every single request, I'm adding a DB round-trip to every rate limit check — defeating the purpose of using Redis for speed.
The solution is an in-memory Set:
typescriptconst validKeys = new Set()

export async function authMiddleware(req, res, next) {
const apiKey = req.headers['x-api-key']

// Fast path — already verified
if (validKeys.has(apiKey)) return next()

// Slow path — first time seeing this key
const result = await db.query('SELECT id FROM api_keys WHERE key = $1', [apiKey])
if (result.rows.length === 0) return res.status(401).json({ error: 'Invalid API key' })

// Cache it for next time
validKeys.add(apiKey)
next()
}
First request from a key: hits PostgreSQL (~5ms). Every subsequent request: hits the Set (~0.001ms). At scale that's thousands of database queries saved per second.
The Set resets on server restart — which is fine. The DB is the source of truth. This is just a speed layer.

Decision 6: Plain React over Next.js for the dashboard
This one is simple but I get asked about it.
The dashboard is an internal analytics tool. It shows request counts, blocked percentages, per-user breakdowns. Nobody is Googling for it. There are no public pages to index.
Next.js is great for server-side rendering and SEO. Neither of those things matter for an internal dashboard that only authenticated users see.
Adding Next.js for this use case is overengineering. Plain React, talking to the Express API, is exactly the right tool.
The principle: use the simplest tool that solves the problem correctly.

Decision 7: 2-second timeout on every SDK call
The SDK calls my server on every limiter.check() call. If my server is slow — maybe it's under load, maybe it's in the middle of a deploy — the SDK should not hang the developer's app indefinitely.
typescriptconst response = await axios.post(serverUrl, options, {
headers: { 'x-api-key': this.apiKey },
timeout: 2000 // give up after 2 seconds
})
Two seconds is the threshold. After that, the request times out, the catch block runs, and we fail-open. The developer's app never hangs waiting for my server.

What I learned
Building this taught me something I didn't expect: the interesting part of backend engineering is almost never the happy path.
Anyone can write the code that works when everything is fine. The decisions that matter are:

What happens when Redis goes down?
What happens when the DB is slow?
What happens when two requests arrive at the same millisecond?
How do you make it fast without making it fragile?

These are the questions that show up in production. Building this project — and contributing to Formbricks and Trigger.dev — forced me to think about all of them.
That's why I built it. Not to add a line to a resume. To actually understand the problems.

Links

Live demo: https://rate-limiter-sdk.vercel.app
GitHub: https://github.com/bharathkumar39293/Rate-Limiter-SDK
My other project (webhook delivery engine): https://web-hook-drop-t4k6.vercel.app

If you're building something similar or have questions about any of these decisions — drop a comment. Happy to dig into it.