What 'critical' actually means in a bug report

Severity Labs · Notes

What 'critical' actually means in a bug report

Severity inflation is the most common pathology in self-hosted bug bounty programs. Here's how to tell a real critical from a 9.8 in a sandbox.

April 8, 20264 min read

Open any unmanaged bug bounty inbox and look at the severity distribution on the last fifty reports. There will be more "critical" submissions than the actual rate of critical bugs in your codebase would justify. By a lot.

This isn't because hunters are malicious. It's because the incentives push that direction, bigger label, bigger payout, bigger reputation, and because most programs don't push back. Severity inflation becomes the default, and over time, "critical" stops meaning anything specific.

This post is about how we decide when a report is actually critical, versus when it's a high or a medium dressed up.

The CVSS 9.8 problem

Most "critical" reports come with a CVSS vector that lands somewhere between 9.0 and 10.0, typically AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H. That's the maximum-everything vector. It scores 9.8.

Three things to notice about it:

AV:N, exploitable from the network. Almost everything web is.
PR:N, no privileges required. Often inflated. The hunter created an account and is reasoning that "anyone could do this", but if the bug requires being logged in, it's PR:L.
UI:N, no user interaction. Often wrong for client-side bugs. Stored XSS that fires only when a victim opens a specific page is UI:R, not UI:N.

Get those three right and a lot of "critical" reports become "high", which is the right label.

When critical is actually critical

Real criticals share a small set of properties. We use these as a sanity check:

Pre-auth, mass impact. SQLi at a public endpoint that returns the whole user table. Authentication bypass that lets you log in as anyone. RCE on a web-accessible service. These are the ones that justify waking someone up at 2am.

Money or trust at risk. Payment processing bugs that let an attacker charge other users, change order totals, or refund themselves at scale. Privilege escalation in a financial workflow. Anything that lets the attacker move money in a direction they shouldn't.

Total compromise of one user with low effort. Account takeover via unauthenticated password reset, or via a single-click XSS that exfiltrates auth tokens. Even if it's per-user, the per-user severity is high enough.

Compliance-defined. PII exposure that triggers GDPR or HIPAA notification thresholds. Data residency violations. Some categories of bug are critical because the regulatory cost is critical, not because the technical impact alone would be.

When critical is overstated

We see these patterns repeatedly:

Self-XSS as critical. A hunter finds that they can inject script into their own profile that fires only for them. This is informational. It becomes a real bug only when there's a delivery mechanism (e.g., another user can view that profile, or the script can pivot to a real victim).

SSRF without exploitation. A hunter shows that a URL fetcher will hit an internal address. Without demonstrating that the internal address has sensitive data exposed (cloud metadata, internal admin panel, etc.), this is a medium at best.

Reflected XSS in a low-trust path. Reflected XSS on /search that requires the victim to click a link with the payload. Severity here depends on what the path can do, most of the time it's a high, not a critical, because real-world exploitation requires a phishing setup.

Subdomain takeover on a parked subdomain. If the subdomain has no trust relationship with anything important and isn't referenced anywhere, it's a low. It becomes a critical only if cookies are scoped wide, or the subdomain receives traffic from authenticated sessions, or it appears in OAuth redirect URIs.

RCE in a sandbox. A hunter runs arbitrary code in your build sandbox. That's the sandbox's job, it's meant to run arbitrary code. Critical only if the sandbox escape lets them touch real production data.

What good severity arguments look like

When a hunter pushes back on a severity downgrade, we evaluate the argument on three axes:

Is the technical claim correct? Did we miss something in reproduction? If so, we re-score and apologize.
Is the impact concrete? "Could lead to" and "in theory" don't move severity. "Here's the exact data exposed and to whom" does.
Does the business context change the math? A low-CVSS bug on a payment endpoint can outrank a high-CVSS bug on a marketing subdomain. We're open to this argument; we're not open to it being asserted without evidence.

The argument we never accept is "other programs paid critical for this." Other programs' choices are not our scoring.

What this means for hunters

Don't open with "critical". Describe the bug, prove the impact, and let the score follow. Every triager has been burned by inflated reports, they're more likely to give you a fair score if your initial pitch is honest.

If you genuinely think we've miscalled something, send a technical argument. If you send "I think this is critical because of all the reasons", expect us to disagree without engaging.

What this means for security teams

Define what critical means for your program, in writing. Ours is roughly: "pre-auth mass impact, money or trust at risk, total compromise of one user with one click, or regulatory-defined." Your program might be different, that's fine, as long as it's specified.

Then enforce it. Severity inflation in your inbox is a process failure, not a hunter failure. It's a sign that the people doing triage are optimizing for hunter happiness over engineering accuracy. The fix is written justifications on every score, every time, and pushback when the math doesn't add up.

If you'd rather have someone else fight that fight, that's what we do.