close

DEV Community

Cory
Cory

Posted on

Why random.randint(300, 850) is a bad fake credit score

I searched GitHub for Python test files that reference credit_score. Across about 100 repos, every single one hardcodes an integer:

user["credit_score"] = 720
Enter fullscreen mode Exit fullscreen mode

Not one of them uses a generator, a factory, or even random.randint. They just pick a number and move on and random.randint isn't much better.

720 doesn't mean what you think it means

A FICO 8 score of 720 falls in the "good" tier. But if your code handles Equifax Beacon 5.0 scores, the valid range is 334 - 818 vs. 300 - 850. A score of 720 is still "good," but the boundary math is different.

If your lending flow branches on tier thresholds and you're testing with a hardcoded 720, you're testing one path of one model. You'll never catch the edge case where a 320 is valid for FICO 8 but impossible for Equifax Beacon 5.0.

random.randint is worse

random.randint(300, 850) will happily give you 845. That's a valid FICO 8 score, but it's above the max value for Equifax Beacon 5.0. If your code doesn't validate against model-specific ranges, you won't know until production.

The gap

Here's what surprised me most: many of the repos I looked at already use Faker for names, addresses, and emails. But when it comes to credit scores, they drop back to a hardcoded integer. The gap exists because there hasn't been a Faker provider for credit scores -- so even developers who know better default to 720 and move on.

What I wanted

I wanted fake credit scores that:

  • Come from real scoring models with real ranges
  • Can be constrained to a tier (poor, fair, good, very_good, exceptional)
  • Include the bureau name and model name, not just a number
  • Leverage Faker, a testing scaffolding I'm a big fan of

How it works

Install it:

pip install faker-credit-score
Enter fullscreen mode Exit fullscreen mode

Add it to your Faker setup:

from faker import Faker
from faker_credit_score import CreditScore

fake = Faker()
fake.add_provider(CreditScore)
Enter fullscreen mode Exit fullscreen mode

Generate scores:

fake.credit_score()
# 791

fake.credit_score("fico5")
# 687 (Equifax Beacon 5.0, range 334-818)
Enter fullscreen mode Exit fullscreen mode

Test specific tiers:

fake.credit_score(tier="poor")
# 542

fake.credit_score(tier="exceptional")
# 831
Enter fullscreen mode Exit fullscreen mode

Get the full picture:

result = fake.credit_score_full("fico5")
# CreditScoreResult(name='Equifax Beacon 5.0', provider='Equifax', score=687)

name, provider, score = result  # destructure it
Enter fullscreen mode Exit fullscreen mode

Classify a score you already have:

fake.credit_score_tier(score=720)
# 'good'
Enter fullscreen mode Exit fullscreen mode

You can also combine tier and model. If you ask for an "exceptional" score on a model that tops out at 818, the range gets clamped:

fake.credit_score(score_type="fico5", tier="exceptional")
# 812 (clamped to 800-818)
Enter fullscreen mode Exit fullscreen mode

The models

Ten scoring models, each with real ranges:

Model Range
FICO Score 8, 9, 10, 10 T 300-850
VantageScore 3.0, 4.0 300-850
UltraFICO 300-850
Equifax Beacon 5.0 334-818
Experian/Fair Isaac Risk Model V2SM 320-844
TransUnion FICO Risk Score, Classic 04 309-839

The point

If your code branches on credit scores, your tests should use scores that behave like real ones.

faker-credit-score on PyPI | GitHub

Top comments (2)

Collapse
 
itxashancode profile image
Ashan_Dev

this is the kind of thing you don't realize is a problem until your test suite passes with a 720 and then the production model rejects it for being in the bottom percentile.
generating scores without mapping them to a specific version like fico 8 is just asking for data drift issues later.
Do you have a plan to handle the different score distributions for vintage models like classic 04?

Collapse
 
crd profile image
Cory

Interesting idea! I’d love to offer percentile-aware distributions, but as far as I know that data is not publicly available on a per-model basis. I have considered adding model-specific synthetic distributions if there’s user interest.