CT-log scanner disposable email detection
A new disposable-email provider registers a domain at 10:00, pushes the DNS and stands up a mail server, hits Let’s Encrypt at 13:00 to issue a TLS cert so their https:// UI doesn’t throw mixed-content warnings, and is live to receive their first signup-form fraud traffic by 14:00. That’s a four-hour window from registration to live-and-abusing. The npm disposable-email-domains package will catch the new domain in roughly eight weeks.
We catch it in roughly four hours.
The mechanism is Certificate Transparency log monitoring. Every TLS certificate issued by every major CA is publicly logged in real time — that’s a deliberate W3C design choice to prevent rogue CAs from issuing certs in secret. We poll the public CT log aggregator at crt.sh hourly, filter for patterns that suggest disposable-mail intent, and ship the candidate domains into our review/scrape pipeline.
This post: how the pipeline works, what patterns it watches, what the false-positive rate looks like, and why this is the freshness backbone behind the disposable email checker API.
What Certificate Transparency is
Quick refresher for context. When a Certificate Authority (Let’s Encrypt, DigiCert, GlobalSign, etc.) issues a TLS certificate, it submits the cert to one or more public CT log servers. The logs are append-only, cryptographically verified, and queryable in real time. Browsers (Chrome, Firefox, Safari) require every cert to be in at least two CT logs before they’ll trust it; certs that aren’t CT-logged trigger browser warnings.
The upshot: every legitimate TLS cert is public knowledge within seconds of issuance. crt.sh is a publicly-queryable aggregator of all major CT logs maintained by Sectigo. You can search it via its web UI or its API for any cert matching a domain pattern.
A new disposable-mail operator needs HTTPS for their public UI. They issue a cert. The cert hits CT logs. We see it.
The patterns we monitor
Hourly cron queries crt.sh’s JSON API for certs issued in the last 60 minutes matching these patterns (regex-ish):
*temp*mail*.com
*temp*mail*.net
*temp*mail*.org
*temp*mail*.xyz
*temp*mail*.io
*throwaway*.com
*throwaway*.net
*minute*mail*
*burner*mail*
*disposable*
*tempinbox*
*fakemail*
*quickmail*
*one*time*email*
*1*time*email*
*1mail*
*10mail*
*disposable*inbox*
Plus a longer tail of more-specific patterns: French (mail-temporaire, email-jetable), German (wegwerf-email, wegwerfmail), Spanish (correo-temporal), Russian (temp*pochta), Portuguese (email-descartavel), Hindi (asthayee*). Multilingual coverage matters because disposable operators ship in non-English markets too — and English-only pattern matching misses them entirely until the customer-consensus channel eventually catches up.
What happens after a match
The pipeline runs every match through a five-stage filter:
- Allowlist check. Match against the 380-entry legit allowlist. If the candidate is on the allowlist (extremely rare for CT-log candidates, but defensive), drop.
- DNS resolution. Resolve the apex. If NXDOMAIN or no MX, drop — the cert was issued but the operator hasn’t stood up mail infrastructure yet. We re-queue for 24h.
- Headless HTTP probe. Playwright visits the apex, captures the rendered DOM. If the page is empty / 404 / parked, drop.
- Temp-mail UI detection. Look for known temp-mail patterns in the rendered HTML: an address dropdown widget, inbox-by-URL pattern, “no signup needed” copy, “temporary inbox” / “disposable email” keywords. If matched, classify as candidate temp-mail.
- Fingerprint extraction. Pull any AdSense / GA4 / GTM / Facebook Pixel / Yandex Metrika IDs from the HTML. Cross-reference against the operator-cluster table. If matched, the new domain joins the existing operator. If unmatched, it becomes a new candidate operator.
After stage 5, the domain enters the domain_candidates queue. An operator analyst can review batched candidates and either ship-to-production or reject. Patterns that hit very high confidence (clear temp-mail UI + matched fingerprint to known operator) auto-promote without manual review.
Average end-to-end pipeline latency: ~4 hours from cert issuance to candidate-in-queue.
Real example: a typical day’s harvest
A representative recent run (anonymized to avoid pointing a current operator at a specific date):
- 48 certs matched patterns in the 60-minute window.
- 22 dropped at DNS resolution (cert issued, infrastructure not yet stood up).
- 8 dropped at HTTP probe (404 or parked page).
- 12 dropped at temp-mail UI check (other intent —
tempmail.iocould be a legitimate productivity service rather than disposable mail, and we don’t false-positive based on name alone). - 3 hit known operator fingerprint — auto-promoted to disposable_mail_domains under existing operator entry.
- 3 created new candidate operator entries — queued for review.
Daily volume: ~30-60 new candidates entering the pipeline, ~5-10 promoting to production same-day, the rest cycling through review.
What this looks like for a brand new disposable operator
The four-hour timeline, blocked into stages:
| Time | Event |
|---|---|
| T+0 | Operator registers quicktempmail.example. |
| T+30m | Operator stands up server, configures DNS. |
| T+1h | Operator runs certbot for Let’s Encrypt cert. |
| T+1h+seconds | Cert is logged to CT logs by Let’s Encrypt. |
| T+1h to T+2h | crt.sh ingests the new cert into its aggregator. |
| T+2h | Our hourly poll picks up the match. |
| T+2h to T+3h | DNS + HTTP probe + UI detection runs. |
| T+3h to T+4h | Fingerprint match + queue insertion. |
| T+4h | New domain live in disposable_mail_domains. |
| T+8w | npm disposable-email-domains package adds the same domain (eight weeks later, in the next manual update). |
The eight-week comparison isn’t theoretical — we benchmark our coverage against the npm package monthly and the median lag for new disposable domains hitting the npm list is 47-56 days. The slowest we’ve measured: 84 days.
For your signup form: ship today’s user trying to sign up with bob@quicktempmail.example (registered this morning) gets blocked by vrfymail; the same address slides past every blocklist that depends on the npm package, for the next two months.
Why this works specifically for disposable mail
CT-log monitoring is general-purpose — it’s used in security research for phishing, brand-protection, malware C2 detection, plenty of other things. For disposable mail specifically, it works well because:
- Operators need HTTPS. A temp-mail product is a web app. Users won’t trust it without
https://and a green lock. Cert issuance is non-optional. - Domain naming telegraphs intent. Brand-protection problems are hard because attackers go to lengths to disguise. Disposable-mail operators want discoverability — the brand needs to be memorable. Patterns like “temp,” “throwaway,” “burner” are baked into the value proposition.
- Pattern false-positive rate is low. Surprisingly, a domain matching
*temp*mail*is overwhelmingly likely to be temp-mail. The non-temp-mail uses of that pattern (productivity tools with “temp” in the name, etc.) get filtered at the UI-detection stage.
The combined precision/recall is high enough that the pipeline produces production-grade signal with minimal human review for clear-cut cases.
Where CT-log monitoring falls short
Three failure modes worth naming:
- Operators that don’t ship a public UI. Some adversarial operators don’t need an HTTPS web app — they expose only an API or only an MX server. Those don’t get a cert and we don’t see them via CT logs. We catch them via customer-consensus promotion instead.
- Operators issuing certs on wildcard subdomains only. A wildcard cert for
*.cloudflare.comdoesn’t tell us aboutdisposable-thing.cloudflare.com. We work around this by also monitoring SAN-list expansion on certs covering many domains. - Operators using a CA outside the standard logs. Theoretical — every modern public CA logs to CT. Internal CAs (corporate PKIs) don’t, but neither do they issue certs for disposable-mail apexes.
The general assumption: if an operator wants users to find them and use the service, they get a cert and we see them. Adversarial operators that hide from users hide from us too — but they also hide from the people they’re trying to defraud.
How CT-log catches feed the operator graph
Once a new candidate domain promotes to disposable_mail_domains, the same fingerprint and MX-pattern detection feeds back: if the operator stands up 10 more domains under the same backend, all 10 are caught by Tier-2 pattern match (50ms) instead of waiting for separate CT-log catches. The freshness pipeline (CT-log + customer-consensus + scrapers) acts as the new-operator discovery channel. The detection pipeline (operator graph + MX clusters + IP ranges) acts as the scale-out channel.
The combined effect: a new operator’s first domain gets caught within 4 hours of cert issuance. Domains 2-50 from the same operator get caught at first verify, before they’re independently certed.
FAQ
Doesn’t crt.sh rate-limit?
Sectigo’s crt.sh API is generous to well-behaved clients — our hourly poll with pattern-narrowed queries stays well within their published limits. For higher-volume querying you can run your own CT log reader against the actual CT log servers (Google’s Argon, Cloudflare’s Nimbus, etc.) — we do this as a backup but the primary pipeline runs against crt.sh.
Why hourly and not real-time?
Real-time CT log monitoring is doable (streaming the log feeds directly), but it adds infrastructure complexity for diminishing return. The marginal value of catching a domain at T+1h vs T+4h is small; the marginal cost of streaming infrastructure is large. Hourly is the sweet spot.
How do you avoid flagging legitimate domains with “temp” or “mail” in the name?
Pattern matching is a filter, not a verdict. Patterns produce candidates. Candidates go through DNS + HTTP + UI detection + fingerprint checks before they classify. A domain named tempmaild.io that doesn’t have a temp-mail UI never lands in the disposable table. The patterns are intentionally lossy — we drop most matches.
Can I monitor CT logs myself?
Yes — crt.sh’s API is public and well-documented. The infrastructure to do this is one cron job plus a few hundred lines of code. If you’re running your own disposable detection and want the freshness layer, this is a tractable build.
What about non-Let’s Encrypt CAs?
All major public CAs log to CT — DigiCert, Sectigo (the company behind crt.sh, also a CA), GlobalSign, Entrust, Comodo. Coverage approaches 100% for any cert that browsers will trust. The patterns we monitor are CA-agnostic.
Get fresh disposable detection without running your own CT-log pipeline
Every /v1/check request hits the latest version of the production disposable database, including this morning’s CT-log catches. Free tier: 5,000 verifies/month, no card. Get an API key →