Detecting Synthetic Identities in Subscription Funnels Using Traffic Forensics
A stepwise forensic method to expose synthetic accounts via traffic, DNS, temporal clustering, and email/phone reuse — with queries and thresholds.
Hook: Why marketers and SEO owners must hunt synthetic identities now
Unexplained drops in conversion rates, sudden churn of trial users, and weird referral spikes are often blamed on algorithm changes or seasonality. Increasingly in 2026, those symptoms point to a different culprit: synthetic identities gaming subscription funnels. This article gives a stepwise, forensic method to expose synthetic accounts using traffic patterns, DNS anomalies, temporal signup clustering, and email/phone reuse — with concrete queries and thresholds you can run today.
Executive summary — what to do first
If you manage growth, retention, or SEO health, run these three quick checks in the first hour:
- Query recent signups by IP, ASN, and device fingerprint and flag any IPs with >5 signups/day.
- Check for bursts: compute inter-arrival times for signups and flag any window where z-score > 5.
- Look for reused identifiers: same phone number used by >2 accounts or same email domain with high disposable-domain ratio (>40%).
These fast triage steps separate likely synthetic clusters from normal customer noise so you can prioritize deeper forensic work.
Background context — why this matters in 2026
Two trends in late 2025 / early 2026 make synthetic identity hunting urgent for marketers and site owners:
- Enterprises are learning legacy identity checks are often “good enough” for benign users but insufficient against sophisticated automation and synthetic farms — costing firms billions annually.
- Changes to major providers (for example, Gmail’s account/address changes and expanded AI features introduced in early 2026) have shifted attacker tactics; disposable and aliasing behaviors have evolved.
“Banks overestimate their identity defenses to the tune of $34B a year.” — PYMNTS, Jan 2026
That observation applies to subscription businesses as well: weak verification plus high-volume signups equals amplified fraud risk. The method below is designed for marketing and SEO teams with access to analytics, backend logs, and DNS data.
Step 0 — Data sources you need (and why)
Before queries, collect these data feeds into a central analytics or SIEM pipeline:
- Signup events (timestamped, user_id, email, phone, ip, device_fingerprint, user_agent)
- Web server and WAF logs (raw requests, referrer, cookies)
- DNS resolution logs and resolver responses for domains observed during signups (MX, TXT, A, AAAA)
- Reverse DNS and ASN mapping for IPs
- Third-party enrichments: disposable-email lists, phone carrier/line-type lookup, IP reputation
- Certificate Transparency and WHOIS for suspicious domains used by emails or referrers
Store events in a time-series friendly store (BigQuery, Snowflake, Elastic, or Postgres). Keep raw logs for at least 90 days to analyze bursts and attacker behavior evolution.
Step 1 — Traffic-pattern signals and queries
Start with straightforward, high-signal features that are quick to compute:
- Signups per IP and per ASN per 24h
- Distinct device_fingerprints per IP
- Ratio of signups with identical user_agents
- Session duration / events per session after signup (bots often have low event depth)
Example SQL — signups per IP (Postgres)
SELECT ip, COUNT(*) AS signups_24h
FROM signups
WHERE created_at > now() - interval '24 hours'
GROUP BY ip
HAVING COUNT(*) > 5
ORDER BY signups_24h DESC;
Threshold guidance: flag IPs with >5 signups/day as suspicious. For high-volume SaaS, increase threshold to 10–20 but combine with other signals (ASN, device diversity).
KQL / Elastic query — identical user agent spikes
POST /_search
{
"size": 0,
"aggs": {
"ua_buckets": {
"terms": { "field": "user_agent.keyword", "size": 20 },
"aggs": { "signups": { "filter": { "term": { "event": "signup" } } } }
}
}
}
Threshold guidance: any user agent contributing >20% of signups in 24 hours is anomalous unless you have a known marketing campaign using a specific UA.
Step 2 — Temporal signup clustering (burst detection)
Synthetic signups often arrive in temporally compact bursts. Use inter-arrival times and burst detection to find these clusters.
Compute inter-arrival times (BigQuery)
WITH ordered AS (
SELECT
user_id,
created_at,
LAG(created_at) OVER (ORDER BY created_at) AS prev_ts
FROM `project.dataset.signups`
WHERE created_at > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
)
SELECT
created_at,
TIMESTAMP_DIFF(created_at, prev_ts, SECOND) AS interarrival_seconds
FROM ordered
WHERE prev_ts IS NOT NULL
ORDER BY interarrival_seconds
LIMIT 1000;
Then compute z-scores of inter-arrival intervals across a rolling window. Flag periods where median inter-arrival < 5 seconds and z-score < -4 — a classic automated signup signal.
Burst thresholds
- Median inter-arrival < 5s for groups of >20 signups — treat urgent
- Signups per minute > 10 sustained for >5 minutes — suspicious
- Clustered signups from same IP/ASN > 50 within 1 hour — high confidence
Step 3 — DNS anomalies and domain provenance
Attackers increasingly leverage throwaway domains, short-lived MX records, and misconfigured DNS records to route verification messages or host landing pages. Questions to ask:
- Are email domains used in signups newly registered (< 30 days)?
- Do MX records point to cheap forwarding services or disposable mail providers?
- Is reverse DNS absent or showing generic cloud provider PTRs?
Quick DNS enrichment workflow
- Extract domain from email addresses in recent signups.
- Query WHOIS creation date and MX/TXT records.
- Cross-reference against known disposable domain lists.
Example script logic (pseudo)
// Pseudo-logic
for domain in recent_signup_domains:
whois = whois_lookup(domain)
mx = dns_query(domain, 'MX')
txt = dns_query(domain, 'TXT')
age_days = current_date - whois.creation_date
if age_days < 30 or domain in disposable_list or mx points to forwarding-service:
mark domain as high-risk
Thresholds: domains < 30 days old and domains matching known disposable providers should be scored high. Also watch for common hosting providers where attackers use cheap cloud VMs — combine with ASN signals.
Step 4 — Email and phone reuse and aliasing
Synthetic accounts frequently reuse or algorithmically vary email and phone data to bypass simple uniqueness checks. Analyze reuse patterns at scale.
Email checks
- Count accounts per email local-part prefix variations (e.g., alice+promo vs alice+xyz). Many signups from the same base address using plus-aliasing indicates scripted signups.
- Flag domains with high disposable rates: compute disposable_signups / total_signups for last 7 days — if > 40% treat as polluted channel.
- Watch for mass use of newly supported address-aliasing features (e.g., Gmail primary address changes in 2026) — update parsing rules accordingly.
Phone checks
- Phones used by multiple accounts: flag numbers used by >2 accounts in 30 days.
- Carrier lookup: virtual numbers (VoIP) are riskier. Flag if line_type == 'voip' and accounts > 1.
- Country mismatch: phone country not matching IP geolocation — increase fraud score.
SQL example — phone reuse (Postgres)
SELECT phone_normalized, COUNT(*) AS accounts, array_agg(user_id) AS users
FROM signups
WHERE created_at > now() - interval '30 days'
GROUP BY phone_normalized
HAVING COUNT(*) > 1
ORDER BY accounts DESC;
Thresholds: treat phones with >2 accounts as suspicious; >5 accounts is likely an attack cluster.
Step 5 — Multi-dimensional linking and graph analysis
Synthetic identity attacks are rarely isolated. Build a graph of identifiers and find connected components — this reveals attacker campaigns and account farms.
- Nodes: user_id, email, phone, ip, device_fingerprint, cookie_id
- Edges: observed co-occurrence (e.g., user_id <- phone)
Graph playbook
- Construct edges from signup and login events for the past 90 days.
- Run community detection (Louvain or HDBSCAN) to find clusters.
- Prioritize clusters by size, cross-signal purity (e.g., high disposable email ratio, majority VoIP phones).
Sample query — find large connected components (cypher-like)
// Using a graph DB or Neo4j
MATCH (n:Identifier)-[:LINKS_TO]-(u:User)
WITH n, count(DISTINCT u) as accounts
WHERE accounts > 10
RETURN n, accounts
ORDER BY accounts DESC;
High-confidence attack clusters often contain:
- Single IP/ASN connecting many device_fingerprints
- Multiple emails from same disposable domain family
- Phone numbers all flagged as VoIP with the same carrier
Step 6 — Operational threat hunting playbook
Turn detection into a repeatable playbook for marketing and security teams.
- Triage: run the “first hour” checks daily via scheduled queries and alert on thresholds (IP >5/day, clusters >20 signups/hour).
- Enrich: for any flagged IP/cluster, fetch WHOIS, CT logs, ASN details, and phone carrier.
- Contain: block suspicious IPs, apply step-up verification (SMS OTP, 2FA), or require manual review for accounts in the cluster.
- Remediate: suspend or quarantine accounts after review and monitor for backfill attempts.
- Feedback loop: feed confirmed fraud outcomes back to your scoring models to reduce false positives.
Advanced strategies: automation, ML, and how to avoid false positives
Implement a scoring model combining these signals with weights derived from historical true-positive rates. Use unsupervised techniques (isolation forests, HDBSCAN) to discover novel attack vectors.
- Feature set: signup_rate, ip_asn_entropy, device_fingerprint_count, email_domain_age, phone_line_type, interarrival_zscore, ua_uniqueness
- Training labels: mark confirmed fraud from manual review and automated blocks to calibrate weights.
- Deploy model as a risk score; use thresholds to gate actions (score > 80 → block; 50–80 → step-up verification).
To reduce false positives, combine behavioral signals (post-signup engagement patterns) with identity signals. Legitimate users often convert to engaged sessions within 24–72 hours; synthetic accounts show low engagement and similar post-signup behaviors.
Case study (anonymized): a subscription SaaS hit by synthetic funnel abuse
Situation: a mid-market SaaS company saw a 12% spike in trial signups but a 40% drop in trial-to-paid conversions in January 2026.
Forensic steps taken:
- Triage spotted 3 IPs with 120 signups within 6 hours — all signups used phone numbers flagged as VoIP.
- DNS enrichment showed many emails used newly created domains (<15 days) and MX records pointing to forwarding services.
- Graph analysis linked 800 accounts via 12 common device_fingerprints.
Actions: suspended high-risk accounts, required SMS verification for remaining suspicious signups, blocked offending IP ranges, and implemented daily automated alerts. Conversion metrics normalized within 10 days and customer support load decreased.
2026 trends and future predictions
Expect the following trends through 2026:
- Attackers will increasingly combine AI to generate plausible persona data (names, bios, social links) that evade simple heuristics; cross-signal linking will become essential.
- Disposable and aliasing techniques will adapt to major provider changes (like Gmail’s early-2026 updates), so maintain up-to-date parsing and disposable lists.
- DNS and certificate transparency monitoring will be a high-value source: attackers will rapidly create domains and use short-lived certificates to avoid detection.
Investing in automated traffic forensics now — combining DNS, traffic, and identity signals — will pay dividends as synthetic identity sophistication increases.
Actionable takeaways — checklist you can implement this week
- Schedule the three-hour triage: run IP, interarrival, and identifier reuse queries daily.
- Integrate DNS/WHOIS enrichment into your signup pipeline to flag new domains immediately.
- Build a graph of identifiers to find connected components — start with a rolling 90-day window.
- Apply step-up verification for accounts scoring above your medium-risk threshold (score > 50).
- Log all remediation actions and feed outcomes back to your detection model to refine thresholds.
Closing — Start your traffic forensic program now
Detecting synthetic identities is no longer an optional security enhancement; it's essential for preserving acquisition ROI, conversion integrity, and SEO-driven growth. Use the stepwise method above to convert forensic signals into reliable detection rules, then automate and scale those rules into your marketing and security workflows.
Ready to act? If you want a ready-to-run checklist, query pack (Postgres, BigQuery, Elastic, Splunk), and a lightweight graph notebook tuned for subscription funnels, download our 2026 Traffic Forensics Playbook — or contact our team for a guided audit.
Related Reading
- Analytics Playbook for Data-Informed Departments
- Hands-On Review: Portable Quantum Metadata Ingest (PQMI) — OCR, Metadata & Field Pipelines (2026)
- Observability Patterns We’re Betting On for Consumer Platforms in 2026
- Why Cloud-Native Workflow Orchestration Is the Strategic Edge in 2026
- Smart Field Mapping: Aligning CRM Fields to Tax Categories for Multi-Entity Businesses
- Choosing the Right Power Adapter: Fast-Charging Options for Your E-Bike and Devices
- Budget Creator Kit: Tech Essentials for Beauty Influencers Under $700 (Mac mini, lighting, and more)
- The Cosy Edit: 12 Winter Accessories That Beat High Energy Bills (and Look Chic)
- MMO Shutdowns: What New World's Closure Means for Players and How to Protect Your Purchases
Related Topics
sherlock
Contributor
Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.
Up Next
More stories handpicked for you