forensicsfraud-detectionanalytics

Detecting Synthetic Identities in Subscription Funnels Using Traffic Forensics

UUnknown

2026-01-29

10 min read

A stepwise forensic method to expose synthetic accounts via traffic, DNS, temporal clustering, and email/phone reuse — with queries and thresholds.

Hook: Why marketers and SEO owners must hunt synthetic identities now

Unexplained drops in conversion rates, sudden churn of trial users, and weird referral spikes are often blamed on algorithm changes or seasonality. Increasingly in 2026, those symptoms point to a different culprit: synthetic identities gaming subscription funnels. This article gives a stepwise, forensic method to expose synthetic accounts using traffic patterns, DNS anomalies, temporal signup clustering, and email/phone reuse — with concrete queries and thresholds you can run today.

Executive summary — what to do first

If you manage growth, retention, or SEO health, run these three quick checks in the first hour:

Query recent signups by IP, ASN, and device fingerprint and flag any IPs with >5 signups/day.
Check for bursts: compute inter-arrival times for signups and flag any window where z-score > 5.
Look for reused identifiers: same phone number used by >2 accounts or same email domain with high disposable-domain ratio (>40%).

These fast triage steps separate likely synthetic clusters from normal customer noise so you can prioritize deeper forensic work.

Background context — why this matters in 2026

Two trends in late 2025 / early 2026 make synthetic identity hunting urgent for marketers and site owners:

Enterprises are learning legacy identity checks are often “good enough” for benign users but insufficient against sophisticated automation and synthetic farms — costing firms billions annually.
Changes to major providers (for example, Gmail’s account/address changes and expanded AI features introduced in early 2026) have shifted attacker tactics; disposable and aliasing behaviors have evolved.

“Banks overestimate their identity defenses to the tune of $34B a year.” — PYMNTS, Jan 2026

That observation applies to subscription businesses as well: weak verification plus high-volume signups equals amplified fraud risk. The method below is designed for marketing and SEO teams with access to analytics, backend logs, and DNS data.

Step 0 — Data sources you need (and why)

Before queries, collect these data feeds into a central analytics or SIEM pipeline:

Signup events (timestamped, user_id, email, phone, ip, device_fingerprint, user_agent)
Web server and WAF logs (raw requests, referrer, cookies)
DNS resolution logs and resolver responses for domains observed during signups (MX, TXT, A, AAAA)
Reverse DNS and ASN mapping for IPs
Third-party enrichments: disposable-email lists, phone carrier/line-type lookup, IP reputation
Certificate Transparency and WHOIS for suspicious domains used by emails or referrers

Store events in a time-series friendly store (BigQuery, Snowflake, Elastic, or Postgres). Keep raw logs for at least 90 days to analyze bursts and attacker behavior evolution.

Step 1 — Traffic-pattern signals and queries

Start with straightforward, high-signal features that are quick to compute:

Signups per IP and per ASN per 24h
Distinct device_fingerprints per IP
Ratio of signups with identical user_agents
Session duration / events per session after signup (bots often have low event depth)

Example SQL — signups per IP (Postgres)

SELECT ip, COUNT(*) AS signups_24h
FROM signups
WHERE created_at > now() - interval '24 hours'
GROUP BY ip
HAVING COUNT(*) > 5
ORDER BY signups_24h DESC;

Threshold guidance: flag IPs with >5 signups/day as suspicious. For high-volume SaaS, increase threshold to 10–20 but combine with other signals (ASN, device diversity).

KQL / Elastic query — identical user agent spikes

POST /_search
{
  "size": 0,
  "aggs": {
    "ua_buckets": {
      "terms": { "field": "user_agent.keyword", "size": 20 },
      "aggs": { "signups": { "filter": { "term": { "event": "signup" } } } }
    }
  }
}

Threshold guidance: any user agent contributing >20% of signups in 24 hours is anomalous unless you have a known marketing campaign using a specific UA.

Synthetic signups often arrive in temporally compact bursts. Use inter-arrival times and burst detection to find these clusters.

Compute inter-arrival times (BigQuery)

WITH ordered AS (
  SELECT
    user_id,
    created_at,
    LAG(created_at) OVER (ORDER BY created_at) AS prev_ts
  FROM `project.dataset.signups`
  WHERE created_at > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
)
SELECT
  created_at,
  TIMESTAMP_DIFF(created_at, prev_ts, SECOND) AS interarrival_seconds
FROM ordered
WHERE prev_ts IS NOT NULL
ORDER BY interarrival_seconds
LIMIT 1000;

Then compute z-scores of inter-arrival intervals across a rolling window. Flag periods where median inter-arrival < 5 seconds and z-score < -4 — a classic automated signup signal.

Burst thresholds

Median inter-arrival < 5s for groups of >20 signups — treat urgent
Signups per minute > 10 sustained for >5 minutes — suspicious
Clustered signups from same IP/ASN > 50 within 1 hour — high confidence

Step 3 — DNS anomalies and domain provenance

Attackers increasingly leverage throwaway domains, short-lived MX records, and misconfigured DNS records to route verification messages or host landing pages. Questions to ask:

Are email domains used in signups newly registered (< 30 days)?
Do MX records point to cheap forwarding services or disposable mail providers?
Is reverse DNS absent or showing generic cloud provider PTRs?

Quick DNS enrichment workflow

Extract domain from email addresses in recent signups.
Query WHOIS creation date and MX/TXT records.
Cross-reference against known disposable domain lists.

Example script logic (pseudo)

// Pseudo-logic
for domain in recent_signup_domains:
  whois = whois_lookup(domain)
  mx = dns_query(domain, 'MX')
  txt = dns_query(domain, 'TXT')
  age_days = current_date - whois.creation_date
  if age_days < 30 or domain in disposable_list or mx points to forwarding-service:
    mark domain as high-risk

Thresholds: domains < 30 days old and domains matching known disposable providers should be scored high. Also watch for common hosting providers where attackers use cheap cloud VMs — combine with ASN signals.

Step 4 — Email and phone reuse and aliasing

Synthetic accounts frequently reuse or algorithmically vary email and phone data to bypass simple uniqueness checks. Analyze reuse patterns at scale.

Email checks

Count accounts per email local-part prefix variations (e.g., alice+promo vs alice+xyz). Many signups from the same base address using plus-aliasing indicates scripted signups.
Flag domains with high disposable rates: compute disposable_signups / total_signups for last 7 days — if > 40% treat as polluted channel.
Watch for mass use of newly supported address-aliasing features (e.g., Gmail primary address changes in 2026) — update parsing rules accordingly.

Phone checks

Phones used by multiple accounts: flag numbers used by >2 accounts in 30 days.
Carrier lookup: virtual numbers (VoIP) are riskier. Flag if line_type == 'voip' and accounts > 1.
Country mismatch: phone country not matching IP geolocation — increase fraud score.

SQL example — phone reuse (Postgres)

SELECT phone_normalized, COUNT(*) AS accounts, array_agg(user_id) AS users
FROM signups
WHERE created_at > now() - interval '30 days'
GROUP BY phone_normalized
HAVING COUNT(*) > 1
ORDER BY accounts DESC;

Thresholds: treat phones with >2 accounts as suspicious; >5 accounts is likely an attack cluster.

Step 5 — Multi-dimensional linking and graph analysis

Synthetic identity attacks are rarely isolated. Build a graph of identifiers and find connected components — this reveals attacker campaigns and account farms.

Nodes: user_id, email, phone, ip, device_fingerprint, cookie_id
Edges: observed co-occurrence (e.g., user_id <- phone)

Graph playbook

Construct edges from signup and login events for the past 90 days.
Run community detection (Louvain or HDBSCAN) to find clusters.
Prioritize clusters by size, cross-signal purity (e.g., high disposable email ratio, majority VoIP phones).

Sample query — find large connected components (cypher-like)

// Using a graph DB or Neo4j
MATCH (n:Identifier)-[:LINKS_TO]-(u:User)
WITH n, count(DISTINCT u) as accounts
WHERE accounts > 10
RETURN n, accounts
ORDER BY accounts DESC;

High-confidence attack clusters often contain:

Single IP/ASN connecting many device_fingerprints
Multiple emails from same disposable domain family
Phone numbers all flagged as VoIP with the same carrier

Step 6 — Operational threat hunting playbook

Turn detection into a repeatable playbook for marketing and security teams.

Triage: run the “first hour” checks daily via scheduled queries and alert on thresholds (IP >5/day, clusters >20 signups/hour).
Enrich: for any flagged IP/cluster, fetch WHOIS, CT logs, ASN details, and phone carrier.
Contain: block suspicious IPs, apply step-up verification (SMS OTP, 2FA), or require manual review for accounts in the cluster.
Remediate: suspend or quarantine accounts after review and monitor for backfill attempts.
Feedback loop: feed confirmed fraud outcomes back to your scoring models to reduce false positives.

Advanced strategies: automation, ML, and how to avoid false positives

Implement a scoring model combining these signals with weights derived from historical true-positive rates. Use unsupervised techniques (isolation forests, HDBSCAN) to discover novel attack vectors.

Feature set: signup_rate, ip_asn_entropy, device_fingerprint_count, email_domain_age, phone_line_type, interarrival_zscore, ua_uniqueness
Training labels: mark confirmed fraud from manual review and automated blocks to calibrate weights.
Deploy model as a risk score; use thresholds to gate actions (score > 80 → block; 50–80 → step-up verification).

To reduce false positives, combine behavioral signals (post-signup engagement patterns) with identity signals. Legitimate users often convert to engaged sessions within 24–72 hours; synthetic accounts show low engagement and similar post-signup behaviors.

Case study (anonymized): a subscription SaaS hit by synthetic funnel abuse

Situation: a mid-market SaaS company saw a 12% spike in trial signups but a 40% drop in trial-to-paid conversions in January 2026.

Forensic steps taken:

Triage spotted 3 IPs with 120 signups within 6 hours — all signups used phone numbers flagged as VoIP.
DNS enrichment showed many emails used newly created domains (<15 days) and MX records pointing to forwarding services.
Graph analysis linked 800 accounts via 12 common device_fingerprints.

Actions: suspended high-risk accounts, required SMS verification for remaining suspicious signups, blocked offending IP ranges, and implemented daily automated alerts. Conversion metrics normalized within 10 days and customer support load decreased.

2026 trends and future predictions

Expect the following trends through 2026:

Attackers will increasingly combine AI to generate plausible persona data (names, bios, social links) that evade simple heuristics; cross-signal linking will become essential.
Disposable and aliasing techniques will adapt to major provider changes (like Gmail’s early-2026 updates), so maintain up-to-date parsing and disposable lists.
DNS and certificate transparency monitoring will be a high-value source: attackers will rapidly create domains and use short-lived certificates to avoid detection.

Investing in automated traffic forensics now — combining DNS, traffic, and identity signals — will pay dividends as synthetic identity sophistication increases.

Actionable takeaways — checklist you can implement this week

Schedule the three-hour triage: run IP, interarrival, and identifier reuse queries daily.
Integrate DNS/WHOIS enrichment into your signup pipeline to flag new domains immediately.
Build a graph of identifiers to find connected components — start with a rolling 90-day window.
Apply step-up verification for accounts scoring above your medium-risk threshold (score > 50).
Log all remediation actions and feed outcomes back to your detection model to refine thresholds.

Closing — Start your traffic forensic program now

Detecting synthetic identities is no longer an optional security enhancement; it's essential for preserving acquisition ROI, conversion integrity, and SEO-driven growth. Use the stepwise method above to convert forensic signals into reliable detection rules, then automate and scale those rules into your marketing and security workflows.

Ready to act? If you want a ready-to-run checklist, query pack (Postgres, BigQuery, Elastic, Splunk), and a lightweight graph notebook tuned for subscription funnels, download our 2026 Traffic Forensics Playbook — or contact our team for a guided audit.

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.