September 10, 2025 Data Quality • Exit Strategy

Data Quality: The Real Reason Salesforce Exits Fail

You can't migrate garbage and expect gold. Here's how to measure, fix, and validate data quality before you commit $2M to an exit.

By Tyler Colby

The $1.8M Data Quality Tax

SaaS company, 8 years on Salesforce, 12M records. Exit plan: migrate to HubSpot in 6 months, budget $1.2M.

Month 3 of migration: discovered 2.4M duplicate Contacts, 840K Accounts with no associated records, 1.2M orphaned Opportunities pointing to deleted Accounts.

Outcome: Migration paused. Spent $600K cleaning data. Timeline extended 9 months. Total cost: $1.8M over budget.

Root cause: Started migration without data quality baseline.

Why Data Quality Kills Exits

1. Duplicates Multiply Costs

Migration vendors charge by record count. If 30% of your Accounts are duplicates, you pay 30% more to migrate garbage.

Deduplication after migration costs 3-5x more than before (new system learning curve, unfamiliar tools, lost context).

2. Broken Relationships Break Business Process

Opportunity without Account? Contact without Account? Task without related record?

These orphaned records either fail migration (error out) or create broken workflows in the new system. Sales team loses context. Deals stall.

3. Incomplete Data = Incomplete Migration

Required fields in old system ≠ required fields in new system. If 40% of Accounts missing Industry, and new CRM requires Industry for routing rules, migration fails validation.

Data Quality Scoring Framework

Our assessment tool scores data quality 0-100 across 5 dimensions:

1. Completeness (Weight: 25%)

// Per-object completeness score
required_fields_populated = COUNT(records WHERE all_required_fields != NULL)
total_records = COUNT(records)

completeness_score = (required_fields_populated / total_records) × 100

// Example
Accounts: 87% have Name, BillingStreet, BillingCity
Contacts: 92% have FirstName, LastName, Email
Opportunities: 78% have Amount, CloseDate, StageName

Overall completeness: (87 + 92 + 78) / 3 = 85.7%

2. Uniqueness (Weight: 30%)

// Duplicate detection
duplicates = COUNT(records WHERE fuzzy_match(Name, Email, Phone) > 0.85)
total_records = COUNT(records)

uniqueness_score = ((total_records - duplicates) / total_records) × 100

// Example
Accounts: 2.4M total, 680K duplicates → 71.7% unique
Contacts: 8.1M total, 1.9M duplicates → 76.5% unique
Leads: 3.2M total, 420K duplicates → 86.9% unique

Overall uniqueness: (71.7 + 76.5 + 86.9) / 3 = 78.4%

3. Consistency (Weight: 20%)

// Format consistency
consistent_format = COUNT(records WHERE matches_pattern(field, expected_pattern))
total_records = COUNT(records)

consistency_score = (consistent_format / total_records) × 100

// Examples
Phone: (608) 555-1234 vs 608-555-1234 vs 6085551234
Email: tyler@example.com vs TYLER@EXAMPLE.COM vs tyler@example.COM
State: WI vs Wisconsin vs wi vs WISCONSIN

Phone consistency: 68% (32% formatting variations)
Email consistency: 91%
State consistency: 74%

Overall consistency: (68 + 91 + 74) / 3 = 77.7%

4. Referential Integrity (Weight: 15%)

// Relationship validity
valid_relationships = COUNT(records WHERE parent_exists AND child_exists)
total_relationships = COUNT(relationships)

integrity_score = (valid_relationships / total_relationships) × 100

// Example orphaned records
Opportunities without Account: 142,000 (11.8% orphaned)
Contacts without Account: 89,000 (1.1% orphaned)
Tasks without related WhatId/WhoId: 340,000 (15.2% orphaned)

Integrity score: 88.2% (11.8% average orphan rate)

5. Accuracy (Weight: 10%)

// Sample-based validation
sampled_records = RANDOM_SAMPLE(records, 500)
manually_verified = COUNT(sampled_records WHERE human_validates_correct)

accuracy_score = (manually_verified / 500) × 100

// Typical findings
Account Industry matches reality: 82%
Contact Title matches LinkedIn: 76%
Opportunity Amount matches closed deal: 91%

Overall accuracy: (82 + 76 + 91) / 3 = 83%

Overall Data Quality Score

DQ_Score = (Completeness × 0.25) + 
           (Uniqueness × 0.30) + 
           (Consistency × 0.20) + 
           (Integrity × 0.15) + 
           (Accuracy × 0.10)

// Example calculation
DQ_Score = (85.7 × 0.25) + (78.4 × 0.30) + (77.7 × 0.20) + 
           (88.2 × 0.15) + (83.0 × 0.10)
         = 21.4 + 23.5 + 15.5 + 13.2 + 8.3
         = 81.9

// Interpretation
90-100: Excellent (exit-ready)
75-89:  Good (minor cleanup needed)
60-74:  Fair (significant remediation required)
< 60:   Poor (exit not recommended until fixed)

Real Case Study: Financial Services Exit

Initial Assessment

Records: 18M (Accounts 2.1M, Contacts 9.4M, Opps 6.5M)
DQ Score: 63 (Fair)
Blockers: 1.8M duplicate Contacts, 840K orphaned Opps, inconsistent Phone/Email formats

Remediation Plan (12 weeks, $420K)

Phase 1: Deduplication (4 weeks, $180K)

Tool: Salesforce Data.com + custom Apex batch jobs
Process: fuzzy match on Name + Email + Phone, human review queue for ambiguous matches
Result: Merged 1.8M duplicates → 720K unique records, 1.08M deleted

Phase 2: Orphan Resolution (3 weeks, $90K)

Opportunities without Account: 840K records
Strategy: Match on Contact.Account → 680K resolved, 160K archived (historical, >5 years old)
Contacts without Account: Create "Unknown Account" placeholder for 89K records, flag for ops team review

Phase 3: Data Enrichment (3 weeks, $80K)

Missing Industry: ZoomInfo API enrichment for 420K Accounts
Missing Phone/Email: Manual ops team outreach for top 5,000 strategic accounts
Standardize formats: Phone → E.164, Email → lowercase, State → 2-letter code

Phase 4: Validation (2 weeks, $70K)

Re-run DQ assessment
Spot-check 1,000 random records
UAT with sales team on cleaned data

Post-Remediation Results

Records: 16.2M (10% reduction from deduplication)
DQ Score: 89 (Good → Excellent threshold)
Migration proceeded on schedule
Zero data-related blockers during migration

Architect's Note: Data quality isn't a migration phase—it's a prerequisite. Trying to migrate dirty data is like moving houses without packing: you'll break things, lose things, and pay 3x more than if you'd organized first. Well-Architected "Reliable" means data you can trust, not just data that exists.

Common Data Quality Issues by Object

Accounts

Duplicates (company acquired, name changed, merger not reflected)
Test accounts in production ("Test Co", "ACME", "Sample Account")
Missing Industry/Type (sales can't route)
Inconsistent naming (IBM vs. IBM Corporation vs. International Business Machines)

Contacts

Duplicates (same person, multiple email addresses)
Outdated (person left company 3 years ago, still active)
Missing Account relationship (orphaned from deleted Account)
Bad email formats (typos, invalid domains)

Opportunities

Orphaned (Account deleted but Opp remains)
Missing Amount (required for forecasting in new CRM)
Stale (created 5 years ago, Stage still "Prospecting")
No Contact Roles (can't identify buyer committee)

Automated Remediation Patterns

Deduplication Batch Job

global class DeduplicateAccountsBatch implements Database.Batchable {
  global Database.QueryLocator start(Database.BatchableContext bc) {
    return Database.getQueryLocator([
      SELECT Id, Name, BillingStreet, BillingCity, Phone, Website
      FROM Account
      WHERE IsDeleted = false
      ORDER BY CreatedDate ASC
    ]);
  }
  
  global void execute(Database.BatchableContext bc, List scope) {
    Map uniqueAccounts = new Map();
    List duplicatesToMerge = new List();
    
    for (Account acc : scope) {
      String fuzzyKey = generateFuzzyKey(acc);
      if (uniqueAccounts.containsKey(fuzzyKey)) {
        // Duplicate found - merge into existing
        Database.merge(uniqueAccounts.get(fuzzyKey), acc, false);
      } else {
        uniqueAccounts.put(fuzzyKey, acc);
      }
    }
  }
  
  private String generateFuzzyKey(Account acc) {
    return (acc.Name + acc.BillingStreet + acc.Phone)
      .toLowerCase()
      .replaceAll('[^a-z0-9]', '');
  }
}

Orphan Detection Query

// Find Opportunities without valid Account
SELECT Id, Name, AccountId, Amount, CloseDate
FROM Opportunity
WHERE AccountId = null 
   OR AccountId NOT IN (SELECT Id FROM Account)
ORDER BY Amount DESC NULLS LAST

// Find Contacts without Account
SELECT Id, Name, Email, AccountId
FROM Contact
WHERE AccountId = null
   OR AccountId NOT IN (SELECT Id FROM Account)
ORDER BY CreatedDate DESC

Data Quality SLAs for Exit Readiness

Dimension	Minimum	Target
Completeness	85%	95%
Uniqueness	90%	98%
Consistency	80%	95%
Integrity	95%	99%
Accuracy	80%	90%
Overall DQ Score	85	92

Remediation Cost Estimate Formula

// Cost to remediate data quality issues
deduplication_cost = duplicate_count × $0.50  // $0.50 per duplicate merged
orphan_resolution = orphan_count × $0.30      // $0.30 per orphan resolved
enrichment_cost = missing_field_count × $0.20 // $0.20 per field enriched
validation_cost = total_records × $0.05       // $0.05 per record validated

total_remediation_cost = deduplication_cost + orphan_resolution + 
                        enrichment_cost + validation_cost

// Example: 12M records, DQ Score 63
duplicates: 1.8M × $0.50 = $900K
orphans: 840K × $0.30 = $252K
missing fields: 2.1M × $0.20 = $420K
validation: 12M × $0.05 = $600K

Total: $2.17M remediation cost

// Rule of thumb: remediation cost = 15-25% of migration cost for DQ < 75

Exit-Ready Data Quality Checklist

Run DQ assessment (use our tool or build your own)
If DQ Score < 85, plan remediation before exit planning
Deduplicate Accounts, Contacts, Leads (highest ROI cleanup)
Resolve orphaned records (delete or re-parent)
Enrich missing required fields (Industry, Type, Source)
Standardize formats (Phone E.164, Email lowercase, State 2-char)
Archive stale data (>3 years old, no activity)
Validate with UAT (sales team spot-checks 500 records)
Re-assess DQ score (target 85+)
Lock data quality (no bulk imports until migration complete)

Red Flags That Block Exits

Duplicate rate > 20%: Migration cost doubles, dedup required first
Orphan rate > 10%: Business process broken, fix before migrating
Missing External IDs: Can't do upserts, only inserts = duplicate risk in new system
Inconsistent naming: Matching logic fails, creates duplicates post-migration
No data steward: Who approves merge decisions? Who validates enrichment? Exit stalls.

Need a Data Quality Assessment?

Our Exit Assessment includes comprehensive data quality scoring across all objects, duplicate analysis, orphan detection, and remediation cost estimates with prioritized action plan.

Request DQ Assessment Email Tyler

← Previous: TCO Calculator Back to Blog →