Data Quality: The Real Reason Salesforce Exits Fail
You can't migrate garbage and expect gold. Here's how to measure, fix, and validate data quality before you commit $2M to an exit.
The $1.8M Data Quality Tax
SaaS company, 8 years on Salesforce, 12M records. Exit plan: migrate to HubSpot in 6 months, budget $1.2M.
Month 3 of migration: discovered 2.4M duplicate Contacts, 840K Accounts with no associated records, 1.2M orphaned Opportunities pointing to deleted Accounts.
Outcome: Migration paused. Spent $600K cleaning data. Timeline extended 9 months. Total cost: $1.8M over budget.
Root cause: Started migration without data quality baseline.
Why Data Quality Kills Exits
1. Duplicates Multiply Costs
Migration vendors charge by record count. If 30% of your Accounts are duplicates, you pay 30% more to migrate garbage.
Deduplication after migration costs 3-5x more than before (new system learning curve, unfamiliar tools, lost context).
2. Broken Relationships Break Business Process
Opportunity without Account? Contact without Account? Task without related record?
These orphaned records either fail migration (error out) or create broken workflows in the new system. Sales team loses context. Deals stall.
3. Incomplete Data = Incomplete Migration
Required fields in old system ≠ required fields in new system. If 40% of Accounts missing Industry, and new CRM requires Industry for routing rules, migration fails validation.
Data Quality Scoring Framework
Our assessment tool scores data quality 0-100 across 5 dimensions:
1. Completeness (Weight: 25%)
// Per-object completeness score
required_fields_populated = COUNT(records WHERE all_required_fields != NULL)
total_records = COUNT(records)
completeness_score = (required_fields_populated / total_records) × 100
// Example
Accounts: 87% have Name, BillingStreet, BillingCity
Contacts: 92% have FirstName, LastName, Email
Opportunities: 78% have Amount, CloseDate, StageName
Overall completeness: (87 + 92 + 78) / 3 = 85.7%
2. Uniqueness (Weight: 30%)
// Duplicate detection
duplicates = COUNT(records WHERE fuzzy_match(Name, Email, Phone) > 0.85)
total_records = COUNT(records)
uniqueness_score = ((total_records - duplicates) / total_records) × 100
// Example
Accounts: 2.4M total, 680K duplicates → 71.7% unique
Contacts: 8.1M total, 1.9M duplicates → 76.5% unique
Leads: 3.2M total, 420K duplicates → 86.9% unique
Overall uniqueness: (71.7 + 76.5 + 86.9) / 3 = 78.4%
3. Consistency (Weight: 20%)
// Format consistency
consistent_format = COUNT(records WHERE matches_pattern(field, expected_pattern))
total_records = COUNT(records)
consistency_score = (consistent_format / total_records) × 100
// Examples
Phone: (608) 555-1234 vs 608-555-1234 vs 6085551234
Email: tyler@example.com vs TYLER@EXAMPLE.COM vs tyler@example.COM
State: WI vs Wisconsin vs wi vs WISCONSIN
Phone consistency: 68% (32% formatting variations)
Email consistency: 91%
State consistency: 74%
Overall consistency: (68 + 91 + 74) / 3 = 77.7%
4. Referential Integrity (Weight: 15%)
// Relationship validity
valid_relationships = COUNT(records WHERE parent_exists AND child_exists)
total_relationships = COUNT(relationships)
integrity_score = (valid_relationships / total_relationships) × 100
// Example orphaned records
Opportunities without Account: 142,000 (11.8% orphaned)
Contacts without Account: 89,000 (1.1% orphaned)
Tasks without related WhatId/WhoId: 340,000 (15.2% orphaned)
Integrity score: 88.2% (11.8% average orphan rate)
5. Accuracy (Weight: 10%)
// Sample-based validation
sampled_records = RANDOM_SAMPLE(records, 500)
manually_verified = COUNT(sampled_records WHERE human_validates_correct)
accuracy_score = (manually_verified / 500) × 100
// Typical findings
Account Industry matches reality: 82%
Contact Title matches LinkedIn: 76%
Opportunity Amount matches closed deal: 91%
Overall accuracy: (82 + 76 + 91) / 3 = 83%
Overall Data Quality Score
DQ_Score = (Completeness × 0.25) +
(Uniqueness × 0.30) +
(Consistency × 0.20) +
(Integrity × 0.15) +
(Accuracy × 0.10)
// Example calculation
DQ_Score = (85.7 × 0.25) + (78.4 × 0.30) + (77.7 × 0.20) +
(88.2 × 0.15) + (83.0 × 0.10)
= 21.4 + 23.5 + 15.5 + 13.2 + 8.3
= 81.9
// Interpretation
90-100: Excellent (exit-ready)
75-89: Good (minor cleanup needed)
60-74: Fair (significant remediation required)
< 60: Poor (exit not recommended until fixed)
Real Case Study: Financial Services Exit
Initial Assessment
- Records: 18M (Accounts 2.1M, Contacts 9.4M, Opps 6.5M)
- DQ Score: 63 (Fair)
- Blockers: 1.8M duplicate Contacts, 840K orphaned Opps, inconsistent Phone/Email formats
Remediation Plan (12 weeks, $420K)
Phase 1: Deduplication (4 weeks, $180K)
- Tool: Salesforce Data.com + custom Apex batch jobs
- Process: fuzzy match on Name + Email + Phone, human review queue for ambiguous matches
- Result: Merged 1.8M duplicates → 720K unique records, 1.08M deleted
Phase 2: Orphan Resolution (3 weeks, $90K)
- Opportunities without Account: 840K records
- Strategy: Match on Contact.Account → 680K resolved, 160K archived (historical, >5 years old)
- Contacts without Account: Create "Unknown Account" placeholder for 89K records, flag for ops team review
Phase 3: Data Enrichment (3 weeks, $80K)
- Missing Industry: ZoomInfo API enrichment for 420K Accounts
- Missing Phone/Email: Manual ops team outreach for top 5,000 strategic accounts
- Standardize formats: Phone → E.164, Email → lowercase, State → 2-letter code
Phase 4: Validation (2 weeks, $70K)
- Re-run DQ assessment
- Spot-check 1,000 random records
- UAT with sales team on cleaned data
Post-Remediation Results
- Records: 16.2M (10% reduction from deduplication)
- DQ Score: 89 (Good → Excellent threshold)
- Migration proceeded on schedule
- Zero data-related blockers during migration
Common Data Quality Issues by Object
Accounts
- Duplicates (company acquired, name changed, merger not reflected)
- Test accounts in production ("Test Co", "ACME", "Sample Account")
- Missing Industry/Type (sales can't route)
- Inconsistent naming (IBM vs. IBM Corporation vs. International Business Machines)
Contacts
- Duplicates (same person, multiple email addresses)
- Outdated (person left company 3 years ago, still active)
- Missing Account relationship (orphaned from deleted Account)
- Bad email formats (typos, invalid domains)
Opportunities
- Orphaned (Account deleted but Opp remains)
- Missing Amount (required for forecasting in new CRM)
- Stale (created 5 years ago, Stage still "Prospecting")
- No Contact Roles (can't identify buyer committee)
Automated Remediation Patterns
Deduplication Batch Job
global class DeduplicateAccountsBatch implements Database.Batchable {
global Database.QueryLocator start(Database.BatchableContext bc) {
return Database.getQueryLocator([
SELECT Id, Name, BillingStreet, BillingCity, Phone, Website
FROM Account
WHERE IsDeleted = false
ORDER BY CreatedDate ASC
]);
}
global void execute(Database.BatchableContext bc, List scope) {
Map uniqueAccounts = new Map();
List duplicatesToMerge = new List();
for (Account acc : scope) {
String fuzzyKey = generateFuzzyKey(acc);
if (uniqueAccounts.containsKey(fuzzyKey)) {
// Duplicate found - merge into existing
Database.merge(uniqueAccounts.get(fuzzyKey), acc, false);
} else {
uniqueAccounts.put(fuzzyKey, acc);
}
}
}
private String generateFuzzyKey(Account acc) {
return (acc.Name + acc.BillingStreet + acc.Phone)
.toLowerCase()
.replaceAll('[^a-z0-9]', '');
}
}
Orphan Detection Query
// Find Opportunities without valid Account
SELECT Id, Name, AccountId, Amount, CloseDate
FROM Opportunity
WHERE AccountId = null
OR AccountId NOT IN (SELECT Id FROM Account)
ORDER BY Amount DESC NULLS LAST
// Find Contacts without Account
SELECT Id, Name, Email, AccountId
FROM Contact
WHERE AccountId = null
OR AccountId NOT IN (SELECT Id FROM Account)
ORDER BY CreatedDate DESC
Data Quality SLAs for Exit Readiness
| Dimension | Minimum | Target |
|---|---|---|
| Completeness | 85% | 95% |
| Uniqueness | 90% | 98% |
| Consistency | 80% | 95% |
| Integrity | 95% | 99% |
| Accuracy | 80% | 90% |
| Overall DQ Score | 85 | 92 |
Remediation Cost Estimate Formula
// Cost to remediate data quality issues
deduplication_cost = duplicate_count × $0.50 // $0.50 per duplicate merged
orphan_resolution = orphan_count × $0.30 // $0.30 per orphan resolved
enrichment_cost = missing_field_count × $0.20 // $0.20 per field enriched
validation_cost = total_records × $0.05 // $0.05 per record validated
total_remediation_cost = deduplication_cost + orphan_resolution +
enrichment_cost + validation_cost
// Example: 12M records, DQ Score 63
duplicates: 1.8M × $0.50 = $900K
orphans: 840K × $0.30 = $252K
missing fields: 2.1M × $0.20 = $420K
validation: 12M × $0.05 = $600K
Total: $2.17M remediation cost
// Rule of thumb: remediation cost = 15-25% of migration cost for DQ < 75
Exit-Ready Data Quality Checklist
- Run DQ assessment (use our tool or build your own)
- If DQ Score < 85, plan remediation before exit planning
- Deduplicate Accounts, Contacts, Leads (highest ROI cleanup)
- Resolve orphaned records (delete or re-parent)
- Enrich missing required fields (Industry, Type, Source)
- Standardize formats (Phone E.164, Email lowercase, State 2-char)
- Archive stale data (>3 years old, no activity)
- Validate with UAT (sales team spot-checks 500 records)
- Re-assess DQ score (target 85+)
- Lock data quality (no bulk imports until migration complete)
Red Flags That Block Exits
- Duplicate rate > 20%: Migration cost doubles, dedup required first
- Orphan rate > 10%: Business process broken, fix before migrating
- Missing External IDs: Can't do upserts, only inserts = duplicate risk in new system
- Inconsistent naming: Matching logic fails, creates duplicates post-migration
- No data steward: Who approves merge decisions? Who validates enrichment? Exit stalls.
Need a Data Quality Assessment?
Our Exit Assessment includes comprehensive data quality scoring across all objects, duplicate analysis, orphan detection, and remediation cost estimates with prioritized action plan.