Data Migration at Scale: Processes for the Professional Data Migrator
Dedicated hardware, encrypted custody chains, forensic-grade deletion, and the operational discipline that separates a professional migration from a CSV and a prayer.
The Bar Is Low. Raise It.
Here's a scenario that plays out at consulting firms every week: a junior developer exports 4 million Contact records to their laptop, transforms them in a Python script, loads them into the target system, and deletes the CSV from their Downloads folder. The migration "succeeds." The client signs off. Everyone moves on.
Nobody asks where the data lived during transformation. Nobody checks whether the laptop's hard drive was encrypted. Nobody documents which machine held 4 million people's personal information for three weeks. And nobody performs a secure deletion when the project ends—because "I deleted the file" feels like enough.
It isn't enough. Not for HIPAA-covered entities. Not for organizations subject to FERPA, PCI DSS, or state privacy laws. Not even for organizations with no regulatory obligations, because data stewardship isn't just a compliance exercise. It's a professional standard.
This post lays out the complete framework we use for enterprise data migrations. Not because our process is the only valid one, but because almost nobody in this industry publishes theirs. We think that needs to change.
The gap between what's possible and what's practiced is where professional standards are built. Dedicated storage, encrypted custody chains, forensic-grade deletion—none of it is technically difficult. All of it is operationally rare.
Start With the Assessment, Not the Migration
The most expensive mistake in data migration is starting the migration. Specifically, starting it before you understand what you're migrating, how complex the source system is, and whether the migration even makes sense.
We've seen organizations commit to 6-month timelines and $150K budgets based on a 30-minute sales call. Three months in, they discover 500 GB of attachments spread across three storage mechanisms, 14 integrations nobody documented, and field history tracking data that compliance requires them to preserve. The budget doubles. The timeline triples. The relationship sours.
A proper exit assessment takes 2-3 weeks and answers the questions that matter before money gets spent:
- Data volume and complexity. Total records, object count, custom objects, attachment volume. Not estimates—exact numbers pulled from the org's metadata API.
- Schema analysis. Relationship graphs, lookup chain depths, junction objects, polymorphic references. A typical enterprise Salesforce org has 200+ relationships. Every one needs to be mapped and extracted in the correct order.
- Integration dependencies. Connected apps, middleware, external IDs, API consumers, scheduled jobs. Everything that touches the org needs to be documented before anything moves.
- Automation inventory. Flows, Process Builders, Apex triggers, validation rules. Each one represents business logic that lives inside the platform and would need to be rebuilt elsewhere.
- Compliance requirements. What data is subject to regulation? What history must be preserved? What are the retention obligations? These constraints shape the entire migration architecture.
- Cost comparison. Current spend vs. projected destination costs vs. migration investment. Three-year and five-year projections with break-even analysis. Numbers, not narratives.
The output is a report with a go/no-go recommendation. We've told organizations not to leave Salesforce. If the platform works for you and the cost structure makes sense, the honest answer is "stay." An assessment that always recommends migration isn't an assessment—it's a sales tool.
Dedicated Hardware: One Client, One Drive
This is the foundation of our data security model, and it's the thing that surprises people most when we describe our process: every client engagement gets a dedicated, encrypted storage device. Not a partition. Not a folder. Not a cloud bucket with access controls. A physical drive that only touches one client's data.
Why Physical Isolation Matters
Logical isolation (separate directories, separate cloud accounts) is theoretically sufficient. Access controls work. Encryption works. But logical isolation fails in ways that physical isolation cannot:
- Misconfigured permissions can expose data across boundaries. A single IAM policy error in a cloud environment can make one client's data visible to another client's migration pipeline. This has happened at major consulting firms. It will happen again.
- Shared storage means shared risk. If a multi-tenant storage system is compromised, every client on that system is affected. A dedicated drive limits blast radius to exactly one engagement.
- Disposal is unambiguous. When the engagement ends, you securely erase one drive. There's no question about whether all copies were found, whether a backup job cached data somewhere, or whether a temporary file persisted in an unexpected location.
The drives themselves matter too. We use professional-grade NVMe SSDs rated for sustained workloads—not consumer drives that throttle under load. When you're transforming 4 TB of data, the difference between a drive that sustains 3,000 MB/s writes and one that throttles to 300 MB/s after 30 seconds is the difference between a 3-hour operation and a 30-hour one.
Workspace Provisioning
We provision at least 2x the expected data volume. Migrating 4 TB? The drive is 8 TB. This isn't extravagance—it's operational necessity.
During a migration, you need space for source extracts, working copies, transformation outputs, staging data, validation dumps, and rollback snapshots. If you're running tight on space, you start making bad decisions: skipping validation steps, compressing data to fit, deleting intermediate outputs you might need later. Ample workspace eliminates that entire category of pressure.
Encryption at Every Stage
The drive is encrypted from the moment it's provisioned. AES-256, hardware-level encryption. Data is encrypted at rest, and it's encrypted during every transfer—extraction from source, transformation operations, and loading to the target.
There are no unencrypted copies. The staging environment is encrypted. The working directory is encrypted. If we create a validation dump to compare source and target record counts, that dump lives on the same encrypted media. There is no point in the migration lifecycle where client data exists in plaintext on unencrypted storage.
This sounds obvious when stated explicitly. But consider how most migrations actually work: data exported to CSV on a laptop. CSVs emailed between team members. Working copies on shared network drives. Transformation outputs on a developer's local machine. None of these are encrypted by default on most corporate hardware. The data is exposed at every stage, on every device it touches.
Chain of Custody
We track every interaction with client data from intake to deletion. The documentation starts when the drive is provisioned:
- Drive serial number recorded at provisioning
- Encryption method and key management documented
- Access restricted to assigned migration engineers
- All operations logged with timestamps
- Physical custody maintained—no cloud staging unless explicitly agreed in the engagement contract
Chain of custody isn't just about security theater. It's about being able to answer specific questions after the fact: Who had access to the data? When? What did they do with it? Where was the data physically located on a given date?
If your client is a healthcare organization subject to HIPAA, or a university subject to FERPA, or a financial institution subject to PCI DSS, these questions aren't hypothetical. They're audit questions. And "I think the data was on Dave's laptop, but Dave left the company" is not an acceptable answer.
The Migration Lifecycle
With the infrastructure in place—dedicated hardware, encryption, documented custody—the actual migration follows a six-stage lifecycle. Each stage has defined inputs, outputs, and verification criteria.
Stage 1: Provision
A new encrypted drive is provisioned and labeled for the engagement. The drive serial number, capacity, encryption method, and assigned engineers are documented in the project record. This happens before any data touches our systems.
Stage 2: Extract
Data is pulled from the source system directly to the dedicated drive. We use a combination of Bulk API 2.0, REST API, and custom extraction tooling depending on the data type and volume.
The extraction order matters. Parent objects must be extracted before child objects to preserve referential integrity. In a typical Salesforce org, this means Accounts before Contacts, Contacts before Activities, Opportunities before OpportunityLineItems, and so on through the entire relationship graph.
For a 10-million-record org with substantial file attachments, extraction alone can take days. Salesforce API limits apply—Bulk API 2.0 supports up to 100 million records per 24-hour period, but file retrieval (ContentVersion records) is significantly slower and subject to different limits.
No intermediate shared storage is used. Data moves from the source API directly to the encrypted drive. There are no cloud staging buckets, no S3 intermediate steps, no temporary copies on shared infrastructure.
Stage 3: Transform
This is where source data is reshaped for the target system. Field mappings, data type conversions, value translations, concatenations, conditional logic, and data cleansing all happen on-drive.
The transformation layer handles the complexity that makes migrations hard:
- ID remapping. Salesforce IDs are org-specific. Every lookup relationship needs to be resolved against the new IDs in the target system. For an org with 200+ relationships, this is a graph problem, not a table problem.
- Picklist translation. Source values ("Active," "Closed Won," "Board Member") may need to map to different values in the target system. Every picklist field is a decision point.
- Data cleansing. Duplicate detection, phone number normalization, email validation, address standardization. Source data is always messier than anyone expects.
- Structural transformation. The source and target schemas rarely align 1:1. Objects get split, combined, or restructured. Fields change types. Relationships change cardinality.
All working copies and transformation outputs stay on the same encrypted media. There's no point where partially transformed data leaks to another device or storage system.
Stage 4: Load
Validated data is loaded to the target system. We use batch processing with configurable batch sizes (typically 200 records per batch for Salesforce targets), checkpointing after each batch, and automatic retry on transient failures.
Checkpointing is critical. If a load operation fails at batch 4,500 of 10,000, we need to resume at batch 4,501—not restart from the beginning. Checkpoint state is persisted to the encrypted drive, enabling pause, resume, and recovery at any point.
Record counts and referential integrity are verified before and after loading. If 4,247,893 Contact records were extracted, exactly 4,247,893 Contact records need to land in the target system. If a lookup relationship pointed to Account record A-001 in the source, it needs to point to the corresponding Account in the target. Every relationship is validated.
Stage 5: Verify
Post-migration validation confirms data landed correctly. This goes beyond record counts:
- Record count reconciliation by object type
- Referential integrity checks across all relationships
- Sample data comparison—field-by-field verification of randomly selected records
- Attachment verification—file counts, sizes, and parent record associations
- Business rule validation—do the numbers add up? Do the reports produce the same results?
The client signs off on validation before we proceed to deletion. This is a hard gate. No sign-off, no deletion. If something looks wrong, we investigate and fix it before moving forward.
Stage 6: Delete
After the client signs off, all data on the dedicated drive is securely erased. Not "deleted." Not "formatted." Securely erased following NIST 800-88 guidelines.
For SSDs, this means cryptographic erase (CE) or block erase commands that render data unrecoverable. The distinction matters: a standard file deletion marks disk space as available but doesn't overwrite the data. A quick format rewrites the file system table but leaves the data intact. Either one can be reversed with forensic tools. Cryptographic erase destroys the encryption key, rendering the ciphertext permanently unreadable.
The client receives a deletion certificate documenting exactly what was erased and how:
After deletion, client data exists in exactly one place: the client's systems. We don't keep copies. We don't archive "just in case." We don't retain data for future reference. The engagement is over. The data is gone.
What About Backups?
We don't back up client data to secondary systems. The dedicated drive is the single location for client data during the engagement.
This is a deliberate choice, not an oversight. Every copy of data is a liability. Every backup location is another system that needs to be tracked, secured, and eventually wiped. The more copies exist, the harder it is to guarantee that deletion is complete.
If drive failure is a concern (and it's a legitimate concern for multi-month engagements with terabytes of data), we can provision mirrored drives as part of the engagement. Both drives are tracked in the chain of custody documentation. Both are securely erased at completion. The client knows exactly how many copies of their data exist and exactly where they are.
For most engagements, mirroring isn't necessary. Source data can always be re-extracted from the source system. The transformation logic and mapping configurations are project documentation—intellectual property that persists as delivery artifacts. The actual data is always re-pullable.
The Cloud Question
We don't use cloud storage by default. All data stays on local encrypted hardware under physical custody.
There are legitimate reasons to use cloud staging: geographically distributed source and target systems, team members in multiple locations, or integration architectures that require cloud intermediaries. If cloud staging is required, it's discussed explicitly, documented in the engagement agreement, and subject to the same security standards—encryption, access control, chain of custody, and secure deletion.
But the default is local. Physical custody is simpler to reason about, simpler to audit, and simpler to explain to compliance officers. "The data was on an encrypted drive in my office" is a better answer than "the data was in an S3 bucket in us-east-1 with an IAM policy that we're pretty sure was configured correctly."
Why Nobody Talks About This
Most migration vendors don't publish their data handling processes. There are a few reasons for this.
They haven't formalized one. Many consulting firms handle data migrations as ad-hoc projects. The approach varies by team, by project, by whoever happens to be assigned. There's no documented process to publish because there's no documented process.
They're afraid of liability. Publishing a specific process creates a standard they can be held to. If they claim NIST 800-88 compliance and then skip the secure erase step on a busy Friday, that's a documented failure. It's safer (for the vendor) to say nothing.
They don't think clients care. And historically, clients haven't asked. The conversation focuses on timelines, costs, and feature parity. Data security during the migration is an afterthought—until it isn't. Until the compliance audit. Until the data breach. Until the regulatory inquiry.
We think transparency is the right answer. Clients should know exactly how their data is handled. They should be able to evaluate our process against their security requirements before the engagement starts. And they should receive documented proof that the process was followed when the engagement ends.
The Regulatory Reality
If your data is subject to HIPAA, FERPA, PCI DSS, SOX, GDPR, CCPA, or any state-level privacy regulation, your vendors' data handling is your liability. You can't outsource compliance. When a migration vendor mishandles your data, it's your organization that faces the regulatory consequences.
This means you need to ask your migration vendor uncomfortable questions:
Six questions for your migration vendor:
- Where will our data be stored during the migration?
- Is the storage encrypted? What algorithm? Hardware or software encryption?
- Who will have access to our data? How is access controlled?
- How is data deleted after the engagement? Can you demonstrate NIST 800-88 compliance?
- Will you provide documented proof of deletion?
- Will you sign a Data Processing Agreement (DPA) or Business Associate Agreement (BAA)?
If your vendor can't answer these questions with specifics—serial numbers, encryption algorithms, deletion methods—that tells you everything you need to know about their process.
We sign DPAs and BAAs as a standard part of engagement. Mutual NDA before any data access. These aren't optional add-ons—they're baseline professional practice.
Common Failures in Professional Migration
After 20 years of data migrations, patterns emerge. Here are the failures we see most often.
Starting Without an Assessment
The most expensive failure. Organizations commit to timelines and budgets before understanding complexity. The $50K estimate becomes $200K. The 3-month timeline becomes 14 months. The root cause is always the same: nobody took 2-3 weeks to understand what they were dealing with before committing resources.
Ignoring Attachments
File migration is consistently underscoped. Organizations plan meticulously for record migration and then discover at load time that they have 400 GB of attachments spread across three storage mechanisms (Attachment objects, ContentDocument/ContentVersion, and Chatter feed attachments). File extraction is slow, storage-intensive, and needs to happen early in the project—not as an afterthought.
Losing Referential Integrity
This is the most common technical failure. Records load successfully, but the relationships between them are broken. An Opportunity that referenced Account A-001 in Salesforce now points to nothing in the target system because Account IDs weren't remapped correctly. The data is there. The meaning is gone.
Preventing this requires dependency-ordered extraction and systematic ID remapping across the entire relationship graph. It's a graph problem. Tools that treat it as a table problem will fail on any non-trivial org.
Skipping the Parallel Run
Cutting over in a single weekend feels efficient. It's reckless. A parallel run—where both source and target systems operate simultaneously for 2-4 weeks—gives users time to verify data, lets integrations be tested under real conditions, and keeps rollback simple. The cost of running two systems for a few weeks is trivial compared to the cost of discovering on Monday morning that half your data is wrong.
No Secure Deletion
The project ends. Everyone celebrates. The consultant's laptop still has 4 million Contact records in a CSV on the desktop. The cloud staging bucket is still active. The shared drive still has transformation outputs from three months ago. Nobody goes back and cleans up because the project is "done."
This is where most of the industry is today. It's not malicious—it's negligent. And negligence is cold comfort during a data breach investigation.
Building the Discipline
Professional data migration isn't about having the best tools, though tools matter. It's about operational discipline: doing the boring things consistently, documenting what you did, and cleaning up after yourself.
Provision a dedicated drive. Document the serial number. Encrypt everything. Log your access. Verify your record counts. Validate your relationships. Get the client's sign-off. Erase the drive. Provide the certificate.
None of this is technically difficult. All of it is operationally rare. That gap—between what's possible and what's practiced—is where professional standards are built.
We've published our complete data security process because we believe the industry needs a visible standard. If your migration vendor does something different, that's fine—but they should be able to tell you what it is. If they can't, that silence speaks louder than any security certification.
Getting Started
If you're planning a data migration—Salesforce exit or otherwise—start with the assessment. Understand what you have before you decide how to move it. The complete exit guide covers the full decision framework, from evaluating whether exit makes sense to choosing extraction methods to avoiding the pitfalls that trip up most organizations.
If you're a practitioner looking to raise your own standards, the framework described in this post is a starting point. Dedicated storage, encryption at rest, chain of custody documentation, and secure deletion are baseline practices that any migration professional can adopt. They don't require specialized tools. They require discipline.
Your clients' data is their most valuable asset. Handle it like it matters—because it does.
— Tyler Colby, February 12, 2026
Planning a Data Migration?
Start with an assessment. We'll give you the full picture—data volume, complexity, compliance requirements, realistic timelines—before you commit to anything.