Building Enterprise-Grade Multi-Org Sync: Architecture Decisions That Matter

When a financial services enterprise operates multiple Salesforce orgs across divisions, the synchronization problem isn't technical—it's architectural. The question isn't "how do we move data?" It's "how do we move data without creating a maintenance nightmare, compliance gaps, or operational chaos?"

Over the past year, we've been building a multi-org synchronization platform for enterprise Financial Services Cloud deployments. This isn't a theoretical exercise—it's running in production, handling real client data across multiple divisions with different regulatory requirements, different business processes, and different ideas about what "the right data" looks like.

This post covers the architecture decisions that actually matter when you're building this kind of system. Not the decisions that look good on whiteboards, but the ones that determine whether your operations team can sleep at night.

The Problem Space: Why Multi-Org Sync is Harder Than It Looks

Consider a typical enterprise financial services organization. They have:

A Wealth Management division with its own Salesforce org, customized for high-net-worth client relationships
A Mortgage Services division running a separate org with loan origination workflows
A Commercial Banking division with yet another org handling business accounts
A Hub org that needs the unified view for enterprise reporting and cross-sell

Each division has autonomy. They've customized their orgs. They have different picklist values, different validation rules, different ideas about required fields. And yet, when a client walks into a branch, the advisor needs to see the complete picture—across all divisions.

⚙ Architect's Note

The temptation here is to consolidate into a single org. Resist it. Division autonomy exists for good reasons—regulatory, operational, and organizational. Your sync architecture needs to respect that autonomy while still providing the unified view.

The challenges compound quickly:

Conflict resolution: When the same client's phone number is different in two orgs, which one wins?
Audit trails: Regulators want to know who changed what, when, and why—across all orgs
Error recovery: When a sync fails halfway through, how do you recover without duplicating data?
Governor limits: Salesforce doesn't care about your sync complexity; you get the same limits as everyone else
Operational visibility: When something goes wrong at 2 AM, how fast can your on-call team understand the problem?

Architecture Decision: Hub-and-Spoke with External Orchestration

After evaluating several patterns, we settled on a hub-and-spoke model with external orchestration. Here's the topology:

            
            
            
            Multi-Org Architecture
          

                    ┌─────────────────┐
                    │   Slack Bolt    │  Control Plane
                    │   Application   │  (Socket Mode)
                    └────────┬────────┘
                             │ REST API / JWT
                    ┌────────▼────────┐
                    │    HUB ORG      │  Golden Record
                    │ (FSC Enterprise)│  + Sync Metadata
                    └────────┬────────┘
           ┌─────────────────┼─────────────────┐
           │                 │                 │
    ┌──────▼──────┐   ┌──────▼──────┐   ┌──────▼──────┐
    │   WEALTH    │   │  MORTGAGE   │   │ COMMERCIAL  │
    │ MANAGEMENT  │   │  SERVICES   │   │  BANKING    │
    └─────────────┘   └─────────────┘   └─────────────┘

          

Why Hub-and-Spoke?

Three alternatives were considered and rejected:

Pattern

Complexity

Why Rejected

Peer-to-Peer

O(n²)

No authoritative source; conflicts multiply

Single Org

O(1)

Violates division autonomy requirements

Data Cloud

O(n)

Analytics layer, not operational sync

Hub-and-spoke provides O(n) complexity with a single source of truth. When conflicts arise, the Hub is the reconciliation authority. Division orgs maintain their autonomy—they can have their own picklist values, their own validation rules, their own processes—but they participate in the sync with clear rules about what data flows where.

Why External Orchestration?

Running sync logic inside Salesforce seems natural, but it creates three problems:

Governor limit consumption in your Hub: Sync operations are expensive. Running them in the Hub means your Hub's limits are consumed by infrastructure, not business logic.
Limited observability: Debug logs, system monitoring, and alerting are constrained inside Salesforce. External orchestration lets you use modern observability tools.
No cross-org transaction coordination: Salesforce transactions are single-org. External orchestration can coordinate the multi-org dance.

⚙ Architect's Note

This is a trade-off, not a silver bullet. External orchestration means you need to manage infrastructure, handle authentication carefully, and deal with network partitions. For organizations with strong Salesforce-only policies, Platform Events with Queueable chains is the next-best option.

Implementation Pattern: Hexagonal Architecture

The control plane follows hexagonal architecture (ports and adapters). This sounds academic, but it's deeply practical: it lets us swap data sources without touching business logic.

            
            
            
            src/data/client.interface.ts
          

interface DataClient {
  // Jobs
  createJob(input: CreateJobInput): Promise<SyncJob>;
  getJob(id: string): Promise<SyncJob | null>;
  updateJobProgress(id: string, progress: JobProgress): Promise<void>;

  // Conflicts
  getPendingConflicts(limit?: number): Promise<SyncConflict[]>;
  resolveConflict(id: string, resolution: Resolution): Promise<SyncConflict>;

  // Connections
  getConnections(): Promise<OrgConnection[]>;
  testConnection(id: string): Promise<ConnectionTestResult>;

  // Audit
  createAuditLog(entry: AuditLogEntry): Promise<void>;
  queryAuditLogs(filters: AuditFilters): Promise<AuditLog[]>;
}
          

The interface is the port. We have two adapters:

MockClient: In-memory implementation for development, demos, and testing
SalesforceClient: JSForce-based implementation for production

A factory selects the adapter based on environment:

            
            src/data/index.ts
          
const dataSource = process.env.DATA_SOURCE || 'mock';

export const dataClient: DataClient =
  dataSource === 'salesforce'
    ? new SalesforceClient(connectionManager)
    : new MockClient();

This pattern paid dividends immediately. We could demo the entire platform to stakeholders before the Salesforce integration was complete. More importantly, we can run the full test suite without Salesforce connectivity—1,773 tests execute in seconds, not minutes.

Resilience Pattern: Circuit Breakers

Multi-org sync is particularly vulnerable to cascade failures. If one division org goes down for maintenance, naively retrying will:

Consume API limits on the healthy orgs
Fill up error queues with predictable failures
Obscure real problems behind a wall of "Connection refused" noise

We implement a per-connection circuit breaker:

            
            Circuit Breaker State Machine
          
CLOSED ──[5 consecutive failures]──> OPEN
   ↑                                   │
   │                              [60s timeout]
   │                                   │
   └────────[success]────────── HALF_OPEN

The circuit breaker tracks failures per org connection. When an org hits 5 consecutive failures, its circuit opens. No requests flow to that org for 60 seconds. After the timeout, a single test request determines if the circuit closes or reopens.

This surfaces clearly in the operator interface:

Org Connections

● Hub Org Production Circuit: Closed

● Wealth Management Production Circuit: Closed

● Mortgage Services Production Circuit: Half-Open

● Commercial Banking Production Circuit: Open

When operations see "Circuit: Open", they know that org is having issues and the system is protecting itself. No investigation needed into why syncs to that org are failing—the failure is expected and managed.

Conflict Resolution: Making It Fast and Auditable

Data conflicts are inevitable in multi-org environments. The question is how fast you can resolve them and whether you can prove why you resolved them that way.

We support four resolution strategies:

⬅

Source Wins

Hub org data overwrites division data. Use when Hub is authoritative.

➡

Target Wins

Division data preserved. Use when divisions have the fresher data.

🕐

Newest Wins

Compare LastModifiedDate. Most recently updated record wins.

👤

Manual Review

Human decision required. For high-value or sensitive records.

The interface shows conflicts side-by-side with three-button resolution. Operators can resolve a conflict in under 30 seconds—compared to 3+ minutes navigating Salesforce UI across multiple orgs.

Every resolution is audited:

Who resolved it
When it was resolved
Which strategy was applied
What the before/after values were
Any notes the operator added

This audit trail satisfies SOX and GDPR requirements. When a regulator asks "why did this client's address change?", you can produce the complete chain of custody.

Error Recovery: Idempotency and Retry Semantics

Sync jobs fail. Networks partition. Salesforce has maintenance windows. The question isn't whether errors happen—it's whether you can recover gracefully.

Our error recovery system implements three tiers:

Tier 1: Transient Failures (Automatic Retry)

Rate limits, network timeouts, temporary Salesforce unavailability. These retry automatically with exponential backoff. The operator sees nothing unless retries exhaust.

Tier 2: Recoverable Failures (Manual Intervention)

Authentication expired, configuration changed, validation rules blocking records. These pause the job and notify operators. The underlying cause must be fixed before resumption.

Tier 3: Fatal Failures (Job Termination)

Data corruption, irreconcilable conflicts, system errors. The job terminates, full diagnostic information is captured, and a post-mortem process begins.

⚙ Architect's Note

Idempotency is non-negotiable. Every sync operation must be safely re-executable. If a job fails after processing 10,000 of 50,000 records, resumption should not duplicate those 10,000 records. We achieve this through External IDs (Global_Client_ID__c) and upsert operations.

The error recovery service maintains operation state across restarts:

            
            
            
            Recovery Operation
          

interface RecoverableOperation {
  id: string;                    // Unique operation identifier
  type: 'sync_job' | 'conflict_resolution' | 'bulk_operation';
  payload: Record<string, unknown>;  // Operation-specific data
  context: OperationContext;     // Org, user, timestamp
  retryCount: number;
  lastError?: string;
  state: 'pending' | 'in_progress' | 'failed' | 'completed';
}
          

This enables two critical capabilities:

Resume after crash: If the orchestrator restarts, pending operations are recovered and resumed
Replay for debugging: Failed operations can be replayed with full context to diagnose issues

Testing Strategy: 1,773 Tests and Counting

Multi-org sync systems are notoriously difficult to test. You need Salesforce orgs. You need realistic data. You need to simulate failures. Traditional integration testing is slow and brittle.

Our approach: hexagonal architecture enables comprehensive unit testing without Salesforce connectivity. Integration tests run against scratch orgs in CI.

52 Test Suites

1,773 Tests Passing

94% Code Coverage

<30s Full Suite

Critical areas receive snapshot testing to catch UI regressions. The Block Kit components that operators interact with daily are tested against known-good snapshots—any unexpected change fails CI.

Test isolation was a hard-won lesson. Early test suites had flaky failures due to shared state. The solution: fresh service instances per test block, unique operation IDs, explicit time ranges for analytics queries. Tests now run reliably in any order.

Observability: Metrics That Actually Help

Dashboards full of green checkmarks are useless if they don't help you fix problems at 2 AM. Our observability focuses on actionable metrics:

Sync Health Metrics

Jobs by status: How many are running, queued, completed, failed?
Records processed per hour: Is throughput normal or degraded?
Conflict rate: Are conflicts within expected bounds or spiking?
Mean time to resolution: How fast are operators resolving conflicts?

Connection Health Metrics

API consumption by org: Which org is consuming its limits?
Error rate by org: Which org is having problems?
Circuit breaker state: Which orgs are degraded?
Authentication status: Which connections need credential refresh?

Trend Analysis

The analytics service computes 7/30/90 day trends for all key metrics. When an operator opens the dashboard, they immediately see if today is normal or anomalous compared to historical baselines.

⚙ Architect's Note

We deliberately avoided building custom dashboards. Slack's App Home provides the operator interface. This means operators don't need to context-switch—alerts and dashboards live where they already work. Mobile support comes free.

Lessons Learned

A year of building and operating this platform has surfaced several non-obvious lessons:

1. External IDs Are Non-Negotiable

Global_Client_ID__c (or equivalent) must exist before multi-org sync is even possible. Retrofitting external IDs to existing data is a project in itself. Start there.

2. Conflict Resolution UX Matters More Than Conflict Resolution Logic

We spent significant effort on sophisticated conflict resolution algorithms. What actually moved the needle was making the resolution interface fast and obvious. Operators who can resolve conflicts in 30 seconds will stay on top of the queue. Operators who need 3 minutes will fall behind.

3. Test Isolation Is Worth the Investment

Early tests shared state and were flaky. The time spent making tests truly isolated—fresh service instances, unique identifiers, explicit time ranges—paid back 10x in CI reliability and developer confidence.

4. Circuit Breakers Need Visibility

A circuit breaker that silently drops requests is worse than no circuit breaker. Operators need to see circuit state at a glance. When Commercial Banking's circuit opens, that's information they need immediately—not buried in logs.

5. Audit Everything, Filter Later

We capture comprehensive audit logs for every operation. Storage is cheap; forensic capability is expensive. When a regulator asks what happened six months ago, you want to have the data.

What's Next: AgentForce Integration

The next phase brings AI-assisted operations through Salesforce AgentForce. The vision:

Natural language sync management: "Start an account sync for Wealth Management" instead of navigating menus
Intelligent conflict suggestions: AI analyzes conflict patterns and suggests resolution strategies
Anomaly detection: AI identifies unusual sync patterns before they become problems
Predictive scheduling: AI recommends optimal sync windows based on historical load patterns

AgentForce integration means sync management can happen inside Salesforce or inside Slack—wherever the operator prefers to work. The orchestration layer remains the same; only the interface expands.

Conclusion

Multi-org Salesforce synchronization is fundamentally an architecture problem. Get the architecture right—hub-and-spoke, external orchestration, hexagonal design—and the implementation follows naturally. Get it wrong, and you'll fight the architecture for the life of the system.

The patterns described here aren't theoretical. They're running in production, handling real enterprise data, meeting real compliance requirements. They've been tested against real failures, real edge cases, and real 2 AM pages.

If your organization is struggling with multi-org synchronization—scattered audit trails, slow conflict resolution, unreliable sync jobs—the problem is likely architectural, not technical. And that's a problem that can be solved.

Building Enterprise-Grade Multi-Org Sync: Architecture Decisions That Matter

The Problem Space: Why Multi-Org Sync is Harder Than It Looks

Architecture Decision: Hub-and-Spoke with External Orchestration

Why Hub-and-Spoke?

Why External Orchestration?

Implementation Pattern: Hexagonal Architecture

Resilience Pattern: Circuit Breakers

Conflict Resolution: Making It Fast and Auditable

Source Wins

Target Wins

Newest Wins

Manual Review

Error Recovery: Idempotency and Retry Semantics

Tier 1: Transient Failures (Automatic Retry)

Tier 2: Recoverable Failures (Manual Intervention)

Tier 3: Fatal Failures (Job Termination)

Testing Strategy: 1,773 Tests and Counting

Observability: Metrics That Actually Help

Sync Health Metrics

Connection Health Metrics

Trend Analysis

Lessons Learned

1. External IDs Are Non-Negotiable

2. Conflict Resolution UX Matters More Than Conflict Resolution Logic

3. Test Isolation Is Worth the Investment

4. Circuit Breakers Need Visibility

5. Audit Everything, Filter Later

What's Next: AgentForce Integration

Conclusion

Need Help With Multi-Org Sync?