Pathfinder 2.0: Sync Engine, 132 Audit Checks, and the Road to a Unified Salesforce Platform
We just merged seven thousand lines of new code into Pathfinder. A full multi-org sync engine. A conflict resolution system with five pluggable strategies. A circuit breaker. A rate limiter. RBAC. An audit log. Sixteen REST API endpoints. A React UI for managing it all. And forty-seven new audit checks that bring the total to 132. This is what it took to build it, and why it matters.
Two days ago, we published the story of our audit engine running 148 checks against a live nonprofit Salesforce org. That post described the audit leg of what we have been building. Today we are publishing the other two legs: Query and Sync.
Pathfinder started as a migration tool. Extract data from a legacy CRM, transform it, load it into Salesforce. Over the past several months it has grown into something larger. The vision is a unified platform for Salesforce data operations: Audit your org health, Query across orgs with a hybrid SQL-graph language, Sync data between orgs with conflict detection, and Migrate from legacy systems. Four capabilities, one platform, zero data leaving your machine.
This post covers the sync engine merge in detail. It is long, it is technical, and it is the most significant upgrade Pathfinder has received since its initial release.
Why a Sync Engine
If you run a single Salesforce org, you do not need what we built. But most organizations past a certain scale do not run a single org. They run two, or five, or fifteen. A hub org for corporate data. A sales cloud for the revenue team. A service cloud for support. A marketing cloud. Maybe a separate org for a subsidiary or acquisition. Maybe a sandbox that drifted so far from production that it became its own thing.
The problem is not having multiple orgs. The problem is keeping them in sync. When a sales rep updates an account phone number in the sales org, that change needs to propagate to the service org and the hub. When a contact email changes in the hub, downstream orgs need to know. When two orgs have different values for the same field on the same record, someone or something needs to decide which value wins.
This is the multi-org synchronization problem, and it is one of the most underserved areas in the Salesforce ecosystem. MuleSoft and Informatica solve it at enterprise scale with enterprise pricing. Salesforce Connect solves a subset with external objects but does not handle conflict resolution. Most mid-market organizations solve it with manual exports, scheduled batch jobs, and prayer.
We built a sync engine that handles the hard parts: conflict detection, resolution strategies, circuit breaking, rate limiting, role-based access control, and immutable audit logging. It runs on your desktop or your server. No middleware. No cloud subscription. No data leaving your infrastructure.
Architecture Overview
The sync engine is a new Rust crate called pf-sync that integrates into the Pathfinder workspace alongside the existing pathfinder-core, connectors, pf-query, and fabric-engine crates. Here is how the pieces fit together:
The sync engine itself is organized into seven modules. Each one handles a distinct concern.
The Seven Modules
1. Types (types.rs)
The foundation. Every sync operation revolves around five core types that mirror the Salesforce custom objects used for sync state management:
- SyncJob — A single synchronization run. Tracks status (Queued, Running, Completed, Failed, Cancelled, CompletedWithErrors), record counts, conflict counts, timing, and who triggered it. Supports dry-run mode for testing without writing.
- SyncConfig — The definition of what to sync. Source org, target org, object type, field mappings, external ID field, batch size, schedule, and conflict strategy. This is the user-facing configuration surface.
- SyncConflict — A field-level disagreement between source and target. Stores both values, modification timestamps, resolution status, and who resolved it.
- SyncLog — An immutable audit entry. Who did what, when, and what happened.
- SyncSchedule — Cron-based scheduling with timezone support and last/next run tracking.
All types implement Serialize and Deserialize via serde, so they serialize to JSON for the REST API and to Salesforce custom object records for persistence. The enum variants use snake_case serialization to match Salesforce picklist conventions.
2. Conflict Resolution Strategies (strategies/mod.rs)
When a sync job finds that the same field on the same record has different values in the source and target orgs, it needs a strategy. We implemented five as pluggable trait objects:
pub trait Strategy: Send + Sync {
fn name(&self) -> ConflictStrategy;
fn resolve(&self, conflict: &SyncConflict) -> Resolution;
}
The five implementations:
- SourceWins — The source org value is always applied. This is the right choice for hub-to-spoke architectures where the hub is the system of record.
- TargetWins — The target org value is kept. Use this when the downstream org owns the data and upstream changes should not overwrite local edits.
- NewestWins — Compares
LastModifiedDatetimestamps and applies whichever value was modified more recently. This is the most common strategy for bidirectional sync. The implementation handles null timestamps gracefully — if only one side has a timestamp, it wins. - ManualReview — Does not resolve the conflict. Instead, it escalates it to a human by setting the resolution action to
Escalate. The conflict stays inPendingstatus until someone reviews it in the UI or API. - CustomRule — Falls back to ManualReview in the current release. The interface is ready for user-defined rules (e.g., "if the field is Phone and the source matches the regex for US format, use source").
Strategies are selected per-config, not per-conflict. You define the strategy when you create the sync configuration, and every conflict for that job uses the same strategy. If you need different strategies for different fields, you create separate configs.
3. Conflict Detection (conflict.rs)
The detection engine is where performance matters. When you are comparing tens of thousands of records between two orgs, you cannot do it with nested loops. Our implementation uses hash-indexed lookup for O(n+m) comparison:
pub fn detect_conflicts(
source_records: &[serde_json::Value],
target_records: &[serde_json::Value],
external_id_field: &str,
fields: &[String],
) -> ComparisonResult
The algorithm:
- Build a
HashMapof target records keyed by external ID. This is one pass through the target data, O(m). - Iterate source records. For each one, look up the matching target record by external ID. This is O(1) per lookup.
- For matched pairs, compare each tracked field. If values differ, emit a
SyncConflictwith both values and timestamps. - Source records with no target match go into the
new_recordsbucket. Target records with no source match go intoorphaned.
The result is a ComparisonResult with four buckets: new_records, conflicts, matched (identical records), and orphaned (target-only records). This gives the caller complete visibility into what will happen before any writes occur.
We tested this with 10,000 source records against 8,000 target records. The detection runs in under a millisecond on a laptop. The bottleneck will always be the network, not the comparison.
4. Circuit Breaker (circuit_breaker.rs)
When a Salesforce org goes down or becomes unreachable, you do not want your sync engine hammering it with retry requests. The circuit breaker pattern prevents cascade failures with a three-state machine:
CLOSED → OPEN → HALF_OPEN → CLOSED
| | |
fail(5x) timeout(60s) success(2x)
| | |
trips allows one closes
open test request circuit
- Closed — Normal operation. Requests pass through. Failures are counted.
- Open — Circuit tripped after 5 consecutive failures. All requests fail immediately without hitting the org. After 60 seconds, transitions to Half-Open.
- Half-Open — Allows one test request through. If it succeeds (2 successes needed), the circuit closes. If it fails, the circuit reopens.
The thresholds are configurable: failure threshold (default 5), reset timeout (default 60 seconds), and success threshold (default 2). All state transitions are thread-safe via parking_lot::RwLock, so the circuit breaker can be shared across async tasks.
This is not theoretical. We have seen Salesforce orgs become unreachable during maintenance windows, and we have seen sync jobs that did not have circuit breakers pile up thousands of failed requests that triggered Salesforce's own rate limiting, making the recovery take even longer.
5. Rate Limiter (rate_limiter.rs)
The rate limiter uses a sliding window algorithm. Each user and operation combination gets its own window of timestamps. When a request comes in, we remove expired timestamps, count the remaining ones, and allow or deny based on the configured maximum.
Default limits:
| Operation | Max Requests | Window |
|---|---|---|
sync:start | 5 | 15 minutes |
sync:cancel | 10 | 15 minutes |
conflict:resolve | 20 | 5 minutes |
conflict:bulk-resolve | 3 | 15 minutes |
connect:test | 3 | 5 minutes |
connect:add | 5 | 15 minutes |
The limiter also exposes a peek() method that checks the limit without consuming a request. The REST API uses this to show users their remaining quota before they hit the wall.
6. RBAC (auth.rs)
Three roles, twenty operations, a simple permission hierarchy:
Viewer < Operator < Admin
| | |
view operate admin
(read) (+ write) (+ configure)
Viewer can see sync status, conflict lists, audit logs, and connection lists. Operator can start syncs, cancel jobs, resolve conflicts, and trigger bulk operations. Admin can add and remove connections, create and delete configurations, and manage schedules.
The authorization check is a single function call: check_authorization(role, operation) -> bool. Every API endpoint checks this before executing. The role comes from the JWT token managed by the pf-platform crate.
7. Audit Log (audit_log.rs)
Every sync operation is logged with who, what, when, and the result. The log is append-only and cannot be modified or deleted. This is a compliance requirement for organizations that need SOX or SOC 2 audit trails.
Each entry records the log level (Debug, Info, Warn, Error, Fatal), the component that generated it, the message, an optional job ID for correlation, an optional record ID for granularity, and optional structured metadata as JSON.
The in-memory implementation is backed by a Mutex<Vec<SyncLog>>. In production deployments, this is backed by Salesforce Sync_Log__c custom object records, giving you the audit trail inside the org itself.
The REST API
The sync engine is exposed through sixteen Axum REST API endpoints mounted at /api/v1/sync/* on the fabric-server:
| Method | Path | Purpose |
|---|---|---|
| GET | /sync/dashboard | Stats: connections, jobs, records, conflicts |
| GET | /sync/jobs | List jobs (filterable by status) |
| POST | /sync/jobs | Start a new sync job |
| GET | /sync/jobs/:id | Get job details |
| POST | /sync/jobs/:id/cancel | Cancel a running job |
| GET | /sync/jobs/:id/logs | Get logs for a specific job |
| GET | /sync/conflicts | List conflicts (filterable by status/job) |
| GET | /sync/conflicts/:id | Get conflict details |
| POST | /sync/conflicts/:id/resolve | Resolve a single conflict |
| POST | /sync/conflicts/bulk-resolve | Resolve multiple conflicts at once |
| GET | /sync/configs | List sync configurations |
| POST | /sync/configs | Create a new configuration |
| GET | /sync/configs/:id | Get configuration details |
| GET | /sync/logs | Audit logs (filterable by level/job) |
| GET | /sync/rate-limits/:user | Check rate limit status |
| GET | /sync/circuit-breaker | Circuit breaker state |
Every mutating endpoint checks RBAC authorization and rate limits before executing. The POST /sync/jobs endpoint also checks the circuit breaker — if the target org's circuit is open, the job is rejected immediately with a clear error message instead of queuing up doomed requests.
All responses use a consistent envelope: { success: bool, data: T | null, error: string | null }. This makes client-side error handling trivial.
The CLI
The same sync engine is accessible from the command line via two new subcommands on pf-audit:
# Check sync status
pf-audit sync status
# Start a sync job
pf-audit sync start account-full-sync
# List conflicts
pf-audit sync conflicts --pending
# Resolve a conflict
pf-audit sync resolve <conflict-id> source_wins
# Manage orgs
pf-audit orgs list
pf-audit orgs register my-production-org
pf-audit orgs health
The CLI is designed for CI/CD pipelines and automation. You can schedule sync jobs with cron, pipe the output to monitoring tools, and use exit codes for alerting.
The Desktop UI
For interactive use, the Tauri desktop app now has a full Sync Engine page with four tabs:
- Dashboard — Real-time stats cards showing active connections, jobs in the last 24 hours, records synced, and pending conflicts. Plus a system health panel showing circuit breaker state, rate limiter status, and current auth role.
- Jobs — Expandable job list with status badges, record counts, conflict counts, and duration. Click a job to see the breakdown: succeeded, failed, conflicts, and timing.
- Conflicts — A resolution table showing every field-level conflict with source and target values side by side. Filter by status or job. Resolve individually or in bulk with a single click.
- Configurations — All sync configs with source/target org, object type, conflict strategy, and active/inactive status. Create new configs or modify existing ones.
The UI is built with React 18 and TypeScript, styled with Tailwind, and uses the same design language as the rest of Pathfinder. Dark theme, cyan accents, clean typography.
132 Audit Checks
The second major piece of this merge is the audit engine expansion. We went from 85 checks to 132, adding four new modules that cover areas we were not reaching before.
Nonprofit Checks (15 new)
These are tailored for NPSP and Salesforce Nonprofit Cloud orgs. If you run a nonprofit Salesforce implementation, these are the things that break your reporting, your compliance, and your donor relationships:
- Donation Amount Anomalies — Zero, null, and negative donation amounts that distort revenue reports
- Unattributed Donations — Closed-won opportunities with no campaign, making fundraising ROI unmeasurable
- Recurring Donation Health — Stale open RDs with no payment in 90+ days, missing installment schedules
- Household Account Integrity — Empty household accounts and singleton households from incomplete migrations
- Soft Credit Coverage — Donations missing contact roles, breaking household giving totals
- GAU Allocation Completeness — Closed-won donations without General Accounting Unit allocations
- NPSP Trigger Handler State — Disabled trigger handlers that were turned off for a data load and never re-enabled
- Address Verification Gaps — Unverified addresses leading to returned mail
- Acknowledgment Status — Donations over 30 days old that have not been acknowledged (IRS compliance)
- Plus six more covering pledge hygiene, donor levels, payment methods, campaign ROI, engagement plans, and relationship mapping
User Adoption Checks (12 new)
These measure how effectively the org is actually being used:
- Inactive Licensed Users — Active users who have not logged in for 60+ days. Each unused license is $150-300/month.
- Report Utilization — Reports not run in 90+ days. Clutter and a sign that analytics are not being leveraged.
- Mobile Adoption — Percentage of logins from the Salesforce mobile app. Low mobile usage means field staff are not capturing data in real time.
- Task Completion Rate — Overdue open tasks indicating process breakdowns or abandoned workflows.
- Case Response Time — Cases open for more than 14 days, indicating SLA breaches.
- Plus seven more covering automation adoption, Chatter usage, email template freshness, knowledge articles, page layout complexity, list view sprawl, and custom app utilization
Data Lifecycle Checks (12 new)
These focus on data archival, retention, and storage optimization:
- Stale Leads — Unconverted leads untouched for 180+ days that inflate pipeline metrics
- Orphaned Attachments — ContentDocument records with no linked parent, consuming storage
- Legacy Attachment/Note Migration — Old Attachment and Note objects that should be migrated to Salesforce Files and Enhanced Notes
- Email Message Volume — Storage-heavy email accumulation with retention policy analysis
- Record Volume Distribution — Objects with 500K+ records that need custom indexing
- Plus seven more covering completed task accumulation, old closed cases, field history retention, recycle bin status, sandbox freshness, and audit trail completeness
Automation Health Checks (8 new)
These dig into the automation layer:
- Flow Error Rate — Flow interview failures in the last 30 days with identification of the worst offenders
- Duplicate Automation Detection — Objects with 3+ active record-triggered flows that risk order-of-execution conflicts
- Process Builder Migration — Active Process Builder processes that need to be migrated to Flow (PB is retired)
- Workflow Rule Migration — Active workflow rules that are deprecated and need Flow equivalents
- Trigger Best Practices — Objects with multiple active Apex triggers (best practice is one per object)
- Plus three more covering scheduled job health, validation rule coverage, and platform event volume monitoring
45 base checks compile into every build — data quality, security, compliance, schema, and performance. 87 extended checks are feature-gated behind audit-store and run against DuckDB for sub-second execution on pre-extracted datasets. The DuckDB path is the enterprise path: extract once, query fast, run all 132 checks in under two seconds.
The BridgeQL Query Engine
We covered the query engine in less detail in previous posts, so here is the summary. BridgeQL is a SQL-graph hybrid query language that lets you query Salesforce data with SQL syntax and traverse object relationships with graph semantics. It was originally a standalone 21,000-line project across seven crates. We ported it into a single pf-query crate with 8,000 lines and 120 tests.
-- SQL for tabular analysis
SELECT Name, Industry, AnnualRevenue
FROM Account
WHERE AnnualRevenue > 1000000
ORDER BY AnnualRevenue DESC
-- Graph traversal for relationship discovery
TRAVERSE Account -> Contact -> Opportunity
WHERE Account.Industry = 'Technology'
DEPTH 3
The query engine has its own REST endpoint (POST /api/v1/query) and an interactive REPL accessible from the CLI (pf-audit query --org myorg --repl). It is the connective tissue that lets the audit engine, the sync engine, and the migration engine share a common query layer.
What We Ported and Why
The sync engine did not come from nowhere. It is a Rust port of the core business logic from SF Sync Center, our 40,000-line TypeScript Slack application for multi-org sync management.
We ported the business logic — conflict detection, resolution strategies, RBAC, rate limiting, circuit breaking, audit logging — because these are the parts that need to be fast, testable, and embeddable. A Rust crate can run inside the Tauri desktop app, behind a REST API, or as a library in another Rust project. A TypeScript Slack app can only run as a Slack app.
What we did not port is the Slack UI shell. That stays as-is. SF Sync Center will become a thin Slack client that calls into the Pathfinder sync API for its business logic. Slack handles the conversational UI. Pathfinder handles the engine. Clean separation.
Test Coverage
The sync engine shipped with 41 tests across its seven modules:
| Module | Tests | What They Cover |
|---|---|---|
| Conflict Detection | 8 | Matching, new records, orphans, large datasets, field comparisons |
| Circuit Breaker | 7 | All state transitions, thresholds, timeout recovery, forced states |
| Rate Limiter | 7 | Under/over limit, user isolation, peek vs. consume, reset |
| RBAC | 6 | Role hierarchy, permission checking, role ordering |
| Strategies | 6 | All five strategies with timestamp edge cases |
| Types | 4 | Serialization roundtrips, display formatting |
| Audit Log | 3 | Append, filter, metadata |
Combined with the existing test suite, the full workspace now has over 3,044 passing tests. The pathfinder-core crate alone has 2,882 tests. The pf-query crate has 120 tests covering the parser, executor, storage layer, and graph operations.
The Full Platform Vision
Here is the picture when all four legs are complete:
Audit tells you what is wrong. Query lets you investigate. Sync keeps your orgs aligned. Migrate moves data from legacy systems into the Salesforce ecosystem. The Fabric AI engine sits underneath all four, providing pattern recognition, code generation, and intelligent automation suggestions.
Three of the four legs are now production-ready on main. The migration engine has been there from the beginning — it is how Pathfinder started. The audit engine reached 132 checks today. The query engine has been merged for two weeks. The sync engine just merged.
What Is Next
The immediate roadmap:
- Slack shell refactor — Connecting SF Sync Center's Slack UI to the Pathfinder sync API. The business logic moves to Rust; the Slack app becomes a thin client.
- Website overhaul — Eight new pages on colbysdatamovers.com reflecting the full platform capabilities. Dedicated pages for the sync engine, the query engine, the audit suite, and the platform architecture.
- More audit checks — We are targeting 200 checks by end of quarter. The next modules will cover Sales Cloud, Service Cloud, and Marketing Cloud verticals with cloud-specific checks.
- Persistent sync state — Moving from in-memory sync state to SQLite-backed persistence in the desktop app and Salesforce custom object persistence for the REST API.
The Numbers
Here is the merge by the numbers:
- 49 files changed across 8 crates and the React frontend
- +7,403 lines of new code
- 8 feature commits on the integration/sync-engine branch
- 132 audit checks across 15 modules (was 85)
- 41 new tests in pf-sync alone
- 3,044+ tests passing across the full workspace
- 16 REST API endpoints for sync management
- 5 conflict resolution strategies
- 3 RBAC roles with 20 mapped operations
- 6 rate-limited operations with sliding window enforcement
- 1 circuit breaker with configurable thresholds and automatic recovery
- 0 data leaving your machine
Everything is Rust. Everything is tested. Everything runs on your hardware. No cloud dependency, no SaaS subscription, no data exfiltration risk. That has been the principle from day one and it has not changed.
Ready for a Salesforce Audit?
132 checks. Sub-second execution. Nonprofit, security, compliance, performance, adoption, and automation health. Zero data leaves your machine.