MCP, Multi-Vendor Routing, and Building an AI Development Platform
sf-fabric is not a chatbot. It is a platform. Multi-vendor LLM routing, 7 MCP tools exposed over JSON-RPC, session persistence with atomic writes, and the architecture that makes Claude call Salesforce patterns autonomously.
The Vendor Abstraction
sf-fabric does not depend on any specific LLM provider. The Vendor trait defines the contract that every provider implements:
// From sf-fabric-llm/src/vendor.rs
#[async_trait]
pub trait Vendor: Send + Sync {
/// Display name (e.g., "anthropic", "openai", "ollama")
fn name(&self) -> &str;
/// Whether this vendor has valid credentials
fn is_configured(&self) -> bool;
/// Apply settings (API keys, base URLs)
async fn configure(
&mut self, settings: VendorSettings
) -> Result<()>;
/// Non-streaming chat request
async fn send(
&self,
messages: &[Message],
opts: &ChatOptions,
) -> Result<ChatResponse>;
/// Streaming chat request
async fn send_stream(
&self,
messages: &[Message],
opts: &ChatOptions,
tx: tokio::sync::mpsc::Sender<StreamUpdate>,
) -> Result<()>;
/// List available models
async fn list_models(&self) -> Result<Vec<String>>;
}
Five methods. That is the entire contract. send for synchronous requests. send_stream for streaming with a tokio mpsc channel. list_models for discovery. is_configured for health checks. configure for setup.
The trait requires Send + Sync because vendor instances are shared across async tasks. A single VendorManager can route requests to multiple vendors concurrently from different tasks without data races.
ChatOptions and ChatResponse
The request and response types are vendor-neutral:
pub struct ChatOptions {
pub model: String,
pub temperature: f64, // default: 0.7
pub top_p: f64, // default: 0.9
pub presence_penalty: f64, // default: 0.0
pub frequency_penalty: f64, // default: 0.0
pub max_tokens: Option<u32>,
pub stream: bool,
pub raw: bool, // Skip system message assembly
pub timeout: Option<Duration>, // 60s sync, 120s stream
pub max_retries: u8, // default: 2
}
pub struct ChatResponse {
pub content: String,
pub model: String,
pub usage: Option<TokenUsage>,
}
pub struct TokenUsage {
pub prompt_tokens: u32,
pub completion_tokens: u32,
pub total_tokens: u32,
}
pub struct StreamUpdate {
pub content: String,
pub done: bool,
}
The max_retries field enables automatic retry on transient failures. The is_retryable_status function identifies which HTTP status codes should trigger a retry:
pub fn is_retryable_status(status: u16) -> bool {
matches!(status, 429 | 503 | 502 | 500)
}
// Retry with exponential backoff:
// Attempt 0: immediate
// Attempt 1: 500ms delay
// Attempt 2: 1000ms delay
The Anthropic Implementation
Here is how the Anthropic vendor implements the trait. The key complexity is converting sf-fabric's message format to Anthropic's API format, where system messages are extracted into a separate parameter:
// From sf-fabric-llm/src/anthropic.rs
fn prepare_messages(
messages: &[Message]
) -> (Option<String>, Vec<ApiMessage>) {
let mut system_parts = Vec::new();
let mut api_messages = Vec::new();
for msg in messages {
match msg.role {
MessageRole::System => {
system_parts.push(msg.content.clone());
}
MessageRole::User => {
api_messages.push(ApiMessage {
role: "user".to_string(),
content: msg.content.clone(),
});
}
MessageRole::Assistant => {
api_messages.push(ApiMessage {
role: "assistant".to_string(),
content: msg.content.clone(),
});
}
MessageRole::Meta => {
// Filtered out: internal only
}
}
}
let system = if system_parts.is_empty() {
None
} else {
Some(system_parts.join("\n\n"))
};
(system, api_messages)
}
The Meta role is significant. Session metadata (timestamps, state flags) uses the Meta role so it persists in the session file but never reaches the LLM. The vendor_messages() method on Session filters out Meta messages before sending.
The Anthropic implementation also handles SSE streaming, parsing the event stream character by character because Anthropic's streaming format uses Server-Sent Events with tagged event types:
// SSE parsing in the streaming implementation
while let Some(chunk) = stream.next().await {
let chunk = chunk?;
buffer.push_str(&String::from_utf8_lossy(&chunk));
// Process complete SSE events
while let Some(event_end) = buffer.find("\n\n") {
let event_text = buffer[..event_end].to_string();
buffer = buffer[event_end + 2..].to_string();
let mut event_type = String::new();
let mut event_data = String::new();
for line in event_text.lines() {
if let Some(t) = line.strip_prefix("event: ") {
event_type = t.to_string();
} else if let Some(d) = line.strip_prefix("data: ") {
event_data = d.to_string();
}
}
match event_type.as_str() {
"content_block_delta" => {
// Parse delta, extract text, send to channel
let _ = tx.send(StreamUpdate {
content: delta.text,
done: false,
}).await;
}
"message_stop" => {
let _ = tx.send(StreamUpdate {
content: String::new(),
done: true,
}).await;
}
_ => { /* ping, message_start, etc. */ }
}
}
}
The VendorManager
The VendorManager owns all vendor instances and routes requests to the correct one. The type signature tells the concurrency story:
pub type VendorHandle = Arc<RwLock<Box<dyn Vendor>>>;
pub struct VendorManager {
vendors: HashMap<String, VendorHandle>,
default_vendor: Option<String>,
default_model: Option<String>,
}
impl VendorManager {
pub fn register(&mut self, vendor: Box<dyn Vendor>) {
let name = vendor.name().to_lowercase();
self.vendors.insert(
name, Arc::new(RwLock::new(vendor))
);
}
pub fn resolve(
&self,
vendor_name: Option<&str>,
model: Option<&str>,
) -> Result<(VendorHandle, String)> {
let vendor_key = vendor_name
.map(|v| v.to_lowercase())
.or_else(|| self.default_vendor.clone())
.ok_or_else(|| anyhow!(
"No vendor specified and no default. \
Run `sf-fabric --setup`."
))?;
let vendor = self.vendors.get(&vendor_key)
.ok_or_else(|| anyhow!(
"Unknown vendor: {vendor_key}"
))?.clone();
let model_name = model
.map(|m| m.to_string())
.or_else(|| self.default_model.clone())
.ok_or_else(|| anyhow!(
"No model specified and no default."
))?;
Ok((vendor, model_name))
}
}
Arc<RwLock<Box<dyn Vendor>>> deserves explanation:
- Box<dyn Vendor> is a heap-allocated trait object. It allows different vendor implementations (AnthropicVendor, OpenAIVendor, OllamaVendor) to be stored in the same HashMap.
- RwLock allows concurrent reads (multiple send() calls) with exclusive writes (configure() calls). In practice, configure() is called once at startup. After that, all access is concurrent reads.
- Arc enables shared ownership across async tasks. When the Chatter spawns a streaming task, it clones the Arc, not the vendor.
Model routing is by convention. If the model string starts with "claude", route to Anthropic. If it starts with "gpt", route to OpenAI. If it starts with a local model name, route to Ollama. The user can override by specifying the vendor explicitly.
The MCP Server
MCP (Model Context Protocol) allows LLMs like Claude to call sf-fabric as a tool server. sf-fabric exposes 7 tools over JSON-RPC via stdio:
// From sf-fabric-mcp/src/lib.rs
7 MCP Tools:
1. list_patterns - List all available patterns
2. get_pattern - Get pattern details and prompt
3. run_pattern - Execute a pattern against input
4. list_orgs - List connected Salesforce orgs
5. execute_soql - Run SOQL against an org
6. describe_sobject - Get SObject schema description
7. get_org_limits - Get org API usage limits
The MCP server implements the JSON-RPC 2.0 protocol over standard input/output. Claude (or any MCP client) sends JSON requests on stdin and receives JSON responses on stdout:
pub struct McpServer {
pub patterns: PatternStore,
pub strategies: StrategyStore,
pub contexts: ContextStore,
pub sessions: SessionStore,
pub template_engine: TemplateEngine,
pub vendor_manager: VendorManager,
}
impl McpServer {
pub async fn run_stdio(self) -> Result<()> {
let stdin = io::stdin();
let stdout = io::stdout();
let reader = stdin.lock();
for line in reader.lines() {
let line = line?;
if line.trim().is_empty() { continue; }
let request: JsonRpcRequest =
match serde_json::from_str(&line) {
Ok(req) => req,
Err(e) => {
// Send parse error response
continue;
}
};
// Notifications (no id) get no response
if request.id.is_none() { continue; }
let response =
self.handle_request(&request).await;
// Write response to stdout
serde_json::to_writer(&mut stdout.lock(),
&rpc_response)?;
}
Ok(())
}
async fn handle_request(
&self, request: &JsonRpcRequest
) -> Result<Value> {
match request.method.as_str() {
"initialize" => self.handle_initialize(),
"tools/list" => self.handle_tools_list(),
"tools/call" =>
self.handle_tools_call(
request.params.as_ref()
).await,
_ => anyhow::bail!(
"Unknown method: {}", request.method
),
}
}
}
The tool definitions follow MCP's schema format. Here is the run_pattern tool, which is the most powerful:
{
"name": "run_pattern",
"description": "Execute an sf-fabric pattern against
input text using an LLM",
"inputSchema": {
"type": "object",
"properties": {
"pattern": {
"type": "string",
"description": "Pattern name to execute"
},
"message": {
"type": "string",
"description": "Input text to process"
},
"strategy": {
"type": "string",
"description": "Optional reasoning strategy"
},
"org": {
"type": "string",
"description": "Optional Salesforce org alias"
}
},
"required": ["pattern", "message"]
}
}
How Claude Calls Patterns Autonomously
When sf-fabric is registered as an MCP server with Claude, something interesting happens. Claude can discover and call patterns without human instruction. Here is the interaction flow:
User to Claude: "Review this Apex class for security issues"
Claude's internal reasoning:
1. User wants a security review of Apex code
2. I have access to sf-fabric via MCP
3. Let me check available patterns...
Claude calls MCP: tools/call
{ "name": "list_patterns", "arguments": {} }
sf-fabric returns:
[{"name": "review_apex", "category": "review",
"tags": ["apex", "security"]},
{"name": "review_security", "category": "security"},
...]
Claude's reasoning:
review_security is the best match.
Let me run it with the user's code.
Claude calls MCP: tools/call
{
"name": "run_pattern",
"arguments": {
"pattern": "review_security",
"message": "[user's Apex code]",
"strategy": "security_first"
}
}
sf-fabric:
1. Loads review_security pattern
2. Prepends security_first strategy
3. Assembles system message
4. Sends to configured LLM vendor
5. Returns response
Claude to User:
[Formatted security review with findings]
Claude autonomously chose the right pattern, applied the right strategy, and used sf-fabric's specialized review instead of its own generic knowledge. The user never mentioned patterns or strategies. Claude discovered the tools via MCP and chose to use them because they were more appropriate for the task.
Session Persistence
sf-fabric persists conversations as named sessions. Each session is a JSON file containing the message history:
// From sf-fabric-core/src/session.rs
const MAX_SESSION_MESSAGES: usize = 500;
pub struct Session {
pub name: String,
pub messages: Vec<Message>,
}
impl Session {
pub fn add_message(&mut self, message: Message) {
if self.messages.len() >= MAX_SESSION_MESSAGES {
// Remove oldest non-system message
if let Some(pos) = self.messages.iter()
.position(|m| m.role != MessageRole::System)
{
self.messages.remove(pos);
}
}
self.messages.push(message);
}
/// Filter out Meta messages before sending to vendor
pub fn vendor_messages(&self) -> Vec<&Message> {
self.messages.iter()
.filter(|m| m.role != MessageRole::Meta)
.collect()
}
}
pub struct SessionStore {
dir: PathBuf,
}
impl SessionStore {
/// Atomic write: temp file + rename
pub fn save(&self, session: &Session) -> Result<()> {
self.ensure_dir()?;
let path = self.session_path(&session.name);
let content = serde_json::to_string_pretty(session)?;
// Write to temp file first
let tmp_path = path.with_extension("json.tmp");
std::fs::write(&tmp_path, &content)?;
// Atomic rename
std::fs::rename(&tmp_path, &path)?;
Ok(())
}
}
Three design decisions:
- 500-message cap. Sessions cannot grow unbounded. At 500 messages, the oldest non-system message is evicted. System messages are preserved because they contain the pattern and strategy instructions that define the session's behavior. Removing them would change what the AI knows.
- Atomic writes. Session persistence uses write-to-temp-then-rename. If the process crashes mid-write, the previous session file is intact. The tmp file is orphaned and cleaned up on next save. This prevents corrupted session files from killing the user's conversation history.
- Meta filtering. The Meta message role stores session metadata (timestamps, configuration snapshots) that should persist in the file but never reach the LLM. vendor_messages() filters these out. This gives you a clean audit trail without polluting the AI's context window.
The Full Architecture
sf-fabric Architecture:
┌────────────────────────────────────────────┐
│ CLI / MCP │
│ sf-fabric-cli sf-fabric-mcp │
│ (direct use) (Claude/tool use) │
└──────────────┬──────────────┬──────────────┘
│ │
┌──────────────▼──────────────▼──────────────┐
│ sf-fabric-core │
│ Chatter ← PatternStore + StrategyStore │
│ ← ContextStore + SessionStore │
│ ← TemplateEngine │
└──────────────┬──────────────┬──────────────┘
│ │
┌──────────────▼──────┐ ┌────▼───────────────┐
│ sf-fabric-llm │ │ sf-fabric-salesforce│
│ VendorManager │ │ OrgManager │
│ ├─ Anthropic │ │ SoqlClient │
│ ├─ OpenAI │ │ MetadataClient │
│ └─ Ollama │ │ TemplateResolver │
└─────────────────────┘ └────────────────────┘
6 crates. Clear dependency boundaries.
Core depends on nothing external.
LLM and Salesforce are swappable layers.
The architecture is deliberately layered. sf-fabric-core knows about patterns, strategies, sessions, and templates. It does not know about Anthropic or Salesforce. sf-fabric-llm knows about LLM vendors. It does not know about patterns. sf-fabric-salesforce knows about Salesforce APIs. It does not know about LLMs.
The Chatter in sf-fabric-core orchestrates everything. It takes a ChatRequest (which names a pattern, strategy, session, vendor, variables, message, and org), assembles the system message, resolves variables, and sends the result through the vendor. The Chatter is the only piece that touches all three layers.
This separation is not academic cleanliness. It is practical. When Anthropic changes their API version, only anthropic.rs changes. When Salesforce releases a new API version, only sf-fabric-salesforce changes. When we add a new pattern format, only sf-fabric-core changes. No change ripples across boundaries.
What This Enables
The combination of multi-vendor routing, MCP tooling, and session persistence creates something more than the sum of its parts. You can:
- Run a pattern against Claude Opus for the initial analysis, then run a follow-up against GPT-4 for a second opinion, in the same session.
- Let Claude autonomously discover and call patterns via MCP, selecting the right tool for each sub-task in a complex workflow.
- Persist a session across days, picking up where you left off with full context.
- Use Ollama for local development (free, fast, private) and switch to Anthropic for production (higher quality, audit trail).
- Run the same pattern against three orgs (dev, staging, prod) with different {{sf:*}} variable resolutions, comparing the results.
sf-fabric is not an AI chatbot. It is an AI development platform. The patterns encode the expertise. The strategies encode the reasoning. The vendors provide the compute. The MCP server provides the integration. And the session system provides the memory.
Each piece is simple. The vendor trait is 5 methods. The MCP server is 7 tools. The session store is JSON files with atomic writes. The pattern store is directories with two files each. But composed together, they create a system that scales Salesforce expertise across teams, vendors, and time.