Salesforce AI November 12, 2025 15 min read

The Atlas Reasoning Engine: How Agentforce Actually Makes Decisions

Agentforce is not a chatbot with Salesforce access. It is an autonomous agent built on a reasoning loop. Understanding that loop is the difference between an agent that works and one that hallucinates its way through your org.

Tyler Colby · Founder, Colby's Data Movers

What Atlas Actually Is

Atlas is Salesforce's reasoning engine. It is the brain behind Agentforce. When a user sends a message to an Agentforce agent, Atlas decides what to do with it. Not a prompt template. Not a lookup table. An actual reasoning loop that plans, acts, observes, and decides what to do next.

The architecture is a ReAct loop (Reason + Act). This is not proprietary to Salesforce. ReAct is a well-established pattern in AI agent design, published by Yao et al. at Princeton in 2022. Salesforce's contribution is implementing it within the constraints of enterprise data, security, and compliance.

Here is the loop:

User sends message
     │
     ▼
┌─────────────────┐
│ CLASSIFY TOPIC   │  Which topic does this message belong to?
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ PLAN             │  What actions are needed to answer this?
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ ACT              │  Execute the first action (query, API call, Apex)
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ OBSERVE          │  What did the action return?
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ EVALUATE         │  Do I have enough to answer? Or do I need more?
└────────┬────────┘
         │
    ┌────┴────┐
    │         │
    ▼         ▼
  RESPOND   LOOP BACK TO PLAN
    │
    ▼
  Apply guardrails, format response, deliver

Each iteration of this loop is a "turn." A simple question ("What is the status of my case?") takes 1-2 turns. A complex request ("Summarize all open opportunities over $50K and flag the ones at risk") takes 3-5 turns. Atlas has a configurable maximum turn count, typically set to 5-8.

Topic Classification: The First Decision

Before Atlas reasons about anything, it classifies the incoming message into a topic. Topics are the top-level organizational unit in Agentforce. Each topic has its own instructions, actions, and guardrails.

Example Agent Topics:
  1. Order Status       - "Where is my order?"
  2. Product Questions  - "Does the X9000 work outdoors?"
  3. Returns            - "I need to return an item"
  4. Account Management - "Update my address"
  5. Escalation         - "Let me talk to a human"

Topic classification is an LLM call. Atlas sends the user message plus the list of topic names and descriptions to the model, and the model returns the best match. The topic description is what the model uses to decide. Bad descriptions produce bad classification.

// Topic configuration in Agent Builder

Topic: Order Status
Description: "Customer is asking about the delivery status,
  shipping timeline, tracking information, or expected arrival
  date of an existing order."
Classification Instruction: "Classify here when the customer
  mentions an order number, tracking, delivery, or shipping."

Topic: Product Questions
Description: "Customer is asking about product features,
  specifications, compatibility, pricing, or availability.
  NOT about an order they already placed."
Classification Instruction: "Classify here when the question
  is about a product the customer has NOT yet purchased."

The second topic's description includes "NOT about an order they already placed." This negative constraint is essential. Without it, a question like "Does my order include the outdoor mounting kit?" could be classified as either Order Status or Product Questions. The explicit boundary resolves the ambiguity.

Planning: What Atlas Decides To Do

Once the topic is classified, Atlas enters the planning phase. It reads the topic's instructions and the list of available actions, then generates a plan.

Instructions are natural language directives that shape the agent's behavior. They are the most powerful and most misunderstood part of Agentforce configuration.

// Topic Instructions (Order Status)

"You help customers check on their orders. Follow these steps:

1. Ask for the order number if the customer hasn't provided one.
2. Look up the order using the Get Order Status action.
3. If the order is shipped, provide the tracking number and
   estimated delivery date.
4. If the order is processing, tell the customer the expected
   ship date.
5. If the order is on hold, explain the reason and offer to
   escalate to a support agent.
6. Never share the customer's full payment information.
7. Never modify an order in this topic. Direct order changes
   to the Account Management topic."

These instructions are injected into the LLM prompt as system context. Atlas uses them to plan which actions to execute and in what order. The numbered steps are not a rigid script. They are guidelines that the reasoning engine interprets dynamically based on the conversation context.

Action Execution: The Act Phase

Actions are the things Atlas can do. There are four types:

Action Types in Agentforce:
  1. Flow Actions     - Execute a Salesforce Flow
  2. Apex Actions     - Call an @InvocableMethod
  3. Query Actions    - Run SOQL queries
  4. Template Actions - Generate text from a Prompt Template

Each action has a name, description, and input/output parameters. Atlas reads the descriptions to decide which action to invoke. The description is doing heavy lifting here. If the description is vague, Atlas will use the action at the wrong time or not at all.

// Apex Action: Get Order Status
@InvocableMethod(
  label='Get Order Status'
  description='Retrieves the current status, tracking number,
    and estimated delivery date for a specific order. Requires
    the order number (e.g., ORD-12345). Returns order status,
    tracking URL, and line items.'
)
public static List<OrderStatusResult> getOrderStatus(
  List<OrderStatusRequest> requests
) {
  // Implementation...
}

The description tells Atlas exactly when to use this action and what it returns. "Retrieves the current status" tells Atlas this is a read operation. "Requires the order number" tells Atlas it needs to collect this input from the user before calling the action. "Returns order status, tracking URL, and line items" tells Atlas what data it will have after the action completes.

The Observe-Evaluate Loop

After executing an action, Atlas enters the observe phase. It looks at the action's output and decides: is this enough to answer the user, or do I need more information?

Turn 1:
  User: "Where is my order ORD-78234?"
  Atlas PLAN: Call Get Order Status with order_number = "ORD-78234"
  Atlas ACT: Execute getOrderStatus("ORD-78234")
  Atlas OBSERVE: {
    status: "shipped",
    tracking_number: "1Z999AA10123456784",
    carrier: "UPS",
    estimated_delivery: "2025-11-15",
    line_items: [
      { name: "Outdoor Heater X9000", qty: 1 },
      { name: "Patio Cover Accessory", qty: 2 }
    ]
  }
  Atlas EVALUATE: I have the status, tracking, and delivery date.
    This is enough to answer. Proceed to RESPOND.

Turn 2 (response):
  Agent: "Your order ORD-78234 has shipped via UPS.
    Tracking number: 1Z999AA10123456784.
    Estimated delivery: November 15, 2025.
    Items: 1x Outdoor Heater X9000, 2x Patio Cover Accessory."

Simple case: one action, one response. But watch what happens with a more complex query:

Turn 1:
  User: "I ordered the X9000 but I think the X12000
    would have been better. Can I exchange it?"
  Atlas PLAN: Need to check order details AND product comparison.
    Two actions needed.
  Atlas ACT: Execute getOrderStatus (to find order details)
  Atlas OBSERVE: Order found. Status: "delivered". Item: X9000.

Turn 2:
  Atlas PLAN: Now I need the exchange policy for delivered items
    AND product comparison between X9000 and X12000.
  Atlas ACT: Execute getReturnPolicy (to check exchange eligibility)
  Atlas OBSERVE: Delivered items eligible for exchange within 30 days.
    Order delivered 12 days ago. Exchange eligible.

Turn 3:
  Atlas PLAN: User wants to know if X12000 is better.
    Get comparison data.
  Atlas ACT: Execute getProductComparison(X9000, X12000)
  Atlas OBSERVE: X12000 has higher BTU, larger coverage area,
    $800 more expensive.

Turn 4:
  Atlas EVALUATE: I have order status, exchange eligibility,
    and product comparison. Enough to give a complete answer.
  Agent: "Your X9000 was delivered 12 days ago, so it's eligible
    for exchange. The X12000 has [comparison details]. The price
    difference would be $800. Would you like me to start the
    exchange process?"

Four turns. Three actions. One coherent response. Atlas decided on its own that it needed three pieces of information and executed them sequentially, carrying context from each step to the next. This is the power of the ReAct loop.

Guardrails: Where Trust Is Enforced

Guardrails constrain what Atlas can do and say. They are applied at two points: before action execution and before response delivery.

Guardrail Types:
  1. Topic Guardrails    - "Never discuss competitor products"
  2. Action Guardrails   - "Require confirmation before creating records"
  3. Output Guardrails   - "Never include PII in responses"
  4. Escalation Rules    - "Transfer to human if sentiment is negative"

// Example: Topic-level guardrails (Order Status)
Guardrail Instructions:
- "Do not share the customer's full payment card number.
   Show only the last four digits."
- "Do not modify order status. This topic is read-only."
- "If the customer expresses frustration more than twice,
   offer to connect them with a human agent."
- "Do not speculate about delivery dates if the carrier
   tracking shows no updates. Say 'tracking information
   is not yet available' instead."

Guardrails are injected into the system prompt alongside topic instructions. The LLM enforces them during reasoning. They are not hard-coded rules. They are natural language constraints that the model interprets. This means they are probabilistic, not deterministic. A guardrail that says "never share PII" will work 99% of the time. For the other 1%, you need the Einstein Trust Layer (covered in a separate post).

Gotcha 1: Topic Overlap

The most common failure mode in Agentforce implementations is topic overlap. Two or more topics have descriptions that are similar enough that the classifier cannot reliably distinguish them.

// BAD: Overlapping topics
Topic: "Billing" - "Questions about charges, payments, and invoices"
Topic: "Refunds" - "Questions about refunds, credits, and payment reversals"

// User message: "I was charged twice for my order"
// Atlas classification: ???
// Could be Billing (it's about a charge)
// Could be Refunds (they probably want money back)

The fix is either to merge the topics or to add explicit boundaries:

// GOOD: Clear boundaries
Topic: "Billing" - "Questions about understanding charges, reading
  invoices, and payment methods. NOT about getting money back."
Topic: "Refunds" - "Customer wants a refund, credit, or reversal
  of a charge. Customer believes they were overcharged or charged
  incorrectly."

Test topic classification with 50-100 sample messages before going live. Log the classifications and look for misroutes. If more than 5% of messages are misclassified, your topic boundaries need work.

Gotcha 2: Instruction Ordering Sensitivity

The order of instructions matters more than it should. LLMs have a recency bias. Instructions at the end of the prompt have more influence than instructions at the beginning. This can cause surprising behavior.

// Instructions where ordering matters:

"1. Always greet the customer by name.
 2. Look up the customer's account.
 3. Check for open cases.
 4. Summarize any recent interactions.
 5. Never discuss internal escalation procedures with the customer.
 6. Always ask if there's anything else you can help with."

// Problem: Instruction 5 (the safety guardrail) is buried
// in the middle. It gets less weight than instruction 6.
// The agent sometimes mentions escalation procedures.

// Fix: Put guardrails LAST (highest recency weight):
"1. Always greet the customer by name.
 2. Look up the customer's account.
 3. Check for open cases.
 4. Summarize any recent interactions.
 5. Always ask if there's anything else you can help with.
 6. IMPORTANT: Never discuss internal escalation procedures.
 7. IMPORTANT: Never share customer PII in your responses."

Prefix critical guardrails with "IMPORTANT:" and place them at the end of the instruction list. This is a hack around LLM attention patterns, not a principled solution. But it works consistently in production.

Gotcha 3: Cold Start with Too Many Topics

More topics means more classification options. More classification options means lower accuracy per option. I have seen agents with 15+ topics where classification accuracy drops below 70%. The agent spends its first turn misclassifying and its second turn recovering. By the third turn, the user has typed "AGENT" to get a human.

Topic Count vs Classification Accuracy (observed):
  3-5 topics:     92-97% accuracy
  6-8 topics:     85-92% accuracy
  9-12 topics:    75-85% accuracy
  13-15 topics:   65-78% accuracy
  16+ topics:     Below 70% (unusable)

Recommendation: Start with 5 topics. Maximum 10.
Merge related topics aggressively.

If your business truly needs 15 distinct conversation paths, use a two-tier approach. The first tier classifies into 4-5 broad categories. The second tier narrows within the category. Atlas supports this through topic chaining.

Gotcha 4: More Instructions Makes Agents Worse

This is counterintuitive but consistent. Teams write 30-40 instructions per topic trying to cover every edge case. The agent gets worse, not better.

The reason is context window dilution. Every instruction competes for the LLM's attention. With 40 instructions, each one gets less weight. Critical guardrails get buried alongside formatting preferences. The model cannot distinguish between "never share PII" and "use bullet points for lists."

// BAD: 35 instructions, many redundant or low-value
"1. Greet the customer warmly.
 2. Use their first name.
 3. Be empathetic and professional.
 4. Use proper grammar.
 5. Avoid jargon.
 ...
 33. Always confirm before making changes.
 34. Never share internal procedures.
 35. Never share PII."

// GOOD: 8 focused instructions
"1. Greet the customer by first name.
 2. Look up their account and recent cases.
 3. For order issues: check order status first, then return policy.
 4. For product questions: use the product knowledge base action.
 5. Confirm before making any changes to the account.
 6. Offer to escalate if the customer is frustrated.
 7. IMPORTANT: Never share customer PII or internal procedures.
 8. IMPORTANT: Never modify records without explicit customer consent."

Eight well-crafted instructions outperform thirty-five verbose ones. Every time. The rule of thumb: if an instruction does not directly affect the agent's decision-making, remove it. "Use proper grammar" is not an instruction. It is a default behavior. Do not waste context window space on it.

Testing Atlas Behavior

Agentforce agents must be tested like software, not like chatbots. That means reproducible test cases with expected outcomes.

// Test case structure
Test: Order Status - Shipped Order
  Input: "Where is order ORD-12345?"
  Expected Topic: Order Status
  Expected Action: getOrderStatus("ORD-12345")
  Expected Response Contains: tracking number, estimated delivery
  Expected Response Excludes: payment information, internal notes
  Max Turns: 2

Test: Topic Boundary - Billing vs Refund
  Input: "I was charged twice"
  Expected Topic: Refunds (not Billing)
  Expected Action: getOrderCharges (to verify double charge)
  Expected Response Contains: acknowledgment, refund offer
  Max Turns: 3

Test: Guardrail - PII Protection
  Input: "What credit card do I have on file?"
  Expected Topic: Account Management
  Expected Response Contains: last four digits only
  Expected Response Excludes: full card number, CVV, expiration

Build a test suite of at least 30 cases covering happy paths, edge cases, and guardrail violations. Run it after every topic or instruction change. Automated testing for Agentforce is still immature, so most teams run these manually in the Agent Builder preview. It is tedious but essential.

Production Monitoring

Once the agent is live, monitor three metrics:

Key Agentforce Metrics:
  1. Resolution Rate     - % of conversations resolved without escalation
     Target: 70-80%
     Below 60%: Topics or instructions need rework

  2. Average Turn Count  - How many reasoning loops per conversation
     Target: 2-3 turns
     Above 5: Agent is struggling, likely misclassifying or
              hitting wrong actions

  3. Escalation Triggers - Why conversations transfer to humans
     Track: Topic, turn count at escalation, last action before escalation
     Pattern: If 40% of escalations come from one topic, that topic
              needs better actions or instructions

Salesforce provides conversation logs in Einstein Analytics. Review them weekly for the first month. Look for patterns in failed conversations. A single misclassified topic can drive 30% of your escalations. Fix the topic description and the escalation rate drops overnight.

The Bottom Line on Atlas

Atlas is a capable reasoning engine. It can handle multi-step tasks, maintain context across turns, and enforce guardrails. But it is not magic. It is a pattern-matching system that operates within the constraints you define.

The constraints are the product. Topic definitions, action descriptions, instructions, and guardrails. The quality of these four inputs determines the quality of the agent's output. The LLM is the same for everyone. The differentiation is in the configuration.

Start with 5 topics and 8 instructions per topic. Test with 30+ cases. Monitor resolution rate and turn count. Iterate weekly. The agents that work well are the ones that were tuned relentlessly, not the ones that were configured once and deployed. Building an Agentforce agent? We can help you get the configuration right the first time.