Designing a SQL-Graph Hybrid Query Language in Rust
BridgeQL unifies tabular and graph queries in one language. This is the technical story of its parser, lexer, and AST architecture. Hand-written recursive descent. 50+ node types. Zero parser generators.
Why Not Just Use SQL?
Salesforce data is relational on the surface and graph-shaped underneath. Account connects to Contact connects to Opportunity connects back to Account through a different lookup. You can traverse these relationships in SOQL using dot notation (Contact.Account.Name) or child subqueries (SELECT Id, (SELECT Id FROM Contacts) FROM Account). But the moment you need to express "find all paths from this Account to any related Case through any chain of lookups," SOQL falls apart.
Graph query languages like Cypher handle traversals beautifully but cannot do aggregation, subqueries, or window functions. SQL handles tabular operations beautifully but cannot express variable-length path traversals.
BridgeQL handles both. One language. One parser. One AST. You can SELECT columns, JOIN tables, MATCH patterns, and traverse paths in the same query. This post explains how it works at the implementation level.
The Lexer
The lexer is a hand-written scanner that produces a stream of tokens. No regular expressions. No generator tools. Just a match on the current character that advances through the source string.
BridgeQL's lexer handles both SQL and graph keywords, which means the keyword set is larger than either paradigm alone. The TokenKind enum has over 60 variants:
// bridgeql-core/src/lexer.rs (simplified)
pub enum TokenKind {
// SQL keywords
Select, From, Where, Join, Inner, Left, Right,
Full, Cross, On, As, Distinct, All,
Insert, Into, Values, Update, Set, Delete,
Group, By, Having, Order, Asc, Desc,
Limit, Offset, Nulls, First, Last,
And, Or, Not, In, Like, Between, Exists,
Is, Null, Case, When, Then, Else, End,
Union, Intersect, Except,
// Graph keywords
Match, Optional, Create, Detach, Return,
// Temporal keywords
At, Time, Timeline, History,
// Literals
Integer(i64),
Float(f64),
StringLiteral(String),
True, False,
// Identifiers and operators
Identifier(String),
Star, Dot, Comma, Semicolon,
LeftParen, RightParen,
LeftBracket, RightBracket,
LeftBrace, RightBrace,
Eq, NotEq, Lt, LtEq, Gt, GtEq,
Plus, Minus, Slash, Percent, Pipe,
Colon, Arrow, LeftArrow, Dash,
DotDot, // For path length ranges
// Special
Parameter(String), // $1, $name
Eof,
}
The challenge is disambiguation. The dash character (-) could be a minus operator, part of a graph edge pattern (-[...]->), or a negative number literal. The arrow (->) could be a right arrow in a graph pattern or nothing (just a dash followed by a greater-than). The lexer resolves these with lookahead:
// Disambiguating - vs -> vs -[
fn scan_dash(&mut self) -> Token {
self.advance(); // consume '-'
match self.peek() {
Some('>') => {
self.advance();
self.make_token(TokenKind::Arrow)
}
Some('[') => {
// Don't consume '[' - let the parser handle it
// as part of edge pattern parsing
self.make_token(TokenKind::Dash)
}
Some('-') => {
self.advance();
self.make_token(TokenKind::DoubleDash)
}
_ => self.make_token(TokenKind::Minus)
}
}
The lexer also handles SQL-style string escaping (single quotes with '' for literal quote) and graph-style property maps ({key: value}). These use different delimiter rules and the lexer tracks context to know which rules apply.
The AST
BridgeQL's AST has 50+ node types organized into four groups: SQL queries, graph queries, hybrid queries, and expressions. The top-level Query enum is the entry point:
// bridgeql-core/src/ast.rs
/// A complete BridgeQL query
pub enum Query {
// SQL Queries
Select(Box<SelectQuery>),
Insert(InsertQuery),
Update(UpdateQuery),
Delete(DeleteQuery),
// Graph Queries
Match(Box<MatchQuery>),
CreateGraph(CreateGraphQuery),
DeleteGraph(Box<DeleteGraphQuery>),
SetGraph(Box<SetGraphQuery>),
// Hybrid (contains both SQL and graph components)
Hybrid(Box<HybridQuery>),
}
The Hybrid variant is the interesting one. It combines a SQL SELECT with a graph MATCH in one query:
/// Hybrid query combining SQL and Graph
pub struct HybridQuery {
/// SQL SELECT part
pub select: SelectQuery,
/// Graph MATCH part
pub match_clause: MatchQuery,
}
This lets you write queries like:
SELECT a.Name, b.Email, COUNT(c.Id) as case_count
MATCH (a:Account)-[:LOOKUP]->(b:Contact)-[:LOOKUP]->(c:Case)
WHERE a.Industry = 'Technology'
RETURN a.Name, b.Email, case_count
ORDER BY case_count DESC
LIMIT 10
The SELECT defines the columns. The MATCH defines the traversal pattern. The WHERE filters. The RETURN projects. Everything compiles to one AST node that the executor can optimize as a unit.
Graph Pattern Types
The graph portion of the AST centers on the PatternElement enum, which alternates between nodes and edges:
/// A graph pattern (sequence of nodes and edges)
pub struct Pattern {
pub elements: Vec<PatternElement>,
}
/// Element in a pattern (either node or edge)
pub enum PatternElement {
Node(NodePattern),
Edge(EdgePattern),
}
/// Node pattern: (variable:Label:Label {props})
pub struct NodePattern {
pub variable: Option<String>,
pub labels: Vec<String>,
pub properties: Option<MapExpr>,
}
/// Edge pattern: -[variable:TYPE*min..max {props}]->
pub struct EdgePattern {
pub variable: Option<String>,
pub edge_type: Option<String>,
pub direction: Direction,
pub length: Option<PathLength>,
pub properties: Option<MapExpr>,
}
/// Edge direction
pub enum Direction {
Outgoing, // ->
Incoming, // <-
Both, // --
}
/// Path length specification: *min..max
pub struct PathLength {
pub min: Option<u32>,
pub max: Option<u32>,
}
A pattern like (a:Account)-[:LOOKUP*1..3]->(b:Contact) parses into:
Pattern {
elements: [
Node(NodePattern {
variable: Some("a"),
labels: ["Account"],
properties: None,
}),
Edge(EdgePattern {
variable: None,
edge_type: Some("LOOKUP"),
direction: Direction::Outgoing,
length: Some(PathLength {
min: Some(1),
max: Some(3),
}),
properties: None,
}),
Node(NodePattern {
variable: Some("b"),
labels: ["Contact"],
properties: None,
}),
]
}
The PathLength struct handles variable-length traversals. PathLength { min: Some(1), max: Some(3) } means "follow this edge 1, 2, or 3 hops." PathLength { min: None, max: None } (the * wildcard) means "any number of hops." This is what lets BridgeQL express "find all paths of any length from Account to Case," which is impossible in SOQL.
The Expression System
Both SQL and graph queries share the same expression system. This is a deliberate design decision. A WHERE clause in SQL and a WHERE clause after MATCH use identical expression nodes:
/// Expression
pub enum Expr {
Literal(Literal),
Identifier(String),
QualifiedIdentifier { qualifier: String, name: String },
BinaryOp { left: Box<Expr>, op: BinaryOp, right: Box<Expr> },
UnaryOp { op: UnaryOp, expr: Box<Expr> },
Function { name: String, args: Vec<Expr>, distinct: bool },
Case { operand: Option<Box<Expr>>, when_clauses: Vec<(Expr, Expr)>, else_clause: Option<Box<Expr>> },
IsNull { expr: Box<Expr>, negated: bool },
InList { expr: Box<Expr>, list: Vec<Expr>, negated: bool },
InSubquery { expr: Box<Expr>, subquery: Box<SelectQuery>, negated: bool },
Between { expr: Box<Expr>, low: Box<Expr>, high: Box<Expr>, negated: bool },
Like { expr: Box<Expr>, pattern: Box<Expr>, negated: bool },
Exists { subquery: Box<SelectQuery>, negated: bool },
Subquery(Box<SelectQuery>),
Map(MapExpr),
List(Vec<Expr>),
Parameter(String),
}
/// Unified Value type across SQL and graph paradigms
pub enum Literal {
Null,
Boolean(bool),
Integer(i64),
Float(f64),
String(String),
}
The unified Literal enum is important. In SQL, NULL is a keyword. In graph properties, null is a JSON-like value. By sharing one Literal type, we guarantee that null in a SQL WHERE clause and null in a graph property map are the same thing at the AST level. No implicit conversions. No surprising type mismatches.
The BinaryOp enum includes standard SQL operators plus a string Concat operator. Operator precedence is defined directly on the enum:
pub enum BinaryOp {
Add, Sub, Mul, Div, Mod, // Arithmetic
Eq, NotEq, Lt, LtEq, Gt, GtEq, // Comparison
And, Or, // Logical
Concat, // String
}
impl BinaryOp {
pub fn precedence(&self) -> u8 {
match self {
Self::Or => 1,
Self::And => 2,
Self::Eq | Self::NotEq | Self::Lt
| Self::LtEq | Self::Gt | Self::GtEq => 3,
Self::Add | Self::Sub | Self::Concat => 4,
Self::Mul | Self::Div | Self::Mod => 5,
}
}
}
Putting precedence on the operator type itself (instead of in the parser) means the parser's expression parsing is a generic Pratt parser that just calls op.precedence(). If we add a new operator, we add one line to the enum and one line to the precedence method. The parser does not change.
The Parser
BridgeQL uses a hand-written recursive descent parser. No yacc. No ANTLR. No nom. The reasons are practical:
- Error messages. Generated parsers produce cryptic errors. A hand-written parser can say "expected field name after SELECT, found ')'" instead of "unexpected token at position 7."
- Hybrid disambiguation. The parser needs context-sensitive lookahead to decide whether a query is pure SQL, pure graph, or hybrid. A SELECT followed by columns followed by MATCH is hybrid. A SELECT followed by columns followed by FROM is SQL. This is trivial in hand-written code and painful in grammar files.
- Incremental development. We add new syntax regularly. Modifying a hand-written parser is editing Rust code. Modifying a grammar file requires understanding the generator tool, regenerating, debugging the generated code.
The parser structure follows the grammar directly. Each production rule is a method:
// bridgeql-core/src/parser.rs
pub struct Parser {
tokens: Vec<Token>,
position: usize,
}
impl Parser {
pub fn new(source: &str) -> Result<Self, ParseError> {
let mut lexer = Lexer::new(source);
let tokens = lexer.tokenize()?;
Ok(Self { tokens, position: 0 })
}
pub fn parse(&mut self) -> Result<Query, ParseError> {
let query = self.parse_query()?;
if self.check(&TokenKind::Semicolon) {
self.advance();
}
if !self.is_at_end() {
let token = self.current();
return Err(ParseError::UnexpectedToken {
expected: "end of input".to_string(),
found: format!("{}", token.kind),
position: token.span.start,
});
}
Ok(query)
}
fn parse_query(&mut self) -> Result<Query, ParseError> {
match self.current().kind {
TokenKind::Select => self.parse_select_or_hybrid(),
TokenKind::Insert => Ok(Query::Insert(self.parse_insert()?)),
TokenKind::Update => Ok(Query::Update(self.parse_update()?)),
TokenKind::Delete => Ok(Query::Delete(self.parse_delete()?)),
TokenKind::Match | TokenKind::Optional =>
Ok(Query::Match(Box::new(self.parse_match()?))),
TokenKind::Create => self.parse_create(),
_ => Err(ParseError::UnexpectedToken {
expected: "query keyword (SELECT, INSERT, \
UPDATE, DELETE, MATCH, CREATE)".to_string(),
found: format!("{}", self.current().kind),
position: self.current().span.start,
})
}
}
}
The critical method is parse_select_or_hybrid. It parses a SELECT query and then checks whether a MATCH clause follows. If it does, the query becomes a Hybrid. If it does not, it stays a pure Select:
fn parse_select_or_hybrid(&mut self)
-> Result<Query, ParseError>
{
let select = self.parse_select_query()?;
// Check if MATCH follows (hybrid query)
if self.check(&TokenKind::Match) {
let match_clause = self.parse_match()?;
Ok(Query::Hybrid(Box::new(HybridQuery {
select,
match_clause,
})))
} else {
Ok(Query::Select(Box::new(select)))
}
}
This is five lines of disambiguation. In a grammar file, expressing "SELECT can optionally be followed by MATCH to form a different production" requires careful factoring to avoid ambiguity errors.
Parsing Graph Patterns
Graph patterns are the trickiest part. The syntax (a:Person {name: 'Alice'})-[r:KNOWS*1..3]->(b:Person) requires the parser to alternate between node patterns and edge patterns, handling optional variables, labels, properties, types, directions, and path lengths.
fn parse_pattern(&mut self) -> Result<Pattern, ParseError> {
let mut elements = Vec::new();
// First element must be a node
elements.push(PatternElement::Node(
self.parse_node_pattern()?
));
// Alternate edge-node until no more edges
while self.is_edge_start() {
elements.push(PatternElement::Edge(
self.parse_edge_pattern()?
));
elements.push(PatternElement::Node(
self.parse_node_pattern()?
));
}
Ok(Pattern { elements })
}
fn parse_node_pattern(&mut self)
-> Result<NodePattern, ParseError>
{
self.expect(&TokenKind::LeftParen)?;
let mut node = NodePattern::new();
// Optional variable name
if self.check_identifier() && !self.check(&TokenKind::Colon) {
node.variable = Some(self.consume_identifier()?);
}
// Optional labels (:Label:Label)
while self.check(&TokenKind::Colon) {
self.advance();
node.labels.push(self.consume_identifier()?);
}
// Optional properties {key: value}
if self.check(&TokenKind::LeftBrace) {
node.properties = Some(self.parse_map_expr()?);
}
self.expect(&TokenKind::RightParen)?;
Ok(node)
}
fn parse_edge_pattern(&mut self)
-> Result<EdgePattern, ParseError>
{
// Determine direction and consume opening tokens
// <-[...]- or -[...]-> or -[...]-
let incoming = self.check(&TokenKind::LeftArrow);
if incoming {
self.advance(); // consume <
}
self.expect(&TokenKind::Dash)?;
let mut edge = EdgePattern::new(Direction::Both);
// Optional edge details in brackets
if self.check(&TokenKind::LeftBracket) {
self.advance();
// Optional variable
if self.check_identifier()
&& !self.check(&TokenKind::Colon) {
edge.variable = Some(self.consume_identifier()?);
}
// Optional type (:TYPE)
if self.check(&TokenKind::Colon) {
self.advance();
edge.edge_type = Some(self.consume_identifier()?);
}
// Optional path length (*min..max)
if self.check(&TokenKind::Star) {
self.advance();
edge.length = Some(self.parse_path_length()?);
}
// Optional properties
if self.check(&TokenKind::LeftBrace) {
edge.properties = Some(self.parse_map_expr()?);
}
self.expect(&TokenKind::RightBracket)?;
}
self.expect(&TokenKind::Dash)?;
// Determine direction
if incoming {
edge.direction = Direction::Incoming;
} else if self.check(&TokenKind::Gt) {
self.advance();
edge.direction = Direction::Outgoing;
}
Ok(edge)
}
Why Rust's Type System Is Perfect for This
Three Rust features make this codebase dramatically better than it would be in any other language:
Exhaustive match. When you match on the Query enum, the compiler forces you to handle every variant. Add a new query type and every match statement in the codebase becomes a compile error until you handle it. In Java or Python, a new enum variant silently falls through to a default case. In Rust, it is a hard error.
// The executor. Add a new Query variant and this
// won't compile until you add the new arm.
pub fn execute(&mut self, query: &Query)
-> Result<QueryResult, ExecutionError>
{
match query {
Query::Select(select) =>
sql::execute_select(self.storage, select),
Query::Insert(insert) =>
sql::execute_insert(self.storage, insert),
Query::Update(update) =>
sql::execute_update(self.storage, update),
Query::Delete(delete) =>
sql::execute_delete(self.storage, delete),
Query::Match(match_q) =>
graph::execute_match(self.storage, match_q),
Query::CreateGraph(create) =>
graph::execute_create(self.storage, create),
Query::DeleteGraph(delete) =>
graph::execute_delete_graph(self.storage, delete),
Query::SetGraph(set) =>
graph::execute_set_graph(self.storage, set),
Query::Hybrid(hybrid) =>
graph::execute_hybrid(self.storage, hybrid),
}
}
Algebraic data types. The Expr enum is a textbook algebraic data type. Each variant carries exactly the data it needs. A BinaryOp carries left, op, and right. An InList carries expr, list, and negated. There is no "generic expression with optional fields." Every expression variant is precisely typed.
Compare this to a typical Java AST where Expression is a base class with 20 optional fields, most of which are null for any given node. In Rust, the type system tells you exactly what data each node has. Pattern matching destructures it safely.
Zero-cost abstractions with Box. The recursive nature of AST nodes (an Expr can contain other Exprs) requires heap allocation. Rust's Box provides this with zero overhead beyond the pointer. No garbage collector. No reference counting on the hot path. When the AST goes out of scope, the entire tree is deallocated deterministically.
// BinaryOp contains Box<Expr> for left and right.
// This is a heap allocation but no GC overhead.
BinaryOp {
left: Box<Expr>,
op: BinaryOp,
right: Box<Expr>,
}
// Pattern matching destructures without runtime checks
match expr {
Expr::BinaryOp { left, op, right } => {
let left_val = evaluate(left)?;
let right_val = evaluate(right)?;
apply_op(op, left_val, right_val)
}
// ...
}
Real Query Examples
Here are four queries showing the full range of BridgeQL syntax and how they parse:
Pure SQL:
SELECT a.Name, COUNT(c.Id) as contact_count
FROM Account a
LEFT JOIN Contact c ON c.AccountId = a.Id
WHERE a.Industry = 'Technology'
GROUP BY a.Name
HAVING COUNT(c.Id) > 5
ORDER BY contact_count DESC
LIMIT 20
Parses to: Query::Select(SelectQuery { ... }) with JoinClause, group_by, having, order_by, and limit populated.
Pure Graph:
MATCH (a:Account {Industry: 'Technology'})
-[:LOOKUP]->(c:Contact)
-[:LOOKUP]->(cs:Case {Status: 'Open'})
WHERE a.AnnualRevenue > 1000000
RETURN a.Name, c.Email, cs.Subject
ORDER BY a.Name
Parses to: Query::Match(MatchQuery { ... }) with a three-element Pattern (Account node, LOOKUP edge, Contact node, LOOKUP edge, Case node), property filters on two nodes, a WHERE clause, and a RETURN clause.
Hybrid SQL + Graph:
SELECT a.Name, b.Email, COUNT(c.Id) as case_count
MATCH (a:Account)-[:LOOKUP]->(b:Contact)
-[:LOOKUP]->(c:Case)
WHERE a.Industry IN ('Technology', 'Finance')
AND c.CreatedDate >= '2025-01-01'
RETURN a.Name, b.Email, case_count
ORDER BY case_count DESC
LIMIT 10
Parses to: Query::Hybrid(HybridQuery { select: SelectQuery, match_clause: MatchQuery }). The SELECT defines the output shape. The MATCH defines the traversal. This is the query type that neither SQL nor Cypher can express alone.
Variable-length path traversal:
MATCH path = (a:Account {Name: 'Acme Corp'})
-[:LOOKUP*1..5]->(target)
WHERE target:Case AND target.Status = 'Escalated'
RETURN path, length(path) as hops
This finds all paths from Acme Corp to any escalated Case within 5 hops. The path variable captures the entire traversal. PathLength { min: Some(1), max: Some(5) } constrains the depth. This query is completely impossible in SOQL.
Temporal Extensions
BridgeQL also includes temporal query clauses for time-travel queries against Salesforce field history:
pub enum TemporalClause {
/// AT TIME 'timestamp' - query at a specific point
AtTime(String),
/// TIMELINE 'start', 'end' - query over a range
Timeline { start: String, end: String },
/// HISTORY 'duration' - query from now backwards
History(String),
}
// Example: What was this Account's Industry 30 days ago?
SELECT Name, Industry
FROM Account
WHERE Id = '001xx000001'
AT TIME '2025-12-15T00:00:00Z'
// Example: Track all changes to Opportunity Stage over time
SELECT Name, StageName
FROM Opportunity
TIMELINE '2025-01-01', '2025-12-31'
WHERE Amount > 100000
The temporal clauses translate to FieldHistory queries against Salesforce. The AT TIME clause finds the field value at a specific moment by walking the history records backward. The TIMELINE clause returns a series of snapshots. The HISTORY clause is sugar for "TIMELINE from N duration ago to now."
What We Got Wrong (And Fixed)
The first version of the parser did not use Pratt parsing for expressions. It used a naive recursive approach that got operator precedence wrong for chained comparisons. a > 5 AND b < 10 OR c = 3 would parse as a > (5 AND b) < (10 OR c) = 3, which is nonsensical.
Switching to Pratt parsing (where each operator calls a sub-parser at its precedence level) fixed this entirely. The precedence method on BinaryOp drives the parsing now. Adding new operators means adding one precedence value, not restructuring the parser.
The second mistake was trying to share too much between SQL and graph parsing. The FROM clause in SQL and the pattern in MATCH look nothing alike syntactically. Early versions tried to unify them into a "data source" abstraction. This made both parsers worse. We separated them and unified only at the expression level, which is where the actual overlap exists.
The third mistake was not implementing Box for recursive types from the start. Rust's ownership system means you cannot have a type that contains itself. SelectQuery can contain a subquery, which is another SelectQuery. Without Box, this is a compile error because the type has infinite size. This is obvious in hindsight but cost half a day of confused error messages when we first hit it.
The Numbers
BridgeQL parser stats:
Lexer: ~780 lines
Parser: ~1,400 lines
AST: ~750 lines
Total: ~2,930 lines of Rust
Token types: 63
AST node types: 52
Parse methods: 34
Test coverage: 91%
Parse latency: <1ms for queries under 1KB
~5ms for 10KB queries with subqueries
2,930 lines. That is the entire frontend for a language that handles SQL, graph patterns, hybrid queries, temporal clauses, and path traversals. Rust's expressiveness helps. A Java equivalent would be 8,000-12,000 lines to achieve the same thing with visitor patterns and class hierarchies.
The parser is the foundation that everything else builds on. The executor (bridgeql-execution), the Salesforce adapter (bridgeql-salesforce), and the audit engine (bridgeql-assess) all consume the AST. Get the AST right and the downstream code writes itself. Get it wrong and every consumer inherits the mistakes.
We got it right by keeping it simple. No inheritance. No visitors. No generic type parameters on AST nodes. Just enums, structs, and exhaustive matching. Rust makes this not just possible but pleasant.