Expressir Parsers
Purpose
This page explains how Expressir parses EXPRESS schemas from various formats into its Ruby data model. Understanding the parsing architecture is essential for troubleshooting parsing issues, optimizing performance, and working with different EXPRESS formats.
References
-
EXPRESS Language - Understanding the source language
-
Data Model - Understanding the target model
-
Guide: Format Schemas - Using the parser via CLI
-
Guide: Parsing Files - Using the parser API
Concepts
- Parser
-
Component that reads text in EXPRESS syntax and produces an Abstract Syntax Tree (AST)
- AST
-
Abstract Syntax Tree - intermediate tree representation of parsed code
- Transform
-
Conversion of AST into Expressir’s Ruby data model
- Reference Resolution
-
Process of linking references (by name) to their target definitions
- Visitor
-
Design pattern for traversing and transforming the AST
- Cache
-
Stored parsed schemas for faster subsequent loads
Parser Architecture
Expressir uses a multi-stage parsing pipeline:
┌─────────────────┐
│ EXPRESS Text │
│ (.exp file) │
└────────┬────────┘
│
▼
┌──────────┐
│ Parsanol │ (PEG parser)
│ Grammar │
└────┬─────┘
│
▼
┌────────────────┐
│ AST (Tree) │ (intermediate)
└────────┬───────┘
│
▼
┌──────────┐
│ Visitor │ (transform)
└────┬─────┘
│
▼
┌────────────────┐
│ Data Model │ (Ruby objects)
│ Repository │
└────────┬───────┘
│
▼
┌────────────────────┐
│ Reference │
│ Resolution │
└────────┬───────────┘
│
▼
┌────────────────────┐
│ Finalized Model │
│ (ready to use) │
└────────────────────┘
Backend: Parsanol
Expressir uses the Parsanol parser backend:
- Parsanol (High-Performance Rust)
-
-
High-performance Rust backend
-
18-44x faster parsing
-
99.5% fewer allocations
-
Supports source position tracking (Slice)
-
Available automatically when Parsanol gem is installed
-
Expressir automatically uses Parsanol when available:
# Automatically uses native Rust parser (Parsanol)
repo = Expressir::Express::Parser.from_file("geometry.exp")
# Check if native parser is available
if defined?(Parsanol::Native) && Parsanol::Native.available?
# Using Rust parser - 20-30x faster
end
See the Parsanol documentation for performance details.
Stage 1: Lexical Analysis and Parsing
Expressir uses Parsanol, a high-performance Parsing Expression Grammar (PEG) parser:
# Grammar rules defined in Parser class
rule(:entityDecl) do
(entityHead >> entityBody >> tEND_ENTITY >> op_delim).as(:entityDecl)
end
rule(:entityHead) do
(tENTITY >> entityId >> subsuper >> op_delim).as(:entityHead)
end
Parsanol advantages:
-
Pure Ruby: No external dependencies
-
Composable rules: Complex grammars from simple parts
-
Error reporting: Clear parse failure messages
-
Type-safe: Strongly typed AST nodes
Stage 2: AST Generation
Parsing produces a hierarchical tree structure:
# Example AST for: ENTITY person; name : STRING; END_ENTITY;
{
entityDecl: {
entityHead: {
entityId: { str: "person" },
...
},
entityBody: {
explicitAttr: [
{
attributeDecl: { str: "name" },
parameterType: { str: "STRING" }
}
]
}
}
}
Stage 3: AST Transformation
The Visitor pattern transforms AST to data model:
class Visitor
def visit_entityDecl(node)
entity = Model::Declarations::Entity.new
entity.id = node[:entityHead][:entityId]
entity.attributes = visit_attributes(node[:entityBody])
entity
end
end
Transformation responsibilities:
-
Create appropriate model objects
-
Set attributes and relationships
-
Attach parent links
-
Preserve source text (if requested)
-
Extract documentation (remarks)
Supported Formats
Expressir supports multiple EXPRESS formats:
EXPRESS Language (ISO 10303-11)
Standard textual EXPRESS format:
SCHEMA geometry_schema;
ENTITY point;
x : REAL;
y : REAL;
z : REAL;
END_ENTITY;
END_SCHEMA;
File extension: .exp
Characteristics:
-
Text-based, human-readable
-
Supports full EXPRESS language
-
Most common format
Usage:
repo = Expressir::Express::Parser.from_file("geometry.exp")
STEPmod EXPRESS XML
XML representation used in STEPmod repository:
<express>
<schema name="geometry_schema">
<entity name="point">
<explicit name="x" type="REAL"/>
<explicit name="y" type="REAL"/>
<explicit name="z" type="REAL"/>
</entity>
</schema>
</express>
File extension: .xml
Characteristics:
-
XML format
-
Modular schema organization
-
Used in ISO STEP modular repository
Note: Future support planned
Parsing Process
Single File Parsing
Parse one EXPRESS file:
# Basic parsing
repository = Expressir::Express::Parser.from_file("schema.exp")
# With options
repository = Expressir::Express::Parser.from_file(
"schema.exp",
skip_references: false, # Resolve references (default: false)
include_source: true, # Attach source text (default: nil)
root_path: "/base/path" # Base for relative paths (default: nil)
)
Process:
-
Read file content
-
Parse to AST
-
Transform to model
-
Resolve references (unless skipped)
-
Return Repository
Multiple File Parsing
Parse several files into one repository:
files = ["schema1.exp", "schema2.exp", "schema3.exp"]
# With progress tracking
repository = Expressir::Express::Parser.from_files(files) do |filename, schemas, error|
if error
puts "Error parsing #{filename}: #{error.message}"
else
puts "Loaded #{schemas.length} schemas from #{filename}"
end
end
Process:
-
Parse each file individually
-
Collect all schemas
-
Create unified Repository
-
Resolve cross-file references
-
Return complete Repository
Reference Resolution
What is Reference Resolution?
EXPRESS uses names to reference other elements:
TYPE length_measure = REAL;
END_TYPE;
ENTITY line;
length : length_measure; -- Reference to type above
END_ENTITY;
After parsing, length_measure is just a string. Reference resolution finds the actual Type object.
Resolution Process
# Automatic resolution (default)
repo = Expressir::Express::Parser.from_file("schema.exp")
# References already resolved
# Manual resolution
repo = Expressir::Express::Parser.from_file("schema.exp", skip_references: true)
# References not yet resolved
repo.resolve_all_references
# Now resolved
Interface Resolution
USE FROM and REFERENCE FROM create cross-schema references:
SCHEMA application_schema;
USE FROM geometry_schema; -- Import all
REFERENCE FROM support_schema (date); -- Import specific
ENTITY geometric_model;
base : point; -- From geometry_schema
created : date; -- From support_schema
END_ENTITY;
END_SCHEMA;
Resolution finds point in geometry_schema and date in support_schema.
Error Handling
Parse Failures
When parsing fails, Expressir raises detailed errors:
begin
repo = Expressir::Express::Parser.from_file("invalid.exp")
rescue Expressir::Express::Error::SchemaParseFailure => e
puts "Failed to parse: #{e.filename}"
puts e.message
puts e.parse_failure_cause.ascii_tree # Detailed error location
end
Error information includes:
-
File name
-
Line and column numbers
-
Expected tokens
-
Actual tokens found
-
Parse tree context
Common Parse Errors
Missing semicolon:
Expected ';' at line 10, column 5
Invalid keyword:
Unexpected keyword 'FOO' at line 15, column 3
Mismatched END statement:
Expected 'END_ENTITY' but found 'END_TYPE' at line 20
Invalid identifier:
Expected identifier at line 8, column 12
Recovery Strategies
Skip broken file:
files.each do |file|
begin
repo = Expressir::Express::Parser.from_file(file)
process(repo)
rescue Expressir::Express::Error::SchemaParseFailure => e
warn "Skipping #{file}: #{e.message}"
next
end
end
Continue parsing remaining files:
Expressir::Express::Parser.from_files(files) do |filename, schemas, error|
if error
warn "Failed: #{filename}"
else
# Process successful schemas
end
end
Performance Considerations
Benchmarking
Measure parsing performance:
require 'benchmark'
time = Benchmark.realtime do
repo = Expressir::Express::Parser.from_file("large_schema.exp")
end
puts "Parsed in #{time.round(2)} seconds"
Optimization Techniques
Skip reference resolution for analysis:
# Faster if you don't need resolved references
repo = Expressir::Express::Parser.from_file("schema.exp", skip_references: true)
Omit source text:
# Reduces memory usage
repo = Expressir::Express::Parser.from_file("schema.exp", include_source: false)
Parse in parallel (for multiple files):
require 'parallel'
repos = Parallel.map(files) do |file|
Expressir::Express::Parser.from_file(file, skip_references: true)
end
# Combine and resolve references once
combined = combine_repositories(repos)
combined.resolve_all_references
Caching
Use caching for repeated parses:
# Expressir has built-in cache support
Expressir::Express::Cache.enable
# First parse: slow
repo1 = Expressir::Express::Parser.from_file("schema.exp")
# Second parse: fast (from cache)
repo2 = Expressir::Express::Parser.from_file("schema.exp")
See Benchmark Performance guide for details.
Advanced Topics
Custom Visitors
Extend parsing with custom transformations:
class MyVisitor < Expressir::Express::Visitor
def visit_entity(node)
entity = super
# Custom processing
entity.custom_flag = true
entity
end
end
Incremental Parsing
Parse schemas on demand:
# Parse schema headers only
repos = files.map do |file|
Expressir::Express::Parser.from_file(file, skip_references: true)
end
# Parse individual schemas fully as needed
selected_repo = repos.find { |r| r.schemas.first.id == "target_schema" }
selected_repo.resolve_all_references
Parsing Best Practices
- Always handle errors
-
Use begin/rescue blocks to handle parse failures gracefully
- Validate before parsing
-
Check file existence and readability first
- Use progress callbacks
-
For multiple files, track progress with callbacks
- Skip references when possible
-
If you don’t need resolved references, skip for speed
- Cache for production
-
Enable caching for applications that parse repeatedly
- Profile large schemas
-
Use benchmarking to identify bottlenecks
- Process incrementally
-
For very large sets, parse and process one at a time
Troubleshooting
Parser Hangs
Symptom: Parser doesn’t complete
Causes:
-
Malformed file with infinite recursion
-
Very large schema
-
Memory exhaustion
Solutions:
-
Validate file structure first
-
Parse smaller chunks
-
Increase memory limits
Next Steps
Now that you understand parsing:
- Try parsing
- Learn the CLI
- Master the API
- Optimize performance
Bibliography
-
Parsanol - High-performance PEG parser for Ruby
-
PEG on Wikipedia - Understanding PEG parsers
-
EXPRESS Language - Understanding what is being parsed
-
Data Model - Understanding the parsing result