Parsers

Purpose

This page explains how Expressir parses EXPRESS schemas from various formats into its Ruby data model. Understanding the parsing architecture is essential for troubleshooting parsing issues, optimizing performance, and working with different EXPRESS formats.

References

EXPRESS Language - Understanding the source language
Data Model - Understanding the target model
Guide: Format Schemas - Using the parser via CLI
Guide: Parsing Files - Using the parser API

Concepts

Parser: Component that reads text in EXPRESS syntax and produces an Abstract Syntax Tree (AST)
AST: Abstract Syntax Tree - intermediate tree representation of parsed code
Transform: Conversion of AST into Expressir’s Ruby data model
Reference Resolution: Process of linking references (by name) to their target definitions
Visitor: Design pattern for traversing and transforming the AST
Cache: Stored parsed schemas for faster subsequent loads

Parser Architecture

Expressir uses a multi-stage parsing pipeline:

┌─────────────────┐
│  EXPRESS Text   │
│   (.exp file)   │
└────────┬────────┘
         │
         ▼
   ┌──────────┐
   │  Parsanol │ (PEG parser)
   │  Grammar  │
   └────┬─────┘
        │
        ▼
┌────────────────┐
│   AST (Tree)   │ (intermediate)
└────────┬───────┘
         │
         ▼
   ┌──────────┐
   │ Visitor  │ (transform)
   └────┬─────┘
        │
        ▼
┌────────────────┐
│  Data Model    │ (Ruby objects)
│  Repository    │
└────────┬───────┘
         │
         ▼
┌────────────────────┐
│ Reference         │
│ Resolution        │
└────────┬───────────┘
         │
         ▼
┌────────────────────┐
│ Finalized Model    │
│ (ready to use)     │
└────────────────────┘

Backend: Parsanol

Expressir uses the Parsanol parser backend:

Parsanol (High-Performance Rust)

High-performance Rust backend
18-44x faster parsing
99.5% fewer allocations
Supports source position tracking (Slice)
Available automatically when Parsanol gem is installed

Expressir automatically uses Parsanol when available:

# Automatically uses native Rust parser (Parsanol)
repo = Expressir::Express::Parser.from_file("geometry.exp")

# Check if native parser is available
if defined?(Parsanol::Native) && Parsanol::Native.available?
  # Using Rust parser - 20-30x faster
end

See the Parsanol documentation for performance details.

Stage 1: Lexical Analysis and Parsing

Expressir uses Parsanol, a high-performance Parsing Expression Grammar (PEG) parser:

# Grammar rules defined in Parser class
rule(:entityDecl) do
  (entityHead >> entityBody >> tEND_ENTITY >> op_delim).as(:entityDecl)
end

rule(:entityHead) do
  (tENTITY >> entityId >> subsuper >> op_delim).as(:entityHead)
end

Parsanol advantages:

Pure Ruby: No external dependencies
Composable rules: Complex grammars from simple parts
Error reporting: Clear parse failure messages
Type-safe: Strongly typed AST nodes

Stage 2: AST Generation

Parsing produces a hierarchical tree structure:

# Example AST for: ENTITY person; name : STRING; END_ENTITY;
{
  entityDecl: {
    entityHead: {
      entityId: { str: "person" },
      ...
    },
    entityBody: {
      explicitAttr: [
        {
          attributeDecl: { str: "name" },
          parameterType: { str: "STRING" }
        }
      ]
    }
  }
}

Stage 3: AST Transformation

The Visitor pattern transforms AST to data model:

class Visitor
  def visit_entityDecl(node)
    entity = Model::Declarations::Entity.new
    entity.id = node[:entityHead][:entityId]
    entity.attributes = visit_attributes(node[:entityBody])
    entity
  end
end

Transformation responsibilities:

Create appropriate model objects
Set attributes and relationships
Attach parent links
Preserve source text (if requested)
Extract documentation (remarks)

Stage 4: Reference Resolution

Final stage links references to definitions:

# Before resolution
attribute.type  # => SimpleReference(id: "length_measure")

# After resolution
attribute.type.ref  # => Type(id: "length_measure")

Supported Formats

Expressir supports multiple EXPRESS formats:

EXPRESS Language (ISO 10303-11)

Standard textual EXPRESS format:

SCHEMA geometry_schema;
  ENTITY point;
    x : REAL;
    y : REAL;
    z : REAL;
  END_ENTITY;
END_SCHEMA;

File extension: .exp

Characteristics:

Text-based, human-readable
Supports full EXPRESS language
Most common format

Usage:

repo = Expressir::Express::Parser.from_file("geometry.exp")

STEPmod EXPRESS XML

XML representation used in STEPmod repository:

<express>
  <schema name="geometry_schema">
    <entity name="point">
      <explicit name="x" type="REAL"/>
      <explicit name="y" type="REAL"/>
      <explicit name="z" type="REAL"/>
    </entity>
  </schema>
</express>

File extension: .xml

Characteristics:

XML format
Modular schema organization
Used in ISO STEP modular repository

Note: Future support planned

EXPRESS XML (ISO 10303-28)

Standardized XML representation:

File extension: .xml

Characteristics:

Follows ISO 10303-28 specification
Designed for data exchange
Precise mapping to EXPRESS constructs

Note: Future support planned

Parsing Process

Single File Parsing

Parse one EXPRESS file:

# Basic parsing
repository = Expressir::Express::Parser.from_file("schema.exp")

# With options
repository = Expressir::Express::Parser.from_file(
  "schema.exp",
  skip_references: false,   # Resolve references (default: false)
  include_source: true,     # Attach source text (default: nil)
  root_path: "/base/path"   # Base for relative paths (default: nil)
)

Process:

Read file content
Parse to AST
Transform to model
Resolve references (unless skipped)
Return Repository

Multiple File Parsing

Parse several files into one repository:

files = ["schema1.exp", "schema2.exp", "schema3.exp"]

# With progress tracking
repository = Expressir::Express::Parser.from_files(files) do |filename, schemas, error|
  if error
    puts "Error parsing #{filename}: #{error.message}"
  else
    puts "Loaded #{schemas.length} schemas from #{filename}"
  end
end

Process:

Parse each file individually
Collect all schemas
Create unified Repository
Resolve cross-file references
Return complete Repository

String Parsing

Parse EXPRESS from string:

express_code = <<~EXPRESS
  SCHEMA example;
    ENTITY person;
      name : STRING;
    END_ENTITY;
  END_SCHEMA;
EXPRESS

repository = Expressir::Express::Parser.from_exp(express_code)

Use cases:

Testing
Dynamic schema generation
Template processing
Schema fragments

Reference Resolution

What is Reference Resolution?

EXPRESS uses names to reference other elements:

TYPE length_measure = REAL;
END_TYPE;

ENTITY line;
  length : length_measure;  -- Reference to type above
END_ENTITY;

After parsing, length_measure is just a string. Reference resolution finds the actual Type object.

Resolution Process

# Automatic resolution (default)
repo = Expressir::Express::Parser.from_file("schema.exp")
# References already resolved

# Manual resolution
repo = Expressir::Express::Parser.from_file("schema.exp", skip_references: true)
# References not yet resolved
repo.resolve_all_references
# Now resolved

Interface Resolution

USE FROM and REFERENCE FROM create cross-schema references:

SCHEMA application_schema;
  USE FROM geometry_schema;  -- Import all
  REFERENCE FROM support_schema (date);  -- Import specific

  ENTITY geometric_model;
    base : point;  -- From geometry_schema
    created : date;  -- From support_schema
  END_ENTITY;
END_SCHEMA;

Resolution finds point in geometry_schema and date in support_schema.

Resolution Scope

Resolution searches in order:

Current entity/function (local scope)
Current schema (schema-level declarations)
Interfaced schemas (USE FROM / REFERENCE FROM)
Parent scopes (for nested contexts)

Unresolved References

If a reference cannot be resolved:

attribute.type.ref  # => nil (not found)

This typically indicates:

Typo in reference name
Missing interface declaration
Missing schema in repository
Incorrect schema order

Error Handling

Parse Failures

When parsing fails, Expressir raises detailed errors:

begin
  repo = Expressir::Express::Parser.from_file("invalid.exp")
rescue Expressir::Express::Error::SchemaParseFailure => e
  puts "Failed to parse: #{e.filename}"
  puts e.message
  puts e.parse_failure_cause.ascii_tree  # Detailed error location
end

Error information includes:

File name
Line and column numbers
Expected tokens
Actual tokens found
Parse tree context

Common Parse Errors

Missing semicolon:

Expected ';' at line 10, column 5

Invalid keyword:

Unexpected keyword 'FOO' at line 15, column 3

Mismatched END statement:

Expected 'END_ENTITY' but found 'END_TYPE' at line 20

Invalid identifier:

Expected identifier at line 8, column 12

Recovery Strategies

Skip broken file:

files.each do |file|
  begin
    repo = Expressir::Express::Parser.from_file(file)
    process(repo)
  rescue Expressir::Express::Error::SchemaParseFailure => e
    warn "Skipping #{file}: #{e.message}"
    next
  end
end

Continue parsing remaining files:

Expressir::Express::Parser.from_files(files) do |filename, schemas, error|
  if error
    warn "Failed: #{filename}"
  else
    # Process successful schemas
  end
end

Performance Considerations

Benchmarking

Measure parsing performance:

require 'benchmark'

time = Benchmark.realtime do
  repo = Expressir::Express::Parser.from_file("large_schema.exp")
end

puts "Parsed in #{time.round(2)} seconds"

Optimization Techniques

Skip reference resolution for analysis:

# Faster if you don't need resolved references
repo = Expressir::Express::Parser.from_file("schema.exp", skip_references: true)

Omit source text:

# Reduces memory usage
repo = Expressir::Express::Parser.from_file("schema.exp", include_source: false)

Parse in parallel (for multiple files):

require 'parallel'

repos = Parallel.map(files) do |file|
  Expressir::Express::Parser.from_file(file, skip_references: true)
end

# Combine and resolve references once
combined = combine_repositories(repos)
combined.resolve_all_references

Caching

Use caching for repeated parses:

# Expressir has built-in cache support
Expressir::Express::Cache.enable

# First parse: slow
repo1 = Expressir::Express::Parser.from_file("schema.exp")

# Second parse: fast (from cache)
repo2 = Expressir::Express::Parser.from_file("schema.exp")

See Benchmark Performance guide for details.

Advanced Topics

Custom Visitors

Extend parsing with custom transformations:

class MyVisitor < Expressir::Express::Visitor
  def visit_entity(node)
    entity = super
    # Custom processing
    entity.custom_flag = true
    entity
  end
end

Incremental Parsing

Parse schemas on demand:

# Parse schema headers only
repos = files.map do |file|
  Expressir::Express::Parser.from_file(file, skip_references: true)
end

# Parse individual schemas fully as needed
selected_repo = repos.find { |r| r.schemas.first.id == "target_schema" }
selected_repo.resolve_all_references

Grammar Extension

Expressir’s grammar can be extended for custom syntax:

class CustomParser < Expressir::Express::Parser::Parser
  rule(:custom_construct) do
    # Custom grammar rules
  end
end

Parsing Best Practices

Always handle errors: Use begin/rescue blocks to handle parse failures gracefully
Validate before parsing: Check file existence and readability first
Use progress callbacks: For multiple files, track progress with callbacks
Skip references when possible: If you don’t need resolved references, skip for speed
Cache for production: Enable caching for applications that parse repeatedly
Profile large schemas: Use benchmarking to identify bottlenecks
Process incrementally: For very large sets, parse and process one at a time

Troubleshooting

Parser Hangs

Symptom: Parser doesn’t complete

Causes:

Malformed file with infinite recursion
Very large schema
Memory exhaustion

Solutions:

Validate file structure first
Parse smaller chunks
Increase memory limits

Reference Resolution Fails

Symptom: Many unresolved references

Causes:

Missing interface declarations
Incorrect schema order
Typos in names

Solutions:

Check USE FROM / REFERENCE FROM
Parse schemas in dependency order
Validate names match

Memory Issues

Symptom: Out of memory errors

Causes:

Very large schemas
Including source text
Parsing many files at once

Solutions:

Parse incrementally
Skip source text inclusion
Use streaming approaches

Next Steps

Now that you understand parsing:

Try parsing: Parse your first schema
Learn the CLI: Format schemas with CLI
Master the API: Parse files programmatically
Optimize performance: Benchmark and optimize

Bibliography

Parsanol - High-performance PEG parser for Ruby
PEG on Wikipedia - Understanding PEG parsers
EXPRESS Language - Understanding what is being parsed
Data Model - Understanding the parsing result

Expressir Parsers

Purpose

References

Concepts

Parser Architecture

Backend: Parsanol

Stage 1: Lexical Analysis and Parsing

Stage 2: AST Generation

Stage 3: AST Transformation

Stage 4: Reference Resolution

Supported Formats

EXPRESS Language (ISO 10303-11)

STEPmod EXPRESS XML

EXPRESS XML (ISO 10303-28)

Parsing Process

Single File Parsing

Multiple File Parsing

String Parsing

Reference Resolution

What is Reference Resolution?

Resolution Process

Interface Resolution

Resolution Scope

Unresolved References

Error Handling

Parse Failures

Common Parse Errors

Recovery Strategies

Performance Considerations

Benchmarking

Optimization Techniques

Caching

Advanced Topics

Custom Visitors

Incremental Parsing

Grammar Extension

Parsing Best Practices

Troubleshooting

Parser Hangs

Reference Resolution Fails

Memory Issues

Next Steps

Bibliography