Expressir Parsers

Purpose

This page explains how Expressir parses EXPRESS schemas from various formats into its Ruby data model. Understanding the parsing architecture is essential for troubleshooting parsing issues, optimizing performance, and working with different EXPRESS formats.

References

Concepts

Parser

Component that reads text in EXPRESS syntax and produces an Abstract Syntax Tree (AST)

AST

Abstract Syntax Tree - intermediate tree representation of parsed code

Transform

Conversion of AST into Expressir’s Ruby data model

Reference Resolution

Process of linking references (by name) to their target definitions

Visitor

Design pattern for traversing and transforming the AST

Cache

Stored parsed schemas for faster subsequent loads

Parser Architecture

Expressir uses a multi-stage parsing pipeline:

┌─────────────────┐
│  EXPRESS Text   │
│   (.exp file)   │
└────────┬────────┘
         │
         ▼
   ┌──────────┐
   │  Parsanol │ (PEG parser)
   │  Grammar  │
   └────┬─────┘
        │
        ▼
┌────────────────┐
│   AST (Tree)   │ (intermediate)
└────────┬───────┘
         │
         ▼
   ┌──────────┐
   │ Visitor  │ (transform)
   └────┬─────┘
        │
        ▼
┌────────────────┐
│  Data Model    │ (Ruby objects)
│  Repository    │
└────────┬───────┘
         │
         ▼
┌────────────────────┐
│ Reference         │
│ Resolution        │
└────────┬───────────┘
         │
         ▼
┌────────────────────┐
│ Finalized Model    │
│ (ready to use)     │
└────────────────────┘

Backend: Parsanol

Expressir uses the Parsanol parser backend:

Parsanol (High-Performance Rust)
  • High-performance Rust backend

  • 18-44x faster parsing

  • 99.5% fewer allocations

  • Supports source position tracking (Slice)

  • Available automatically when Parsanol gem is installed

Expressir automatically uses Parsanol when available:

# Automatically uses native Rust parser (Parsanol)
repo = Expressir::Express::Parser.from_file("geometry.exp")

# Check if native parser is available
if defined?(Parsanol::Native) && Parsanol::Native.available?
  # Using Rust parser - 20-30x faster
end

See the Parsanol documentation for performance details.

Stage 1: Lexical Analysis and Parsing

Expressir uses Parsanol, a high-performance Parsing Expression Grammar (PEG) parser:

# Grammar rules defined in Parser class
rule(:entityDecl) do
  (entityHead >> entityBody >> tEND_ENTITY >> op_delim).as(:entityDecl)
end

rule(:entityHead) do
  (tENTITY >> entityId >> subsuper >> op_delim).as(:entityHead)
end

Parsanol advantages:

  • Pure Ruby: No external dependencies

  • Composable rules: Complex grammars from simple parts

  • Error reporting: Clear parse failure messages

  • Type-safe: Strongly typed AST nodes

Stage 2: AST Generation

Parsing produces a hierarchical tree structure:

# Example AST for: ENTITY person; name : STRING; END_ENTITY;
{
  entityDecl: {
    entityHead: {
      entityId: { str: "person" },
      ...
    },
    entityBody: {
      explicitAttr: [
        {
          attributeDecl: { str: "name" },
          parameterType: { str: "STRING" }
        }
      ]
    }
  }
}

Stage 3: AST Transformation

The Visitor pattern transforms AST to data model:

class Visitor
  def visit_entityDecl(node)
    entity = Model::Declarations::Entity.new
    entity.id = node[:entityHead][:entityId]
    entity.attributes = visit_attributes(node[:entityBody])
    entity
  end
end

Transformation responsibilities:

  • Create appropriate model objects

  • Set attributes and relationships

  • Attach parent links

  • Preserve source text (if requested)

  • Extract documentation (remarks)

Stage 4: Reference Resolution

Final stage links references to definitions:

# Before resolution
attribute.type  # => SimpleReference(id: "length_measure")

# After resolution
attribute.type.ref  # => Type(id: "length_measure")

Supported Formats

Expressir supports multiple EXPRESS formats:

EXPRESS Language (ISO 10303-11)

Standard textual EXPRESS format:

SCHEMA geometry_schema;
  ENTITY point;
    x : REAL;
    y : REAL;
    z : REAL;
  END_ENTITY;
END_SCHEMA;

File extension: .exp

Characteristics:

  • Text-based, human-readable

  • Supports full EXPRESS language

  • Most common format

Usage:

repo = Expressir::Express::Parser.from_file("geometry.exp")

STEPmod EXPRESS XML

XML representation used in STEPmod repository:

<express>
  <schema name="geometry_schema">
    <entity name="point">
      <explicit name="x" type="REAL"/>
      <explicit name="y" type="REAL"/>
      <explicit name="z" type="REAL"/>
    </entity>
  </schema>
</express>

File extension: .xml

Characteristics:

  • XML format

  • Modular schema organization

  • Used in ISO STEP modular repository

Note: Future support planned

EXPRESS XML (ISO 10303-28)

Standardized XML representation:

File extension: .xml

Characteristics:

  • Follows ISO 10303-28 specification

  • Designed for data exchange

  • Precise mapping to EXPRESS constructs

Note: Future support planned

Parsing Process

Single File Parsing

Parse one EXPRESS file:

# Basic parsing
repository = Expressir::Express::Parser.from_file("schema.exp")

# With options
repository = Expressir::Express::Parser.from_file(
  "schema.exp",
  skip_references: false,   # Resolve references (default: false)
  include_source: true,     # Attach source text (default: nil)
  root_path: "/base/path"   # Base for relative paths (default: nil)
)

Process:

  1. Read file content

  2. Parse to AST

  3. Transform to model

  4. Resolve references (unless skipped)

  5. Return Repository

Multiple File Parsing

Parse several files into one repository:

files = ["schema1.exp", "schema2.exp", "schema3.exp"]

# With progress tracking
repository = Expressir::Express::Parser.from_files(files) do |filename, schemas, error|
  if error
    puts "Error parsing #{filename}: #{error.message}"
  else
    puts "Loaded #{schemas.length} schemas from #{filename}"
  end
end

Process:

  1. Parse each file individually

  2. Collect all schemas

  3. Create unified Repository

  4. Resolve cross-file references

  5. Return complete Repository

String Parsing

Parse EXPRESS from string:

express_code = <<~EXPRESS
  SCHEMA example;
    ENTITY person;
      name : STRING;
    END_ENTITY;
  END_SCHEMA;
EXPRESS

repository = Expressir::Express::Parser.from_exp(express_code)

Use cases:

  • Testing

  • Dynamic schema generation

  • Template processing

  • Schema fragments

Reference Resolution

What is Reference Resolution?

EXPRESS uses names to reference other elements:

TYPE length_measure = REAL;
END_TYPE;

ENTITY line;
  length : length_measure;  -- Reference to type above
END_ENTITY;

After parsing, length_measure is just a string. Reference resolution finds the actual Type object.

Resolution Process

# Automatic resolution (default)
repo = Expressir::Express::Parser.from_file("schema.exp")
# References already resolved

# Manual resolution
repo = Expressir::Express::Parser.from_file("schema.exp", skip_references: true)
# References not yet resolved
repo.resolve_all_references
# Now resolved

Interface Resolution

USE FROM and REFERENCE FROM create cross-schema references:

SCHEMA application_schema;
  USE FROM geometry_schema;  -- Import all
  REFERENCE FROM support_schema (date);  -- Import specific

  ENTITY geometric_model;
    base : point;  -- From geometry_schema
    created : date;  -- From support_schema
  END_ENTITY;
END_SCHEMA;

Resolution finds point in geometry_schema and date in support_schema.

Resolution Scope

Resolution searches in order:

  1. Current entity/function (local scope)

  2. Current schema (schema-level declarations)

  3. Interfaced schemas (USE FROM / REFERENCE FROM)

  4. Parent scopes (for nested contexts)

Unresolved References

If a reference cannot be resolved:

attribute.type.ref  # => nil (not found)

This typically indicates:

  • Typo in reference name

  • Missing interface declaration

  • Missing schema in repository

  • Incorrect schema order

Error Handling

Parse Failures

When parsing fails, Expressir raises detailed errors:

begin
  repo = Expressir::Express::Parser.from_file("invalid.exp")
rescue Expressir::Express::Error::SchemaParseFailure => e
  puts "Failed to parse: #{e.filename}"
  puts e.message
  puts e.parse_failure_cause.ascii_tree  # Detailed error location
end

Error information includes:

  • File name

  • Line and column numbers

  • Expected tokens

  • Actual tokens found

  • Parse tree context

Common Parse Errors

Missing semicolon:

Expected ';' at line 10, column 5

Invalid keyword:

Unexpected keyword 'FOO' at line 15, column 3

Mismatched END statement:

Expected 'END_ENTITY' but found 'END_TYPE' at line 20

Invalid identifier:

Expected identifier at line 8, column 12

Recovery Strategies

Skip broken file:

files.each do |file|
  begin
    repo = Expressir::Express::Parser.from_file(file)
    process(repo)
  rescue Expressir::Express::Error::SchemaParseFailure => e
    warn "Skipping #{file}: #{e.message}"
    next
  end
end

Continue parsing remaining files:

Expressir::Express::Parser.from_files(files) do |filename, schemas, error|
  if error
    warn "Failed: #{filename}"
  else
    # Process successful schemas
  end
end

Performance Considerations

Benchmarking

Measure parsing performance:

require 'benchmark'

time = Benchmark.realtime do
  repo = Expressir::Express::Parser.from_file("large_schema.exp")
end

puts "Parsed in #{time.round(2)} seconds"

Optimization Techniques

Skip reference resolution for analysis:

# Faster if you don't need resolved references
repo = Expressir::Express::Parser.from_file("schema.exp", skip_references: true)

Omit source text:

# Reduces memory usage
repo = Expressir::Express::Parser.from_file("schema.exp", include_source: false)

Parse in parallel (for multiple files):

require 'parallel'

repos = Parallel.map(files) do |file|
  Expressir::Express::Parser.from_file(file, skip_references: true)
end

# Combine and resolve references once
combined = combine_repositories(repos)
combined.resolve_all_references

Caching

Use caching for repeated parses:

# Expressir has built-in cache support
Expressir::Express::Cache.enable

# First parse: slow
repo1 = Expressir::Express::Parser.from_file("schema.exp")

# Second parse: fast (from cache)
repo2 = Expressir::Express::Parser.from_file("schema.exp")

See Benchmark Performance guide for details.

Advanced Topics

Custom Visitors

Extend parsing with custom transformations:

class MyVisitor < Expressir::Express::Visitor
  def visit_entity(node)
    entity = super
    # Custom processing
    entity.custom_flag = true
    entity
  end
end

Incremental Parsing

Parse schemas on demand:

# Parse schema headers only
repos = files.map do |file|
  Expressir::Express::Parser.from_file(file, skip_references: true)
end

# Parse individual schemas fully as needed
selected_repo = repos.find { |r| r.schemas.first.id == "target_schema" }
selected_repo.resolve_all_references

Grammar Extension

Expressir’s grammar can be extended for custom syntax:

class CustomParser < Expressir::Express::Parser::Parser
  rule(:custom_construct) do
    # Custom grammar rules
  end
end

Parsing Best Practices

Always handle errors

Use begin/rescue blocks to handle parse failures gracefully

Validate before parsing

Check file existence and readability first

Use progress callbacks

For multiple files, track progress with callbacks

Skip references when possible

If you don’t need resolved references, skip for speed

Cache for production

Enable caching for applications that parse repeatedly

Profile large schemas

Use benchmarking to identify bottlenecks

Process incrementally

For very large sets, parse and process one at a time

Troubleshooting

Parser Hangs

Symptom: Parser doesn’t complete

Causes:

  • Malformed file with infinite recursion

  • Very large schema

  • Memory exhaustion

Solutions:

  • Validate file structure first

  • Parse smaller chunks

  • Increase memory limits

Reference Resolution Fails

Symptom: Many unresolved references

Causes:

  • Missing interface declarations

  • Incorrect schema order

  • Typos in names

Solutions:

  • Check USE FROM / REFERENCE FROM

  • Parse schemas in dependency order

  • Validate names match

Memory Issues

Symptom: Out of memory errors

Causes:

  • Very large schemas

  • Including source text

  • Parsing many files at once

Solutions:

  • Parse incrementally

  • Skip source text inclusion

  • Use streaming approaches

Next Steps

Now that you understand parsing:

Try parsing

Parse your first schema

Learn the CLI

Format schemas with CLI

Master the API

Parse files programmatically

Optimize performance

Benchmark and optimize

Bibliography