Transpilation to an Intermediate Rule-List / AST #117

Closed
opened 2026-05-16 19:22:34 +00:00 by guettli · 1 comment
guettli commented 2026-05-16 19:22:34 +00:00 (Migrated from codeberg.org)

Follow-up from #90

Transpilation to an Intermediate Rule-List / AST

Since Sieve is a declarative language, the plan is broken down into three logical phases: Parsing (converting text to an Abstract Syntax Tree), Lowering (converting the AST to your flat rule-list), and Execution (running the rules against an email object).


Phase 1: Define the Data Models (The Target Format)

Before parsing, you need to define what your intermediate rule-list looks like in pure Dart. This format should closely mirror the subset of features you support.

1.1 Action Models

Represent the specific side-effects your engine allows.

  • FileIntoAction(String folder)
  • KeepAction()
  • DiscardAction()
  • MarkAsSeenAction() // Often mapped to a flag action
  • FlagAction(List<String> flags)

1.2 Condition Models

Represent how matches are evaluated.

  • HeaderCondition(List<String> headers, String matchType, List<String> keyList)
  • SizeCondition(String comparison, int bytes)

1.3 Rule Container

A structured rule pairs conditions with actions.

class SieveRule {
  final String joinType; // 'allof', 'anyof', or 'single'
  final List<Condition> conditions;
  final List<Action> actions;
}


Phase 2: The Parser & Compiler (Text to Data)

Instead of writing a regex-heavy parser from scratch, use a parser combinator library like petitparser in Dart. It makes tokenizing Sieve commands incredibly robust.

2.1 Lexing Tokenizer

Build rules to recognize Sieve grammar tokens:

  • Commands: if, elsif, else, require
  • Test Commands: address, header, exists, allof, anyof, true
  • Tagged Arguments: :contains, :is, :matches, :comparator
  • Data Types: Strings, Block Strings (text: multi-line), and Lists (["a", "b"])

2.2 AST Generation & Normalization

As the parser reads the Sieve script, map it directly to your SieveRule objects.

  • Handle Implicit Keeps: Sieve specification dictates that if no action is explicitly taken, an implicit keep occurs. Your compiler should append a KeepAction() to the end of the script execution loop if no terminal action (discard or fileinto) is met.
  • Sifting Block Structures: Flatten elsif and else blocks into standard, sequential SieveRule blocks to simplify evaluation logic.

Phase 3: The Execution Engine (Data to Action)

The execution engine takes an incoming email representation (e.g., a MimeMessage object) and your compiled list of SieveRule objects.

3.1 Context State Tracker

Create an execution context object to track the state of the email as it passes through rules.

class SieveExecutionContext {
  bool isCancelled = false; // Set true if 'discard' is hit
  Set<String> targetFolders = {}; // For 'fileinto' actions
  Set<String> flagsToAdd = {}; // For 'flag' or 'mark-as-seen'
  bool keepInInbox = true; // Flips to false if 'fileinto' or 'discard' occurs
}

3.2 Evaluation Loop

Iterate through your compiled SieveRule objects sequentially:

  1. Evaluate Conditions: Check the email's headers/size against the rule's conditions.
  2. Execute Actions (If conditions match):
  • If fileinto, add the folder to targetFolders and set keepInInbox = false.
  • If discard, set isCancelled = true and break the loop.
  • If mark-as-seen or flag, add those flags to flagsToAdd.
  1. Finalize: After the loop ends, inspect the SieveExecutionContext to determine the final routing and state modifications to apply to the email store.

Phase 4: Verification & Testing Strategy

Because email filtering can destructive (e.g., a bug causing unintended discard actions), implement a two-tiered testing suite:

  • Unit Tests (The Engine): Pass raw SieveRule Dart objects and mock emails into the execution engine to verify string matchers (:contains, :is), case insensitivity, and action outcomes.
  • Integration Tests (The Parser + Engine): Feed raw Sieve string scripts into the pipeline and assert the final SieveExecutionContext matches expected states.

Example Sieve Test Case Matrix

Ensure your tests cover your specific subset boundary limits:

IF header :contains "subject" "Spam" -> Action: discard
IF header :is ["from", "reply-to"] "boss@work.com" -> Action: flag "\\Important", fileinto "Work"
```</String></String></String>

Follow-up from #90 Transpilation to an Intermediate Rule-List / AST Since Sieve is a declarative language, the plan is broken down into three logical phases: **Parsing** (converting text to an Abstract Syntax Tree), **Lowering** (converting the AST to your flat rule-list), and **Execution** (running the rules against an email object). --- ## Phase 1: Define the Data Models (The Target Format) Before parsing, you need to define what your intermediate rule-list looks like in pure Dart. This format should closely mirror the subset of features you support. ### 1.1 Action Models Represent the specific side-effects your engine allows. * `FileIntoAction(String folder)` * `KeepAction()` * `DiscardAction()` * `MarkAsSeenAction()` // Often mapped to a flag action * `FlagAction(List<String> flags)` ### 1.2 Condition Models Represent how matches are evaluated. * `HeaderCondition(List<String> headers, String matchType, List<String> keyList)` * `SizeCondition(String comparison, int bytes)` ### 1.3 Rule Container A structured rule pairs conditions with actions. ```dart class SieveRule { final String joinType; // 'allof', 'anyof', or 'single' final List<Condition> conditions; final List<Action> actions; } ``` --- ## Phase 2: The Parser & Compiler (Text to Data) Instead of writing a regex-heavy parser from scratch, use a parser combinator library like **`petitparser`** in Dart. It makes tokenizing Sieve commands incredibly robust. ### 2.1 Lexing Tokenizer Build rules to recognize Sieve grammar tokens: * **Commands:** `if`, `elsif`, `else`, `require` * **Test Commands:** `address`, `header`, `exists`, `allof`, `anyof`, `true` * **Tagged Arguments:** `:contains`, `:is`, `:matches`, `:comparator` * **Data Types:** Strings, Block Strings (`text:` multi-line), and Lists (`["a", "b"]`) ### 2.2 AST Generation & Normalization As the parser reads the Sieve script, map it directly to your `SieveRule` objects. * **Handle Implicit Keeps:** Sieve specification dictates that if no action is explicitly taken, an implicit `keep` occurs. Your compiler should append a `KeepAction()` to the end of the script execution loop if no terminal action (`discard` or `fileinto`) is met. * **Sifting Block Structures:** Flatten `elsif` and `else` blocks into standard, sequential `SieveRule` blocks to simplify evaluation logic. --- ## Phase 3: The Execution Engine (Data to Action) The execution engine takes an incoming email representation (e.g., a `MimeMessage` object) and your compiled list of `SieveRule` objects. ### 3.1 Context State Tracker Create an execution context object to track the state of the email as it passes through rules. ```dart class SieveExecutionContext { bool isCancelled = false; // Set true if 'discard' is hit Set<String> targetFolders = {}; // For 'fileinto' actions Set<String> flagsToAdd = {}; // For 'flag' or 'mark-as-seen' bool keepInInbox = true; // Flips to false if 'fileinto' or 'discard' occurs } ``` ### 3.2 Evaluation Loop Iterate through your compiled `SieveRule` objects sequentially: 1. **Evaluate Conditions:** Check the email's headers/size against the rule's conditions. 2. **Execute Actions (If conditions match):** * If `fileinto`, add the folder to `targetFolders` and set `keepInInbox = false`. * If `discard`, set `isCancelled = true` and break the loop. * If `mark-as-seen` or `flag`, add those flags to `flagsToAdd`. 3. **Finalize:** After the loop ends, inspect the `SieveExecutionContext` to determine the final routing and state modifications to apply to the email store. --- ## Phase 4: Verification & Testing Strategy Because email filtering can destructive (e.g., a bug causing unintended `discard` actions), implement a two-tiered testing suite: * **Unit Tests (The Engine):** Pass raw `SieveRule` Dart objects and mock emails into the execution engine to verify string matchers (`:contains`, `:is`), case insensitivity, and action outcomes. * **Integration Tests (The Parser + Engine):** Feed raw Sieve string scripts into the pipeline and assert the final `SieveExecutionContext` matches expected states. ### Example Sieve Test Case Matrix Ensure your tests cover your specific subset boundary limits: ```text IF header :contains "subject" "Spam" -> Action: discard IF header :is ["from", "reply-to"] "boss@work.com" -> Action: flag "\\Important", fileinto "Work" ```</String></String></String> ```
guettlibot commented 2026-05-16 20:57:19 +00:00 (Migrated from codeberg.org)

Implemented in commit 606958e. Added a complete Sieve transpilation pipeline in lib/core/sieve/: sealed data models (SieveCondition, SieveAction, SieveRule), SieveParser (hand-written recursive-descent, RFC 5228 subset), and SieveInterpreter. 40 unit tests pass.

Implemented in commit 606958e. Added a complete Sieve transpilation pipeline in lib/core/sieve/: sealed data models (SieveCondition, SieveAction, SieveRule), SieveParser (hand-written recursive-descent, RFC 5228 subset), and SieveInterpreter. 40 unit tests pass.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: guettli/sharedinbox#117