ScanRat2 1.0.0

dotnet add package ScanRat2 --version 1.0.0
NuGet\Install-Package ScanRat2 -Version 1.0.0
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="ScanRat2" Version="1.0.0" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add ScanRat2 --version 1.0.0
#r "nuget: ScanRat2, 1.0.0"
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install ScanRat2 as a Cake Addin
#addin nuget:?package=ScanRat2&version=1.0.0

// Install ScanRat2 as a Cake Tool
#tool nuget:?package=ScanRat2&version=1.0.0

ScanRat2 - PEG Parser Combinators for F# with support for Left Recursion and Memoization

Nuget ScanRat2 is a mashup of the IronMeta PackRat parsing algorithm, and the concepts of the FParsec and Sprache projects.

Note: ScanRat2 is a fork of ScanRat https://github.com/pragmatrix/ScanRat, with few minor modifications. The code slightly modernized and now is Fable compatible.

Features

  • Automatic memoization of the parsing results.
  • Direct and mutual left recursion as specified in the paper Left Recursion in Parsing Expression Grammars.
  • Computation Expressions to conventiently parse sequences (inspired by Sprache's LinQ SelectMany "hack").

and what's missing:

  • RegExp support
  • Source indices must be accessible inside the result generators (define an additional production operator?).

Get Started

To use ScanRat2 in Visual Studio, install the NuGet package or clone this repository and add the ScanRat/ScanRat.fsproj as a reference to your project.

In your F# source file, add

open ScanRat.ScanRat

and start writing grammars. A grammar is specified as a collection of production rules. The rules are build from a number of combinators.

Generic Combinators

The generic combinators listed here can be applied to any parsing rule of any type.

  • + is the sequence combinator

      let twoDigits = digit + digit
    

    is a production rule that parses two digits, not one, not zero not three.

    Sequence combinators are left associative, which means that they are combined from left to right. The parsing result type of the + sequence combinator is a tuple that contains the parsing result of the left rule and the parsing result of the right production rule.

    The +. and .+ sequence combinators can be used to only process the result of the rule that is placed at the side of the dot.

  • |- is the choice combinator

      let eitherLeftOrRight = left |- right
    

    Parses either left or right. If both rules match the input, the first rule is preferred. Both rules must be of the same result type.

    The choice combinator is also left associative, but has a lower priority than the sequence combinators, which means that you can put sequences inside the choice rules without using parenthesis:

      let driverDecision = accellerate + overtake |- driveSlowly
    
  • --> is the production combinator

    --> is used to capture and convert the parsing result. It expects a function that takes the parsing result from the rule on the left side and converts its result.

    For example, when parsing a two digit number, and digit itself is a rule that returns an integer, the resulting number can be computed on the spot:

      let twoDigitNumber = digit + digit --> fun (digit1, digit2) -> digit1 * 10 + digit2
    

String Specific Combinators

The string specific combinators are optimized to handle string based input.

  • ~~ Parses a simple string. This is an unary combinator that is placed in front of a string to convert it to a parsing rule. The rule's return type is a string.

      let hello = ~~"Hello"
    

    defines a rule that parses the string "Hello".

  • oneOf Parses one character of a string. The rule's return type is a character.

      let oneOrTwo = oneOf "12"
    

    is a shorthand and optimized form of

      let oneOrTwo = (~~"1" |- ~~"2") --> fun str -> str.[0]
    

    To conveniently parse a digit and return the integer value of it, oneOf can be used like:

      let digit = oneOf "0123456789" --> fun c -> int(c) - int('0')
    

Parsing

Parsing is done by calling the parse function. Two arguments are required, the first one is the grammar and the second one is the input (a string for now):

let digit = oneOf "0123456789" --> fun c -> int(c) - int('0')
let r = parse digit "3"

The result of a parse is either Success or Failure:

match r with
| Success s -> s.Value
| Failure f -> failwith "error"

Recursive parsing grammars

Because a rule may need to be accessed before the point it has been defined, recursive rules are specified in a slightly different way:

let digit = oneOf "0123456789" --> fun d -> int(d) - int('0')
let digits = production "digits"
digits.rule
	<- digits + digit --> fun (a, b) -> a*10+b
	|- digit

Here the production function creates an initially named but empty production rule. After that, the rule's term can then be assigned to production's rule property. This makes it possible for the term to refer back to the production and - like in this example - specify a left recursive grammar that parses digits.

Parsing Sequences

Combining more than three items with the + sequence combinator may get a bit annoying, because each new item generates a new tuple that contains the previous result type as its first argument.

For longer sequences, there is an alternative which makes use of Computation Expressions:

The rule:

let addressRule = nameGrammar + streetGrammar + cityGrammar + phoneGrammar --> fun (((name, street), city), phone) -> { Name = name; Street = street; City = city; Phone = phone }

may also be specified by a much more readable and extensible:

parseq {
	let! name = nameGrammar
	let! street = streetGrammar
	let! city = cityGrammar
	let! phone = phoneGrammar
	return { Name = name; Street = street; City = city; Phone = phone }
}

Error handling

TBD

Limitations

  • If a direct left and right recursion is used in one rule, the algorithm incorrectly right-associates the parses as noted by Laurence Tratt in his paper.
  • The parsers that are built inside a computation expression can not be memoized, but the parsers they refer to, can. Therefore it's more efficient to refer to parsers from inside computation expressions instead of putting them inline.

Acknowledges

Many thanks go to

License

MIT

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
1.0.0 62 6/6/2024