Parsing with FParsec
Push has a very simple program structure:
Program ::= literal | operation | ( program *) (1)
This is reflected in the F# code, using FParsec: (2)
do pushProgramRef := choice [ pushSimple pushList ]
FParsec parses the document by combining together atomic parsers in different ways. This makes building a parser a very intuitive process. This methodology is reflected here.
pushSimple – parses “simple” Push types (integers, floats, Booleans, etc) as well as any extensions to the language.
pushList – parses lists.
Defining Recursive Parsers
The definition for pushList is simple enough:
type PushParser<'a> = Parser<'a, unit> let ws = spaces let str s = pstring s let openList : PushParser<string> = str "(" .>> ws let closeList : PushParser<string> = ws >>. str ")" let internal pushProgram, internal pushProgramRef = createParserForwardedToRef() let internal listSeries = (sepEndBy pushProgram spaces) |>> PushList let internal pushList = between openList closeList listSeries
The problem here is that pushProgram actually needs to be defined in terms of pushList (2) above, however, pushList is defined in terms of pushProgram, per recursive formula (1). To break this circle, FParsec introduces createParserForwardedToRef() function, that lets us define pushProgram parser, before any “meat” is actually added to it. That is done later, by creating the reference cell equivalent of the pushProgram parser above (2).
Otherwise things are pretty intuitive: we define pushList as pushProgram’s separated by white-space and enclosed in brackets.
Parsing Simple Types.
Push should be completely extensible, meaning, it should be possible to implement a simple type (e.g.: URL) in any .NET language and using the hooks of the system just plug it in without recompilation. This means a mechanism is required to plug into the current parsing framework. We will return to this later.
Simple parsing is done by first defining some stock simple type parsers:
// push identifier is a regular identifer let internal commonIdentifier : PushParser<string> = identifier (IdentifierOptions(isAsciiIdStart = isAsciiIdStart, isAsciiIdContinue = isAsciiIdContinue)) let internal pushType = commonIdentifier >>= findType let internal pushOp = tuple2 (pushType .>> str ".") stringTokenNoDot >>= findOp let internal pushOperation = pushOp |>> Operation let nodot : PushParser<unit> = notFollowedByString "." // values of simple types let internal pushIdentifier = commonIdentifier >>= validIdentifier .>> nodot |>> createIdentifier >>= createValue let internal pushSimple = choice [ pushSimpleTypes attempt pushIdentifier attempt pushOperation attempt pushLiteral ]
pushIdentifier is an identifer that can be used in something like CODE.DEFINE operation introduced in Push 3. Essentially, this operation defines a macro by assigning the top of CODE stack to an identifier. pushIdentifier should be:
- A valid identifier (commonIdentifer)
- Not a Push keyword (result piped into validIdentifier parser, that assures just that)
- Not followed by a “.” (should be followed by the “nodot” parser).
pushOp a tuple of (type, operationName):is piped into the findOp parser to determine the valid operation INTEGER.+ is represented as (“INTEGER”, “+”) and the check of whether such operation exists is performed.
pushType has all the characteristics of a common identifier, except that it has to be a member of the type keywords list (INTEGER, BOOLEAN, etc), hence the result of applying commonIdentifier parser is piped into findType parser for validation.
These three parsers, combined with the pushList parser can parse expressions like:
((a B) CODE.QUOTE INTEGER.- “file.push” EXEC.SAVE)
pushLiteral is shamelessly stolen from FParsec samples:
let stringLiteral : PushParser<string> = let escape = anyOf "\"\\/bfnrt" |>> function | 'b' -> "\b" | 'f' -> "\u000C" | 'n' -> "\n" | 'r' -> "\r" | 't' -> "\t" | c -> string c // every other char is mapped to itself let unicodeEscape = str "u" >>. pipe4 hex hex hex hex (fun h3 h2 h1 h0 -> let hex2int c = (int c &&& 15) + (int c >>> 6)*9 // hex char to int (hex2int h3)*4096 + (hex2int h2)*256 + (hex2int h1)*16 + hex2int h0 |> char |> string ) between (str "\"") (str "\"") (stringsSepBy (manySatisfy (fun c -> c <> '"' && c <> '\\')) (str "\\" >>. (unicodeEscape <|> escape)))
pushSimpleTypes is used to parse every other possible Push type: those described in the documentation as well as any possible extension. More about it in the next post.