Parsing a Push Program

Parsing with FParsec

 

Push has a very simple program structure:

Program ::= literal | operation | ( program *)                    (1)

This is reflected in the F# code, using FParsec:                  (2)

 do pushProgramRef := choice [                 
                              pushSimple
                              pushList
                             ]
 

FParsec parses the document by combining together atomic parsers in different ways. This makes building a parser a very intuitive process. This methodology is reflected here.

pushSimple – parses “simple” Push types (integers, floats, Booleans, etc) as well as any extensions to the language.

pushList – parses lists.

Defining Recursive Parsers

The definition for pushList is simple enough:

  type PushParser<'a> = Parser<'a, unit>
 
  let ws = spaces
  let str s = pstring s

  let openList : PushParser<string> = str "(" .>> ws
  let closeList : PushParser<string> = ws >>. str ")"

  let internal pushProgram, internal pushProgramRef = createParserForwardedToRef()

  let internal listSeries = (sepEndBy pushProgram spaces) |>> PushList
  let internal pushList = between openList closeList listSeries

The problem here is that pushProgram actually needs to be defined in terms of pushList (2) above, however, pushList is defined in terms of pushProgram, per recursive formula (1). To break this circle, FParsec introduces createParserForwardedToRef() function, that lets us define pushProgram parser, before any “meat” is actually added to it. That is done later, by creating the reference cell equivalent of the pushProgram parser above (2).

Otherwise things are pretty intuitive: we define pushList as pushProgram’s separated by white-space and enclosed in brackets.

Parsing Simple Types.

Push should be completely extensible, meaning, it should be possible to implement a simple type (e.g.: URL) in any .NET language and using the hooks of the system just plug it in without recompilation. This means a mechanism is required to plug into the current parsing framework. We will return to this later.

Simple parsing is done by first defining some stock simple type parsers:

// push identifier is a regular identifer
  let internal commonIdentifier : PushParser<string> = identifier (IdentifierOptions(isAsciiIdStart = isAsciiIdStart, 
                                                                        isAsciiIdContinue = isAsciiIdContinue))

  let internal pushType = commonIdentifier >>= findType
  let internal pushOp = tuple2 (pushType .>> str ".") stringTokenNoDot >>= findOp
  let internal pushOperation = pushOp |>> Operation
 
  let nodot : PushParser<unit> = notFollowedByString "."

  // values of simple types
  let internal pushIdentifier = commonIdentifier >>= validIdentifier .>> nodot |>> createIdentifier >>= createValue

  let internal pushSimple = choice [
                                  pushSimpleTypes
                                  attempt pushIdentifier
                                  attempt pushOperation
                                  attempt pushLiteral
                                   ]


pushIdentifier is an identifer that can be used in something like CODE.DEFINE operation introduced in Push 3. Essentially, this operation defines a macro by assigning the top of CODE stack to an identifier. pushIdentifier should be:

  • A valid identifier (commonIdentifer)
  • Not a Push keyword (result piped into validIdentifier parser, that assures just that)
  • Not followed by a “.” (should be followed by the “nodot” parser).

 

pushOp a tuple of (type, operationName):is piped into the findOp parser to determine the valid operation INTEGER.+ is represented as (“INTEGER”, “+”) and the check of whether such operation exists is performed.

pushType has all the characteristics of a common identifier, except that it has to be a member of the type keywords list (INTEGER, BOOLEAN, etc), hence the result of applying commonIdentifier parser is piped into findType parser for validation.

These three parsers, combined with the pushList parser can parse expressions like:


((a B) CODE.QUOTE INTEGER.- “file.push” EXEC.SAVE)

pushLiteral is shamelessly stolen from FParsec samples:

    let stringLiteral : PushParser<string> =
        let escape =  anyOf "\"\\/bfnrt"
                      |>> function
                          | 'b' -> "\b"
                          | 'f' -> "\u000C"
                          | 'n' -> "\n"
                          | 'r' -> "\r"
                          | 't' -> "\t"
                          | c   -> string c // every other char is mapped to itself
 
        let unicodeEscape =
            str "u" >>. pipe4 hex hex hex hex (fun h3 h2 h1 h0 ->
                let hex2int c = (int c &&& 15) + (int c >>> 6)*9 // hex char to int
                (hex2int h3)*4096 + (hex2int h2)*256 + (hex2int h1)*16 + hex2int h0
                |> char |> string
            )
 
        between (str "\"") (str "\"")
                (stringsSepBy (manySatisfy (fun c -> c <> '"' && c <> '\\'))
                              (str "\\" >>. (unicodeEscape <|> escape)))


pushSimpleTypes is used to parse every other possible Push type: those described in the documentation as well as any possible extension. More about it in the next post.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.