Archive

Archive for the ‘FParsec’ Category

FParsec: Creating a Parser Dynamically

January 27, 2012 Leave a comment

In one of the previous posts the following definition for parsing simple Push types was discussed:

let internal pushSimple = choice [
                                   pushSimpleTypes
                                   attempt pushIdentifier
                                   attempt pushOperation
                                   attempt pushLiteral
                                 ]

The most noteworthy entry here is pushSimpleTypes. This parser is created dynamically from internally implemented Push types (INTEGER, FLOAT, etc) and may be dynamically extended if it becomes necessary to parse a new type.

The dynamic parser is built by exposing a delegate type:

// override this delegate to parse extended types
ExtendedTypeParser = delegate of string -> PushTypeBase</pre>

This delegate, implemented by a developer simply takes a string and returns an object of a type, derived from PushTypeBase. More about this type in later posts.

The delegate does not have to be implemented if a type does not provide its own parser. This is the case with CODE, EXEC, and NAME types. When a type is implemented, the delegate is bound to the Parser property of the object that implements the given type. This property may have a value of Unchecked.defaultof<ExtendedTypeParser>, or null in C#.

Otherwise the property returns the actual parsing function. Here is the code in the case of FLOAT type.

    // custom parsing
    static member parse s =
        let result = ref Unchecked.defaultof<float>
        if not (System.Double.TryParse(s, result)) 
        then 
            Unchecked.defaultof<PushTypeBase> 
        else 
            new Float(!result) :> PushTypeBase

    override t.Parser 
        with get() = 
            ExtendedTypeParser(Float.parse)

Given this code, we can now create an FParsec type of parser dynamically:

let internal pushExtended (dlgt : ExtendedTypeParser) = attempt (stringToken >>= (dlgt.Invoke >> createValue))

This parser first parses a string token and then pipes it into a function that is actually a composition of two functions: one is the delegate doing the parsing and converting the token to a value of PushTypeBase derivative type, the result of which is taken and converted into a parser of type PushParser<Push>, where Push is a representation of the simple object plugged into the AST that the parser produces. createValue  has a familiar signature:

    let createValue (value : #PushTypeBase) =
        fun stream ->
            let mutable reply = new Reply<Push>()
            if Unchecked.defaultof<#PushTypeBase> = value
            then 
                reply.Status <- Error
                reply.Error <- messageError("Delegate parser returned null")
            else
                reply.Status <- Ok
                reply.Result <- Value(value)
            reply    

Now for the actual parsers, we first discover all of our extended types and collect them. We then look at the collection and for each of the type gather their Parser properties. This collection of parsers is then filtered and an FParsec type parser is created for each of the entries.

    // takes the map of types, returns the parsers for these types
    let discoverParsers =
        stockTypes.Types
        |> Map.map (fun key value -> 
            value.GetType().GetProperties(BindingFlags.Public ||| BindingFlags.NonPublic ||| BindingFlags.Instance)
            |> Array.find (fun p -> p.PropertyType = typeof<ExtendedTypeParser>))
    

    let createLiteral (s : string) = fst (createPushObject (stockTypes.Types.["LITERAL"].GetType()) [| s.Trim([|'\"'|]) |])

    let internal pushExtended (dlgt : ExtendedTypeParser) = attempt (stringToken >>= (dlgt.Invoke >> createValue))
    let internal pushLiteral = stringLiteral |>> createLiteral >>= createValue
     
    // dynamically create the list of simple type parsers
    // also, filter out types that do not implement a parser
    let pushSimpleTypes =
        let parsers = 
            discoverParsers 
            |> Map.fold(fun lst key value -> value.GetValue(stockTypes.Types.[key], null) :?> ExtendedTypeParser :: lst) List.empty
            |> List.filter(fun callback -> not (callback = Unchecked.defaultof<ExtendedTypeParser>))
            |> List.map(fun callback -> pushExtended callback)
        choice parsers

pushSimpleTypes first gets all of the values of Parser properties of known Push types (internal as well as possibly additional), then casts them to the ExtendedParserTypedelegate type, filters out all those elements of the new list that are not set to anything meaningful (not (callback = Unchecked.defaultof<ExtendedTypeParser>) and creates a list of these delgates. In the third step this list is converted to a list of pushExtended type FParsec parses.

Here is a C# example from the test code of extending the Push engine with a parser that parses URLs. An URL has to start with “http:”

       static UrlPushType UrlParse(string url)
        {
            try
            {
                if (!url.Trim().StartsWith("http://", true, CultureInfo.InvariantCulture))
                {
                    return null;
                }

                Uri uri = new Uri(url);
                if (uri.HostNameType != UriHostNameType.Dns)
                {
                    return null;
                }

                return new UrlPushType(uri);

            }
            catch (Exception)
            {
                return null;                
            }
        }

        public override Type.ExtendedTypeParser Parser
        {
            get 
            {
                return new Type.ExtendedTypeParser(UrlParse);
            }
        }
Categories: F#, FParsec

FParsec: Parsing a List of Dynamic Keywords

January 25, 2012 Leave a comment

In the previous post we mentioned code that parses Push types and Push operations:

let internal pushType = commonIdentifier >>= findType

Tokens for Push types are a dynamic collection of keywords. “Dynamic” in a sense that a developer might add to it when extending the language. We therefore need to create a parser that would recognize tokens from a dynamic set.

Since Parser<‘a, ‘u> type is just an abbreviation for CharStream<‘u> -> Reply<‘a> (a function that takes CharStream and returns a Reply, more on that in this article), we can simply write:

    let findType t = 
        fun stream ->
            let mutable reply = new Reply<string>()
            match t with
            | FindType res -> 
                reply.Status <- Ok
                reply.Result <- res
            | _ -> 
                reply.Status <- Error
                reply.Error <- messageError("Unknown type: " + t)
            reply

We use F# active patterns to wrap the logic for finding the right type in the collection:

    let (|FindType|_|) t =
        match stockTypes.Types.TryFind(t) with
        | Some s -> Some t
        | None -> None

This looks slightly weird, but all we need to do is find a particular key in the F# map: stockTypes.Types is defined as a Map<string, PushTypeBase>. It is a map of type names to their representations as objects.

We then use the FParsec “>>=” operator, that takes the result of its left side, applies a function on its right side and returns a parser, returned by that function.

Categories: F#, FParsec

Parsing a Push Program

January 24, 2012 Leave a comment

Parsing with FParsec

 

Push has a very simple program structure:

Program ::= literal | operation | ( program *)                    (1)

This is reflected in the F# code, using FParsec:                  (2)

 do pushProgramRef := choice [                 
                              pushSimple
                              pushList
                             ]
 

FParsec parses the document by combining together atomic parsers in different ways. This makes building a parser a very intuitive process. This methodology is reflected here.

pushSimple – parses “simple” Push types (integers, floats, Booleans, etc) as well as any extensions to the language.

pushList – parses lists.

Defining Recursive Parsers

The definition for pushList is simple enough:

  type PushParser<'a> = Parser<'a, unit>
 
  let ws = spaces
  let str s = pstring s

  let openList : PushParser<string> = str "(" .>> ws
  let closeList : PushParser<string> = ws >>. str ")"

  let internal pushProgram, internal pushProgramRef = createParserForwardedToRef()

  let internal listSeries = (sepEndBy pushProgram spaces) |>> PushList
  let internal pushList = between openList closeList listSeries

The problem here is that pushProgram actually needs to be defined in terms of pushList (2) above, however, pushList is defined in terms of pushProgram, per recursive formula (1). To break this circle, FParsec introduces createParserForwardedToRef() function, that lets us define pushProgram parser, before any “meat” is actually added to it. That is done later, by creating the reference cell equivalent of the pushProgram parser above (2).

Otherwise things are pretty intuitive: we define pushList as pushProgram’s separated by white-space and enclosed in brackets.

Parsing Simple Types.

Push should be completely extensible, meaning, it should be possible to implement a simple type (e.g.: URL) in any .NET language and using the hooks of the system just plug it in without recompilation. This means a mechanism is required to plug into the current parsing framework. We will return to this later.

Simple parsing is done by first defining some stock simple type parsers:

// push identifier is a regular identifer
  let internal commonIdentifier : PushParser<string> = identifier (IdentifierOptions(isAsciiIdStart = isAsciiIdStart, 
                                                                        isAsciiIdContinue = isAsciiIdContinue))

  let internal pushType = commonIdentifier >>= findType
  let internal pushOp = tuple2 (pushType .>> str ".") stringTokenNoDot >>= findOp
  let internal pushOperation = pushOp |>> Operation
 
  let nodot : PushParser<unit> = notFollowedByString "."

  // values of simple types
  let internal pushIdentifier = commonIdentifier >>= validIdentifier .>> nodot |>> createIdentifier >>= createValue

  let internal pushSimple = choice [
                                  pushSimpleTypes
                                  attempt pushIdentifier
                                  attempt pushOperation
                                  attempt pushLiteral
                                   ]


pushIdentifier is an identifer that can be used in something like CODE.DEFINE operation introduced in Push 3. Essentially, this operation defines a macro by assigning the top of CODE stack to an identifier. pushIdentifier should be:

  • A valid identifier (commonIdentifer)
  • Not a Push keyword (result piped into validIdentifier parser, that assures just that)
  • Not followed by a “.” (should be followed by the “nodot” parser).

 

pushOp a tuple of (type, operationName):is piped into the findOp parser to determine the valid operation INTEGER.+ is represented as (“INTEGER”, “+”) and the check of whether such operation exists is performed.

pushType has all the characteristics of a common identifier, except that it has to be a member of the type keywords list (INTEGER, BOOLEAN, etc), hence the result of applying commonIdentifier parser is piped into findType parser for validation.

These three parsers, combined with the pushList parser can parse expressions like:


((a B) CODE.QUOTE INTEGER.- “file.push” EXEC.SAVE)

pushLiteral is shamelessly stolen from FParsec samples:

    let stringLiteral : PushParser<string> =
        let escape =  anyOf "\"\\/bfnrt"
                      |>> function
                          | 'b' -> "\b"
                          | 'f' -> "\u000C"
                          | 'n' -> "\n"
                          | 'r' -> "\r"
                          | 't' -> "\t"
                          | c   -> string c // every other char is mapped to itself
 
        let unicodeEscape =
            str "u" >>. pipe4 hex hex hex hex (fun h3 h2 h1 h0 ->
                let hex2int c = (int c &&& 15) + (int c >>> 6)*9 // hex char to int
                (hex2int h3)*4096 + (hex2int h2)*256 + (hex2int h1)*16 + hex2int h0
                |> char |> string
            )
 
        between (str "\"") (str "\"")
                (stringsSepBy (manySatisfy (fun c -> c <> '"' && c <> '\\'))
                              (str "\\" >>. (unicodeEscape <|> escape)))


pushSimpleTypes is used to parse every other possible Push type: those described in the documentation as well as any possible extension. More about it in the next post.

 

Categories: F#, FParsec