Sunday, May 12, 2013

F# Parsing XML, aiming to get to Type Providers

For a new project, I've started using some F# again. Type providers are brand new since last I used the language, so I thought I'd start out by writing my own type provider to deal with some xml apis I'll be using.

One of things I missed from F# was sequence generators. Here's a sequence generator that turns XML into a slightly different form, hopefully easier to consume from a type provider:
//#module XmlTesting
open System.Xml
let xml = """
<foo snark="boojum">
<bar>
<val t="somet" >tcontents</val>
</bar>
</foo>
"""
let r = XmlReader.Create(new System.IO.StringReader(xml))
type SimpleNode = {
name : string;
ancestors : List<SimpleNode>;
content : Option<string>;
attributes : Map<string, string>;
}
let defaultNode = { name = ""; ancestors = List.empty; content = None; attributes = Map.empty }
let xmlSeqOfAttributes (r: XmlReader) = seq {
let buildAttr() = (r.Name, r.Value)
let hasAttribute = r.MoveToFirstAttribute()
match hasAttribute with
| true ->
yield buildAttr()
while r.MoveToNextAttribute() do
yield buildAttr()
r.MoveToElement() |> ignore
| false -> ()}
let rec rdr ancestors : seq<SimpleNode> = seq {
let hasEntity = r.Read()
match hasEntity, r.NodeType with
| true, XmlNodeType.Element ->
let newNode = { name = r.Name; ancestors = ancestors; content = None; attributes = Map(xmlSeqOfAttributes(r)) }
yield! rdr (newNode::ancestors)
| true, XmlNodeType.Text ->
let content = r.ReadContentAsString()
yield { ancestors.Head with content = Some content }
yield! rdr ancestors.Tail
| true, XmlNodeType.EndElement ->
yield ancestors.Head
yield! rdr ancestors.Tail
| true, _ -> yield! rdr ancestors
| false, _ -> ()
}
and rdrStart = rdr(List.empty)
let x = "---------------------------------------------"
Seq.toList rdrStart
view raw gistfile1.fs hosted with ❤ by GitHub

Here's the output:
val it : SimpleNode list =
[{name = "val";
ancestors =
[{name = "bar";
ancestors = [{name = "foo";
ancestors = [];
content = null;
attributes = map [("snark", "boojum")];}];
content = null;
attributes = map [];}; {name = "foo";
ancestors = [];
content = null;
attributes = map [("snark", "boojum")];}];
content = Some "tcontents";
attributes = map [("t", "somet")];};
{name = "bar";
ancestors = [{name = "foo";
ancestors = [];
content = null;
attributes = map [("snark", "boojum")];}];
content = null;
attributes = map [];}; {name = "foo";
ancestors = [];
content = null;
attributes = map [("snark", "boojum")];}]
view raw gistfile1.fs hosted with ❤ by GitHub