Use LINQ to convert CSV to XML by ThinqLinq

Use LINQ to convert CSV to XML

File conversion is a typical programming task. VB9 should make this task easier to code and maintain as we can eliminate much of the plumbing code and focus on the task at hand. The following example uses the MAY CTP of LINQ and leverages the TextFieldParser added to VB 8 to convert a simple CSV file to XML. For this sample, we will convert a simple CSV file that contains standard windows file extensions with a brief description of the file. Here are the first couple of lines:

.doc,Microsoft Word Document
.log,Log File
.msg,Mail Message
.rtf,Rich Text Format
.txt,Text File

For this task, I want to create an XML document with a single root node and multiple child nodes with the extension and description as attributes of a FileType element as follows:

<FileTypes>
   <FileType extension=".doc" description="Microsoft Word Document" />
   <FileType extension=".log" description="Log File" />
   <FileType extension=".msg" description="Mail Message" />
   <FileType extension=".rtf" description="Rich Text Format" />
   <FileType extension=".txt" description="Text File" />
   <FileType extension=".wpd" description="WordPerfect Document" />
</FileTypes>

To accomplish this, we need to do 3 simple steps: iterate through the csv file, Add the data to our XML document, save the document to disk. First, here is the complete code. We will examine each part below.

Public Shared Sub CsvXml(ByVal sourceFile As String, ByVal destinationFile As String)

   'Create a base XML element called FileTypes
   Dim value As New XElement("FileTypes")

   'Create a new delimited file reader using VB 2005's TextFieldParser
   Using reader As New Microsoft.VisualBasic.FileIO.TextFieldParser(sourceFile)
      reader.Delimiters = New String() {","}

      'Read data from the CSV
      While Not reader.EndOfData
         Dim currentRow() As String = reader.ReadFields

         'Use XLINQ to add the child elements to the base element
         value.Add(<FileType extension=<%= currentRow(0) %> description=<%= currentRow(1) %>/>)
      End While
   End Using

   'Save it
   value.Save(destinationFile)
End Sub

With LINQ to XML changes the model for XML from a document centric model to an Element centric model, thus to begin working with XML, we simply need to create a new element

   Dim value As New XElement("FileTypes")

Next, we need to open and read the lines of the CSV file. VB 2005 added a helpful class called the TextFieldParse which takes a stream, uri, or TextReader. To be simple, we will just use a string for the uri of the file we want to open. We also need to identify the kind of delimiter the CSV file uses. In this case, we are using a comma delimited file, so we will add a "," as the array of delimiters. In addition, we will leverage the new "Using" keyword to make sure we dispose of our file once we are done with it, even if we error while iterating through the file.

   Using reader As New Microsoft.VisualBasic.FileIO.TextFieldParser(sourceFile)
      reader.Delimiters = New String() {","}

To read each line, we set up a loop to continue iterating until we are out of data. We get the actual values for each line by calling the ReadFields method of the TextFieldParser.

      'Read data from the CSV
      While Not reader.EndOfData
         Dim currentRow() As String = reader.ReadFields

Now that we have the data we want to work with we can leverage the new functional construction available in LINQ to XML to add new elements. Our element name is FileType and has two attributes--extension and description. Using the <%=  ... %> syntax familiar to ASP or CodeSmith users we can easily insert the values from our CurrentRow array of values into the appropriate locations

         'Use XLINQ to add the child elements to the base element
         value.Add(<FileType extension=<%= currentRow(0) %> description=<%= currentRow(1) %>/>)
      End While
   End Using

Perhaps the easiest part of the process is to save the resulting XML element. LINQ to XML includes a simple Save method on the element which saves the element and all of it's children with a simple call.

   value.Save(destinationFile)
End Sub

Naturally, this sample could be extended easily to be more flexible. In particular, taking a CSV file with the names of the fields in the first row and dynamically building an XML document based on this CSV would be a helpful extension which I may try to demonstrate when I have more time.

Posted on - Comment
Categories: LINQ - Linq to XML -
comments powered by Disqus