LINQ to CSV using DynamicObject by ThinqLinq

LINQ to CSV using DynamicObject

When we wrote LINQ in Action we included a sample of how to simply query against a CSV file using the following LINQ query:


From line In File.ReadAllLines(“books.csv”) 
Where Not Line.StartsWith(“#”) 
Let parts = line.Split(“,”c) 
Select Isbn = parts(0), Title = parts(1), Publisher = parts(3)

While this code does make dealing with CSV easier, it would be nicer if we could refer to our columns as if they were properties where the property name came from the header row in the CSV file, perhaps using syntax like the following:


From line In MyCsvFile
Select line.Isbn, line.Title, line.Publisher

With strongly typed (compile time) structures, it is challenging to do this when dealing with variable data structures like CSV files. One of the big enhancements that is coming with .Net 4.0 is the inclusion of Dynamic language features, including the new DynamicObject data type. In the past, working with dynamic runtime structures, we were limited to using reflection tricks to access properties that didn't actually exist. The addition of dynamic language constructs offers better ways of dispatching the call request over dynamic types. Let's see what we need to do to expose a CSV row using the new dynamic features in Visual Studio 2010.

First, let's create an object that will represent each row that we are reading. This class will inherit from the new System.Dynamic.DynamicObject base class. This will set up the base functionality to handle the dynamic dispatching for us. All we need to do is add implementation to tell the object how to fetch values based on a supplied field name. We'll implement this by taking a string representing the current row. We'll split that based on the separator (a comma). We also supply a dictionary containing the field names and their index. Given these two pieces of information, we can override the TryGetMember and TrySetMember to Get and Set the property based on the field name:


Imports System.Dynamic

Public Class DynamicCsv
    Inherits DynamicObject

    Private _fieldIndex As Dictionary(Of String, Integer)
    Private _RowValues() As String

    Friend Sub New(ByVal currentRow As String,
                   ByVal fieldIndex As Dictionary(Of String, Integer))
        _RowValues = currentRow.Split(","c)
        _fieldIndex = fieldIndex
    End Sub

    Public Overrides Function TryGetMember(ByVal binder As GetMemberBinder,
                                           ByRef result As Object) As Boolean
        If _fieldIndex.ContainsKey(binder.Name) Then
            result = _RowValues(_fieldIndex(binder.Name))
            Return True
        End If
        Return False
    End Function

    Public Overrides Function TrySetMember(ByVal binder As SetMemberBinder,
                                           ByVal value As Object) As Boolean
        If _fieldIndex.ContainsKey(binder.Name) Then
            _RowValues(_fieldIndex(binder.Name)) = value.ToString
            Return True
        End If
        Return False
    End Function
End Class

With this in place, now we just need to add a class to handle iterating over the individual rows in our CSV file. As we pointed out in our book, using File.ReadAllLines can be a significant performance bottleneck for large files. Instead we will implement a custom Enumerator. In our customer enumerable, we initialize the process with the GetEnumerator method. This method opens the stream based on the supplied filename. It also sets up our dictionary of field names based on the values in the first row. Because we keep the stream open through the lifetime of this class, we implement IDisposable to clean up the stream.

As we iterate over the results calling MoveNext, we will read each subsequent row and create a DynamicCsv instance object. We return this row as an Object (Dynamic in C#) so that we will be able to consume it as a dynamic type in .Net 4.0. Here's the implementation:


Imports System.Collections

Public Class DynamicCsvEnumerator
    Implements IEnumerator(Of Object)
    Implements IEnumerable(Of Object)

    Private _FileStream As IO.TextReader
    Private _FieldNames As Dictionary(Of String, Integer)
    Private _CurrentRow As DynamicCsv
    Private _filename As String

    Public Sub New(ByVal fileName As String)
        _filename = fileName
    End Sub

    Public Function GetEnumerator() As IEnumerator(Of Object) _
        Implements IEnumerable(Of Object).GetEnumerator

        _FileStream = New IO.StreamReader(_filename)
        Dim headerRow = _FileStream.ReadLine
        Dim fields = headerRow.Split(","c)
        _FieldNames = New Dictionary(Of String, Integer)
        For i = 0 To fields.Length - 1
            _FieldNames.Add(GetSafeFieldName(fields(i)), i)
        Next
       
        Return Me
    End Function

    Function GetSafeFieldName(ByVal input As String) As String
        Return input.Replace(" ", "_")
    End Function

    Public Function GetEnumerator1() As IEnumerator Implements IEnumerable.GetEnumerator
        Return GetEnumerator()
    End Function

    Public ReadOnly Property Current As Object Implements IEnumerator(Of Object).Current
        Get
            Return _CurrentRow
        End Get
    End Property

    Public ReadOnly Property Current1 As Object Implements IEnumerator.Current
        Get
            Return Current
        End Get
    End Property

    Public Function MoveNext() As Boolean Implements IEnumerator.MoveNext
        Dim line = _FileStream.ReadLine
        If line IsNot Nothing AndAlso line.Length > 0 Then
            _CurrentRow = New DynamicCsv(line, _FieldNames)
            Return True
        Else
            Return False
        End If
    End Function

    Public Sub Reset() Implements IEnumerator.Reset
        _FileStream.Close()
        GetEnumerator()
    End Sub

#Region "IDisposable Support"
    Private disposedValue As Boolean ' To detect redundant calls

    ' IDisposable
    Protected Overridable Sub Dispose(ByVal disposing As Boolean)
        If Not Me.disposedValue Then
            If disposing Then
                _FileStream.Dispose()
            End If
            _CurrentRow = Nothing
        End If
        Me.disposedValue = True
    End Sub

    ' This code added by Visual Basic to correctly implement the disposable pattern.
    Public Sub Dispose() Implements IDisposable.Dispose
        Dispose(True)
        GC.SuppressFinalize(Me)
    End Sub
#End Region

End Class

Now that we have our custom enumerable, we can consume it using standard dot notation by turning Option Strict Off in Visual Basic or referencing it as a Dynamic type in C#:

VB:



Public Sub OpenCsv()
    Dim data = New DynamicCsvEnumerator("C:\temp\Customers.csv")
    For Each item In data
        TestContext.WriteLine(item.CompanyName & ": " & item.Contact_Name)
    Next

End Sub

C#:


[TestMethod]
public void OpenCsvSharp()
{
    var data = new DynamicCsvEnumerator(@"C:\temp\customers.csv");
    foreach (dynamic item in data)
    {
        TestContext.WriteLine(item.CompanyName + ": " + item.Contact_Name);
    }
}

In addition, since we are exposing this as an IEnumerable, we can use all of the same LINQ operators over our custom class:

VB:


Dim query = From c In data
            Where c.City = "London"
            Order By c.CompanyName
            Select c.Contact_Name, c.CompanyName

For Each item In query
    TestContext.WriteLine(item.CompanyName & ": " & item.Contact_Name)
Next

C#:


[TestMethod]
public void LinqCsvSharp()
{
    var data = new DynamicCsvEnumerator(@"C:\temp\customers.csv");
    var query = from dynamic c in data 
                where c.City == "London"
                orderby c.CompanyName
                select new { c.Contact_Name, c.CompanyName };

    foreach (var item in query)
    {
        TestContext.WriteLine(item.CompanyName + ": " + item.Contact_Name);
    }
}

Note: This sample makes a couple assumptions about the underlying data and implementation. First, we take an extra step to translate header strings that contain spaces to replace the space with an underscore. While including spaces is legal in the csv header, it isn't legal in VB to say: " MyObject.Some Property With Spaces". Thus we'll manage this by requiring the code to access this property as follows: "MyObject.Some_Property_With_Spaces".

Second, this implementation doesn't handle strings that contain commas. Typically fields in CSV files that contain commas are wrapped by quotes (subsequently quotes are likewise escaped by double quotes). This implementation does not account for either situation. I purposely did not incorporate those details in order to focus on the use of DynamicObject in this sample. I welcome enhancement suggestions to make this more robust.

Posted on - Comment
Categories: LINQ - VB - C# - VB Dev Center - Dynamic -
comments powered by Disqus