LINQ to CSV using DynamicObject Part 2 by ThinqLinq

LINQ to CSV using DynamicObject Part 2

In the last post, I showed how to use DynamicObject to make consuming CSV files easier. In that example, we showed how we can access CSV columns using the standard dot (.) notation that we use on other objects. Using DynamicObject, we can refer to item.CompanyName and item.Contact_Name rather than item(0) and item(1).

While I’m happy about the new syntax, I’m not content with replacing spaces with underscores as that doesn’t agree with the coding guidelines of using Pascal casing for properties. Because we have control on how the accessors work, we can modify the convention. Let’s reconsider the CSV file that we’re working with. Here’s the beginning:

CustomerID,COMPANYNAME,Contact Name,CONTACT_TITLE,Address,City,Region,PostalCode,Country,Phone,Fax
ALFKI,Alfreds Futterkiste,Maria Anders,Sales Representative,Obere Str. 57,Berlin,NULL,12209,Germany,030-0074321,030-0076545
ANATR,Ana Trujillo Emparedados y helados,Ana Trujillo,Owner,Avda. de la Constituci¢n 2222,Mexico D.F.,NULL,5021,Mexico,(5) 555-4729,(5) 555-3745
ANTON,Antonio Moreno Taqueria,Antonio Moreno,Owner,Mataderos  2312,Mexico D.F.,NULL,5023,Mexico,(5) 555-3932,NULL

Notice here that the header row contains values with a mix of mixed case, all upper, words with spaces, and underscores. To standardize this, we could parse the header and force an upper case at the beginning of each word. That would take a fair amount of parsing code. As a fan of case insensitive programming languages, I figured that if we just strip the spaces and underscores and work against the strings in a case insensitive manner, I’d be happy. In the end, we’ll be able to consume the above CSV with the following code:


Dim data = New DynamicCsvEnumerator("C:\temp\Customers.csv")
Dim query = From c In data
            Where c.City = "London"
            Order By c.CompanyName
            Select c.ContactName, c.CompanyName, c.ContactTitle

To make this change, we change how we parse the header row and the binder name when fetching properties. In our DynamicCsvEnumerator, we already isolated the parsing of the header with a GetSafeFieldName method. Previously we simply returned the input value replacing a space with an underscore. Extending this is trivial:


    Function GetSafeFieldName(ByVal input As String) As String
        'Return input.Replace(" ", "_")
        Return input.
            Replace(" ", "").
            Replace("_", "").
            ToUpperInvariant()
    End Function

That's it for setting up the header parsing changes. We don't need to worry about spaces in the incoming property accessor because it's not legal to use spaces in a method name. I'll also assume that the programmer won't use underscores in the method names by convention. Thus, the only change we need to make in our property accessor is to uppercase the incoming field name to handle the case insensitivity feature. Here's the revised TryGetMember implementation.


    Public Overrides Function TryGetMember(ByVal binder As GetMemberBinder,
                                           ByRef result As Object) As Boolean
        Dim fieldName = binder.Name.ToUpperInvariant()
        If _fieldIndex.ContainsKey(fieldName) Then
            result = _RowValues(_fieldIndex(fieldName))
            Return True
        End If
        Return False
    End Function

All we do is force the field name to upper case and then we can look it up in the dictionary of field indexes that we setup last time. Simple yet effective.

Posted on - Comment
Categories: LINQ - VB Dev Center - Visual Studio - Dynamic -
comments powered by Disqus