40FINGERS DNN Nederlands40FINGERS DNN English

Our Blog

Menu: Blog

Data conversion with Expresso and regular expressions

By: Timo Breumelhof

I often get data from clients in Excel or XML format, which I have to convert to some kind of XML like format. Mostly this is a one time conversion for which a custom solution is not viable.

The target data is the XML to import into a module or a template with a specific structure.

You can use database tools to do this kind of conversions import to a database use SQL etc. or XSLT, but I tend to use Expresso more and more in this scenario.

Expresso is a free tool to generate regular expressions.
Regular expressions are used for advanced text search and replacements.
It took me a while to understand regular expressions (and I'm still no expert) but if you get to know them they are very powerful. More information on regular expressions: http://www.regular-expressions.info/

While Expresso is normally used to develop and test regular expressions, I'm using it a lot as an advanced text replacement tool. (My other favorite text tool (Notepad++) has some regex support too, but not like Expresso)

An example I worked on today.

The data (in Excel) I got was this:

    FirstName, MiddleName, LastName (250 records)

I had to convert this to a format like this (simplified):

    <content id="#" ...................................>
        ............<FN>Firstname</FN><MN>Middlename</MN><LN>Lastname</LN>..................
    </content>

This is what I did:

1. Add an id column to the Excel sheet
2. Export the Excel sheet to CVS
3. Copy the Exported text to the "Sample text" field in Expresso
4. I used this regular expression:
    (.*?),(.*?),(.*?),(.*?)\n
5. My replacement String was:
    <content id="$1" ...................................>
      ............<FN>$2</FN><MN>$3</MN><LN>$4</LN>..................
    </content>

This regular expression searches for the text between the commas and found text between () can be referenced in the replacement string using $n.

n being the order in which the text was detected.

This all took me about 2 minutes, which is much faster then any other tool I can think of..

Categories: General