Making the Best of Data Conversion

Ask most any developer what's the worst part of a software project and mostly likely you'll hear, “data conversion.” And what's most magical when working a project spec? “We're going to start fresh with new data.”

Reflecting back on the year gone by, clean or messy data migration definitely has a huge impact project success as a whole.

What makes the data migration part of a project so hard?

You look at old data on screen, in reports, or with tech specs. That's often all you get to look at before starting, and that's not enough to judge its true quality. You can't fully judge how well its going to work until you do real trials into the new system. And you can't really do that until you have that new system ready.

So here are painfully accumulated tips about data conversion. I'm writing this from the point of view of the “joint planning team”--lessons to both client and consultant/developer. And I'm not going to say which ones I have the most success sticking to. I'm doing this partly as therapy and partly as New Year's resolution. I promise to reread this myself on every new 2010 project.

Data conversion and the project plan
  • Do as much clean up as possible in the old system. The data will look familiar there, and staff who have to do it will be faster on the old.
  • Don't wait until the end. Put time in the schedule to for clean-up straight through the planning and configuration of the new. Reflect each decision about the new back to the old. Understand how you will get there and when the route can be opened.
  • It's often hard to reserve enough time for clean-up. Explore whether you can effectively use volunteers, interns, or temp staff to do routine clean up work instead of taking consultant project time to do it or burdening organizational staff beyond what is reasonable or essential for each side to do.
Data conversion and the budget
  • Here's a tough one: See if you take data conversion outside of the rest of the project estimate and work it as time and materials. That protects the consultant and focuses program manager attention on only doing as much as is necessary.
  • The fall-back choice is to allocate a specific, reasonable number of hours, say as a percent of the whole project, and agree that the consultant will do much and as best they can within that block of time, and no more.
Data conversion and "letting go"
  • A corollary to the last: Consider converting the essential top level stuff—the primary contact lists—and not all the historical details (all donation or membership history, services rendered, etc). The top level lists are often cleaner, easier to check, more essential, and less encumbered with old-to-new data structure changes.
  • For historical data you really do need, consider adding special new fields to summarize the old. For example, instead of importing an entire “trainings” history, can you live with aggregating “total workshops attended” for each contact? And remembering preferred training topics, how about just tagging the contact with “attended series A workshops” or “workshop B series”? You might be surprised how much headache this saves where the old data turns out not as clean as it now appears on screen.
Data conversion in stages
  • It frequently makes sense to plan on a data migration way station--a temporary clean-up and check-me database. It could be Excel, Access or something else that has strong query and reporting tools. Maybe all the status categories need to be remapped, or collected from three different current places. Maybe the notes field holds way more than notes and needs analysis
  • On the other hand, the new system may have fantastic clean up tools available, such as address correction and verification, or deduping. Maybe it specifically has importers for the old donor database you are moving form. Check them on sample data and use them where you can.
  • What about all the rest of the details of the old? If keeping the old system around does not incur licensing or hosting costs, why not just leave it up, frozen as an archive for historical data. Again, only convert what is really needed.
Data conversion and testing
  • Count on doing at least one trial conversion for staff to test before the day you go live. Once you have worked that into the plan, it makes sense to do it as soon as a trial version of the new software is up. Let folks test the new software with some real, existing data. It improves the testing and improves the data migration. And make sure you have put time on the calendar for that checking. Discovering gaps six months into a new system, say when quarterly or annual reports first have to run for real, really will mess things up.
  • Do have a series of benchmark queries to run on old and new system to quickly check results. For example, maybe the count of contacts by status group has to match exactly. Maybe the total dollar value of donations by year will not match 100% for this, that and the other reason. Make an agreement in advance, the new system will vary by as much as 3% earlier than the last two years.
  • Conversely, agree on a couple representative accounts you will use as a standard base line for checking the conversion. I remember one large scale donor management system with a huge master contact list mostly sparsely populated and just a handful with tons of details. Running overall lists just wouldn't cut it. Every step in the development process had to be run against one particular wealthy donor with a 28-page single spaced personal profile. Each new iteration of the software had to pass muster on him or it was back to the cubicles for us.

Data conversion documentation

  • Data conversions usually involve a lot of steps. List them out in a shared document and explain what each does. When you have a series of scripts that have to run, number them in order, like “step100_clear main contact table,” “step 200_import from spreadsheet xyz,” “step310_relink contacts table to database X_manual step,” step 315 _import from database x,” “step 400_run dedup query,” “step510_extract priority A people to excel and give to ED to check personally.” By putting that first step in there, you have already set things up for running the conversion process as a whole more than once. Get every step, manual or automated, in there. Consider counting by 10s at least so you insert more steps as you fine tune the overall process.
  • Use a shared Google doc or some other collaboratively editable document to hold the conversion steps. Even if your steps lean toward the technical, take the time to go through them collaboratively, understand what they mean, and sign off that this is the plan.

Data conversion therapy
  • Accept that data conversion has a therapeutic component. You can count on the staff person who is most impatient with the old system to miss data and procedures that don't show up exactly the same in the new. Hmm. My Tai Chi sensibility says maybe the framework is more meditation than therapy: it's all about grasping the essence firmly, remaining calm to find the path from the old to the new, and letting go of what you can.
  • And through it all, be reasonable and patient with each other. This means the organizational lead needs to look at the raw underbelly of the data and appreciate whatever is giving he consultant the most trouble. The consultant needs to sit with users and understand the consequences, workarounds, new ways of doing things if new data doesn't 100% line up with the old.
  • Data conversion sounds like such a mine field that you might ask, why would anyone ever agree to do it? Well, you often have to, to get the project at all. In addition, I have to say, data conversion does have a creative appeal. Its almost always unique. Unlocking old data is like unraveling a mystery. It requires real detective work. It requires knowledge of multiple systems, which takes time and experience to accumulate. If both sides agree to a reasonable plan, it is possible for all to find satisfaction and joy when the new reports line up with the old.

And speaking of the therapeutic and the meditative, just writing this has been both therapeutic and meditative for me. hope it helps you as well, and best wishes for the New Year.


There is another serious

There is another serious hickup to mention. The existing data represents some business process, errors in following the process, and sly bugs in the system.

If you don't have a strongly defined set of business processes, the new system will surely point that out. And the existing data will probably have some long forgotten about corner case that happens often enough to screw the code, but not frequently enough anyone remembers it while you design the new system.

So I always insist on using real live data as part of the first iteration of the development code. There is still time & budget to determine a reasonable corrective course of action.