5 best ETL practices

 In ETL

 

It is not possible to provide a list of instructions to build ideal and perfect ETL. Not only because of complexity of a problem, but also because of  diversity. The scope of ETL can vary depending on project requirements. Every part of building the solution: creating architecture, defining sources, mappings, building validation, applying business rules is important and have it’s own specificity.

However, there are some practices, which might help you and I would like to share them with you.

 

1. Start with visualization

When you start new ETL project, typically you meet with client to determine list of sources, type of sources and then build mappings. It can be role of analyst or ETL consultant, but probably both are involved.

I would suggest you to create a visualization of the ETL process before moving to actual building. Even on very early stage of project you can take a piece of paper and draw the process (or just use professional architectural tool).

Ask yourself questions:

  • Do I know everything (from analysis perspective) I should know? If not, ascertain that business knows about this shortage.
  • Do the sources give all required information and data for target? Do you have a list of sources, dictionaries and mappings?
  • Do you know dependencies in systems ? Do you know all relations ?

 

2. Analyze

Even, if you have an analyst in your team – you should always analyze the data. Trying to find bugs in mappings, relations or data on the beginning might let you save tons of hours of work. For example, when you get first extract of source data, check it for NULLs, primary key and foreign key occurence. It might not be compatibile with what business thinks. It happens very often in medium size data systems like SAP, where you can find a lot of financial exceptions.

 

3. Choose proper ETL tool

This is a crucial step from technical point of view. I suggest you to analyze your case and if you have a comfort of choosing the ETL tool – make use of it !

For example, if client use SAP as source and want free ETL tool then Talend Open Studio might be a very good choice. On the other hand, if the source and destination is Oracle Database, on which there is a build OBIEE reporting system – it might be a good idea to propose ODI as tool. And lastly, if you have a loot of demands that will require a lot of customisation – you might think about combining Python & SQL and build your own tool. This are just example cases you might see, but every case if different and should be diagnosed carefully.

Nevertheless, I encourage you to analyze your case and possible options. It might save you a lot of time later.

 

4. Scalability

It’s not only a very trendy word, but also one of key feature good ETL should possess. Always think about it and test your solution for it. Even if your ETL is now operating on 10 000 rows daily, it might grow later to 1 000 000 rows or more. Asking business about possible growth of data is also a very good idea.

For example: when using webservices as source system, always check how they behave with bigger volums. It is a common problem with lack of performance.

 

5. ETL should serve people, not the other way

Therefore, build a proper logging. But think about it as helpful and useful information for maintenance, not the painful must-do. Building too big and too complex logging can make future harsh.

Secondly, when building alerts always check who should get them and how they will be delivered. It might be not the best option to create automatic e-mail send to 1000 people every day.

And lastly, write proper documentation. And start this task during ETL building (even just abstract), not two weeks after you finish. Try to write it as simple and as easy-to-understand as possible.

 

Of course the list is not full, this are just some of ideas. I would be very happy to discuss  with you in comments.

Mateusz
Creator & author of blog.
Recommended Posts
Showing 2 comments
  • Katya

    Nice article. Thank your for your insgihts.
    1 practise is very important and I also always try to do that.
    On the other hand 5 is what most of developers forget, that building ETL is not just “art for art’s sake” but it have to be used!
    Keep posting ! Very good quality. 🙂

  • Colton Nery

    Appreciate it for all your efforts that you have put in this. very interesting info .

Contact Me

You can contact me with this form. :)

Not readable? Change text.