top of page
hatter5.png

Renewal and automation of data warehouse infrastructure.

  • Writer: Peter Molnar
    Peter Molnar
  • Jul 2, 2023
  • 2 min read

The project goal was to process the actual day's business data and make it available the next morning. As well to accelerate and improve the quality of developments and increase data security.


I faced the challenge that they used outdated softwares and hardwares and because of the huge amount of data, the data processing took 2 days. Therefore the actual day's business data became available for use only 2 days later. The system design wasn’t optimal, so cloning an environment took 1 week if there were no errors. In case of any error, it could took even more. The backup was corrupted sometimes, and there was no stable solution for restore in the event of a disaster.


I managed to overcome the problems by designing a new hardware infrastructure that met our goals. Accordingly, we purchased new servers, storages, SAN Network devices, and Backup Systems. The HA capability of the databases has been improved, and the location of the databases on hosts and storages has been redesigned in accordance with cloning needs. We have created new procedures for environment building. We have replaced outdated software with new modern technologies. We have optimized the ETL processes. We have improved the data provision capability of the source systems and made them reliable.


The client gained the following benefits. Business decisions are more well-founded and market changes can be reacted to more quickly. The actual day's business data will be available within 6 hours already at dawn the next day instead of the previous 2 days. The development process has been greatly accelerated and it’s quality has improved. Data quality has also improved and the number of errors has decreased. Instead of 1-2 tests per month, they can run 8-10 tests per month, because they can build a new test environment in 3 hours instead of the previous 1 week. Database backup is completed in 2 hours instead of the previous 1-2 days. The HA solution is stable and reliable. The monitoring system has been optimized to a large extent. They can predict in advance if a delay is expected in the ETL process and they can intervene in time.


The hidden or known issues i managed to solve in this project. It was a hidden error due to previous incorrect planning that OLTP and OLAP systems competed with each other for storage resources. After we accelerated the OLAP system, it regularly consumed the cache of the storage system and slowed down the OLTP system. We had to redesign and separate the OLAP and OLTP storage systems.


 
 
 

Comments


© 2023 DevOpsThinking | All Rights Reserved
bottom of page