Tuesday, December 1, 2020

How to build a data warehouse without ETL and database using Synapse serverless for structured data

(This document is intended to share an idea that is not implemented yet.)

 Both the relational database and ETL are two significant components during traditional data warehousing development. However, we may not need these two components any more with modern data warehousing, where we apply the "Lake house" concept and separate storage from the computing engine.

Looking into Microsoft's modern data warehousing architecture, we notice that the data lake plays the most critical role. The first question is whether we should extract relational data and load it to a data lake. If it is semi-structured or non-structured data, there is no doubt we should pack it into the data lake as needed. The answer is that it depends on the architecture. If we want to move to modern data warehousing, then we should move to the data lake.  Microsoft Azure Synapse Analytics Serverless is the solution to explore. The diagram below illustrates the data flow for both batch and real-time processing according to Lambda architecture.


Please note that Azure Synapse Link for SQL Server is released, which will change the whole solution of real-time data warehousing.



No comments:

Post a Comment