In this article, we are going to look into setting up a datawarehouse(Clickhouse). This is perticularly useful for someone who is getting started with analytics,data engineering or sql. Clickhouse has good documentation and it also provides some sample datasets to explore the datawarehouse. I am going to cover the following in this article:
Efficient data organization and management are crucial for reducing maintenance efforts and enhancing data usability. In this article, part of my Delta Lake series, I will introduce you to the Medallion Architecture, a data design pattern used to structure data in a lakehouse environment. This procedural framework provides various tiers, starting from data ingestion to downstream consumption, allowing data to be organized in a structured, progressive manner.
In the big data eco-system, the data keeps growiing. It is important to manage data efficiently to meet the performance requirements as well as to control the cost associated with storing large amount of data. Most of the data, especailly for analytics and machine learning these days reside on the cloud infrastructure on providers like aws, azure or gcp. As the data grows, it is important to have a solid understanding on optimization to help data teams work efficiently with data with better…