spark

Medallion Architecture

Lakehouse data design pattern

Vasav

3 minute read

Efficient data organization and management are crucial for reducing maintenance efforts and enhancing data usability. In this article, part of my Delta Lake series, I will introduce you to the Medallion Architecture, a data design pattern used to structure data in a lakehouse environment. This procedural framework provides various tiers, starting from data ingestion to downstream consumption, allowing data to be organized in a structured, progressive manner.

Databricks Delta Lake

Optimize delta lake tables

Vasav

5 minute read

In the big data eco-system, the data keeps growiing. It is important to manage data efficiently to meet the performance requirements as well as to control the cost associated with storing large amount of data. Most of the data, especailly for analytics and machine learning these days reside on the cloud infrastructure on providers like aws, azure or gcp. As the data grows, it is important to have a solid understanding on optimization to help data teams work efficiently with data with better…

how to install apache spark on windows

Setting up apache spark on windows

Vasav

4 minute read

As a Spark developer, setting up Apache Spark on Windows is a crucial first step towards building scalable and efficient big data applications. In this blog, we will walk you through the process of setting up Apache Spark on Windows, so you can start harnessing the power of Spark to process large datasets and build data-intensive applications.