The Ultimate Guide to Databricks Associate Spark Developer Certification
How to Ace the Exam
The Ultimate Guide to Databricks Associate Spark Developer Certification: How to Ace the Exam
I recently cleared databricks associate spark developer certification. In this article, I am going to list down the resources I used in for the exam preparation. Here is the curriculum for the exam
Following are the topics which needs to be studied to pass the exam:
- Spark architecture
- Adaptive Query Execution (AQE)
- DataFrame API for manipulation task
- selecting columns
- renaming columns
- manipulating columns
- filtering rows
- dropping rows
- sorting rows
- aggregating rows
- joining dataframes
- reading dataframes
- writing and partitioning dataframes
- working with User defined functions (UDFs)
- Spark SQL functions
Preparations
Based on the official website, here is the distribution of the questions you can expect in the exam:
- Apache Spark Architecture Concepts – 17% (10/60)
- Apache Spark Architecture Applications – 11% (7/60)
- Apache Spark DataFrame API Applications – 72% (43/60)
As you can see from the above distribution, you can see that the dataframe API is the most important section for the exam. You need to get at least 70% to clear the exam. In other words, 42/60 questions should be correct.
Dataframe API
As described, this is the most important section. In the exam, you can use the PDF version of the documentation without the search functionality. So in order to practice dataframe API, first create a databricks community edition account and then start practicing code in the notebook.
Make sure to use the documentation to understand the syntax associated with the API. Try different parameters to gain more clarity on how methods work and what is expected in the parameters.
I have been uploading pyspark practice code on this github repo. I’ll keep updating this repo to include more examples.
Spark Architecture
I relied heavily on the books and online resources to gain more understanding of apache spark architecture. I have also used chatgpt to gain more clarity on few topics which were unclear after reading the book.
I am going to list down all the resources I have used for preparation. I recommend to use these resources in the following order:
- Books
- Online Resources
- Databricks documentation and API practive
- Mock exam available on the databricks exam
- (Optional) - Udemy course for mock databricks exam
Books
- Spark – The Definitive Guide: Big data processing made simple
- Learning Spark: Lightning-Fast Data Analytics, Second Edition
Online Resources
- https://towardsdatascience.com/ultimate-pyspark-cheat-sheet-7d3938d13421
- https://medium.com/free-code-camp/deep-dive-into-spark-internals-and-architecture-f6e32045393b
- https://spark.apache.org/docs/3.0.0/api/python/pyspark.sql.html
- https://www.rakirahman.me/spark-certification-study-guide-part-1/#tasks
- https://mageswaran1989.medium.com/spark-jargon-for-starters-af1fd8117ada
- https://shrutibhawsar94.medium.com/study-guide-for-clearing-databricks-certified-associate-developer-for-apache-spark-3-0-69377dba0107
- https://www.databricks.com/session/deep-dive-into-monitoring-spark-applications-using-web-ui-and-sparklisteners
- https://selectfrom.dev/spark-performance-tuning-spill-7318363e18cb
- https://medium.com/data-arena/databricks-certified-associate-developer-for-apache-spark-tips-to-get-prepared-for-the-exam-cf947795065b
- https://towardsdatascience.com/10-mcqs-to-practice-before-your-databricks-apache-spark-3-0-developer-exam-bd886060b9ab
- https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-Column.html
- https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-bucketing.html
- https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-SparkSession.html?q=
- https://mungingdata.com/apache-spark/dates-times/
- https://github.com/spark-examples/pyspark-examples
- https://www.youtube.com/watch?v=d9Mt67UKSio
Share this post
Twitter
Reddit
LinkedIn
Pinterest
Email