Create Pandas Dataframe on Databricks
Workaround to read csv from DBFS using pandas
Introduction
I had some issue reading a csv directly on the databricks community edition. So after going through some articles, I finally found the workaround. Databricks has disabled to use csv directly for pandas as you may encounter FileNotFoundError: [Errno 2] No such file or directory:
.
Workaround
You need to copy the file from dbfs
to a temp
directory and after that you should be able to create a pandas dataframe. Here is a code snippet for the same.
dbutils.fs.cp("/FileStore/tables/games/vgsales.csv", "file:///tmp/vgsales.csv")
pandas_df = pd.read_csv("file:///tmp/vgsales.csv")
cp
command will copy the file from dbfs
to temp
and then you can leverage pandas on the databricks platform.
Share this post
Twitter
Reddit
LinkedIn
Pinterest
Email