I had some issue reading a csv directly on the databricks community edition. So after going through some articles, I finally found the workaround. Databricks has disabled to use csv directly for pandas as you may encounter FileNotFoundError: [Errno 2] No such file or directory:
.
import numpy as np
import pandas as pd
- Import data visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
- Import a csv file as a dataframe
df = pd.read_csv('file.csv')
df = pd.read_csv('/path/file.csv')
- Get data types and other information about the dataframe
df.info()
- View few initial rows from a dataframe
df.head()
df.head(5) #returns top 5 rows
- Obtain value count for a column/series
df['column name'].value_counts()
df['column…