Pandas using jupyter notebook#
Prerequisites and basic setup#
Install anaconda on your machine
Open command prompt
Navigate to your desired directory
use command jupyter notebook
to start the server
In a web browser, hit http://localhost:8888/
url to access jupyter notebook
Import statements#
Import pandas
and numpy
to get started with basic data analysis
import numpy as np
import pandas as pd
copy
Import data visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
copy
Import a csv file as a dataframe
df = pd.read_csv('file.csv')
df = pd.read_csv('/path/file.csv')
copy
Get data types and other information about the dataframe
df.info()
copy
View few initial rows from a dataframe
df.head()
df.head(5) #returns top 5 rows
copy
Obtain value count for a column/series
df['column name'].value_counts()
df['column name'].value_counts().head(5) #gives top 5 values with max counts
copy
Find unique count in a series. There are 2 ways to achieve this. One way is to use len method and other is to use nunique method.
count = len(df['column name'].unique())
count = df['column name'].nunique()
copy
Create a new column of an existing column using lambda expression
# assume we have data as reason: text and we want to obtain reason from existing col.
df['new col'] = df['exising col'].apply(lambda x: x.split(':')[0])
df['new col'] #print the new column
copy
Data Visualization#
seaborn cheat sheet