Vasav

1 minute read

Pandas using jupyter notebook

Prerequisites and basic setup

  1. Install anaconda on your machine
  2. Open command prompt
  3. Navigate to your desired directory
  4. use command jupyter notebook to start the server
  5. In a web browser, hit http://localhost:8888/ url to access jupyter notebook

Import statements

  1. Import pandas and numpy to get started with basic data analysis
import numpy as np
import pandas as pd
  1. Import data visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
  1. Import a csv file as a dataframe
df = pd.read_csv('file.csv')
df = pd.read_csv('/path/file.csv')
  1. Get data types and other information about the dataframe
df.info()
  1. View few initial rows from a dataframe
df.head()
df.head(5) #returns top 5 rows
  1. Obtain value count for a column/series
df['column name'].value_counts()
df['column name'].value_counts().head(5) #gives top 5 values with max counts
  1. Find unique count in a series. There are 2 ways to achieve this. One way is to use len method and other is to use nunique method.
count = len(df['column name'].unique())
count = df['column name'].nunique()
  1. Create a new column of an existing column using lambda expression
# assume we have data as reason: text and we want to obtain reason from existing col.
df['new col'] = df['exising col'].apply(lambda x: x.split(':')[0])
df['new col'] #print the new column

Data Visualization

seaborn cheat sheet