Pandas using jupyter notebook
Prerequisites and basic setup
- Install anaconda on your machine
- Open command prompt
- Navigate to your desired directory
- use command
jupyter notebook
to start the server
- In a web browser, hit
http://localhost:8888/
url to access jupyter notebook
Import statements
- Import
pandas
and numpy
to get started with basic data analysis
import numpy as np
import pandas as pd
- Import data visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
- Import a csv file as a dataframe
df = pd.read_csv('file.csv')
df = pd.read_csv('/path/file.csv')
- Get data types and other information about the dataframe
df.info()
- View few initial rows from a dataframe
df.head()
df.head(5) #returns top 5 rows
- Obtain value count for a column/series
df['column name'].value_counts()
df['column name'].value_counts().head(5) #gives top 5 values with max counts
- Find unique count in a series. There are 2 ways to achieve this. One way is to use len method and other is to use nunique method.
count = len(df['column name'].unique())
count = df['column name'].nunique()
- Create a new column of an existing column using lambda expression
# assume we have data as reason: text and we want to obtain reason from existing col.
df['new col'] = df['exising col'].apply(lambda x: x.split(':')[0])
df['new col'] #print the new column
Data Visualization
seaborn cheat sheet
Share this post
Twitter
Reddit
LinkedIn
Pinterest
Email