Pandas using jupyter notebook

Basics of python pandas to get started with the library and jupyter notebook

October 30, 2023 · 1 min · Vasav Anandjiwala

Table of Contents

Pandas using jupyter notebook

Pandas using jupyter notebook

Prerequisites and basic setup

Install anaconda on your machine
Open command prompt
Navigate to your desired directory
use command jupyter notebook to start the server
In a web browser, hit http://localhost:8888/ url to access jupyter notebook

Import statements

Import pandas and numpy to get started with basic data analysis

import numpy as np
import pandas as pd

Import data visualization libraries

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

Import a csv file as a dataframe

df = pd.read_csv('file.csv')
df = pd.read_csv('/path/file.csv')

Get data types and other information about the dataframe

df.info()

View few initial rows from a dataframe

df.head()
df.head(5) #returns top 5 rows

Obtain value count for a column/series

df['column name'].value_counts()
df['column name'].value_counts().head(5) #gives top 5 values with max counts

Find unique count in a series. There are 2 ways to achieve this. One way is to use len method and other is to use nunique method.

count = len(df['column name'].unique())
count = df['column name'].nunique()

Create a new column of an existing column using lambda expression

# assume we have data as reason: text and we want to obtain reason from existing col.
df['new col'] = df['exising col'].apply(lambda x: x.split(':')[0])
df['new col'] #print the new column

Data Visualization

seaborn cheat sheet