Pandas using jupyter notebook

April 19, 2020 Vasav

1 minute read

Pandas using jupyter notebook

Prerequisites and basic setup

Install anaconda on your machine
Open command prompt
Navigate to your desired directory
use command jupyter notebook to start the server
In a web browser, hit http://localhost:8888/ url to access jupyter notebook

Import statements

Import pandas and numpy to get started with basic data analysis

import numpy as np
import pandas as pd

Import data visualization libraries

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

Import a csv file as a dataframe

df = pd.read_csv('file.csv')
df = pd.read_csv('/path/file.csv')

Get data types and other information about the dataframe

df.info()

View few initial rows from a dataframe

df.head()
df.head(5) #returns top 5 rows

Obtain value count for a column/series

df['column name'].value_counts()
df['column name'].value_counts().head(5) #gives top 5 values with max counts

Find unique count in a series. There are 2 ways to achieve this. One way is to use len method and other is to use nunique method.

count = len(df['column name'].unique())
count = df['column name'].nunique()

Create a new column of an existing column using lambda expression

# assume we have data as reason: text and we want to obtain reason from existing col.
df['new col'] = df['exising col'].apply(lambda x: x.split(':')[0])
df['new col'] #print the new column

Data Visualization

seaborn cheat sheet

blog

Home

About

Blog

Recent Posts

Basics of dimesional modeling

Data Warehouse Architecture

Prompt Engineering Notes

Setup Clickhouse on Mac

Medallion Architecture

Pandas using jupyter notebook

Pandas using jupyter notebook

Prerequisites and basic setup

Import statements

Data Visualization

Vasav Anandjiwala

Recent Posts

Basics of dimesional modeling

Data Warehouse Architecture

Prompt Engineering Notes

Setup Clickhouse on Mac

Medallion Architecture

Categories

About