Pandas Datetime

Manipulating datetime columns in pandas

Vasav

3 minute read

This post explains how to work with date and time in pandas. Date and time are very common for a dataset to have. Based on the use case, the column should be transformed.

Pyspark Data Sources

Pyspark supported data sources

Vasav

13 minute read

Spark support various data sources. Spark has some core data sources built into it while the others are available and maintained by other developers from the community. In this post, I am going to explain the core data sources supported by pyspark.

Combine csv files python

Combine multiple csv files in python

Vasav

1 minute read

import os
import glob
import pandas as pd

path = os.getcwd()
extension = 'csv'

csv_files = glob.glob('*.{}'.format(extension))

df_list = []
for file in csv_files:
    df = pd.read_csv(file)
    df_list.append(df)

pd.concat(df_list).to_csv("combined_file.csv", index=False)

Note: In order to perform the same with excel, change the value of extension and use read_excel method instead read_csv.