Setup Clickhouse on Mac
Setup datawarehouse on local machine for development and testing
Introduction
In this article, we are going to look into setting up a datawarehouse(Clickhouse). This is perticularly useful for someone who is getting started with analytics,data engineering or sql. Clickhouse has good documentation and it also provides some sample datasets to explore the datawarehouse. I am going to cover the following in this article:
- Setting up Clickhouse using docker
- Create a connection with Clickhouse local instance using DBeaver
- Loading a sample dataset
prerequisite
- You should have docker desktop setup on your machine.
- Docker daemon should be up and running.
- DBeaver should be installed on your machine.
Steps to setup Clickhouse
docker pull clickhouse/clickhouse-server
- Download the ClickHouse server Docker image from the repository.mkdir ~/Documents/<User>/Volumes/clickhouse
- Create a directory in your Documents folder to store ClickHouse data and logs.cd ~/Documents/<User>/Volumes/clickhouse
- Navigate to the newly created directory.mkdir -p clickhouse_data
- Make a subdirectory to store ClickHouse data files.mkdir -p clickhouse_logs
- Make another subdirectory to store ClickHouse log files.docker run -d -v /Users/<user>/Documents/Experiment/clickhouse/clickhouse_data:/var/lib/clickhouse -v /Users/<user>/Documents/Experiment/clickhouse/clickhouse_logs:/var/log/clickhouse-server -p 8123:8123 -p 9000:9000 --name clickhouse-server --ulimit nofile=262144:262144 clickhouse/clickhouse-server
- Run the ClickHouse server in a Docker container, mapping your data and log directories to the container, and setting the necessary ports and file limits.docker ps
- Validate if the container is running.
Establish Connection in DBeaver
- Open DBeaver and Create a new connection:
- Click on
Database > New Database Connection
. - Select
ClickHouse
from the list of available database drivers. - Click
Next
.
- Click on
- Configure the Connection:
- Click on connect by
URL
option and providejdbc:clickhouse://localhost:8123/default
in the input - database:
default
- user:
default
- password: We have not provided it so leave it blank. This is for local development and testing. Not meant for production setup.
- Click on connect by
- Test the Connection:
- Click on Test Connection to ensure that the connection to the ClickHouse server is successful.
- If the connection is successful, click Finish to save the connection.
Setup Sample Data
- For importing the data, change the
socket_timeout
property underdriver properties
in dbeaver. It is available underedit connection
menu. Set the value to300000
to make sure import works as expected.
We are going to import NYC taxi data
and it is documented on the official site as well.
- Create a new sql script on DBeaver and use the following commands in the same order as below:
CREATE TABLE trips (
trip_id UInt32,
pickup_datetime DateTime,
dropoff_datetime DateTime,
pickup_longitude Nullable(Float64),
pickup_latitude Nullable(Float64),
dropoff_longitude Nullable(Float64),
dropoff_latitude Nullable(Float64),
passenger_count UInt8,
trip_distance Float32,
fare_amount Float32,
extra Float32,
tip_amount Float32,
tolls_amount Float32,
total_amount Float32,
payment_type Enum('CSH' = 1, 'CRE' = 2, 'NOC' = 3, 'DIS' = 4, 'UNK' = 5),
pickup_ntaname LowCardinality(String),
dropoff_ntaname LowCardinality(String)
)
ENGINE = MergeTree
PRIMARY KEY (pickup_datetime, dropoff_datetime);
INSERT INTO trips
SELECT
trip_id,
pickup_datetime,
dropoff_datetime,
pickup_longitude,
pickup_latitude,
dropoff_longitude,
dropoff_latitude,
passenger_count,
trip_distance,
fare_amount,
extra,
tip_amount,
tolls_amount,
total_amount,
payment_type,
pickup_ntaname,
dropoff_ntaname
FROM gcs(
'https://storage.googleapis.com/clickhouse-public-datasets/nyc-taxi/trips_{0..2}.gz',
'TabSeparatedWithNames'
);
Validate the insertion using the following query:
SELECT count()
FROM trips;
Next Steps
- You can use this datawarehouse to perform ETL.
- Ingest data from a data lake created on cloud.
- Practice SQL using any avaialble dataset.
I am going to cover few other features as i explore more. Happy learning.
Share this post
Twitter
Reddit
LinkedIn
Pinterest
Email