Recently I got a chance to work on this requirement where I need to create a clone of existing hive database. I need to create another database having same data and tables.
This task involved three steps:
- Create new database DB2.
- Copy all data from DB1 to DB2.
- Create all tables in DB2.
Step 1: This is simple as we can create DB2 as follows:
Create DATABASE <DB Name>;
Step 2: Copy all data from DB1 to DB2, I did this through Cloudera Manager snapshot. First we have to enable snapshot for the database, we want to clone. Log in to Cloudera Manager.
Enable and take Snapshot
Go to HDFS > File Browser then go to the path of the database you want to clone for example: /user/hive/warehouse/flight.db and enable snapshot from the drop down menu. Once snapshot is enabled, take the snapshot from drop down menu.
Enable snapshot
Once snapshot is enabled click on the take snapshot
Give the name of snapshot and click OK.
Our snapshot is ready.
Restore Snapshot to new Database
Through this step we will copy all data from old database to new database. First we will create new database.
create database flight_backup;
Now we will restore snapshot to this database (flight_backup). Go to HDFS > File Browser > /user/hive/warehouse/flight.db click on Restore Directory From Snapshot as.. shown below
Provide the path where you want to restore it. In our case we have to provide new database path in HDFS i.e /user/hive/warehouse/flight_backup.db
Click Restore. Once it’s done we can see all data has been restored to new path
But when we check from Hue, we see there are no tables in this database. Because we have migrated table structure in database. As of now we have only data in HDFS under HDFS file system.
Step 3: Now we have to migrate/create tables from old database (flight). There are two ways I can do this work.
- Create tables manually through this statement if we have very few tables
CREATE TABLE NEWDB.NEW_TABLE1 AS select * from OLDDB.OLD_TABLE1;
- We can use below script if we have lots of tables.
#!/bin/bash
# Author: Prashant Raghava
# 1/24/2020
# This program exports all tables from the given Hive database
##############################################################
EXPORT_DIR=/user/hive/warehouse/NEW_DB_NAME.db/
HiveTables=$(hive -e "use OLD_DB_NAME; show tables;" 2>/dev/null | egrep -v "WARN|^$|^Logging|^OK|^Time\ taken")
#hdfs dfs -mkdir -p /user/hive/warehouse/NEW_DB_NAME.db/ 2>/dev/null
for Table in $HiveTables
do
hive -e "CREATE TABLE NEW_DB_NAME.$Table AS select * FROM OLD_DB_NAME.$Table;"
done
To execute this script you have to do SSH on the hive server hosts and make sure you have hive shell enabled i.e. you should be able to launch hive shell on the host and also the user should have proper permission to create table. This script will take some time to complete based on the number of tables presents in old database. Once it’s done we can go back and check the database to verify tables. During execution of script you will see something like this. Spark job is running and creating tables.
Let’s check the table in Old database and in new database through HUE.
Tables in flight DB | Tables in flight_backup DB |
We can see all tables has been created successfully. There is one temp table we can ignore it also, all data is populated properly in all tables.
This is the way we can successfully clone DB and tables in Hive.
Please feel free to add your comments or if you have any questions.