Support Questions

Find answers, ask questions, and share your expertise

Exporting in hive , I have 4 TB of table in Prod cluster i want to move this to Dev using Dsitcp but i have less for export in Prod..

Rising Star

Hi All ,

I have 4 TB of table in Prod cluster i want to move this to Dev using Dsitcp but i have less for export in Prod..

So i want to to split the table into some chunks ..can any one help me here

i have try like this

export table tablename where count(customer_id)> 100000 to 'hdfs_exports_location';

export table tablename having count(customer_id)> 100000 to 'hdfs_exports_location';

export table db.tablename partition (count(sname))> "2") to 'apps/hive/warehouse';

and i try like this finally , its also not working

export table db.tablename partition (count(sname)) = "2") to 'apps/hive/warehouse'

But no use..please suggest me

1 ACCEPTED SOLUTION

@rama

You seem to have data in Hive in Production and you want to move it to Dev also to Hive. You could use Sqoop for Hive <-->Hive transfer. Sqoop will parallelize the transfer for you. You just have to address the possible network separation from Production to Development. If that is a show stopper for you, then you could export from Production Hive to pipe delimited files (splitting could be partition if you have any or you could just export a reasonable number of rows from your table ), move them around via a jump box and then reload those in development.

View solution in original post

5 REPLIES 5

Rising Star
@rama

Here are your options:

1. You can go with falcon Hive replication from Prod to Dev if its the same cluster config.

2. You can just distcp the /user/hive/warehouse from PROD to DEV and generate the 'create table' DDL statement from hive, change the NN info on the table and recreate them in DEV. You need to also generate the manual ALTER TABLE ADD Partition statement to get the partitions recognized.

3. You can use Hive Replication to copy the tables.

Rising Star

Thank you so much @grajagopal

Can you enhance the 2 step with example please

2)You can just distcp the /user/hive/warehouse from PROD to DEV and generate the 'create table' DDL statement from hive, change the NN info on the table and recreate them in DEV. You need to also generate the manual ALTER TABLE ADD Partition statement to get the partitions recognized.

@rama

You seem to have data in Hive in Production and you want to move it to Dev also to Hive. You could use Sqoop for Hive <-->Hive transfer. Sqoop will parallelize the transfer for you. You just have to address the possible network separation from Production to Development. If that is a show stopper for you, then you could export from Production Hive to pipe delimited files (splitting could be partition if you have any or you could just export a reasonable number of rows from your table ), move them around via a jump box and then reload those in development.

Rising Star

Thank you @Constantin Stanca

I have did by below steps and works good..

#I distcp the table from Prod to Dev (but table meta data is not visible in dev cluster)

#I created same table schema in Dev what we had created table in Prod

Then i get the table with data.

So this process is good or will face data loss problem?

Explorer

This is a standard approach. You could add a data count check on source and target, just to give you peace of mind.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.