Created 10-19-2016 02:00 PM
Hi All ,
I have 4 TB of table in Prod cluster i want to move this to Dev using Dsitcp but i have less for export in Prod..
So i want to to split the table into some chunks ..can any one help me here
i have try like this
export table tablename where count(customer_id)> 100000 to 'hdfs_exports_location';
export table tablename having count(customer_id)> 100000 to 'hdfs_exports_location';
export table db.tablename partition (count(sname))> "2") to 'apps/hive/warehouse';
and i try like this finally , its also not working
export table db.tablename partition (count(sname)) = "2") to 'apps/hive/warehouse'
But no use..please suggest me
Created 10-21-2016 02:30 AM
@rama
You seem to have data in Hive in Production and you want to move it to Dev also to Hive. You could use Sqoop for Hive <-->Hive transfer. Sqoop will parallelize the transfer for you. You just have to address the possible network separation from Production to Development. If that is a show stopper for you, then you could export from Production Hive to pipe delimited files (splitting could be partition if you have any or you could just export a reasonable number of rows from your table ), move them around via a jump box and then reload those in development.
Created 10-19-2016 05:19 PM
Here are your options:
1. You can go with falcon Hive replication from Prod to Dev if its the same cluster config.
2. You can just distcp the /user/hive/warehouse from PROD to DEV and generate the 'create table' DDL statement from hive, change the NN info on the table and recreate them in DEV. You need to also generate the manual ALTER TABLE ADD Partition statement to get the partitions recognized.
3. You can use Hive Replication to copy the tables.
Created 10-20-2016 07:24 AM
Thank you so much @grajagopal
Can you enhance the 2 step with example please
2)You can just distcp the /user/hive/warehouse from PROD to DEV and generate the 'create table' DDL statement from hive, change the NN info on the table and recreate them in DEV. You need to also generate the manual ALTER TABLE ADD Partition statement to get the partitions recognized.
Created 10-21-2016 02:30 AM
@rama
You seem to have data in Hive in Production and you want to move it to Dev also to Hive. You could use Sqoop for Hive <-->Hive transfer. Sqoop will parallelize the transfer for you. You just have to address the possible network separation from Production to Development. If that is a show stopper for you, then you could export from Production Hive to pipe delimited files (splitting could be partition if you have any or you could just export a reasonable number of rows from your table ), move them around via a jump box and then reload those in development.
Created 10-21-2016 01:52 PM
Thank you @Constantin Stanca
I have did by below steps and works good..
#I distcp the table from Prod to Dev (but table meta data is not visible in dev cluster)
#I created same table schema in Dev what we had created table in Prod
Then i get the table with data.
So this process is good or will face data loss problem?
Created 12-20-2016 03:45 PM
This is a standard approach. You could add a data count check on source and target, just to give you peace of mind.