Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Need to bring prod hive table data into test environment using distcp.

Solved Go to solution

Need to bring prod hive table data into test environment using distcp.

I need to bring prod hive table data into test hive table. Since it's a hadoop to hadoop, i can't use sqoop, hence i can use discp to transfer data across the clusters. But i have one more scenario to be handled while bringing data, that is filtering. Say i have 10 million records in prod hive table, i want to filter using some criteria and bring it to test table. is there a way to give filter parameters in distcp command on the fly? Or any other suggestions? Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Need to bring prod hive table data into test environment using distcp.

@Prabhu Muthaiyan

Filter the data from hive prod and load it into a file and then as mentioned by @Namit Maheshwari use distcp to transfer between different environments. If you want to limit the data without any filters being applied filter only a set of files under a HDFS folder.

View solution in original post

5 REPLIES 5
Highlighted

Re: Need to bring prod hive table data into test environment using distcp.

You can use distcp -filters to ignore few path, patterns

Refer this:

http://www.ericlin.me/how-to-use-filters-to-exclude-files-when-in-distcp

hadoop distcp -filters /path/to/filterfile.txt hdfs://source/path hdfs://destination/path
Highlighted

Re: Need to bring prod hive table data into test environment using distcp.

Thanks Namit Maheshwari, data i am bringing into test is hive data, i need to filter using some criteria, like where condition in hive query. distcp -filters to exclude some files right, not on the data level. I want to filter the hive data using some criteria in production, and then want to bring the filtered data into test region.

Highlighted

Re: Need to bring prod hive table data into test environment using distcp.

@Prabhu Muthaiyan

Filter the data from hive prod and load it into a file and then as mentioned by @Namit Maheshwari use distcp to transfer between different environments. If you want to limit the data without any filters being applied filter only a set of files under a HDFS folder.

View solution in original post

Highlighted

Re: Need to bring prod hive table data into test environment using distcp.

Thank you Bala!

Highlighted

Re: Need to bring prod hive table data into test environment using distcp.

@Prabhu Muthaiyan Glad that it helped you. Happy Hadooping!!

Don't have an account?
Coming from Hortonworks? Activate your account here