Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Need to bring prod hive table data into test environment using distcp.

avatar

I need to bring prod hive table data into test hive table. Since it's a hadoop to hadoop, i can't use sqoop, hence i can use discp to transfer data across the clusters. But i have one more scenario to be handled while bringing data, that is filtering. Say i have 10 million records in prod hive table, i want to filter using some criteria and bring it to test table. is there a way to give filter parameters in distcp command on the fly? Or any other suggestions? Thanks in advance.

1 ACCEPTED SOLUTION

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
5 REPLIES 5

avatar

You can use distcp -filters to ignore few path, patterns

Refer this:

http://www.ericlin.me/how-to-use-filters-to-exclude-files-when-in-distcp

hadoop distcp -filters /path/to/filterfile.txt hdfs://source/path hdfs://destination/path

avatar

Thanks Namit Maheshwari, data i am bringing into test is hive data, i need to filter using some criteria, like where condition in hive query. distcp -filters to exclude some files right, not on the data level. I want to filter the hive data using some criteria in production, and then want to bring the filtered data into test region.

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar

Thank you Bala!

avatar

@Prabhu Muthaiyan Glad that it helped you. Happy Hadooping!!