Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to use distcp editor of HUE?

avatar
Expert Contributor

Hello Everyone,

 

Has anyone tested the Distcp editor available in HUE with CDP 7.1.5?

 

Currently I see the below screen: distcp screenshot.PNG

 

In this, I'm not able to manually enter the source cluster path. As soon as I click on the source path text box, it opens a dialog box to select the path from the current cluster HDFS.

 

Has anybody tested pulling data from another cluster? What config changes are required for getting this to work?

 

Thanks,

Megh

1 ACCEPTED SOLUTION

avatar
Expert Contributor

In my case, though DistCp editor didn't work out, this same thing could be achieved within HUE by using DistCp action in Oozie.

 

Designed a simple oozie workflow with DistCp action and managed to get it working.

 

Thanks,

Megh

View solution in original post

9 REPLIES 9

avatar
Expert Contributor

Hello

 

This HUE distcp editor is designed to replicate data within the cluster and/or with the object store

You can click on the "..." button next to the input box to see what are the directories you account has access to, but within the current cluster scope

 

For data replication between two clusters, use Cloudera Manager/Replication Manager

https://docs.cloudera.com/cdp/latest/data-migration/topics/rm-dc-data-replication.html

 

 

avatar
Expert Contributor

Hi @Daming Xue ,

 

Thanks for your reply.

 

The other cluster I have is HDP, so as far as I understand, using replication from CM won't work.

 

In any case, I think the actual purpose of DistCp is for copying Data between different clusters. The reason I'm exploring a UI based alternative is that I don't want to give terminal access to users for distcp. Any other possibility?

 

Thanks,

Megh

avatar
Expert Contributor

Hello @vidanimegh 

 

In that case, you probably have to run the distcp command to replicate the data 

 

Here are some examples: 

https://docs.cloudera.com/runtime/7.2.2/scaling-namespaces/topics/hdfs-using-distcp.html

avatar
Expert Contributor

Hi @Daming Xue ,

 

I'm aware of the distcp command, but for that I need to give users access to the terminal which is something I want to avoid for security reasons. I want them to run their distcp jobs through a web UI.

 

I hope this clarifies.

 

Thanks,

Megh

avatar
Expert Contributor

Just to add on to this, how can I suggest this feature improvement to HUE community for adding support for remote clusters in distcp editor?

 

Thanks,

Megh

avatar
Expert Contributor

It will become tricky once Kerberos comes into the picture

Especially both clusters are secure clusters

 

For feature requests, you can try the Hue's community forum

https://discourse.gethue.com/categories

 

If you have any connections to any Hue committers, they can help to create a direct feature request via its internal JIRA 

avatar
Expert Contributor

Thanks for the suggestion.

 

I've created a community post here .

 

Unfortunately, I don't have any direct connections to any Hue commiters, but I'll wait and see if somebody provides an update on this.

 

Thanks,

Megh

avatar
Master Guru

@vidanimegh Cloudera Streams Replication Manager is the way you can replicate between HDP and CDP cluster. Setup is little tricky but end of the day you can use Streams Replication Manager as a bridge between 2 cluster as well and replicate the data. 

https://docs.cloudera.com/csp/2.0.1/srm-overview/topics/srm-replication-overview.html

 


Cheers!
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar
Expert Contributor

In my case, though DistCp editor didn't work out, this same thing could be achieved within HUE by using DistCp action in Oozie.

 

Designed a simple oozie workflow with DistCp action and managed to get it working.

 

Thanks,

Megh