Created 04-09-2021 01:26 AM
Hello Everyone,
Has anyone tested the Distcp editor available in HUE with CDP 7.1.5?
Currently I see the below screen:
In this, I'm not able to manually enter the source cluster path. As soon as I click on the source path text box, it opens a dialog box to select the path from the current cluster HDFS.
Has anybody tested pulling data from another cluster? What config changes are required for getting this to work?
Thanks,
Megh
Created 04-29-2021 11:03 PM
In my case, though DistCp editor didn't work out, this same thing could be achieved within HUE by using DistCp action in Oozie.
Designed a simple oozie workflow with DistCp action and managed to get it working.
Thanks,
Megh
Created on 04-09-2021 06:53 PM - edited 04-09-2021 06:53 PM
Hello
This HUE distcp editor is designed to replicate data within the cluster and/or with the object store
You can click on the "..." button next to the input box to see what are the directories you account has access to, but within the current cluster scope
For data replication between two clusters, use Cloudera Manager/Replication Manager
https://docs.cloudera.com/cdp/latest/data-migration/topics/rm-dc-data-replication.html
Created 04-10-2021 12:58 AM
Hi @Daming Xue ,
Thanks for your reply.
The other cluster I have is HDP, so as far as I understand, using replication from CM won't work.
In any case, I think the actual purpose of DistCp is for copying Data between different clusters. The reason I'm exploring a UI based alternative is that I don't want to give terminal access to users for distcp. Any other possibility?
Thanks,
Megh
Created 04-10-2021 01:29 AM
Hello @vidanimegh
In that case, you probably have to run the distcp command to replicate the data
Here are some examples:
https://docs.cloudera.com/runtime/7.2.2/scaling-namespaces/topics/hdfs-using-distcp.html
Created 04-10-2021 02:42 AM
Hi @Daming Xue ,
I'm aware of the distcp command, but for that I need to give users access to the terminal which is something I want to avoid for security reasons. I want them to run their distcp jobs through a web UI.
I hope this clarifies.
Thanks,
Megh
Created 04-10-2021 02:43 AM
Just to add on to this, how can I suggest this feature improvement to HUE community for adding support for remote clusters in distcp editor?
Thanks,
Megh
Created on 04-10-2021 05:35 AM - edited 04-10-2021 05:38 AM
It will become tricky once Kerberos comes into the picture
Especially both clusters are secure clusters
For feature requests, you can try the Hue's community forum
https://discourse.gethue.com/categories
If you have any connections to any Hue committers, they can help to create a direct feature request via its internal JIRA
Created 04-10-2021 08:05 AM
Thanks for the suggestion.
I've created a community post here .
Unfortunately, I don't have any direct connections to any Hue commiters, but I'll wait and see if somebody provides an update on this.
Thanks,
Megh
Created 04-11-2021 01:29 AM
@vidanimegh Cloudera Streams Replication Manager is the way you can replicate between HDP and CDP cluster. Setup is little tricky but end of the day you can use Streams Replication Manager as a bridge between 2 cluster as well and replicate the data.
https://docs.cloudera.com/csp/2.0.1/srm-overview/topics/srm-replication-overview.html
Created 04-29-2021 11:03 PM
In my case, though DistCp editor didn't work out, this same thing could be achieved within HUE by using DistCp action in Oozie.
Designed a simple oozie workflow with DistCp action and managed to get it working.
Thanks,
Megh