- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to use distcp editor of HUE?
Created ‎04-09-2021 01:26 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Everyone,
Has anyone tested the Distcp editor available in HUE with CDP 7.1.5?
Currently I see the below screen:
In this, I'm not able to manually enter the source cluster path. As soon as I click on the source path text box, it opens a dialog box to select the path from the current cluster HDFS.
Has anybody tested pulling data from another cluster? What config changes are required for getting this to work?
Thanks,
Megh
Created ‎04-29-2021 11:03 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In my case, though DistCp editor didn't work out, this same thing could be achieved within HUE by using DistCp action in Oozie.
Designed a simple oozie workflow with DistCp action and managed to get it working.
Thanks,
Megh
Created on ‎04-09-2021 06:53 PM - edited ‎04-09-2021 06:53 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello
This HUE distcp editor is designed to replicate data within the cluster and/or with the object store
You can click on the "..." button next to the input box to see what are the directories you account has access to, but within the current cluster scope
For data replication between two clusters, use Cloudera Manager/Replication Manager
https://docs.cloudera.com/cdp/latest/data-migration/topics/rm-dc-data-replication.html
Created ‎04-10-2021 12:58 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Daming Xue ,
Thanks for your reply.
The other cluster I have is HDP, so as far as I understand, using replication from CM won't work.
In any case, I think the actual purpose of DistCp is for copying Data between different clusters. The reason I'm exploring a UI based alternative is that I don't want to give terminal access to users for distcp. Any other possibility?
Thanks,
Megh
Created ‎04-10-2021 01:29 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @vidanimegh
In that case, you probably have to run the distcp command to replicate the data
Here are some examples:
https://docs.cloudera.com/runtime/7.2.2/scaling-namespaces/topics/hdfs-using-distcp.html
Created ‎04-10-2021 02:42 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Daming Xue ,
I'm aware of the distcp command, but for that I need to give users access to the terminal which is something I want to avoid for security reasons. I want them to run their distcp jobs through a web UI.
I hope this clarifies.
Thanks,
Megh
Created ‎04-10-2021 02:43 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Just to add on to this, how can I suggest this feature improvement to HUE community for adding support for remote clusters in distcp editor?
Thanks,
Megh
Created on ‎04-10-2021 05:35 AM - edited ‎04-10-2021 05:38 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It will become tricky once Kerberos comes into the picture
Especially both clusters are secure clusters
For feature requests, you can try the Hue's community forum
https://discourse.gethue.com/categories
If you have any connections to any Hue committers, they can help to create a direct feature request via its internal JIRA
Created ‎04-10-2021 08:05 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the suggestion.
I've created a community post here .
Unfortunately, I don't have any direct connections to any Hue commiters, but I'll wait and see if somebody provides an update on this.
Thanks,
Megh
Created ‎04-11-2021 01:29 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@vidanimegh Cloudera Streams Replication Manager is the way you can replicate between HDP and CDP cluster. Setup is little tricky but end of the day you can use Streams Replication Manager as a bridge between 2 cluster as well and replicate the data.
https://docs.cloudera.com/csp/2.0.1/srm-overview/topics/srm-replication-overview.html
Cheers!
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Created ‎04-29-2021 11:03 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In my case, though DistCp editor didn't work out, this same thing could be achieved within HUE by using DistCp action in Oozie.
Designed a simple oozie workflow with DistCp action and managed to get it working.
Thanks,
Megh
