Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Can we load data directly from HDF Nifi cluster to HDFS in a different HDP cluster?

I want to read a csv file which has lat long data through Nifi and each record will hit a solr-cloud instance for reverse-geocoding. Post which, the information needs to be loaded into a different HDP cluster for different Hive processing. I understand I can directly load the data into Hive tables using Nifi which is also an option.

However, i'm not sure how to directly load data into HDFS on a different cluster. Can any of you point me in the right direction? Any documents or blogs I can refer?

2 REPLIES 2

@Gaurav Mallikarjuna

For simply copying data from one cluster to another you can use the DistCp tool.

hadoop distcp hdfs://nn1:8020/source hdfs://nn2:8020/destination 

Where hdfs://nn1:8020/source is the data source, and hdfs://nn2:8020/ destination is the destination. This will expand the name space under /source on NameNode "nn1" into a temporary file, partition its contents among a set of map tasks, and start copying from "nn1" to "nn2". Note that DistCp requires absolute paths.

For more details and options on DistCp you can check the following guide:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_administration/content/using_distcp.html

@Dinesh Chitlangia I am aware of distcp. As described in the description, I have to parse each row and do reverse-geocoding using solr. Then each record is enriched with geo location information. I want to write the updated flow files into a different cluster rather than writing the flowfiles into HDF cluster and then Distcp to another cluster. I'm trying to avoid this unnecessary over-head.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.