Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Get Data from HDP using HDF

avatar

I'm using both HDF and HDP , and I'm using NIFI in HDF to stream data into HDP, but in a specific ETL use case i need to fetch data in HDP HDFS , what's the best practice to do this in NIFI , can HDF connect to HDP hdfs ?

1 ACCEPTED SOLUTION

avatar
Super Mentor
@nedox nedox

You will want to use one of the available HDFS processors to get data form your HDP HDFS file system.

1. GetHDFS <-- Use if standalone NiFi installation 2. ListHDFS --> RPG --> FetchHDFS <-- Use if NiFI cluster installation All of the HDFS based NiFi processors have a property that allows you to specify a path to the HDFS site.xml files. Obtain a copy of your core-site.xml and hdfs-site.xml files from your HDP cluster and place them somewhere on the HDF hosts running NiFi. Point to these files using the "Hadoop Configuration Resources" processor property.

example:

13176-screen-shot-2017-03-02-at-32249-pm.png

Thanks,

Matt

View solution in original post

7 REPLIES 7

avatar
Super Guru
@nedox nedox

HDF is not an ETL tool. How much data do you want to fetch from HDP? If it's a big chunk (millions of records or more), then why not use Sqoop? Can you please describe what you intend to do with the data you fetch?

avatar

no it's rather a small chunk of thouands of records

avatar
Super Guru

@nedox nedox

then just use GetHDFS or ListHDFS -> FetchHDFS. In these processors you will have to specify client config files from your hDP cluster and that's how it knows where to connect, which keytab and principal to use if Kerberos is enabled and which directories to fetch files from.

avatar
Master Guru

Certainly! You can get files from HDFS using the GetHDFS processor or the ListHDFS -> FetchHDFS processors.

avatar

How can i specifiy in the listHDFS processor that it need to list hdfs of my HDP cluster rather than the HDF ?

avatar
Expert Contributor

There are a number of HDFS based processors in Ni-Fi, including GetHDFS, FetchHDFS, GetHDFSEvents, etc.

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.hadoop.GetHDFS/

Native processors can read or write to HDFS, depending on your requirement.

Full docs below:

http://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.1.2/index.html

avatar
Super Mentor
@nedox nedox

You will want to use one of the available HDFS processors to get data form your HDP HDFS file system.

1. GetHDFS <-- Use if standalone NiFi installation 2. ListHDFS --> RPG --> FetchHDFS <-- Use if NiFI cluster installation All of the HDFS based NiFi processors have a property that allows you to specify a path to the HDFS site.xml files. Obtain a copy of your core-site.xml and hdfs-site.xml files from your HDP cluster and place them somewhere on the HDF hosts running NiFi. Point to these files using the "Hadoop Configuration Resources" processor property.

example:

13176-screen-shot-2017-03-02-at-32249-pm.png

Thanks,

Matt