Created 03-02-2017 03:20 PM
I'm using both HDF and HDP , and I'm using NIFI in HDF to stream data into HDP, but in a specific ETL use case i need to fetch data in HDP HDFS , what's the best practice to do this in NIFI , can HDF connect to HDP hdfs ?
Created on 03-02-2017 08:23 PM - edited 08-19-2019 02:02 AM
You will want to use one of the available HDFS processors to get data form your HDP HDFS file system.
1. GetHDFS <-- Use if standalone NiFi installation 2. ListHDFS --> RPG --> FetchHDFS <-- Use if NiFI cluster installation All of the HDFS based NiFi processors have a property that allows you to specify a path to the HDFS site.xml files. Obtain a copy of your core-site.xml and hdfs-site.xml files from your HDP cluster and place them somewhere on the HDF hosts running NiFi. Point to these files using the "Hadoop Configuration Resources" processor property.
example:
Thanks,
Matt
Created 03-02-2017 03:24 PM
HDF is not an ETL tool. How much data do you want to fetch from HDP? If it's a big chunk (millions of records or more), then why not use Sqoop? Can you please describe what you intend to do with the data you fetch?
Created 03-02-2017 03:38 PM
no it's rather a small chunk of thouands of records
Created 03-02-2017 06:56 PM
then just use GetHDFS or ListHDFS -> FetchHDFS. In these processors you will have to specify client config files from your hDP cluster and that's how it knows where to connect, which keytab and principal to use if Kerberos is enabled and which directories to fetch files from.
Created 03-02-2017 03:24 PM
Created 03-02-2017 03:38 PM
How can i specifiy in the listHDFS processor that it need to list hdfs of my HDP cluster rather than the HDF ?
Created 03-02-2017 03:28 PM
There are a number of HDFS based processors in Ni-Fi, including GetHDFS, FetchHDFS, GetHDFSEvents, etc.
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.hadoop.GetHDFS/
Native processors can read or write to HDFS, depending on your requirement.
Full docs below:
http://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.1.2/index.html
Created on 03-02-2017 08:23 PM - edited 08-19-2019 02:02 AM
You will want to use one of the available HDFS processors to get data form your HDP HDFS file system.
1. GetHDFS <-- Use if standalone NiFi installation 2. ListHDFS --> RPG --> FetchHDFS <-- Use if NiFI cluster installation All of the HDFS based NiFi processors have a property that allows you to specify a path to the HDFS site.xml files. Obtain a copy of your core-site.xml and hdfs-site.xml files from your HDP cluster and place them somewhere on the HDF hosts running NiFi. Point to these files using the "Hadoop Configuration Resources" processor property.
example:
Thanks,
Matt