Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hortonworks Data Flow (Apache Nifi)

avatar
Rising Star

Hi,

I am new to HDF and have few queries on HDF and its configuration. Can anyone please answer my below queries.

  1. What are the steps required to define a workflow so that a Nifi job can be called. I am looking for something similar to Oozie, which can be used to schedule any task related to Hadoop. In a similar context, I am looking how to achieve the same in HDF
  2. What are the ways to secure access to HDF cluster? We wanted to have a HDF cluster on AWS and have a VPC established from our network to AWS. Alongside, we want the HDF cluster to be secured and ring fenced such that designated people / machines only are able to invoke Ni-Fi processing. How to achieve the same?
  3. Extending the security question, can something similar to Knox is available for HDF. If not, how to achieve similar ring-fencing?

Thanks

1 ACCEPTED SOLUTION

avatar
Master Mentor
4 REPLIES 4

avatar
Master Mentor

avatar
Master Mentor

@Greenhorn Techie

1. Nifi is not a replacement for Oozie, you can't schedule jobs though you can run cron commands and execute shell commands within Nifi. It's not a start and stop operation, it continuously runs until you explicitly stop it. You can take a look at rest api to start and stop workflow if that's what you're asking. In the next release, nifi will have scripting capabilities so essentially you can execute groovy, shell, maybe python and maybe pig but I cannot comment on the last two.

2. https://community.hortonworks.com/content/kbentry/886/securing-nifi-step-by-step.html

3. file a jira

avatar
Rising Star

@Artem Ervits Thanks for the info. For the first query, my intention was not to see whether Nifi works as a Oozie replacement, but to see how to get functionality like oozie in HDF world. On further reading, I found out that at each processor level, I can have scheduling (timer based, cron based or event based etc). This is sufficient for our requirements.

For security, I need to look into it deeper. Will come back later with further queries.

Many Thanks

avatar
Rising Star

@Neeraj Sabharwal @Artem Ervits

Just wondering what is the best mechanism to ingest data from relational sources into HDP. To use a combination of ExecuteSQL and putHDFS processors or to use Sqoop and deliver the data to HDP?

Many Thanks