Member since
07-25-2016
55
Posts
28
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5752 | 07-26-2016 12:31 AM |
10-24-2016
11:58 PM
Hi, I just recently came across Hortonworks Data Cloud: http://hortonworks.github.io/hdp-aws/. And was curious whether it could also launch HDF cluster (basically Nifi cluster), if not then is there a plan to add support for it? Thanks Obaid
... View more
Labels:
- Labels:
-
Apache NiFi
-
Hortonworks Cloudbreak
10-24-2016
11:22 PM
Thanks @Dominika B, Thanks for sharing the link, seems interesting. So I have a very basic question: Amazon EMR lets you launch manage Hadoop and Spark clusters, so what would be the benefit of using Hortonworks cloud vs just using EMR? Thanks Obaid
... View more
10-23-2016
08:04 AM
3 Kudos
Hi all, I am a newbie to HDP and cloudbreak. I want to move some of our onsite Hadoop clusters/jobs on AWS. Two solutions that I have came-across are Cloudbreak and EMR, however not sure which one to use. I wanted to know which technology to use for launching hadoop jobs on AWS? Pros and cons of using either approach would be really helpful (interms of cost, ease of use, monitoring, metrics, latency etc). One apparent cost optimization feature that I am interested in : is to launch the cluster whenever a job or jobs needs to run, and kill the cluster/nodes whenever there are no more jobs to execute. Thanks Obaid
... View more
Labels:
- Labels:
-
Apache Ambari
-
Hortonworks Cloudbreak
09-27-2016
08:28 PM
1 Kudo
Hi all, I have a scenario where I want to trigger a signal for a flow to start processing whenever there is some data available on S3 to process. In such a scenario, all processors will be EventDriven (except for the trigger), and only run if there is any data to process (or somehow we trigger them to start processing). Scenario: - Whenever a file(or files) lands on S3, launch SQL queries (create table, copy data etc) So, what would be a good way for defining such a trigger? Thanks
... View more
Labels:
- Labels:
-
Apache NiFi
09-22-2016
05:34 PM
Thanks a lot @Matt Burgess and @Pierre Villard for a quick response,
... View more
09-22-2016
05:25 PM
Hi, I have a scenario where I want to ignore flow files if an attribute of a flowfile contains invalid value (like filename contains invalid value i.e name of a directory rather then a filename) Is there a way to totally discard/ignore a FlowFile on the base of an attribute value? Thanks Obaid
... View more
Labels:
- Labels:
-
Apache NiFi
09-21-2016
07:58 PM
3 Kudos
Hi, I have a Json message ''store' which contains an array of 'books'. I want to calculate sum/average of all book prices. Is there a way to do it in Nifi? I explored JsonPath and JOLT, however so far I haven't found a way to do it. Thanks. Input: { "store": {
"books": [
{ "category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{ "category": "fiction",
"author": "Evelyn Waugh",
"title": "Sword of Honour",
"price": 12.99
},
{ "category": "fiction",
"author": "Herman Melville",
"title": "Moby Dick",
"isbn": "0-553-21311-3",
"price": 8.99
},
{ "category": "fiction",
"author": "J. R. R. Tolkien",
"title": "The Lord of the Rings",
"isbn": "0-395-19395-8",
"price": 22.99
}
]
}
} Output: Sum of all prices : 53.92
... View more
Labels:
- Labels:
-
Apache NiFi
09-04-2016
08:22 PM
@Sam Hjelmfelt So far I have not being able to find a feasible way for sending alerts through Nifi cluster, and am curious to know how should I deploy alerts in production Nifi cluster. Thanks
... View more
09-03-2016
09:10 AM
Thanks @Sam Hjelmfelt for your reply, Yes if the data lands on Primary node, PutEmail works as expected. However id the data lands on a Slave node, no Email is generated and flowfiles get stuck on connection for ever (i.e Slave nodes are not able to talk to primary node). Following is an example flow (template is attached, please check it out): In the below dataflow, we generate flow files and run MurgeContent (every 20 seconds) and then pass the result on to two PutEmail processors in parallel. First PutEmail is running on Primary, where as the second PutEmail processor is running on all Slave nodes (Timer event). For PutEmail on primary, it seems like for 9 generated files, only 1 got processed where as 8 got stuck on the connection (seems like slave nodes not able to talk to primary). Second PutEmail worked just fine i.e it processed all 9 flowfiles. So, is there a way to generate 1 Email alert if a processor fails in a cluster? PS: putemaillimitations.xml
... View more
09-02-2016
11:00 PM
2 Kudos
Hi, I am trying to use PutEmail in my workflow to send email alert whenever something fails. I have 8 slave nodes and my dataflow is running on all slaves (meaning not just primary node) The issue is that I get multiple emails if one processor has errors etc. I think this is because we have 8 slaves, so PutEmail is running on all 8 slaves and therefore I get multiple emails. - Is there a way to ensure that we always get 1 Email instead of 8? Thanks Obaid
... View more
Labels:
- Labels:
-
Apache NiFi