Member since
06-20-2017
6
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
901 | 08-02-2017 05:05 AM |
08-03-2017
03:30 AM
yes, that is my understanding.
... View more
08-02-2017
03:43 PM
Agreed Bryan, I especially like slide #5 in your slide-share deck, as it defines a basic architecture for pretty much any IoT play. As simple as it looks, it begs a bunch of questions in my mind: for example why route events between the edge locations (I assume the blue box represents locations where the IoT events land in the customer network?) and the HDF/HDP cluster in the orange box? Seems like a msg broker makes snese here (Kafka) to guarantee delivery. I know this is an HDF 2 view, and so i'm just trying to hone in on a clear approach.
... View more
08-02-2017
05:29 AM
Doesn't seem like streaming data directly to HDFS will make it very easy to find/aggregate at the end of each window? What about creating a key/value store (with reddis, hbase, or elasticSearch for example) and using it to lookup all the keys associated with each window.
... View more
08-02-2017
05:05 AM
Hoang, I think what that post is saying is while there is not yet a load balancing solution for STS on a secure cluster, in a non-Kerberos environment any external load balancer, like haproxy, or httpd +mod_jk can be used to distribute requests. The Thrift servers will be independent (no Zk coordination) and the load balancer will simply round robin client requests.
... View more
08-02-2017
04:53 AM
Hi Opao, I'm not sure I follow your thinking here, so let me re-state the problem: you have a spark program (written in scala? java?) that needs to take data from an RDD and use it as input to a query against Hive (so not really map-reduce?) on the same cluster and then use the response of the query in your spark program, yes? Why the need to spawn a new JVM? It seems like you could use sparkSQL, spawn a Hive context and execute the query inline. could you elaborate? bob
... View more
08-02-2017
02:44 AM
I am trying to understand basic usage patterns for both, and would like create a simple checklist to see to answer the question? Something like: - Is scale critical, particularly in terms of volume? Kafka - Do I need/want to transform data on ingest? Nifi is perfect, to Kafka events are immutable payload - Can a producer of msgs outpace and overwhelm a consumer? obviously Kafka to decouple them - can the source be modified? with Kafka, the src must publish events, with Nifi no changes are required I do get that in many scenarios, both will be appropriate. So just trying to get a good handle on strengths & weaknesses. thoughts? thanks, bob
... View more
Labels: