Member since
06-01-2016
28
Posts
2
Kudos Received
0
Solutions
08-04-2016
12:18 PM
Thanks, the mile away view was all i needed, but you also put another model into my head about parsing/normalization before even going into kafka.
... View more
08-03-2016
04:51 PM
We have multiple applications which need to chew on the the data, but we also have several processes to normalize data (that have to run based on the source data format). For example, a document feed needs to hit tika, but a news feed does not (RSS content). We also have to process e-mails, and an e-mail could go down the path of needing tika for attachments if they are present. Based on that logic, do you still feel a single queue would work sufficient and have all that decision tree/disassembly in a single spark app for efficiency?
... View more
08-03-2016
04:31 PM
In theory, design wise, i've been debating with a developer on an architecture using Spark and Kafka for a data processor. In my mind, I prefer a model where data comes into a queue in Kafka, gets picked up by a dispatcher service (spark) and distributed into another Kafka queue based on the content delivered. He prefers a single Kafka queue, where the spark application does all the individual extraction/disassembly of data. My argument for the multiple queues is I can divide up data that is multi-type (emails for example), and distribute them into queues to be more efficient. He feels that a single Spark application can do that more efficiently based on the distributed model. Which one is better? Is one better?
... View more
Labels:
- Labels:
-
Apache Kafka
-
Apache Spark
08-01-2016
04:59 PM
Thanks, that answers all my questions. I'd be all in HDInsight if MS would give me a free dev environment 🙂
... View more
08-01-2016
03:50 PM
Thanks, and I do agree. still working on breaking out of the layered architecture mindset. HDF messed with my progress on that as in another thread someone recommended it be a different cluster from HDP.
Any chance you have a link which outlines the components that are recommended together on the same machine vs the components recommended not? like for example, I know it's not recommended to have a HDFS namenode on the same machine as the datanode. I was curious if there are some documents breaking that out better.
... View more
08-01-2016
03:10 PM
I've been mulling over a design architecture for a deployment and i'm looking for some input on how to physically layout the environment. My current thought is building a single data storage cluster with HDFS, made up with small machines with large storage and building a separate cluster for my processing layer (spark/yarn/oozie/elastic/etc...) and a DB cluster holding Hive. I don't know if this model is necessarily efficient though or if I should stick with a single cluster, and just manage the services on each individual node. What is everyone's thoughts on these two model options?
... View more
Labels:
07-28-2016
02:07 PM
InvokeHTTP looks to have gotten me closer, only issue now is I am only getting the Post response, the original text is stripped. Thanks, this solves my original problem though.
... View more
07-28-2016
01:31 PM
attached, thanks. I can trace the PostHTTP, and I see the response from the post. but the text doesn't seem to make it through to the replacetext or the second putfile.
... View more
07-28-2016
01:11 PM
That's not what is happening or at least it's not working that way in my setup. The flow I have is: HandleHttpRequest --> PutFile --> PostHTTP --> ReplaceText --> PutFile --> HandleHttpResponse the only thing that seems to apply a change to the text is the ReplaceText being submitted. The first PutFile contains the text submitted, the second PutFile only contains the text modified by the replacetext. the PostHTTP result does not seem to be incorporated into the result.
... View more
07-28-2016
12:42 PM
I've successfully created a process in Nifi to send a POST to a web service, but i'm unable to find out how to incorporate the post response into the httpresponse for the caller. Does anyone know if this is possible or did I go the wrong route for a post/modify/response service? EDIT: the flow I have is: HandleHttpRequest --> PutFile --> PostHTTP --> ReplaceText --> PutFile --> HandleHttpResponse the only thing that seems to apply a change to the text is the ReplaceText being submitted. The first PutFile contains the text submitted, the second PutFile only contains the text modified by the replacetext. the PostHTTP result does not seem to be incorporated into the result.
... View more
Labels:
- Labels:
-
Apache NiFi
- « Previous
-
- 1
- 2
- Next »