Member since
12-14-2015
70
Posts
94
Kudos Received
16
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
6159 | 03-14-2017 03:56 PM | |
1356 | 03-07-2017 07:20 PM | |
4358 | 01-23-2017 05:57 AM | |
5692 | 01-23-2017 05:40 AM | |
1724 | 10-18-2016 03:36 PM |
09-21-2016
03:17 AM
Thanks @ajaysingh
... View more
09-21-2016
02:51 AM
2 Kudos
I know Syncsort is a possible solution here, but wanted to check if we HDF can do the job and if we have any other recommendation other than Syncsort ??
... View more
Labels:
09-01-2016
02:00 AM
1 Kudo
Just an update - the 'SelectHiveQL' has been added as part of Nifi 0.7
... View more
08-30-2016
12:19 PM
3 Kudos
Before I answer the question specifically, let me address (based on my research) the fault tolerance for each of the components within Storm: 1) Nimbus - A stateless daemon which sits on master node which deploys the job (Topology) and keeps track of it. Now there are two scenarios. First, If Nimbus goes down after you submit the topology it will not have any adverse affect on current topology as it is running on worker nodes (and not on the master node), Now if you have kept this process under supervision the process will restart and when your nimbus comes back as it is fail-fast it will retrieve all the meta information of all the active topologies on the cluster from Zookeeper and start tracking them again. Second, If Nimbus goes down before you submit the topology you will simply have to restart it. 2) Supervisor - This daemon is responsible to keep track of worker processes (JVM Process) on the node he sits and coordinate the state with Nimbus through Zookeeper. If this daemon goes down your worker process will not be affected and it will keep on running unless it doesn't get crashed, once it comes back (due to supervisord or monit) it will collect the state from Zookeeper and resume tracking the worker processes. If timeout occurs Nimbus will reschedule the topology on different worker node. 3) Worker Processes (JVM Processes) - These container processes actually execute your topology’s components (Spouts + Bolts), If it goes down Supervisor will simply restart them on different ports(on same worker node) and if it is out of ports it will notify Nimbus and then it will reschedule the process on different worker node. 4) Worker Node (Supervisor + Worker Process) - During this scenario Nimbus will stop receiving the heartbeats (due to timeout) from the Worker Node and then the Nimbus will simply reassign the work to different Worker Node(s) in the cluster. 5) Zookeeper (Zk) - Okay! From all the above you might have infered that all the state gets stored on Zk, Hmmh! What if it goes down or can it go down? Zk is again not a single node process, it has its own cluster and the state stored in Zk is constantly replicated, so even if a single Zk node goes down, the new node will be elected as a leader which will start communicating with Apache Storm. Now, going back to the specific question: When a supervisor with 4 slots (ports) go down The very first thing the Nimbus will try to do is to restart the processes on the SAME worker node on the available ports. And for those processes that it does not have port, it will be reassigned to a different worker node - so yes, it will increase the executor threads on these worker nodes. And from design perspective itself, you should not necessarily counter for redundant ports as Nimbus is designed to take care of this by either restarting the processes on that port or by re-distributing among other worker nodes.
... View more
08-30-2016
12:14 PM
Thanks @Rajkumar Singh
... View more
08-30-2016
12:07 AM
2 Kudos
Say, if there were 3 extra slots, and a supervisor with 4 slots (supervisor.slots.ports) go down, what happens?? - Does storm automatically increase the number of executor threads in worker process from other supervisors?
... View more
Labels:
- Labels:
-
Apache Storm
08-20-2016
05:55 PM
3 Kudos
@gkeys The mergecontent processor has 2 properties that I normally use to determine the output file size
Minimum number of entries Minimum group size For your question as how do i increase the file size to reach a desired file size (say 1gb)? - Set the minimum group size to the size that you would like (i.e 1 gb) AND set the minimum number of entries to 1. This will merge the content to the 1 gb before it writes out to the next processor Can you clarify a little more about your other question on how do i double the size on existing setting?? - do you mean double the size of incoming file? - this will be direct. Just set the minimum number of entries to 2 and minimum group size to 0 b
... View more
08-06-2016
09:56 PM
1 Kudo
@Iyappan Gopalakrishnan Download the nifi-0.7.0-bin.zip file from the downloads link https://nifi.apache.org/download.html After that, if you unzip the file, you will see the folder structure similar to this one below: Then based on the OS, you can either use 'bin/run-nifi.bat' for windows or 'bin/nifi.sh start' for mac/linux. More details on how to start nifi is here https://nifi.apache.org/docs/nifi-docs/html/getting-started.html#starting-nifi You can tail the logs from logs/nifi-app.log (to see if it starts properly) OPTIONAL: By default, nifi starts on port 8080 - but if you see any port conflict or want to start this on a different port, you can change that by editing the file 'conf/nifi.properties', search for 8080 and update the port number. If you like the answer, please make sure to upvote or accept the answer.
... View more
08-06-2016
03:19 AM
@Iyappan Gopalakrishnan Follow the below steps: save your hdf flow files to xml templates download the nifi 0.7 from apache nifi downloads site (https://nifi.apache.org/download.html) unzip the file, edit the port (if you would like) and start nifi import the templates If this answer and comment is helpful, please upvote my answer and/or select as best answer. Thank you!!
... View more
08-04-2016
04:40 AM
3 Kudos
There must be a new nifi processor 'SelectHiveQL' that queries from hive. Also, there is a processor now to insert or update data directly to hive 'PutHiveQL'
... View more