Member since
09-06-2016
7
Posts
3
Kudos Received
0
Solutions
09-09-2016
11:46 PM
3 Kudos
@Deepesh Bhatia Before anything, many containers AVAILABLE and SIZED to do the job is a GOOD THING! Many containers needed and underused (BADLY SIZED) is not a GOOD THING! What you want is the least amount of resources for the fastest executed job. I, intentionally, did not say the least number of containers. All it maters is using optimal resources for performance. Let's get to your expected answers: 1. One container is used for each task. A mapper is a task. A reducer is a task. If a query is translated in 10 mappers and 1 reducer, that is 10+1=11 containers to finish the task. If your default queue (let's assume) has 64 GB of RAM and 64 cores then if you set the memory per container to 1 GB and number of cores per container to 1, YARN can allocate for all the jobs running at the same time, up-to 64 containers. Containers can be reused (that setting is true by default) to reduce the overhead for creating new containers. If you understand how much data is processed per task and determine the amount of data is, let't say up-to 512MB then you could reduce the memory allocated per container, but that won't make sense if you don't have more cores to take advantage of 128 containers x 512 MB. If you had 128 cores then you could have 128 containers. You can reduce the number of containers by increasing the size of RAM per container and number of cores per container, but why would you do that? The point is that you need to create containers of the best size to handle your jobs mix. If they have too much resources allocated and under-used then you waste resources. If they are too small to process the data per task then you have bottlenecks. The best practice is to create a container size globally that meets the majority of requirements, however, at the individual job level you can override to set the container size based on the job needs.' 2. You have the root queue. Default is a child of the root queue. Assuming that you create another queue and segregate the resources of the root queue as 50% for default and 50% for the new queue (simplified), of you don't specify to which queue to submit the job, the job will be submitted to default. Assuming that you delete Default, you always have to set the execution queue. If you forget to do it, since there is no default queue, your job will just hang in there because it does not have resources allocated until you kill it and resend it to a specific queue. Resource Manager UI shows load per queue, including number of containers, their RAM and core utilization. Each job shows in the queue it was submitted. 3. Have enough resources available for those tasks. If you have applications that cannot wait for execution you need to either create a queue that guarantees those resources, or increase the overall resources of your cluster, or optimize your jobs to use less resources. References: YARN (what it is, what it does etc): http://hortonworks.com/apache/yarn/#section_1 Resource Manager is also covered in the documentation and available from Ambari UI via YARN and Quick Links at the top/center of the screen. If any of the responses addressed your question, please don't forget to vote and accept the answer. If you fix the issue on your own, don't forget to post the answer to your own question. A moderator will review it and accept it.
... View more
09-09-2016
08:20 AM
1 Kudo
Hi, At this moment this kind of scenario is not really supported. The processor expects that log messages are appended to a file with a fixed name. Something like: - /my/var/log/my-app.log - /my/var/log/my-app.log.1 - /my/var/log/my-app.log.2 Messages being appended to 'my-app.log'. A workaround would be to update your processor every day with a script/crontab to change the filename and rolling filename properties. There is a PR to support the scenario you are describing [1] and it is currently in a review process. You can give it a try if you are in position to, otherwise it will probably make it in the next version of NiFi (1.1.0 I think). [1] https://github.com/apache/nifi/pull/980
... View more
09-06-2016
10:43 PM
Thanks alot Matt for the quick response. I need one more suggestion. Currently application logs get recycled on daily basis. Let's assume Nifi copies the files at particular point in time using ListFile->FetchFile. There will be some more data will get written before next log file creation. What's the best way to copy/sync up the file that is being appended using nifi. Current version of nifi is 0.6.0.1.2.0.1-1. Thanks Deepesh
... View more