Member since
04-24-2017
61
Posts
6
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5438 | 12-23-2018 01:06 PM | |
3617 | 12-14-2018 10:59 AM |
04-29-2019
11:18 AM
Sorry about the confusion, yes you are right, I completely missed the part that inside a queue application can be preempted. Thanks for pointing out the Engineering blog whihc elarly states that, otherwise I see lots of other documentation does not explicitly mentions this and talk [preemption only in terms of other queue. Now , Is there a way to switch off Applications "within" the queue: I went through all fair share configurations, there is no configuration to switch off preemption only for the within queue. You can switch preemption at the queue level. The workaround can be to create multiple queues and then use one queue for critical jobs whihc you do not want to get preempted and disbale preemption on that particular queue. Use other queues whihc you are ok with geting preempted. In that case, even if you submit a new critical job to that queue, it will not preempt any container and should wait till previous submitted critical jobs releases some resoruce. If I put maximum applications per queue 10, cluster is under utilized: Yes setting max application might not be good idea in this usecase. Even if set it to 10 , then if 2 application has take all resource , a third applciation can come and preempt those containers. So for this use case I think splitting the queues and configuring individual queues as mentioned above. I am not able to think of any better solution. Will like to hear your feedback.
... View more
04-29-2019
07:24 AM
An application will never get preempted for an application in the same queue. The preemption logic works only for prempting from another queue whihc is running above fairshare. So if 10 applications are running in Queue A and taking all resource and queue A is running over fairshare , the a new application coming in Queure A will wait till some application finish and resource are available. The 10 application will not get preempted. But if a job is posted in Queue B and preemption is enabled on Queue A ( and otehre conmdititons satisfied), the applications in Queue A can get preempted. So most probably your application is getting preempted due to other queue and not becasue of same queue applications. You can disable preemption on critical queues so taht no otehr queue can also preempt from that critical queue.
... View more
03-25-2019
08:09 AM
Hi , This looks like application is trying to find an hdfs file under yarn local directory hdfs:/user/user1/pqrstu/config/input_abc1234.xml The file being create here should be just input_abc1234.xml Not sure what might be casuing this. Can you please give us the exact command you are using to submit spark job. Thanks Bimal
... View more
02-28-2019
01:22 PM
Hi, It should be be batched and multiple containers should be preepmted to fulfill the requirement. In your example the application should have been preempted. If it is not getting preempted, you need to see all other settings and see waht is stopping it from preemption. Like how long it will wait before preemptiong etc. If you give the screen shot of RM Web UI > scheduler > expland the queues to show the graphical picture aand the settings details fo the queues, we can take a look. It should show the one queue using 100% and other queue having applications pending for longer than the time after whihc the containers are supposed to be preempted. Regards Bimal
... View more
02-28-2019
01:14 PM
Hi, Mulitple containers get preempted to fulfill the need of one big container provided those small containers are running in the queue over fair share and other conditions make it eligile for preepmption as mentioned in the link. So if there are two 5gb containers from that queue on a node that can be preempted, then that will be preempted and assigned to this 10GB ram container on other starved queue. Regards Bimal
... View more
02-13-2019
06:47 AM
1 Kudo
Hi, I assume your ceoncern is if Flume will take some space on local file system or not and not on HDFS. Flume can have file channels where the data sits on the local file system as it gets consumed by sinks. Also if you are using Spillable Memory Channel then data will be stored on local disk. So we need to account for the file space required on local file system where flume channel is running. But kafka or flume should not take space on HDFS. Even kafka log files are stored on local file system and not on the HDFS. Regards Bimal
... View more
01-28-2019
07:49 AM
Hi, Looks like the put queue is getting full immediately as the data is not getting committed to the sink as fast as the data is coming to the channel. Can you try to reduce the hdfs.batchSize to 100 and see if that helps. Regards Bimal
... View more
01-23-2019
07:10 AM
Hi, There can be many reason for this so you can check following to begin with and let us know if these settings are fine: 1. What is the value of fs.namenode.delegation.token.max-lifetime set on your cluster to see after which the tokens might not be renewed. If that is set less for 2 days then that can explain the behavior. 2. You also need to check the logs to see if there was some exception while trying to renew the tokes and if yes, resolve that 3. The renewal is also dependent upon the AM implementation whihc in this case is gobblin, so you need to check how the keytabs are being passed to AM in case of gobblin and if the configurations at gobblin end is set correctly like gobblin.yarn.login.interval.minutes and gobblin.yarn.token.renew.interval.minutes We do not support gobblin, but for simillar yarn application like spark, we pass the keytab while submitting application whihc is used to renew the token. So this needs to be looked from gobblin side too. Thanks & regards Bimal
... View more
12-26-2018
01:36 PM
1 Kudo
Hi, 1. 1. If the files are not moved to another folder (like questions 1 and 2 I mentioned), when the folder is too many files, for example 1 billion files, the server is full, I have to do that what? Maybe I have to reconfigure with another spool folder? Ans: You can configure flume to delete it so that the files does not keep on accumulating in your directory. deletePolicy never When to delete completed files: never or immediate 2. The error you are getting is due to the regex and pattern being incorrect. This combination works : tier1.sources.source1.interceptors.i1.regex = ^(?:\\[)(\\d\\d\\d\\d-\\d\\d-\\d\\d\\s\\d\\d:\\d\\d:\\d\\d) tier1.sources.source1.interceptors.i1.serializers.s1.pattern = yyyy-MM-dd HH:mm:ss With the above regex we are matching anything starting with [dddd-dd-dd dd:dd:dd and discarding the starting [ and picking the rest of the pattern. That captured data matches the pattern yyyy-MM-dd HH:mm:ss and it is correctly translated to timestamp. So [ 2012-10-18 18:47:57] ... will be interpretted properly and converted into timestamp. If the regex and pattern does not map then you will not get a timestamp in the header. With your regex the selected group does not matches to the pattern yyyy-MM-dd HH:mm:ss and hence the timestamp in header comes as Null and you get the exception. Plesae let me know if you have any question. Regards Bimal
... View more