Member since
04-24-2017
61
Posts
6
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4498 | 12-23-2018 01:06 PM | |
2918 | 12-14-2018 10:59 AM |
04-29-2019
11:18 AM
Sorry about the confusion, yes you are right, I completely missed the part that inside a queue application can be preempted. Thanks for pointing out the Engineering blog whihc elarly states that, otherwise I see lots of other documentation does not explicitly mentions this and talk [preemption only in terms of other queue. Now , Is there a way to switch off Applications "within" the queue: I went through all fair share configurations, there is no configuration to switch off preemption only for the within queue. You can switch preemption at the queue level. The workaround can be to create multiple queues and then use one queue for critical jobs whihc you do not want to get preempted and disbale preemption on that particular queue. Use other queues whihc you are ok with geting preempted. In that case, even if you submit a new critical job to that queue, it will not preempt any container and should wait till previous submitted critical jobs releases some resoruce. If I put maximum applications per queue 10, cluster is under utilized: Yes setting max application might not be good idea in this usecase. Even if set it to 10 , then if 2 application has take all resource , a third applciation can come and preempt those containers. So for this use case I think splitting the queues and configuring individual queues as mentioned above. I am not able to think of any better solution. Will like to hear your feedback.
... View more
04-29-2019
07:24 AM
An application will never get preempted for an application in the same queue. The preemption logic works only for prempting from another queue whihc is running above fairshare. So if 10 applications are running in Queue A and taking all resource and queue A is running over fairshare , the a new application coming in Queure A will wait till some application finish and resource are available. The 10 application will not get preempted. But if a job is posted in Queue B and preemption is enabled on Queue A ( and otehre conmdititons satisfied), the applications in Queue A can get preempted. So most probably your application is getting preempted due to other queue and not becasue of same queue applications. You can disable preemption on critical queues so taht no otehr queue can also preempt from that critical queue.
... View more
03-25-2019
08:09 AM
Hi , This looks like application is trying to find an hdfs file under yarn local directory hdfs:/user/user1/pqrstu/config/input_abc1234.xml The file being create here should be just input_abc1234.xml Not sure what might be casuing this. Can you please give us the exact command you are using to submit spark job. Thanks Bimal
... View more
02-28-2019
01:22 PM
Hi, It should be be batched and multiple containers should be preepmted to fulfill the requirement. In your example the application should have been preempted. If it is not getting preempted, you need to see all other settings and see waht is stopping it from preemption. Like how long it will wait before preemptiong etc. If you give the screen shot of RM Web UI > scheduler > expland the queues to show the graphical picture aand the settings details fo the queues, we can take a look. It should show the one queue using 100% and other queue having applications pending for longer than the time after whihc the containers are supposed to be preempted. Regards Bimal
... View more
02-28-2019
01:14 PM
Hi, Mulitple containers get preempted to fulfill the need of one big container provided those small containers are running in the queue over fair share and other conditions make it eligile for preepmption as mentioned in the link. So if there are two 5gb containers from that queue on a node that can be preempted, then that will be preempted and assigned to this 10GB ram container on other starved queue. Regards Bimal
... View more
02-13-2019
06:47 AM
1 Kudo
Hi, I assume your ceoncern is if Flume will take some space on local file system or not and not on HDFS. Flume can have file channels where the data sits on the local file system as it gets consumed by sinks. Also if you are using Spillable Memory Channel then data will be stored on local disk. So we need to account for the file space required on local file system where flume channel is running. But kafka or flume should not take space on HDFS. Even kafka log files are stored on local file system and not on the HDFS. Regards Bimal
... View more
01-28-2019
07:49 AM
Hi, Looks like the put queue is getting full immediately as the data is not getting committed to the sink as fast as the data is coming to the channel. Can you try to reduce the hdfs.batchSize to 100 and see if that helps. Regards Bimal
... View more
01-23-2019
07:10 AM
Hi, There can be many reason for this so you can check following to begin with and let us know if these settings are fine: 1. What is the value of fs.namenode.delegation.token.max-lifetime set on your cluster to see after which the tokens might not be renewed. If that is set less for 2 days then that can explain the behavior. 2. You also need to check the logs to see if there was some exception while trying to renew the tokes and if yes, resolve that 3. The renewal is also dependent upon the AM implementation whihc in this case is gobblin, so you need to check how the keytabs are being passed to AM in case of gobblin and if the configurations at gobblin end is set correctly like gobblin.yarn.login.interval.minutes and gobblin.yarn.token.renew.interval.minutes We do not support gobblin, but for simillar yarn application like spark, we pass the keytab while submitting application whihc is used to renew the token. So this needs to be looked from gobblin side too. Thanks & regards Bimal
... View more
12-26-2018
01:36 PM
1 Kudo
Hi, 1. 1. If the files are not moved to another folder (like questions 1 and 2 I mentioned), when the folder is too many files, for example 1 billion files, the server is full, I have to do that what? Maybe I have to reconfigure with another spool folder? Ans: You can configure flume to delete it so that the files does not keep on accumulating in your directory. deletePolicy never When to delete completed files: never or immediate 2. The error you are getting is due to the regex and pattern being incorrect. This combination works : tier1.sources.source1.interceptors.i1.regex = ^(?:\\[)(\\d\\d\\d\\d-\\d\\d-\\d\\d\\s\\d\\d:\\d\\d:\\d\\d) tier1.sources.source1.interceptors.i1.serializers.s1.pattern = yyyy-MM-dd HH:mm:ss With the above regex we are matching anything starting with [dddd-dd-dd dd:dd:dd and discarding the starting [ and picking the rest of the pattern. That captured data matches the pattern yyyy-MM-dd HH:mm:ss and it is correctly translated to timestamp. So [ 2012-10-18 18:47:57] ... will be interpretted properly and converted into timestamp. If the regex and pattern does not map then you will not get a timestamp in the header. With your regex the selected group does not matches to the pattern yyyy-MM-dd HH:mm:ss and hence the timestamp in header comes as Null and you get the exception. Plesae let me know if you have any question. Regards Bimal
... View more
12-23-2018
01:06 PM
1 Kudo
Hi, Please find the answers below: 1. When sending files to hadoop, the files in the spool are not moved anywhere, which makes me wonder if there is a new file in the spool, how does Flume recognize the old and new files? Ans: The files get renamed and a suffix is added to the completely ingested file from spool dir, see the following configuration: fileSuffix .COMPLETED Suffix to append to completely ingested files 2. How does Flume after uploading the file to hadoop, will the files in the spool be moved to another folder? Or does Flume have a mechanism to back up files? Ans: Same as above, It is renamed with suffix. 3. I know that Flume has some properties that help work with regex, but I don't know if Flume supports sending files to hadoop and sorting those files into regex-based directories? If so, how do I do it? Ans: You can use the HDFS directory path with certain formatting escape sequences that will replaced by the HDFS sink to generate a directory/file name to store the events. For example to store the file in different directory based on dates hdfs.path = /flume/%Y-%m-%d For more detail on the escape sequence see the following link: https://flume.apache.org/FlumeUserGuide.html#hdfs-sink 4. Does Flume support sending files to hadoop and categorizing them into directories based on the date sent? (I have read that part in HDFS Sink but when I tried it failed) Ans: - If you give the configuration you are using , I can try to fix the issues with it. 5. While using Flume to send files to hadoop, can I fix the file contents such as adding file names into the data stream, or changing the ";" into "|"? Ans: If you just want to add the file name to the data, you should try following configuration for the spplodir source type: basenameHeader false Whether to add a header storing the basename of the file. basenameHeaderKey basename Header Key to use when appending basename of file to event header. If you want to do regex replace , you will have to use Search and Replace Interceptor You can specify the search regex and replace string. See the following link: https://flume.apache.org/FlumeUserGuide.html#search-and-replace-interceptor 6. Can I use any API, or any tool to monitor Flume file transfer to hadoop? For example, during file transfer, see how many files have been transferred to hadoop or how many files have been successfully submitted and how many files sent to hadoop failed. Ans: Not sure if anything available for spooldir, but you should see the Monitoring section and see if you can use something https://flume.apache.org/FlumeUserGuide.html#monitoring 7. Does Flume record transaction logs with hadoop? For example, how many files have been uploaded to hadoop, ... Ans: I don't think so but might need more research to see if you can track what all files has been written. You can check your spool dir for files sent. Thanks Bimal
... View more
12-14-2018
10:59 AM
1 Kudo
Hi, After saving the changes, you should have seen the icon to refresh cluster. Clicking this icon should do the steps to update the values. The configuration looks good. Check the value of CM > Flume > configuration > Agent , this will tell whihc node the tier1 is configured to run on. You can check the logs on that node to confirm if the sink1 got started or not. ( The logs are by default under /var/log/flume-nd). If you do not see the data in HDFS , please see the logs and you should see corresponding error message if ther is any issue in writting to hdfs. Regards Bimal
... View more
12-08-2018
07:45 AM
Great Sunil Regards Bimal
... View more
12-07-2018
08:35 PM
Hi Alex, Look for the Logaggregation related messages in the Node manger log file on one of the node where one of the container was running for the application: In normal case you should see: 2018-12-07 20:27:59,994 INFO org.apache.spark.network.yarn.YarnShuffleService: Stopping application application_1544179594403_0020 ... org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Application just finished : application_1544179594403_0020 .. org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Uploading logs for container container_e06_1544179594403_0020_01_000001. Current good log dirs are /yarn/container-logs Do you see these messages for the failing application or do you see some error/exception instead? If you can paste the relevant log for the failing application I can take a look. Regards Bimal
... View more
12-07-2018
09:29 AM
Hi Sunil, That means the spark submit is asking for container size of 4 Gb. The --executor-memory must be getting set to 4g. Can you check the saprk command being used and set the --executor-memory and --driver-memory to 6g. Regards Bimal
... View more
12-06-2018
09:39 AM
Hi Sunil, This error is indicatig tha the containre size in yarn is set to 4.00 Gb and your spark application needs more memory to run. Container Memory : yarn.nodemanager.resource.memory-mb As a test you can increase the continer size in yarn configuration to say 6 Gb or 8Gb and see if the application succeeds. ( If using Cloudera manager, you will se this in CM > yarn > configuration> Container Memory yarn.nodemanager.resource.memory-mb) Regards Bimal
... View more
12-04-2018
02:58 PM
Do you see some error or info related to timeout or anything indicating why the Application failed and had to be restarted. You can gather the yarn log for the application and check at the Application Master section to see what was the reason for failure. May be the Spark configuration being used at CDSW are differnt than the one used by spark submit.
... View more
11-26-2018
07:29 AM
1 Kudo
Hi, Seems like you are using jdk 1.7 which is not compatible with the kafka version being used. Using jdk 1.8 should resolve this issue. Also see the following link: https://stackoverflow.com/questions/10382929/how-to-fix-java-lang-unsupportedclassversionerror-unsupported-major-minor-versi Regards Bimal
... View more