Member since
04-24-2017
12
Posts
0
Kudos Received
0
Solutions
12-26-2017
12:01 PM
@Harald Berghoff Thank you for responding . Please find my responses below in bold - the machine names from the error log are the expected ones? So this basically means: env-2 is example1.com and example2.com is your hdfs master node (port 8020 should be hdfs file service from the name node)? ------ Yes, env-2 is example1.com and example2.com is hdfs master node Are all the issues related to the communication between env-2 and your name node, or do you have other hosts involved as well? -------- Yes, Other hosts are involved as well . Issue is seen other nodes as well. does the process on env-1 start 5 times a day, or is it started once and continues to run (sleeping instead of terminating)? --------- In env -1 , process runs once and continues to run . Only 5 files are moved to HDFS in a single day . We don't restart any process. the ticket renewal on env-1 is identical to the ticket renewal on env-2? ------ ticket renewal is identical in both the env. I am just wondering if it is possible, that your process on env-2
only takes the ticket at start-up, and when the ticket expires, it just
doesn't pick the renewed ticket? If after a restart of your processes on
env-2 all authentication issues are gone for the next around 20h, this
might be the case. And if on env-1 the process is starting 5 times a day
instead of continuously running it might be the reason that the issue
is not occure on env-1. ------ In env-1 , the process is continuously running and we don't restart the process. ------- In env-2 , when the GSS issue pops up, we restart our process and all authentication issues are gone for the next around 20 h. My issue here is, why restart is needed for env-2 . As, in env-1 , all works good without any restart . As I have mentioned in my previous comment, the only difference between env-1 and env-2 is the load and amount of files are moved to HDFS simultaneously is huge in env-2. Please comment, if you need any more information for analysis. Thank you
... View more
12-26-2017
09:14 AM
We have a application which ingest files from LOCAL file system to HDFS in AD kerberos enabled environment . This basically moves files from Local directory to HDFS path. Once the ingestion process start , after 20 hours, we see the below error given randomly and after sometime, we see the error continuously . And finally, no files are moved. Error : java.io.IOException: java.io.IOException: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "example1.com/xxxxx"; destination host is: "example2.com":8020; We have the application running in two environment i.e, Env-1 and Env-2 . The same ingestion process is working fine without any error in Env-1 , and in Env-2 we see the GSS exception error . There is difference of load and incoming files in Env-1 and Env-2 . Env-1 - Per day , 5 files are moved to HDFS and without any error .And the same process goes on everyday. Env-2 - Per 5 minute , 6000 files are moved to HDFS and the GSS exception error is seen after 20 hours. 6000 files are moved into HDFS from 42 different directories simultaneously and total number of threads used are 150 . At a time, 150 files can be moved simultaneously , once the threads are released, it will pick the next files. Hence, the process goes on. Can anyone comment on the below concern and issue seen : 1. Is there something to do with load in KDC server. 2. Are there any parameters in AD server which restrict the number of count of TGT to be generated from KDC at a time. 3. Is there something to do with Kerberos tolerance time. In AD server, Maximum tolerance time is set to 5 min. 4. Please suggest If any parameters need to be added in krb5.conf to handle load and handle huge number of requests incoming to AD at a time. We had checked the below in AD server and Env-2 : 1. AD server and Env-2 time are in sync. 2. Kerberos ticket is not expired. We have set a cron job to renew kerberos ticket every 4 hours. 3. Lifetime of ticket is set in krb5.conf accordingly : renew_lifetime = 7d ticket_lifetime = 24h Can anyone suggest what might be the issue. Thank you.
... View more
06-01-2017
11:45 AM
Hi, I have successfully performed Data replication for HDFS files using Falcon. Now, I would like to perform HDFS file replication with Delta . The HDFS files shouldn't be overwritten during Data replication. Instead, HDFS files would get updated.
... View more
Labels:
- Labels:
-
Apache Falcon
05-30-2017
03:58 AM
Hi, Is there any solution to the error found ? I am facing the same issue in my cluster. Appreciate your help in advance. Thanks!
... View more