Member since
10-10-2017
13
Posts
0
Kudos Received
0
Solutions
04-05-2019
08:37 PM
Any insights on this?
... View more
03-29-2019
06:26 PM
I am using S3A committer(Staging directory committer) with my object Store (not AWS) where I am trying to upload a large-sized directory (50TB in space) using on a Hortonworks client(hadoop). It uses Map Reduce to upload the directory to my object store. S3A staging committer initiates various tasks to do MultiPart Upload(MPU) operations to the Object Store and they all are committed during the job_commit phase. Problem and Question: 1. MapReduce logs as seen on hadoop client In my case, all the task commit are completed successfully but during S3A committer job_commit phase `it fails` and the error I see is `INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=FAILED. Redirecting to the job history server INFO mapreduce.Job: map 0% reduce 100% INFO mapreduce.Job: Job job_1553199983818_0003 failed with state FAILED due to Job commit from a prior MRAppMaster attempt is potentially in progress. Preventing multiple commit executions` -> `Are these a warning only?` as none of my tasks failed on RM 2. S3A committer error logs during the job_commit phase and it started deleting the files. I couldn't understand why the job_commit failed from below logs LOGS ``` INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter is org.apache.hadoop.fs.s3a.commit.staging.DirectoryStagingCommitter INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Attempt num: 2 is last retry: true because a commit was started. INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$NoopEventHandler INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.jobhistory.EventType for class org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.rm.ContainerAllocator$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system [hdfs://xxxxxx:8020] INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system [hdfs://xxxxxx:8020] INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system [hdfs://xxxxxx:8020] INFO [main] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Emitting job history data to the timeline server is not enabled INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Not attempting to recover. Recovery is not supported by class org.apache.hadoop.fs.s3a.commit.staging.DirectoryStagingCommitter. Use an OutputCommitter that supports recovery. INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system [hdfs://xxxxxx:8020] INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Previous history file is at hdfs://xxxxxx:8020/user/xxxxxx/.staging/job_1553199983818_0003/job_1553199983818_0003_1.jhist INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Starting to clean up previous job's temporary files INFO [main] org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter: Task committer attempt_1553199983818_0003_m_000000_0: aborting job job_1553199983818_0003 in state FAILED INFO [main] org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter: Starting: Task committer attempt_1553199983818_0003_m_000000_0: aborting job in state job_1553199983818_0003 INFO [main] org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter: Task committer attempt_1553199983818_0003_m_000000_0: no pending commits to abort INFO [main] org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter: Task committer attempt_1553199983818_0003_m_000000_0: aborting job in state job_1553199983818_0003 : duration 0:00.007s INFO [main] org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter: Starting: Cleanup job job_1553199983818_0003 INFO [main] org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter: Starting: Aborting all pending commits under s3a://xxxxxx/user/xxxxxx/30000 ``` The Yarn logs don't tell much why the S3 job_committer failed. Any insights what I can look into to figure this. MRAppMaster code where the logs are coming from ``` try { 0309 String user = UserGroupInformation.getCurrentUser().getShortUserName(); 0310 Path stagingDir = MRApps.getStagingAreaDir(conf, user); 0311 FileSystem fs = getFileSystem(conf); 0312 0313 boolean stagingExists = fs.exists(stagingDir); 0314 Path startCommitFile = MRApps.getStartJobCommitFile(conf, user, jobId); 0315 boolean commitStarted = fs.exists(startCommitFile); 0316 Path endCommitSuccessFile = MRApps.getEndJobCommitSuccessFile(conf, user, jobId); 0317 boolean commitSuccess = fs.exists(endCommitSuccessFile); 0318 Path endCommitFailureFile = MRApps.getEndJobCommitFailureFile(conf, user, jobId); 0319 boolean commitFailure = fs.exists(endCommitFailureFile); 0320 if(!stagingExists) { 0321 isLastAMRetry = true; 0322 LOG.info("Attempt num: " + appAttemptID.getAttemptId() + 0323 " is last retry: " + isLastAMRetry + 0324 " because the staging dir doesn't exist."); 0325 errorHappenedShutDown = true; 0326 forcedState = JobStateInternal.ERROR; 0327 shutDownMessage = "Staging dir does not exist " + stagingDir; 0328 LOG.fatal(shutDownMessage); 0329 } else if (commitStarted) { 0330 //A commit was started so this is the last time, we just need to know 0331 // what result we will use to notify, and how we will unregister 0332 errorHappenedShutDown = true; 0333 isLastAMRetry = true; 0334 LOG.info("Attempt num: " + appAttemptID.getAttemptId() + 0335 " is last retry: " + isLastAMRetry + 0336 " because a commit was started."); 0337 copyHistory = true; 0338 if (commitSuccess) { 0339 shutDownMessage = 0340 "Job commit succeeded in a prior MRAppMaster attempt " + 0341 "before it crashed. Recovering."; 0342 forcedState = JobStateInternal.SUCCEEDED; 0343 } else if (commitFailure) { 0344 shutDownMessage = 0345 "Job commit failed in a prior MRAppMaster attempt " + 0346 "before it crashed. Not retrying."; 0347 forcedState = JobStateInternal.FAILED; 0348 } else { 0349 if (isCommitJobRepeatable()) { 0350 // cleanup previous half done commits if committer supports 0351 // repeatable job commit. 0352 errorHappenedShutDown = false; 0353 cleanupInterruptedCommit(conf, fs, startCommitFile); 0354 } else { 0355 //The commit is still pending, commit error 0356 shutDownMessage = 0357 "Job commit from a prior MRAppMaster attempt is " + 0358 "potentially in progress. Preventing multiple commit executions"; 0359 forcedState = JobStateInternal.ERROR; 0360 } 0361 } 0362 } 0363 } catch (IOException e) { 0364 throw new YarnRuntimeException("Error while initializing", e); 0365 } ```
... View more
Labels:
- Labels:
-
Apache Hadoop
10-23-2018
07:50 PM
@Soumitra Sulav Today I redeployed my HDP cluster and it seems to be working with both the methods that you have shared above. I am not sure why it wasn't working with previous set up and seems like as an intermittent issue. I would keep you posted incase I find it again. Thanks for all your help in this.
... View more
10-22-2018
08:57 PM
@Soumitra Sulav While getting logs from YARN resource manager on Web UI at 8088 port in a kerberized cluster, its failing with the authentication error (HTTP Error Code 401, Unauthorized access). I am using chrome and not sure how do I make my web UI to validate the kerberos ticket. Any suggestions.
... View more
10-09-2018
11:38 PM
@Soumitra Sulav Tried both Methods #1 and #2 today and have attached the logs below. Logs are same for both the methods. Now it seems like its not complaining about AWS but still its failing. I verified the JCEKS operation by simply running a -ls command with the user and even what you have suggested above comment and it worked. Just to add the cluster is kerberized. Logs: 0: jdbc:hive2://nile3-vm7.centera.lab.test.com> CREATE DATABASE IF NOT EXISTS datab LOCATION 's3a://s3aTestBucket/db1';
INFO : Compiling command(queryId=hive_20181009185046_b19ccbf4-1cfd-4148-96cd-e20a6fe45b1f): CREATE DATABASE IF NOT EXISTS datab LOCATION 's3a://s3aTestBucket/db1'
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling command(queryId=hive_20181009185046_b19ccbf4-1cfd-4148-96cd-e20a6fe45b1f); Time taken: 0.055 seconds
INFO : Executing command(queryId=hive_20181009185046_b19ccbf4-1cfd-4148-96cd-e20a6fe45b1f): CREATE DATABASE IF NOT EXISTS datab LOCATION 's3a://s3aTestBucket/db1'
INFO : Starting task [Stage-0:DDL] in serial mode
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.reflect.UndeclaredThrowableException)
INFO : Completed executing command(queryId=hive_20181009185046_b19ccbf4-1cfd-4148-96cd-e20a6fe45b1f); Time taken: 0.318 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.reflect.UndeclaredThrowableException) (state=08S01,code=1)
... View more
10-08-2018
03:38 PM
@Soumitra Sulav I tried Method1: i.e added fs.s3a.bucket.s3aTestBucket.security.credential.provider.path=jceks://hdfs@nile3-vm6.centra.lab.test.com:8020/user/test/s3a.jceks restarted HDFS on Ambari but seems like it didn't work. Any suggestion? Please find the logs below. Didn't try Method2 as it can expose my credentials on Ambari UI Logs: 0: jdbc:hive2://nile3-vm7.centra.lab.test.com> CREATE DATABASE IF NOT EXISTS table3 LOCATION 's3a://s3aTestBucket/user/table3'; INFO : Compiling command(queryId=hive_20181008105923_0324b26a-64b7-4c8f-91e3-635c62442173): CREATE DATABASE IF NOT EXISTS table3 LOCATION 's3a://s3aTestBucket/user/table3' INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=hive_20181008105923_0324b26a-64b7-4c8f-91e3-635c62442173); Time taken: 230.585 seconds INFO : Executing command(queryId=hive_20181008105923_0324b26a-64b7-4c8f-91e3-635c62442173): CREATE DATABASE IF NOT EXISTS table3 LOCATION 's3a://s3aTestBucket/user/table3' INFO : Starting task [Stage-0:DDL] in serial mode ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.net.SocketTimeoutException: doesBucketExist on s3aTestBucket: com.amazonaws.AmazonClientException: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Unable to load credentials from service endpoint INFO : Completed executing command(queryId=hive_20181008105923_0324b26a-64b7-4c8f-91e3-635c62442173); Time taken: 115.487 seconds Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.net.SocketTimeoutException: doesBucketExist on s3aTestBucket: com.amazonaws.AmazonClientException: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Unable to load credentials from service endpoint (state=08S01,code=1)
... View more
10-08-2018
08:14 AM
Using a non AWS endpoints for S3a and thereby have a basic issue that hive is not honoring the s3a endpoint if its not AWS. While distcp, hadoop fs, Spark, MapReduce jobs are finding my s3a endpoint and got completed/ successful without any issues but HIVE is ignoring it and is expecting for AWS S3 credentials, as seen in the below example.
I tried three options and error was same with all the 3 options as shown below:
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.net.SocketTimeoutException: doesBucketExist on s3aTestBucket: com.amazonaws.AmazonClientException: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Unable to load credentials from service endpoint
INFO : Completed executing command(queryId=hive_20181007232623_f38e7fac-5aed-4d4a-b08a-9cbfc950d7a6); Time taken: 116.608 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.net.SocketTimeoutException: doesBucketExist on s3aTestBucket: com.amazonaws.AmazonClientException: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Unable to load credentials from service endpoint (state=08S01,code=1)
Option 1: Ran the create database command as shown above but passing my s3 credentials using JCEKS in the HDFS core-site.xml as
hadoop.security.credential.provider.path=jceks://hdfs@nile3-vm6.centera.lab.emc.com:8020/user/test/s3a.jceks
Running a hive query
0: jdbc:hive2://nile3-vm7.centera.lab.emc.com> CREATE DATABASE IF NOT EXISTS table1 LOCATION 's3a://s3aTestBucket/user/table1';
INFO : Compiling command(queryId=hive_20181007232623_f38e7fac-5aed-4d4a-b08a-9cbfc950d7a6): CREATE DATABASE IF NOT EXISTS table1 LOCATION 's3a://s3aTestBucket/user/table1'
INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling command(queryId=hive_20181007232623_f38e7fac-5aed-4d4a-b08a-9cbfc950d7a6); Time taken: 230.907 seconds
INFO : Executing command(queryId=hive_20181007232623_f38e7fac-5aed-4d4a-b08a-9cbfc950d7a6): CREATE DATABASE IF NOT EXISTS table1 LOCATION 's3a://s3aTestBucket/user/table1' INFO : Starting task [Stage-0:DDL] in serial mode
Option 2: Passing User:s3-Key in URL while creating a databaseI am even tried the option of using CREATE DATABASE IF NOT EXISTS table1 LOCATION 's3a://s3-user:s3-secret-key@s3aTestBucket/user/table1'; but didn't work Option 3: Even added the below propert on hive-site
hive.security.authorization.sqlstd.confwhitelist.append=hive\.mapred\.supports\.subdirectories|fs\.s3a\.access\.key|fs\.s3a\.secret\.key On hive shell from Ambari ran the following set fs.s3a.access.key= s3-access-key;
set fs.s3a.secret.key= s3-secret-key;
CREATE DATABASE IF NOT EXISTS table1 LOCATION 's3a://s3aTestBucket/user/table1';
I saw a similar post from past but not sure if the issue is solved or not Link https://community.hortonworks.com/questions/71891/hdp-250-hive-doesnt-seem-to-honor-an-s3a-endpoint.html
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
10-06-2018
01:32 AM
[spark@vm1 spark2-client]$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --num-executors 1 --driver-memory 512m --executor-memory 512m --executor-cores 1 examples/jars/spark-examples_2.11-2.3.1.3.0.1.0-187.jar 10 Logs
Traceback (most recent call last):
File "/usr/bin/hdp-select", line 448, in <module>
listPackages(getPackages("all"))
File "/usr/bin/hdp-select", line 266, in listPackages
os.path.basename(os.path.dirname(os.readlink(linkname))))
OSError: [Errno 22] Invalid argument: '/usr/hdp/current/oozie-client'
ls: cannot access /usr/hdp//hadoop/lib: No such file or directory
Exception in thread "main" java.lang.IllegalStateException: hdp.version is not set while running Spark under HDP, please set through HDP_VERSION in spark-env.sh or add a java-opts file in conf with -Dhdp.version=xxx
at org.apache.spark.launcher.Main.main(Main.java:118)
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Spark