Member since
09-09-2014
17
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
41643 | 01-05-2015 01:24 PM |
10-15-2015
06:14 PM
Dear Friends
We need your help. We have recently updated our domain name/ip in our Kerberos Active directory authentication setting in CM. Now, we got the following health issues and we cannot start our cluster and CM service. Any help much appreciated!
We have two name nodes, one added with High availability.
Thanks much in advance and please let me know if you have any question.
Kind regards
Andy
For Yarn, Job history server, here is the error:
This role's process exited. This role is supposed to be started.
Failed to start namenode.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /data/1/dfs/nn is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:313)
Here are the list of health issues:
cluster
HBase
HBase Master Health
catalogserver (name_node)
StateStore Connectivity
impalad (first_data_node)
StateStore Connectivity, Impala Daemon Ready Check., Web Server Status
impalad (2nd_data_node)
StateStore Connectivity, Impala Daemon Ready Check., Unexpected Exits, Web Server Status
impalad (3rd_data_node)
StateStore Connectivity, Impala Daemon Ready Check., Unexpected Exits, Web Server Status
jobhistory (name_node)
Process Status
master (name_node)
Process Status
oozie_server (name_node)
Web Server Status
----
So, here are the errors for other services in details:
A) Hdfs also has these two health issues:
1) NameNode summary: <name_node_name> (Availability: Standby, Health: Good), <2nd name node> (Availability: Stopped, Health: Bad).
This health test is bad because the Service Monitor did not find an active NameNode.
2) Details Canary test failed to create parent directory for /tmp/.cloudera_health_monitoring_canary_files.
B) Oozie error:
The Cloudera Manager Agent is not able to communicate with this role's web server.
log entry:
ERROR org.apache.oozie.servlet.V0AdminServlet
SERVER[<name_node>] USER[hue] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] URL[GET http://<name_node>:11000/oozie/v0/admin/instrumentation] error, null java.lang.UnsupportedOperationException
C) HBase 2 errors:
1) HBase Master Health
Master summary: <name_node> (Availability: Unknown, Health: Bad). This health test is bad because the Service Monitor did not find an active Master.
2) master (<short_name_node>)
Process Status
This role's process exited. This role is supposed to be started.
ERROR org.apache.hadoop.hbase.master.HMasterCommandLine
Master exiting
java.lang.RuntimeException: HMaster Aborted
------------
And here is the CM health issue with the error detail, thank you.
The Reports Manager is not running.
This role's status is as expected. The role is stopped.
WARN org.hibernate.engine.jdbc.spi.SqlExceptionHelper
SQL Error: 0, SQLState: null
3:56:25.884 PM ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper
Connections could not be acquired from the underlying database!
3:56:25.884 PM WARN com.mchange.v2.resourcepool.BasicResourcePool
com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask@10ABCD -- Acquisition Attempt Failed!!! Clearing pending acquires.
While trying to acquire a needed new resource, we failed to succeed more than the maximum number of allowed acquisition attempts (5). Last acquisition attempt exception:
ERROR com.cloudera.headlamp.HeadlampServer
Unable to upgrade schema to latest version.
org.hibernate.exception.GenericJDBCException: Could not open connection
... View more
01-23-2015
05:37 PM
Dear Freinds We are using CM/ CDH 5.3 and planning to enable high availability (HA) (as an alternative method of having a secondary name node) to protect ourselves from name node any possible failure. We need to know after we buy the extra machine (identical to our name node), what steps we need to follow (either by running commands in our CentOS (6.6) name node machine or in Cloudera Manager) before we start following the steps (at CM described here http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_hag_hdfs_ha_enabling.html#cmug_topic_5_12_unique_1). i.e. adding the new node (machine) to the cluster, etc. Thanks in advance and please let me know if you have any question. Any help much appreciated. Kind regards Andy
... View more
01-21-2015
03:19 PM
Dear Friends I need to transfer files 2-5 GB (in different directories). Can I use Flume? If not, any other Hadoop tool available? Is there a recommended tool/best practices? Is it possibile to automate each file transfer flume job (schedule it weekly directly or use other tool) Can I set up Flume (or any other job scheduling tool which runs Flume job) the way that I get notified (preferably by email) if an specific file transfer job failed? Any help/ link much appreciated. Thanks much in advance and please let me know if you need more info. Kind regards Andy
... View more
Labels:
01-21-2015
02:53 PM
Dear Friends We are planning to use Flume to transfer files 2-5 GB (in different directories) on a weekly basis and we want to make sure we will be notified (preferably by email) if any of our Flume jobs fail. Can we use Ooize (Workflow or coordinator in Hue)? if not, any other Hadoop tool available to provide the above functionality (job scheduling and error notification)? Any help/ link much appreciated. Thanks much in advance and please let me know if you need more info. Kind regards Andy
... View more
Labels:
01-05-2015
01:24 PM
1 Kudo
Thanks Romain for your time and your answer. I have restarted CDH and the isused has fixed! This is strange but still a very good news 🙂 I have not changed any configuration and have had restarted the service many times before. Anyway, I will wait couple of more days and then update the case. Thanks again. Kind regards Andy
... View more
01-05-2015
12:54 PM
Thanks much Romain for your time and attention. Couple of quick quesiton: Can you help me by providing the step I need to take from here. Last week, I have created a new post at http://community.cloudera.com/t5/Data-Ingestion-Integration/KMS-AuthenticationToken-ignored-Invalid-signature-A009-HTTP/m-p/23238 I guess our issue is related to the bug (https://issues.apache.org/jira/browse/HADOOP-11151). Is there is way you can help me find out if this issue is going to be fixed in 5.3.1 as it seems it is not fixed in 5.3. If my assumption about the existing bug is correct, I hope I can find a safe workaround for that. P.S. I can run the pig job using grunt with no problem so maybe it something related to the way Hue deals with delegation token (owner=my_active_dir_user, realuser=oozie/ourserver@our_realm) as you can see from the error message from the new post above. Appreciate your professional support. Kind regards Andy
... View more
01-01-2015
05:38 PM
Dear Friends Based on my original post from http://community.cloudera.com/t5/Web-UI-Hue-Beeswax/Pig-script-in-Hue-START-RETRY-status/m-p/23161 and the last response ( This is probably related to kms, as stated somewhere else, I hope they will be able to help about that!), I think the issue (Start_Retry halt status in Pig Editor in Hue) is due to a recent bug mentioned at https://issues.apache.org/jira/browse/HADOOP-11151 We have recently upgraded both CM and CDH to from 5.2 to 5.3 but the bug still exists. So seems it has not completely been fixed as the error messages mentioned in the URL above are exact the same as our error messages we got as below (highlighed in Bold). Would you please let us know if my understanding is correct and if not, what how do you suggest to fix the issue. If yes, does anybody know if it is going to be fixed in the future release i.e. 5.3.1. Thanks much for you attention. P.S. more on KMS: http://hadoop.apache.org/docs/current/hadoop-kms/index.html Kind regards Andy ---------- from /var/log/hadoop-kms/kms.log 2015-01-01 16:25:46,866 WARN org.apache.hadoop.security.authentication.server.AuthenticationFilter: AuthenticationToken ignored: org.apache.hadoop.security.authentication.util.SignerException: Invalid signature --------- from pig job's log in hue (from Oozie dahsboar/workflows --> log tab) 2015-01-01 16:25:46,875 WARN org.apache.oozie.command.wf.ActionStartXCommand: SERVER[our_name_node.com] USER[my_active_dir_user_name] GROUP[-] TOKEN[] APP[pig-app-hue-script] JOB[0000012-141228125521164-oozie-oozi-W] ACTION[0000012-141228125521164-oozie-oozi-W@pig] Error starting action [pig]. ErrorType [TRANSIENT], ErrorCode [JA009], Message [JA009: HTTP status [403], message [Forbidden]] org.apache.oozie.action.ActionExecutorException: JA009: HTTP status [403], message [Forbidden] ... Caused by: java.io.IOException: HTTP status [403], message [Forbidden] at org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:169) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:223) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.getDelegationToken(DelegationTokenAuthenticator.java:145) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.getDelegationToken(DelegationTokenAuthenticatedURL.java:346) at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:799) at org.apache.hadoop.crypto.key.KeyProviderDelegationTokenExtension.addDelegationTokens(KeyProviderDelegationTokenExtension.java:86)
... View more
01-01-2015
03:17 PM
This is exactly what we did and it fixed the problem. Thanks much Darren. FYI: Before that step, we had to temporarily re-enable the user which is capable to create the credentials. We have decided earlier to disable that user to improve security and that was the main source of the issue. Appreciate your support. Kind regards Andy
... View more
12-30-2014
05:04 PM
Thanks much Romain From the URL you kindly shared with me, I just ran beeline and I was able to get into it command line: then I ran the following command all in one line (not sure I need to do that) by replacing my AD username and password below: !connect jdbc:hive2://localhost:10000 <ADUserName> <Password> org.apache.hive.jdbc.HiveDriver 0: jdbc:hive2://localhost:10000> SHOW TABLES; but I got this error message: Connecting to jdbc:hive2://localhost:10000 Error: Could not open connection to jdbc:hive2://localhost:10000: Peer indicated failure: Unsupported mechanism type PLAIN (state=08S01,code=0) 0: jdbc:hive2://localhost:10000 (closed)> I also try the following: sudo service hive-server2 start but got the error: hive-server2: unrecognized service Can you tell me if I missed something and how to run the above commands correctly or is there any way I can test to see if HiveServer2 is running correctly? Thanks much Romain. P.S. Originally, running beeline from the default location ((/usr/lib/hive/bin/beeline) did not work so I just ran it by typing beeline and it worked 🙂 Cheers Andy
... View more
12-30-2014
11:45 AM
Thanks Romain Here is the log info obtained by following your procedure. I am also reviewing it now. much appreciate your support and have a great day. 2014-12-30 11:35:54,537 INFO org.apache.oozie.command.wf.ActionStartXCommand: SERVER[<our_server_name.com>] USER[<my_active_dir_user_name>] GROUP[-] TOKEN[] APP[pig-app-hue-script] JOB[0000004-141228125521164-oozie-oozi-W] ACTION[0000004-141228125521164-oozie-oozi-W@:start:] Start action [0000004-141228125521164-oozie-oozi-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10] 2014-12-30 11:35:54,537 INFO org.apache.oozie.command.wf.ActionStartXCommand: SERVER[<our_server_name.com>] USER[<my_active_dir_user_name>] GROUP[-] TOKEN[] APP[pig-app-hue-script] JOB[0000004-141228125521164-oozie-oozi-W] ACTION[0000004-141228125521164-oozie-oozi-W@:start:] [***0000004-141228125521164-oozie-oozi-W@:start:***]Action status=DONE 2014-12-30 11:35:54,537 INFO org.apache.oozie.command.wf.ActionStartXCommand: SERVER[<our_server_name.com>] USER[<my_active_dir_user_name>] GROUP[-] TOKEN[] APP[pig-app-hue-script] JOB[0000004-141228125521164-oozie-oozi-W] ACTION[0000004-141228125521164-oozie-oozi-W@:start:] [***0000004-141228125521164-oozie-oozi-W@:start:***]Action updated in DB! 2014-12-30 11:35:54,630 INFO org.apache.oozie.command.wf.ActionStartXCommand: SERVER[<our_server_name.com>] USER[<my_active_dir_user_name>] GROUP[-] TOKEN[] APP[pig-app-hue-script] JOB[0000004-141228125521164-oozie-oozi-W] ACTION[0000004-141228125521164-oozie-oozi-W@pig] Start action [0000004-141228125521164-oozie-oozi-W@pig] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10] 2014-12-30 11:35:54,912 WARN org.apache.oozie.command.wf.ActionStartXCommand: SERVER[<our_server_name.com>] USER[<my_active_dir_user_name>] GROUP[-] TOKEN[] APP[pig-app-hue-script] JOB[0000004-141228125521164-oozie-oozi-W] ACTION[0000004-141228125521164-oozie-oozi-W@pig] Error starting action [pig]. ErrorType [TRANSIENT], ErrorCode [JA009], Message [JA009: HTTP status [403], message [Forbidden]] org.apache.oozie.action.ActionExecutorException: JA009: HTTP status [403], message [Forbidden] at org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:412) at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:396) at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:990) at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1145) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:228) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) at org.apache.oozie.command.XCommand.call(XCommand.java:281) at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:323) at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:252) at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: HTTP status [403], message [Forbidden] at org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:169) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:223) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.getDelegationToken(DelegationTokenAuthenticator.java:145) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.getDelegationToken(DelegationTokenAuthenticatedURL.java:346) at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:799) at org.apache.hadoop.crypto.key.KeyProviderDelegationTokenExtension.addDelegationTokens(KeyProviderDelegationTokenExtension.java:86) at org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2017) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:127) at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:556) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:430) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:564) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:559) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:559) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:550) at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:975) ... 10 more 2014-12-30 11:35:54,913 INFO org.apache.oozie.command.wf.ActionStartXCommand: SERVER[<our_server_name.com>] USER[<my_active_dir_user_name>] GROUP[-] TOKEN[] APP[pig-app-hue-script] JOB[0000004-141228125521164-oozie-oozi-W] ACTION[0000004-141228125521164-oozie-oozi-W@pig] Next Retry, Attempt Number [1] in [60,000] milliseconds 2014-12-30 11:36:54,952 INFO org.apache.oozie.command.wf.ActionStartXCommand: SERVER[<our_server_name.com>] USER[<my_active_dir_user_name>] GROUP[-] TOKEN[] APP[pig-app-hue-script] JOB[0000004-141228125521164-oozie-oozi-W] ACTION[0000004-141228125521164-oozie-oozi-W@pig] Start action [0000004-141228125521164-oozie-oozi-W@pig] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10] 2014-12-30 11:36:55,256 WARN org.apache.oozie.command.wf.ActionStartXCommand: SERVER[<our_server_name.com>] USER[<my_active_dir_user_name>] GROUP[-] TOKEN[] APP[pig-app-hue-script] JOB[0000004-141228125521164-oozie-oozi-W] ACTION[0000004-141228125521164-oozie-oozi-W@pig] Error starting action [pig]. ErrorType [TRANSIENT], ErrorCode [JA009], Message [JA009: HTTP status [403], message [Forbidden]] org.apache.oozie.action.ActionExecutorException: JA009: HTTP status [403], message [Forbidden] at org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:412) at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:396) at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:990) at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1145) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:228) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) at org.apache.oozie.command.XCommand.call(XCommand.java:281) at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: HTTP status [403], message [Forbidden] at org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:169) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:223) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.getDelegationToken(DelegationTokenAuthenticator.java:145) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.getDelegationToken(DelegationTokenAuthenticatedURL.java:346) at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:799) at org.apache.hadoop.crypto.key.KeyProviderDelegationTokenExtension.addDelegationTokens(KeyProviderDelegationTokenExtension.java:86) at org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2017) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:127) at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:556) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:430) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:564) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:559) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:559) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:550) at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:975) ... 8 more 2014-12-30 11:36:55,256 INFO org.apache.oozie.command.wf.ActionStartXCommand: SERVER[<our_server_name.com>] USER[<my_active_dir_user_name>] GROUP[-] TOKEN[] APP[pig-app-hue-script] JOB[0000004-141228125521164-oozie-oozi-W] ACTION[0000004-141228125521164-oozie-oozi-W@pig] Next Retry, Attempt Number [2] in [60,000] milliseconds 2014-12-30 11:37:55,370 INFO org.apache.oozie.command.wf.ActionStartXCommand: SERVER[<our_server_name.com>] USER[<my_active_dir_user_name>] GROUP[-] TOKEN[] APP[pig-app-hue-script] JOB[0000004-141228125521164-oozie-oozi-W] ACTION[0000004-141228125521164-oozie-oozi-W@pig] Start action [0000004-141228125521164-oozie-oozi-W@pig] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10] 2014-12-30 11:37:55,653 WARN org.apache.oozie.command.wf.ActionStartXCommand: SERVER[<our_server_name.com>] USER[<my_active_dir_user_name>] GROUP[-] TOKEN[] APP[pig-app-hue-script] JOB[0000004-141228125521164-oozie-oozi-W] ACTION[0000004-141228125521164-oozie-oozi-W@pig] Error starting action [pig]. ErrorType [TRANSIENT], ErrorCode [JA009], Message [JA009: HTTP status [403], message [Forbidden]] org.apache.oozie.action.ActionExecutorException: JA009: HTTP status [403], message [Forbidden] at org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:412) at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:396) at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:990) at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1145) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:228) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) at org.apache.oozie.command.XCommand.call(XCommand.java:281) at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: HTTP status [403], message [Forbidden] at org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:169) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:223) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.getDelegationToken(DelegationTokenAuthenticator.java:145) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.getDelegationToken(DelegationTokenAuthenticatedURL.java:346) at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:799) at org.apache.hadoop.crypto.key.KeyProviderDelegationTokenExtension.addDelegationTokens(KeyProviderDelegationTokenExtension.java:86) at org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2017) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:127) at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:556) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:430) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:564) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:559) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:559) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:550) at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:975) ... 8 more 2014-12-30 11:37:55,654 INFO org.apache.oozie.command.wf.ActionStartXCommand: SERVER[<our_server_name.com>] USER[<my_active_dir_user_name>] GROUP[-] TOKEN[] APP[pig-app-hue-script] JOB[0000004-141228125521164-oozie-oozi-W] ACTION[0000004-141228125521164-oozie-oozi-W@pig] Next Retry, Attempt Number [3] in [60,000] milliseconds
... View more
12-29-2014
05:34 PM
Can you tell me where can I get the recent log? much apprecaite it. Here is the short version I got from server log in hue. [29/Dec/2014 16:56:53 -0800] kerberos_ DEBUG handle_response(): returning <Response [200]> [29/Dec/2014 16:56:53 -0800] kerberos_ ERROR handle_other(): Mutual authentication unavailable on 200 response [29/Dec/2014 16:56:53 -0800] kerberos_ DEBUG handle_other(): Handling: 200 [29/Dec/2014 16:56:53 -0800] connectionpool DEBUG "GET /oozie/v1/job/0000002-141228125521164-oozie-oozi-W?timezone=America%2FLos_Angeles&doAs=<my_username> HTTP/1.1" 200 3293 [29/Dec/2014 16:56:53 -0800] resource DEBUG GET Got response: {"apps":null} [29/Dec/2014 16:56:53 -0800] kerberos_ DEBUG handle_response(): returning <Response [200]> [29/Dec/2014 16:56:53 -0800] kerberos_ ERROR handle_other(): Mutual authentication unavailable on 200 response [29/Dec/2014 16:56:53 -0800] kerberos_ DEBUG handle_other(): Handling: 200 [29/Dec/2014 16:56:53 -0800] connectionpool DEBUG "GET /ws/v1/cluster/apps?user=<my_user_name>&finalStatus=UNDEFINED HTTP/1.1" 200 None [29/Dec/2014 16:56:53 -0800] connectionpool DEBUG Setting read timeout to None [29/Dec/2014 16:56:53 -0800] connectionpool DEBUG Setting read timeout to None [29/Dec/2014 16:56:53 -0800] access INFO <server_ip and my_username> - "GET /jobbrowser/ HTTP/1.1" [29/Dec/2014 16:56:53 -0800] resource DEBUG GET Got response: {"total":153,"workflows":[{"appP... [29/Dec/2014 16:56:53 -0800] kerberos_ DEBUG handle_response(): returning <Response [200]> [29/Dec/2014 16:56:53 -0800] kerberos_ ERROR handle_other(): Mutual authentication unavailable on 200 response [29/Dec/2014 16:56:53 -0800] kerberos_ DEBUG handle_other(): Handling: 200 [29/Dec/2014 16:56:53 -0800] connectionpool DEBUG "GET /oozie/v1/jobs?filter=user%3D<mysusername>%3Bname%3Dpig-app-hue-script&timezone=America%2FLos_Angeles&jobtype=wf&len=100&doAs=<myusername> HTTP/1.1" 200 None [29/Dec/2014 16:56:53 -0800] connectionpool DEBUG Setting read timeout to None [29/Dec/2014 16:56:53 -0800] connectionpool INFO Resetting dropped connection: <our_server_name> [29/Dec/2014 16:56:53 -0800] access INFO <server_ip and my username> - "GET /pig/dashboard/ HTTP/1.1" [29/Dec/2014 16:56:53 -0800] api WARNING Autocomplete data fetching error default.None: Bad status for request TGetTablesReq(schemaName=u'default', sessionHandle=TSessionHandle(sessionId=THandleIdentifier(secret='\xfd\xe7=\xeb \xa9\xbfJ\xc2\x91\x8b\xee\x07.j\xd3\xc5', guid='\xadu\x81e\xb0\x9aI\xaa\xb8\xb5-\x86\xcd\x03\xe7\x8c')), tableName='.*', tableTypes=None, catalogName=None): TGetTablesResp(status=TStatus(errorCode=0, errorMessage='java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient', sqlState=None, infoMessages=None, statusCode=3), operationHandle=None) [29/Dec/2014 16:56:53 -0800] thrift_util DEBUG Thrift call <class 'TCLIService.TCLIService.Client'>.GetTables returned in 3018ms: TGetTablesResp(status=TStatus(errorCode=0, errorMessage='java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient', sqlState=None, infoMessages=None, statusCode=3), operationHandle=None) [29/Dec/2014 16:56:50 -0800] thrift_util DEBUG Thrift call: <class 'TCLIService.TCLIService.Client'>.GetTables(args=(TGetTablesReq(schemaName=u'default', sessionHandle=TSessionHandle(sessionId=THandleIdentifier(secret='\xfd\xe7=\xeb \xa9\xbfJ\xc2\x91\x8b\xee\x07.j\xd3\xc5', guid='\xadu\x81e\xb0\x9aI\xaa\xb8\xb5-\x86\xcd\x03\xe7\x8c')), tableName='.*', tableTypes=None, catalogName=None),), kwargs={}) [29/Dec/2014 16:56:50 -0800] dbms DEBUG Query Server: {'server_host': '<our_server_name.com', 'server_port': 10000, 'server_name': 'beeswax', 'principal': 'hive/<our_server_name.com@realm'} [29/Dec/2014 16:56:50 -0800] access INFO <server_ip and my_user_name> - "GET /beeswax/api/autocomplete/default HTTP/1.1" Kind regards Andy
... View more
12-29-2014
05:22 PM
Hi Romain I have just ran the simple one line pig script in Hue again and here is the new log (short version). Hope it helps. Thanks Romain. Just FYI, the last 4 lines below repeat many times and maybe that is due to the problem. Hue tries to submit the job to Oozie but cannot maybe due to Kerberos authentication failure. handle_response(): returning <Response [200]> [29/Dec/2014 16:56:53 -0800] kerberos_ ERROR handle_other(): Mutual authentication unavailable on 200 response [29/Dec/2014 16:56:53 -0800] kerberos_ DEBUG handle_other(): Handling: 200 [29/Dec/2014 16:56:53 -0800] connectionpool DEBUG Resetting dropped connection: <our server name which is removed for security reason> [29/Dec/2014 16:56:53 -0800] access INFO < IP_ADDRESS and Active Directory user name which are removed for security reason >- "GET /pig/dashboard/ HTTP/1.1" [29/Dec/2014 16:56:53 -0800] api WARNING Autocomplete data fetching error default.None: Bad status for request TGetTablesReq(schemaName=u'default',
... View more
12-29-2014
04:47 PM
Thanks Romain We do not have Sentry service added in CM but I can see "sentry table'' item under "hadoop security" in Hue and when I select it, it generates the following error: Could not connect to localhost:8038 (code THRIFTTRANSPORT): TTransportException('Could not connect to localhost:8038',) Can you help me verify if my user have access to default database as you mentioned? I use my network AD user to login to Hue using Kerberos/LDAP and we set Hue up for this purpose. P.S. The strange thing is, yesterday, when I logged in to Hue, I did not see the error message but today, I got the same Hive error message again. and as mentioned before, this error is new, right after we have upgraded both CDH and CM from 5.2 to 5.3 last week. Appreciate your support and case. Please let me know if you have any question. Kind regards Andy
... View more
12-29-2014
04:37 PM
Thanks Romain We upgraded both CDH and CM to 5.3 last week. I will provide the new error messages but I think it will be the same. P.S. I can ran the same pig command in the grunt successfully so there is something with pig editor in Hue I guess. Just FYI, we are using Kerberos/ldap and AD users to login to Hue. Much appreciate your support and please let me know if you have any question. Kind regards Andy
... View more
12-27-2014
04:02 PM
Dear Friends We have just upgraded our CM and CDH from 5.2 to 5.3 successfully. Hue was also working fine. Today, when I login to Hue, I got the following message: Potential misconfiguration detected. Fix and restart Hue. Hive Editor: The application won't work without a running HiveServer2. Then I clicked on the Hive editor link in Hue, the database area just try to refresh the sample table list and running a simple select statement generates the error below. Would you please help? Your query has the following error(s):java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient The strange thing is, if I login with another user to Hue, it do not show me this message above and also I can run the sample query in Hive error fine. I can see other people have the same issue as below: http://community.cloudera.com/t5/Apache-Hadoop-Concepts-and/Problem-with-CDH-5-VM-quot-Potential-misconfiguration-detected/m-p/13744/thread-id/1141 P.S. Manually refreshing the table in Hive editor did not work and even reinstalling the same tables in Hue generated the following info: Could not install table: Error creating table sample_07: Bad status for request TExecuteStatementReq(confOverlay={}, sessionHandle=TSessionHandle(sessionId=THandleIdentifier(secret='e\(replaced by me for security reason J)', guid='\x(again replaced by me for security reason J)')), runAsync=True, statement="CREATE TABLE `sample_07` (\n `code` string ,\n `description` string ,\n `total_emp` int ,\n `salary` int )\nROW FORMAT DELIMITED\n FIELDS TERMINATED BY '\t'\nSTORED AS TextFile"): TExecuteStatementResp(status=TStatus(errorCode=10072, errorMessage='Error while compiling statement: FAILED: SemanticException [Error 10072]: Database does not exist: default', sqlState='42000', infoMessages=None, statusCode=3), operationHandle=None). Thanks much for your support and please let me know if you have any question. Kind regards Andy
... View more
12-27-2014
03:40 PM
Dear Friends Recently, after adding Flume service in CM, our one line Pig test script (below) in Hue stop working and we start getting Status:START_RETRY In Pig Script editor in Hue. A = load '/andy/ZipCodes.csv'; Would you please help? Please let me know if you need more info. Thanks in advance. Seems the job not even submitted to Oozie. The only solution is to restart Pig and Oozie services in CM but it only works for a day! the next day, I need to do the same. P.S. I have removed the Flume service and the problem solved which means next day, the script start working but we need to use Flume so this solution does not help much L. Would you please help? Thanks in advance. The is no error in the script as it works after the 2 services restarts. We recently upgraded CM and CDH to 5.3 but still the same issue. Here is the info I got from the Hue Server Logs (please see the last line). base ERROR Internal Server Error: /pig/dashboard/ Traceback (most recent call last): File "/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hue/build/env/lib/python2.6/site-packages/Django-1.4.5-py2.6.egg/django/core/handlers/base.py", line 111, in get_response response = callback(request, *callback_args, **callback_kwargs) File "/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hue/apps/oozie/src/oozie/views/dashboard.py", line 110, in decorate return view_func(request, *args, **kwargs) File "/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hue/apps/pig/src/pig/views.py", line 57, in dashboard jobs = pig_api.get_jobs() File "/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hue/apps/pig/src/pig/api.py", line 143, in get_jobs return get_oozie(self.user).get_workflows(**kwargs).jobs TypeError: get_workflows() got an unexpected keyword argument 'user'
... View more
12-27-2014
02:15 PM
Dear Friends I have started using Flume and have added its service using CM. Then, added the two lines below in the Flume Configuration file: tier1.sinks.sink1.hdfs.kerberosPrincipal = $KERBEROS_PRINCIPAL tier1.sinks.sink1.hdfs.kerberosKeytab = $KERBEROS_KEYTAB But Flume service did not start rather generated the following configuration error: Agent(<our_name_node_name>):Role is missing Kerberos keytab. Would you please help? P.S. I have previously enabled Kerberos in CM (by following the link belew) before adding the Flume service. I can see all services principals (for all of our hosts) in CM EXCEPT Flume one on Administration > Kerberos >Credentials tab. http://blog.cloudera.com/blog/2014/07/new-in-cloudera-manager-5-1-direct-active-directory-integration-for-kerberos-authentication/ Thanks much in advance and please let me know if you have any question. Kind regards Andy
... View more