Member since
10-24-2015
12
Posts
3
Kudos Received
0
Solutions
03-29-2017
05:34 PM
/usr/hdp/current/zeppelin-server/jarss/ is the temporary location these jars are copied to? Then this path needs to be specified in Zeppelin UI --> Interpreter --> Jdbc --> zeppelin.interpreter.localRepo?
... View more
12-29-2016
12:59 AM
SYMPTOMS: HDP running has recently been upgraded to Isilon 8.0.1.0 and kerberized. When trying to start services through Ambari, user receives the following stderr: Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/resourcemanager.py", line 289, in <module>
Resourcemanager().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in execute
method(env)
File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/resourcemanager.py", line 124, in start
self.wait_for_dfs_directories_created(params.entity_groupfs_store_dir, params.entity_groupfs_active_dir)
File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/resourcemanager.py", line 246, in wait_for_dfs_directories_created
self.wait_for_dfs_directory_created(dir_path, ignored_dfs_dirs)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/decorator.py", line 55, in wrapper
return function(*args, **kwargs)
File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/resourcemanager.py", line 268, in wait_for_dfs_directory_created
list_status = util.run_command(dir_path, 'GETFILESTATUS', method='GET', ignore_status_codes=['404'], assertable_result=False)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 192, in run_command
raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'curl -sS -L -w '%{http_code}' -X GET --negotiate -u : 'http://hwx.isilon.support:8082/webhdfs/v1/hwx/done/?op=GETFILESTATUS&user.name=hdfs'' returned status_code=401.
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>401 Authorization Required</title>
</head><body>
<h1>Authorization Required</h1>
<p>This server could not verify that you
are authorized to access the document
requested. Either you supplied the wrong
credentials (e.g., bad password), or your
browser doesn't understand how to supply
the credentials required.</p>
</body></html> Services are not able to authenticate. ROOT CAUSE: Isilon bug number: 84897 RESOLUTION: During Isilon 8.0.1.0 upgrade, permissions are incorrectly changed on /etc/krb5.conf to 660 (instead of 644). This causes WebHDFS to be unable to check the kerberos tickets. This issue is fixed in Isilon 8.0.1.1. The workaround was to change the permissions on /etc/krb5.conf to 644. Please reach out to Dell/EMC if you need assistance with the workaround.
... View more
Labels:
12-29-2016
12:08 AM
SYMPTOMS:
Starting Falcon server from Ambari is successful, but when user tries to access Falcon web UI.
Logs show one or more cluster entities trying to initialize:
INFO - [main:] ~ Initializing FS: hdfs://fc1.support.com:8020 for cluster: falconcluster1
INFO - [main:] ~ Initializing FS: hdfs://ft2.support.com:8020 for cluster: falcontest2
Start up is delayed with numerous attempts to contact cluster entity.
2016-12-05 13:49:17,240 INFO - [main:] ~ Retrying connect to server: ft2.support.com/xxx.xxx.xxx.xx:8020. Already tried 0 time(s); maxRetries=45 (Client:835)
....
....
2016-12-05 14:19:52,971 INFO - [main:] ~ Retrying connect to server: ft2.support.com/xxx.xxx.xxx.xx:8020. Already tried 44 time(s); maxRetries=45 (Client:835)
2016-12-05 14:20:13,000 ERROR - [main:] ~ Failed to initialize FS for cluster : (SharedLibraryHostingService:200)
org.apache.falcon.FalconException: Failed to initialize FS for cluster : falcontest2
at org.apache.falcon.service.SharedLibraryHostingService.addLibsTo(SharedLibraryHostingService.java:85)
ROOT CAUSE: Falcon is working as expected. A fix was introduced in HDP 2.3.0 to address:
https://issues.apache.org/jira/browse/FALCON-1165
Previously, if Falcon could not access a cluster entity, it would not restart successfully. Newer versions
WORKAROUND:
Verify that the cluster specified in that particular entity is accessible.
If cluster is not accessible or no longer available, remove cluster entity definition from the falcon store (default location: /hadoop/falcon/store/CLUSTER)
... View more
Labels:
12-28-2016
11:05 PM
1 Kudo
HDP Environment: Secure cluster (kerberos enabled) Users managed by Active Directory (AD) SYMPTOMS: When user tries to access the Falcon UI with the following address, they are prompted to enter their username and password: http://<falcon server host>:15000/index.html?user.name=admin#/ After entering the correct AD username and password, user gets this exception in the UI: Using curl to negotiate with Falcon UI URL has no issues. ROOT CAUSE: User is not using kerberos to authenticate RESOLUTION: In order to access the Falcon UI after enabling kerberos, user needs to authenticate using SPNEGO to negotiate with kerberos not with a user name and password. Each browser supports SPNEGO, but configuration is different for each browser. Safari needs no further configuration. After configuring your browser to negotiate using SPNEGO, user must kinit and can try to access the Falcon UI again.
... View more
Labels:
12-28-2016
10:17 PM
SYMPTOM: Oozie workflow is failing with NoClassDefFoundError even though required jars are uploaded to the Oozie sharelib
ROOT CAUSE: Oozie needs to know what jars are needed for a specific action. For example, if you are using the Hive action but it needs HBase jars, this needs to explicitly defined in the oozie-site.xml RESOLUTION: Add oozie.action.sharelib.for.hive to the oozie-site.xml and list which sharelibs that are needed for this action. In the case where Hive action will use HBase: oozie.action.sharelib.for.hive = hive,hcatalog,hbase,oozie,hive2
... View more
Labels:
12-28-2016
10:09 PM
SYMPTOMS:
When using HDFS commands to change ownership of a directory on a cluster backed with Isilon, it returns the following error: [hdfs@hdp.test.cluster ~]$ hadoop fs -chown -R hduser1:hdfs /user/hduser1 chown: changing ownership of '/user/hduser1': Failed to get id rec: 2:hduser1 Running "hdfs groups" command returns the following exception: [hdfs@hdp.test.cluster ~]$ hdfs groups hduser1
Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): bad protocol: org.apache.hadoop.tools.GetUserMappingsProtocol at org.apache.hadoop.ipc.Client.call(Client.java:1427) at org.apache.hadoop.ipc.Client.call(Client.java:1358) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy8.getGroupsForUser(Unknown Source) at org.apache.hadoop.tools.protocolPB.GetUserMappingsProtocolClientSideTranslatorPB.getGroupsForUser(GetUserMappingsProtocolClientSideTranslatorPB.java:57)at org.apache.hadoop.tools.GetGroupsBase.run(GetGroupsBase.java:71) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hadoop.hdfs.tools.GetGroups.main(GetGroups.java:96) The user is verified to be locally available on all nodes and in groups hadoop and hdfs. [hdfs@hdp.test.cluster ~]$ id -gn hduser1
hadoop
[hdfs@hdp.test.cluster ~]$ id -Gn hduser1
hadoop hdfs ROOT CAUSE: User is not in Isilon RESOLUTION: User must be added to Isilon. isiloncluster1-1# isi auth groups create hduser1 --zone zone1 \ --provider local isiloncluster1-1 # isi auth users create hduser1 --primary-group hduser1 \ --zone zone1 --provider local \ --home-directory /ifs/isiloncluster1/zone1/hadoop/user/hduser1 Please reach out to EMC Isilon support if you have issues adding this user to Isilon.
... View more
Labels:
12-28-2016
09:54 PM
SYMPTOMS: Unable to submit Faclon due to XML validation error. In this case, the <timeout> property tag is causing the issue:
ERROR: Bad Request;javax.xml.bind.UnmarshalException <br>- with linked exception: <br>[org.xml.sax.SAXParseException; lineNumber: 11; columnNumber: 14; cvc-complex-type.2.4.a: Invalid content was found starting with element 'timeout'. One of '{"uri:falcon:process:0.1":sla, "uri:falcon:process:0.1":timezone, "uri:falcon:process:0.1":inputs, "uri:falcon:process:0.1":outputs, "uri:falcon:process:0.1":properties, "uri:falcon:process:0.1":workflow}' is expected.]
ROOT CAUSE: Property tags in Falcon process xml need to be in the correct order.
RESOLUTION:The tag needs to follow the correct order in the xml schema. This order is the same as the listed properties in the Falcon documentation:
Falcon Entity Specification
For example, the <timeout> tag must be after order and before frequency in the xml
... View more
Labels:
12-28-2016
06:18 PM
Flume will not deliver partial events. The rollSize and rollInterval are approximate thresholds and are not hard limits. Once the threshold set by rollSize or rollInterval is met, an entire batch will be drained out to the file before it is rolled. Data should not be split up.
... View more
Labels: