Member since
02-27-2019
7
Posts
3
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1480 | 07-13-2016 01:57 AM |
04-03-2017
03:57 PM
ISSUE: All of the datanodes in the cluster are shown as down in Ambari dashboard but hdfs is up, available and healthy. But the output of ps command shows that the service is actually running on the datanodes [root@node2 ~]# ps -ef|grep datanode
hdfs 11150 1 0 Aug09 ? 00:03:01 /usr/jdk64/jdk1.8.0_60/bin/java -Dproc_datanode -Xmx1024m -Dhdp.version=2.3.6.0-3796 -Djava.net.preferIPv4Stack=true
-Dhdp.version= -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/hadoop/hdfs -Dhadoop.log.file=hadoop.log
-Dhadoop.home.dir=/usr/hdp/2.3.6.0-3796/hadoop -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.3.6.0-3796/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.6.0-3796/hadoop/lib/native
-Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhdp.version=2.3.6.0-3796 -Dhadoop.log.dir=/var/log/hadoop/hdfs -Dhadoop.log.file=hadoop-hdfs-datanode-node2.openstacklocal.log
-Dhadoop.home.dir=/usr/hdp/2.3.6.0-3796/hadoop -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,RFA -Djava.library.path=:/usr/hdp/2.3.6.0-3796/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.6.0-3796/hadoop/lib/native:/usr/hdp/2.3.6.0-3796/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.6.0-3796/hadoop/lib/native
-Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -server -XX:ParallelGCThreads=4 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log
-XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/hdfs/gc.log-201608091928 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
-Xms1024m -Xmx1024m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -server -XX:ParallelGCThreads=4 -XX:+UseConcMarkSweepGC
-XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/hdfs/gc.log-201608091928 -verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms1024m -Xmx1024m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -server -XX:ParallelGCThreads=4
-XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/hdfs/gc.log-201608091928
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms1024m -Xmx1024m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT
-Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.datanode.DataNode
SOLUTION: This issue is resolved in Ambari 2.2.2.0. WORKAROUND: This issue has been resolved in Ambari 2.2.2.0. For previous versions you can resolve this issue by making the following changes 1. Make a backup of the /usr/lib/python2.6/site-packages/ambari_agent/PythonReflectiveExecutor.py script cp /usr/lib/python2.6/site-packages/ambari_agent/PythonReflectiveExecutor.py /usr/lib/python2.6/site-packages/ambari_agent/PythonReflectiveExecutor.py_ORIG 2. Remove the file /usr/lib/python2.6/site-packages/ambari_agent/PythonReflectiveExecutor.pyc rm /usr/lib/python2.6/site-packages/ambari_agent/PythonReflectiveExecutor.pyc 3. Edit the /usr/lib/python2.6/site-packages/ambari_agent/PythonReflectiveExecutor.py script and make the following changes. Change this: sys.path.append(self.script_dir)
to this sys.path.insert(0, self.script_dir)
4. Restart the ambari-agent on the host ROOT CAUSE: In this particular case the installation of additional python modules via 'pip install unixODBC-devel' resulted in ambari cached hdfs.py conflicts with python hdfs libraries. This issue is outlined in the following bugs reported by Hortonworks and Apache. Hortonworks: BUG-54974 - ambari cached hdfs.py conflicts with python hdfs lib resulting into monitoring errors (Resolved in Ambari 2.2.2.0) Apache Ambari: AMBARI-14926 - ambari cached hdfs.py conflicts with python hdfs lib resulting into monitoring errors(Resolved in Apache Ambari 2.4)
... View more
Labels:
04-03-2017
03:54 PM
1 Kudo
PROBLEM: user1 has submitted a yarn application with the application id of application_1473860344791_0001. To look at the yarn logs you would execute the following command; yarn logs -applicationId application_1473860344791_0001
If you run the command above as 'user2' you may see output similar to the following;
16/09/19 23:00:23 INFO impl.TimelineClientImpl: Timeline service address: http://mycluster.somedomain.com:8188/ws/v1/timeline/
16/09/19 23:00:23 INFO client.RMProxy: Connecting to ResourceManager at mycluster.somedomain.com/192.168.1.89:8050
/app-logs/user2/logs/application_1473860344791_0001 does not exist.
Log aggregation has not completed or is not enabled.
ROOT CAUSE: When log aggregation has been enabled each users application logs will, by default, be placed in the directory hdfs:///app-logs/<USERNAME>/logs/<APPLICATION_ID>. By default only the user that submitted the job and members of the hadoop group will have access to read the log files. In the example directory listing below you can see that the permissions are 770. No access for anyone other than the owner and members of the hadoop group.
[root@mycluster ~]$ hdfs dfs -ls /app-logs
Found 3 items
drwxrwx--- - hive hadoop 0 2017-03-10 15:33 /app-logs/hive
drwxrwx--- - user1 hadoop 0 2017-03-10 15:37 /app-logs/user1
drwxrwx--- - spark hadoop 0 2017-03-10 15:39 /app-logs/spark
SOLUTION: The message above can be deceiving and does not necessarily indicate that log aggregation has not been enabled. To obtain yarn logs for an application the 'yarn logs' command must be executed as the user that submitted the application. In the example below the application was submitted by user1. If we execute the same command as above as the user 'user1' we should get the following output if log aggregation has been enabled.
yarn logs -applicationId application_1473860344791_0001
16/09/19 23:10:33 INFO impl.TimelineClientImpl: Timeline service address: http://mycluster.somedomain.com:8188/ws/v1/timeline/
16/09/19 23:10:33 INFO client.RMProxy: Connecting to ResourceManager at mycluster.somedomain.com/192.168.1.89:8050
16/09/19 23:10:34 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
16/09/19 23:10:34 INFO compress.CodecPool: Got brand-new decompressor [.deflate]
Container: container_e03_1473860344791_0001_01_000001 on mycluster.somedomain.com_45454
===============================================================================
LogType:stderr
Log Upload Time:Wed Sep 14 09:44:15 -0400 2016
LogLength:0
Log Contents:
End of LogType:stderr
.....truncated output.....
REFERENCE:
The following document describes how to use log aggregation to collect logs for long-running YARN applications.
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_yarn-resource-management/content/ch_log_aggregation.html
... View more
Labels:
04-03-2017
03:51 PM
ISSUE: Sqooping data from hdfs to mysql db works when we run it from command line. sqoop import --connect jdbc:mysql://mysqlserver.somedomain.com:3306/sample --username user1 --password password --table sample_test --target-dir /user/user1/data
When we attempt to run the same in an oozie workflow the job fails with the following error; Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
SOLUTION: Setting the path to the oozie.libpath in the job.properties and copying the database connector jars to this directory resolved this issue.
Original job.properties: oozie.use.system.libpath=True
security_enabled=True
nameNode=hdfs://HD0
credentials={u'hcat': {'xml_name': u'hcat', 'properties': [('hcat.metastore.uri', u'thrift://server.somedomain.com:9083'), ('hcat.metastore.principal', u'hive/server.somedomain.com@SOMEREALM.COM')]}, u'hive2': {'xml_name': u'hive2', 'properties': [('hive2.jdbc.url', 'jdbc:hive2://server.somedomain.com:10000/default'), ('hive2.server.principal', 'hive/server.somedomain.com@SOMEREALM.COM')]}, None: {'xml_name': None, 'properties': []}}
jobTracker=hd0
mapreduce.job.user.name=user1
oozie.wf.application.path=/user/user1/a/workflow.xml
oozie.wf.rerun.failnodes=false
security_enabled=True
user.name=user1
New job.properties: oozie.use.system.libpath=True
oozie.libpath=${nameNode}/user/oozie/share/lib/sqoop
security_enabled=True
nameNode=hdfs://HD0
credentials={u'hcat': {'xml_name': u'hcat', 'properties': [('hcat.metastore.uri', u'thrift://server.somedomain.com:9083'), ('hcat.metastore.principal', u'hive/server.somedomain.com@SOMEREALM.COM')]}, u'hive2': {'xml_name': u'hive2', 'properties': [('hive2.jdbc.url', 'jdbc:hive2://server.somedomain.com:10000/default'), ('hive2.server.principal', 'hive/server.somedomain.com@SOMEREALM.COM')]}, None: {'xml_name': None, 'properties': []}}
jobTracker=hd0
mapreduce.job.user.name=user1
oozie.wf.application.path=/user/user1/a/workflow.xml
oozie.wf.rerun.failnodes=false
security_enabled=True
user.name=user1
After making this change you need to copy the database connector jar file to the same path set for oozie.libpath. hdfs dfs -put /usr/share/java/mysql-connector-java.jar /usr/oozie/share/lib/sqoop/
... View more
Labels:
04-03-2017
03:47 PM
OBSERVATIONS: After navigating to the "Register Version' we see that HDP 2.4 and 2.5 are not registered versions.
We identified some INFO level messages in the logs that tell us there is an issue with one of the xml files read by Ambari during startup;
15 Feb 2017 10:52:15,526 INFO [ambari-client-thread-67] AmbariMetaInfo:1417 - Stack HDP-2.0.6.GlusterFS is not active, skipping VDF
15 Feb 2017 10:52:15,528 INFO [ambari-client-thread-67] AmbariMetaInfo:1415 - Stack HDP-2.5 is not valid, skipping VDF: Could not parse XML /var/lib/ambari-server/resources/stacks/HDP/2.4/services/ARCADIA-ENTERPRISE/configuration/arcadia-config.xml: javax.xml.bind.UnmarshalException
- with linked exception:
[org.xml.sax.SAXParseException; systemId: file:/var/lib/ambari-server/resources/stacks/HDP/2.4/services/ARCADIA-ENTERPRISE/configuration/arcadia-config.xml; lineNumber: 1; columnNumber: 1; Premature end of file.]; Could not parse XML /var/lib/ambari-server/resources/stacks/HDP/2.4/services/ARCADIA-ENTERPRISE.bak/configuration/arcadia-config.xml: javax.xml.bind.UnmarshalException
- with linked exception:
[org.xml.sax.SAXParseException; systemId: file:/var/lib/ambari-server/resources/stacks/HDP/2.4/services/ARCADIA-ENTERPRISE.bak/configuration/arcadia-config.xml; lineNumber: 1; columnNumber: 1; Premature end of file.]; Could not parse XML /var/lib/ambari-server/resources/stacks/HDP/2.4/services/ARCADIA-ENTERPRISE.final.bak/configuration/arcadia-config.xml: javax.xml.bind.UnmarshalException
- with linked exception:
[org.xml.sax.SAXParseException; systemId: file:/var/lib/ambari-server/resources/stacks/HDP/2.4/services/ARCADIA-ENTERPRISE.final.bak/configuration/arcadia-config.xml; lineNumber: 1; columnNumber: 1; Premature end of file.]
15 Feb 2017 10:52:15,528 INFO [ambari-client-thread-67] AmbariMetaInfo:1417 - Stack HDP-2.0.6 is not active, skipping VDF
15 Feb 2017 10:52:15,528 INFO [ambari-client-thread-67] AmbariMetaInfo:1417 - Stack HDP-2.3.ECS is not active, skipping VDF
15 Feb 2017 10:52:15,529 INFO [ambari-client-thread-67] AmbariMetaInfo:1415 - Stack HDP-2.4 is not valid, skipping VDF: Could not parse XML /var/lib/ambari-server/resources/stacks/HDP/2.4/services/ARCADIA-ENTERPRISE/configuration/arcadia-config.xml: javax.xml.bind.UnmarshalException
- with linked exception:
[org.xml.sax.SAXParseException; systemId: file:/var/lib/ambari-server/resources/stacks/HDP/2.4/services/ARCADIA-ENTERPRISE/configuration/arcadia-config.xml; lineNumber: 1; columnNumber: 1; Premature end of file.]; Could not parse XML /var/lib/ambari-server/resources/stacks/HDP/2.4/services/ARCADIA-ENTERPRISE.bak/configuration/arcadia-config.xml: javax.xml.bind.UnmarshalException
- with linked exception:
[org.xml.sax.SAXParseException; systemId: file:/var/lib/ambari-server/resources/stacks/HDP/2.4/services/ARCADIA-ENTERPRISE.bak/configuration/arcadia-config.xml; lineNumber: 1; columnNumber: 1; Premature end of file.]; Could not parse XML /var/lib/ambari-server/resources/stacks/HDP/2.4/services/ARCADIA-ENTERPRISE.final.bak/configuration/arcadia-config.xml: javax.xml.bind.UnmarshalException
- with linked exception:
[org.xml.sax.SAXParseException; systemId: file:/var/lib/ambari-server/resources/stacks/HDP/2.4/services/ARCADIA-ENTERPRISE.final.bak/configuration/arcadia-config.xml; lineNumber: 1; columnNumber: 1; Premature end of file.]
In this case the file /var/lib/ambari-server/resources/stacks/HDP/2.4/services/ARCADIA-ENTERPRISE/configuration/arcadia-config.xml was an empty. This generated an error that Ambari could not recover from and resulted in a failure to register the HDP stack version. As a result we could not add repositories for HDP 2.4 or 2.5.
WORKAROUND: There are two ways to work around this issue.
1. Put at least one construct in the xml file indicated in the log file. For example; <?xml version="1.0"?>
<!-- Some comment ->
This will prevent Ambari from throwing and exception and failing to register the HDP version. You will need to restart Ambari server after you update this file.
2. You can move the empty xml file to another directory and comment the entry out of the metainfo.xml file for the custom service. In this case we moved arcadia-config to /var/tmp. Modified /var/lib/ambari-server/resources/stacks/HDP/2.4/services/ARCADIA-ENTERPRISE/metainfo.xml and commented out the configuration file arcadia-config.xml. You will need to restart Ambari after making this change.
SOLUTION: This issue will be corrected in Ambari 3.0.0 via BUG-75022
... View more
Labels:
07-13-2016
01:57 AM
@Sreeram Chintalapudi This could happen if there is an incorrect RULE in "hadoop.security.auth_to_local" in /etc/hadoop/conf/core-site.xml. You can review/modify the contents by navigating to Ambari -> HDFS -> Configs -> Advanced -> Advanced core-site.xml -> hadoop.security.auth_to_local. It should look similar to this; RULE:[1:$1@$0](ambari-qa-EXAMPLE@EXAMPLE.COM)s/.*/ambari-qa/
RULE:[1:$1@$0](hbase-EXAMPLE@EXAMPLE.COM)s/.*/hbase/
RULE:[1:$1@$0](hdfs-EXAMPLE@EXAMPLE.COM)s/.*/hdfs/
RULE:[1:$1@$0](spark-EXAMPLE@EXAMPLE.COM)s/.*/spark/
RULE:[1:$1@$0](.*@EXAMPLE.COM)s/@.*//
RULE:[1:$1@$0](.*@.*EXAMPLE.COM)s/@.*//
RULE:[1:$1@$0](.*@EXAMPLE.COM)s/@.*//
RULE:[1:$1@$0](.*@EXAMPLE.COM)s/@.*///L
RULE:[2:$1@$0](amshbase@EXAMPLE.COM)s/.*/ams/
RULE:[2:$1@$0](amszk@EXAMPLE.COM)s/.*/ams/
RULE:[2:$1@$0](atlas@EXAMPLE.COM)s/.*/atlas/
RULE:[2:$1@$0](dn@EXAMPLE.COM)s/.*/hdfs/
RULE:[2:$1@$0](falcon@EXAMPLE.COM)s/.*/falcon/
RULE:[2:$1@$0](hbase@EXAMPLE.COM)s/.*/hbase/
RULE:[2:$1@$0](hive@EXAMPLE.COM)s/.*/hive/
RULE:[2:$1@$0](jhs@EXAMPLE.COM)s/.*/mapred/
RULE:[2:$1@$0](jn@EXAMPLE.COM)s/.*/hdfs/
RULE:[2:$1@$0](knox@EXAMPLE.COM)s/.*/knox/
RULE:[2:$1@$0](nm@EXAMPLE.COM)s/.*/yarn/
RULE:[2:$1@$0](nn@EXAMPLE.COM)s/.*/hdfs/
RULE:[2:$1@$0](oozie@EXAMPLE.COM)s/.*/oozie/
RULE:[2:$1@$0](rm@EXAMPLE.COM)s/.*/yarn/
RULE:[2:$1@$0](yarn@EXAMPLE.COM)s/.*/yarn/
DEFAULT If you make any changes to the rules you will need to restart the affected services. Hope this helps, Steve
... View more
01-11-2016
09:35 PM
1 Kudo
Security dept. is asking about the Knox REST API and the protections that are built into the service.
... View more
Labels:
- Labels:
-
Apache Knox
09-29-2015
03:58 PM
1 Kudo
We are using SAS Access on a Windows 7 workstation and attempting to connect to a HDP 2.2.6 kerberos enabled cluster. We have been able to install and configure the client software on the Windows workstation but are not able to connect to the cluster for read/write operations, getting kerberos errors. I do understand that this is not a supported configuration but I am looking for someone that has experience configuring this solution.
... View more
Labels:
- Labels:
-
Apache Hadoop