Member since
06-10-2016
30
Posts
4
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1064 | 01-23-2018 11:27 PM | |
1158 | 10-30-2017 08:23 PM | |
1137 | 02-24-2017 07:15 PM | |
980 | 12-11-2016 11:07 PM | |
3756 | 09-01-2016 09:35 PM |
01-23-2018
11:27 PM
I solved it following this instructions: https://cwiki.apache.org/confluence/display/AMBARI/Cleaning+up+Ambari+Metrics+System+Data
... View more
01-23-2018
11:27 PM
Hello, the host where NameNode and AMS' services run was filled up. I solved it but then AMS Collector doesn't start. This is the AMS Collector's error message: /var/log/ambari-metrics-collector/hbase-ams-master-hw.example.com.out 2018-01-23 10:29:01,077 INFO [main] zookeeper.ZooKeeper: Initiating client connection, connectString=hw.example.com:61181 sessionTimeout=120000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@c540f5a
2018-01-23 10:29:01,095 INFO [main-SendThread(hw.example.com:61181)] zookeeper.ClientCnxn: Opening socket connection to server hw.example.com/10.1.0.12:61181. Will not attempt to authenticate using SASL (unknown error)
2018-01-23 10:29:01,114 WARN [main-SendThread(hw.example.com:61181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2018-01-23 10:29:02,222 INFO [main-SendThread(hw.example.com:61181)] zookeeper.ClientCnxn: Opening socket connection to server hw.example.com/10.1.0.12:61181. Will not attempt to authenticate using SASL (unknown error)
2018-01-23 10:29:02,222 WARN [main-SendThread(hw.example.com:61181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2018-01-23 10:29:02,324 WARN [main] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=hw.example.com:61181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ams-hbase-secure/master
2018-01-23 10:29:02,324 ERROR [main] zookeeper.RecoverableZooKeeper: ZooKeeper getData failed after 1 attempts
2018-01-23 10:29:02,324 WARN [main] zookeeper.ZKUtil: clean znode for master0x0, quorum=hw.example.com:61181, baseZNode=/ams-hbase-secure Unable to get data of znode /ams-hbase-secure/master
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ams-hbase-secure/master
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:354)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:714)
at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:267)
at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:149)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2838)
2018-01-23 10:29:02,325 ERROR [main] zookeeper.ZooKeeperWatcher: clean znode for master0x0, quorum=hw.example.com:61181, baseZNode=/ams-hbase-secure Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ams-hbase-secure/master
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:354)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:714)
at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:267)
at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:149)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2838)
2018-01-23 10:29:02,325 WARN [main] zookeeper.ZooKeeperNodeTracker: Can't get or delete the master znode
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ams-hbase-secure/master
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:354)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:714)
at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:267)
at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:149)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2838)
/var/log/ambari-metrics-collector/ambari-metrics-collector.log 2018-01-23 10:29:01,191 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server hw.example.com/10.1.0.12:61181. Will not attempt to authenticate using SASL (unknown error)
2018-01-23 10:29:01,192 WARN org.apache.zookeeper.ClientCnxn: Session 0x16123a0fb540000 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
2018-01-23 10:29:01,298 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=hw.example.com:61181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = Conn
ectionLoss for /ams-hbase-secure/meta-region-server
2018-01-23 10:29:02,339 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server hw.example.com/10.1.0.12:61181. Will not attempt to authenticate using SASL (unknown error)
2018-01-23 10:29:02,340 WARN org.apache.zookeeper.ClientCnxn: Session 0x16123a0fb540000 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) There aren't services listen on ports 6188 and 61181. I've configured the HBase's ticktime "hbase.zookeeper.property.tickTime = 6000". Thanks in advance.
... View more
Labels:
11-24-2017
03:55 PM
I just added new DataNodes in my cluster but one of them isn't live. The DataNode's log is: 2017-11-24 10:18:57,761 WARN datanode.DataNode (BPServiceActor.java:retrieveNamespaceInfo(227)) - Problem connecting to server: namenode.example.com/192.168.0.2:8020
2017-11-24 10:19:18,785 INFO ipc.Client (Client.java:handleConnectionFailure(906)) - Retrying connect to server: namenode.example.com/192.168.0.2:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
Referrer to basic issues: The /etc/hosts file in each device has the IP of all hosts IPv6 is disabled on the interface dedicated to Hadoop firewalld is stopped SELinux is disabled I can ping in both directions So, I restarted the DataNode but the problem persists, see the startup logging: 2017-11-24 10:25:34,053 INFO ipc.Server (Server.java:run(821)) - Starting Socket Reader #1 for port 8010
2017-11-24 10:25:34,115 INFO datanode.DataNode (DataNode.java:initIpcServer(941)) - Opened IPC server at /0.0.0.0:8010
2017-11-24 10:25:34,155 INFO datanode.DataNode (BlockPoolManager.java:refreshNamenodes(152)) - Refresh request received for nameservices: null
2017-11-24 10:25:34,171 INFO datanode.DataNode (BlockPoolManager.java:doRefreshNamenodes(201)) - Starting BPOfferServices for nameservices: <default>
2017-11-24 10:25:34,179 INFO datanode.DataNode (BPServiceActor.java:run(761)) - Block pool <registering> (Datanode Uuid unassigned) service to namenode.example.com/192.168.0.2:8020 starting to offer service
2017-11-24 10:25:34,183 INFO ipc.Server (Server.java:run(1064)) - IPC Server Responder: starting
2017-11-24 10:25:34,183 INFO ipc.Server (Server.java:run(900)) - IPC Server listener on 8010: starting
2017-11-24 10:25:50,309 INFO ipc.Client (Client.java:handleConnectionFailure(906)) - Retrying connect to server: namenode.example.com/192.168.0.2:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
What should I do? Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Hadoop
10-30-2017
08:23 PM
I solved it with a comment on Spark2, Config, Advanced spark2-env. After that, I restarted Spark2 and their clients and the new configuration files were deployed.
... View more
10-26-2017
05:29 PM
Hi @Aditya Sirna, The directory /usr/hdp/2.6.2.0-205/spark2/conf/ is empty but I have installed spark2_2_6_2_0_205-python-2.1.1.2.6.2.0-205.noarch spark2_2_6_2_0_205-2.1.1.2.6.2.0-205.noarch
... View more
10-26-2017
04:20 PM
Hi, I just installed Spark2 from Ambari wizard and the Spark2's configuration directory is empty: > ls -l /etc/spark2/2.6.2.0-205/0
total 0
The installation output is: 14:39:22,875 - Backing up /etc/spark2/conf to /etc/spark2/conf.backup if destination doesn't exist already.
14:39:22,875 - Execute[('cp', '-R', '-p', '/etc/spark2/conf', '/etc/spark2/conf.backup')] {'not_if': 'test -e /etc/spark2/conf.backup', 'sudo': True}
14:39:22,897 - Checking if need to create versioned conf dir /etc/spark2/2.6.2.0-205/0
14:39:22,900 - call[('ambari-python-wrap', u'/usr/bin/conf-select', 'dry-run-create', '--package', 'spark2', '--stack-version', u'2.6.2.0-205', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False, 'stderr': -1}
14:39:22,940 - call returned (0, '/etc/spark2/2.6.2.0-205/0', '')
14:39:22,941 - Package spark2 will have new conf directories: /etc/spark2/2.6.2.0-205/0
14:39:22,946 - Checking if need to create versioned conf dir /etc/spark2/2.6.2.0-205/0
14:39:22,952 - call[('ambari-python-wrap', u'/usr/bin/conf-select', 'create-conf-dir', '--package', 'spark2', '--stack-version', u'2.6.2.0-205', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False, 'stderr': -1}
14:39:22,987 - call returned (1, '/etc/spark2/2.6.2.0-205/0 exist already', '')
14:39:22,988 - checked_call[('ambari-python-wrap', u'/usr/bin/conf-select', 'set-conf-dir', '--package', 'spark2', '--stack-version', u'2.6.2.0-205', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False}
14:39:23,022 - checked_call returned (0, '/usr/hdp/2.6.2.0-205/spark2/conf -> /etc/spark2/2.6.2.0-205/0')
14:39:23,023 - Ensuring that spark2 has the correct symlink structure
14:39:23,024 - Execute[('cp', '-R', '-p', '/etc/spark2/conf', '/etc/spark2/conf.backup')] {'not_if': 'test -e /etc/spark2/conf.backup', 'sudo': True}
14:39:23,033 - Skipping Execute[('cp', '-R', '-p', '/etc/spark2/conf', '/etc/spark2/conf.backup')] due to not_if
14:39:23,034 - Directory['/etc/spark2/conf'] {'action': ['delete']}
14:39:23,034 - Removing directory Directory['/etc/spark2/conf'] and all its content
14:39:23,035 - Link['/etc/spark2/conf'] {'to': '/etc/spark2/conf.backup'}
14:39:23,035 - Creating symbolic Link['/etc/spark2/conf'] to /etc/spark2/conf.backup
14:39:23,036 - Link['/etc/spark2/conf'] {'action': ['delete']}
14:39:23,036 - Deleting Link['/etc/spark2/conf']
14:39:23,037 - Link['/etc/spark2/conf'] {'to': '/usr/hdp/current/spark2-client/conf'}
14:39:23,037 - Creating symbolic Link['/etc/spark2/conf'] to /usr/hdp/current/spark2-client/conf
14:39:23,037 - /etc/hive/conf is already linked to /etc/hive/2.6.2.0-205/0 I'm using Ambari version 2.5.2.0, HDP version 2.6.2.0-205 and Spark2 version 2.1.1. Do you know what happened? There is a way to install again the Spark2 configuration? Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Spark
10-03-2017
03:20 PM
I'm upgrading from HDP 2.4.2.0 to 2.6.2.0. All tasks from Ambari wizard were performed OK but Spark fails. I got this error message while restarting Spark Thrift Server Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/SPARK/1.2.1/package/scripts/spark_thrift_server.py", line 87, in <module>
SparkThriftServer().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 329, in execute
method(env)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 830, in restart
self.stop(env, upgrade_type=upgrade_type)
File "/var/lib/ambari-agent/cache/common-services/SPARK/1.2.1/package/scripts/spark_thrift_server.py", line 57, in stop
import params
File "/var/lib/ambari-agent/cache/common-services/SPARK/1.2.1/package/scripts/params.py", line 262, in <module>
livy_principal = livy_kerberos_principal.replace('_HOST', config['hostname'].lower())
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/config_dictionary.py", line 73, in __getattr__
raise Fail("Configuration parameter '" + self.name + "' was not found in configurations dictionary!")
resource_management.core.exceptions.Fail: Configuration parameter 'livy.server.launch.kerberos.principal' was not found in configurations dictionary! I just added livy.server.launch.kerberos.principal in /etc/livy/conf/livy-defaults.conf but doesn't work. What should I do? Thanks in advance!
... View more
Labels:
- Labels:
-
Apache Spark
09-17-2017
04:20 AM
Hello, following the documentation for upgrading to Ambari 2.5.2 I stuck on this line Record the location of the Metrics Collector component before you begin the upgrade process. What does it means? Does it refers to the path of the Metrics Collector's database? Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Ambari
06-26-2017
10:37 PM
I need to manage three queues in YARN: Production (PROD), Development (DEV) and Researching (LABS). PROD process will require 50% of the cluster resources only two days per month. And DEV and LABS requieres the 50% of resource for each during the entire month. I want to have running DEV/LABS process with 50% for each one and for two days redistribute the consumption to have PROD: 50%, DEV: 25%, and LABS: 25%. Do you have an idea how to have it? Thanks in advance.
... View more
Labels:
- Labels:
-
Apache YARN
03-30-2017
07:57 PM
I tried in YARN API and I got this error message [yarn@foo ~]$ curl -v -X PUT -d '{"state": "KILLED"}' 'http://foo.example.com:8088/ws/v1/cluster/apps/application_1487024494103_0099'
* About to connect() to foo.example.com port 8088 (#0)
* Trying 192.168.1.1...
* Connected to foo.example.com (192.168.1.1) port 8088 (#0)
> PUT /ws/v1/cluster/apps/application_1487024494103_0099 HTTP/1.1
> User-Agent: curl/7.29.0
> Host: foo.example.com:8088
> Accept: */*
> Content-Length: 19
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 19 out of 19 bytes
< HTTP/1.1 500 Internal Server Error
< Cache-Control: no-cache
< Expires: Thu, 30 Mar 2017 19:51:36 GMT
< Date: Thu, 30 Mar 2017 19:51:36 GMT
< Pragma: no-cache
< Expires: Thu, 30 Mar 2017 19:51:36 GMT
< Date: Thu, 30 Mar 2017 19:51:36 GMT
< Pragma: no-cache
< Content-Type: application/json
< Transfer-Encoding: chunked
< Server: Jetty(6.1.26.hwx)
<
* Connection #0 to host foo.example.com left intact
{"RemoteException":{"exception":"WebApplicationException","javaClassName":"javax.ws.rs.WebApplicationException"}}
... View more
03-16-2017
02:47 PM
I'm trying to kill an application in YARN but I get the message "Waiting for application ID to be killed". There is a way to kill it fast? Thanks in advance.
... View more
- Tags:
- Hadoop Core
- Spark
- YARN
Labels:
- Labels:
-
Apache Spark
-
Apache YARN
02-24-2017
07:15 PM
I found the problem: the device that was filled up has this file /var/lib/ambari-agent/data/structured-out-status.json that differs with the other nodes. I following this steps as root rm -f /var/lib/ambari-agent/data/structured-out-status.json
ambari-agent restart
And I deleted the PID files in /var/run for applications that aren't responding for restart (like Zookeeper and Ambari Metrics Collector). After that Ambari shows the process down. So I started them and now it works correctly.
... View more
02-24-2017
06:24 PM
An application filled up the HDD and after the cleaning the log is corrupted (there are the last five lines) 2017/02/24 05:30:15 [I] Completed XXX.XXX.XXX.XXX - "GET / HTTP/1.1" 500 Internal Server Error 2528 bytes in 26900us
2017/02/24 05:31:15 [I] Completed XXX.XXX.XXX.XXX - "GET / HTTP/1.1" 500 Internal Server Error 2528 bytes in 14789us
2017/02/24 05:32:15 [I] Completed XXX.XXX.XXX.XXX - "GET / HTTP/1.1" 500 Internal Server Error 2528 bytes in 20252us
2017/02/24 05:33:15 [I] Completed XXX.XXX.XXX.XXX - "GET / HTTP/1.1" 500 Internal Server Error 2528 bytes in 16111us
2017/02
... View more
02-24-2017
05:54 PM
Following this steps for restarting Ambari Metrics I'm sticking on stopping Grafana. And the background operation list Should I kill it manually? Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Ambari
12-11-2016
11:07 PM
The problem was a previous Zeppelin installation from Ambari (v0.6.0) that was in maintained mode but wasn't uninstalled. So, when Zeppelin v0.6.1 starts up, it loads a variable environment called CLASSPATH with a wrong classpath (because I uses Spark 2.11). I solved it adding this line at top in file ${HOME}/zeppelin-0.6.1/bin/common.sh unset CLASSPATH
... View more
12-07-2016
11:47 PM
On HDP 2.4 I've installed Zeppelin 0.6.1 with Spark interpreter built with Scala 2.10. (Spark version is 1.6.1.) All interpreters work well but the Spark interpreter fails. The error in log message is: INFO [2016-12-05 13:25:35,638] ({pool-2-thread-4} SchedulerFactory.java[jobStarted]:131) - Job remoteInterpretJob_1480965935638 started by scheduler org.apache.zeppelin.spark.SparkInterpreter1640235141
ERROR [2016-12-05 13:25:35,650] ({pool-2-thread-4} Job.java[run]:189) - Job failed
java.lang.IncompatibleClassChangeError: Implementing class
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.zeppelin.spark.Utils.isScala2_10(Utils.java:88)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:570)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:341)
at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
INFO [2016-12-05 13:25:35,651] ({pool-2-thread-4} SchedulerFactory.java[jobFinished]:137) - Job remoteInterpretJob_1480965935638 finished by scheduler org.apache.zeppelin.spark.SparkInterpreter1640235141 In zeppelin-env.sh file the environment variables are export MASTER=yarn-client
export HADOOP_CONF_DIR="/etc/hadoop/conf"
export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.4.2.0-258 -Dspark.yarn.queue=default"
export SPARK_HOME="/usr/hdp/current/spark-client"
export PYTHONPATH="${SPARK_HOME}/python:${SPARK_HOME}/python/lib/py4j-0.8.2.1-src.zip"
export SPARK_YARN_USER_ENV="PYTHONPATH=${PYTHONPATH}"
Do you have any idea on how to correct this error? Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache Zeppelin
09-01-2016
09:35 PM
3 Kudos
I solved the issue: in the file `/etc/hosts` the short hostname is before the long 192.168.1.3 datanode datanode.example.com I switched the order 192.168.1.3 datanode.example.com datanode
... View more
09-01-2016
08:46 PM
I can't start one of my DN (rest of them are running) 2016-09-01 16:35:37,489 ERROR datanode.DataNode (DataNode.java:secureMain(2545)) - Exception in secureMain
java.io.IOException: Login failure for dn/datanode@EXAMPLE.COM from keytab /etc/security/keytabs/dn.service.keytab: javax.security.auth.login.LoginException: Unable to obtain password from user - File permissions -r--------. 1 hdfs hadoop 408 Sep 1 15:36 /etc/security/keytabs/dn.service.keytab - File content Keytab name: FILE:/etc/security/keytabs/dn.service.keytab
KVNO Timestamp Principal
---- ------------------- ------------------------------------------------------
1 09/01/2016 15:36:21 dn/datanode.example.com@EXAMPLE.COM
1 09/01/2016 15:36:21 dn/datanode.example.com@EXAMPLE.COM
1 09/01/2016 15:36:21 dn/datanode.example.com@EXAMPLE.COM
1 09/01/2016 15:36:21 dn/datanode.example.com@EXAMPLE.COM
1 09/01/2016 15:36:21 dn/datanode.example.com@EXAMPLE.COM Also, I found that KDC has the principal `dn/datanode.example.com@EXAMPLE.COM` and not `dn/datanode@EXAMPLE.COM`. And this command works: kinit -kt /etc/security/keytabs/dn.service.keytab dn/datanode.example.com@EXAMPLE.COM So, why is HDFS using a wrong principal? Should I regenerate the Kerberos keys from Ambari UI? Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Hadoop
07-19-2016
06:16 AM
I enabled Kerberos authentication for HDFS. NameNode and SNameNode are running and quering it through kerberos is OK. The issue is for the DataNode, I have this error message java.lang.RuntimeException: Cannot start secure DataNode without configuring either privileged resources or SASL RPC data transfer protection and SSL for HTTP. Using privileged resources in combination with SASL RPC data transfer protection is not supported.
at org.apache.hadoop.hdfs.server.datanode.DataNode.checkSecureConfig(DataNode.java:1217)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1103)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:432)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2423)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2310)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2357)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2538)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2562)
2016-07-19 03:03:24,433 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
2016-07-19 03:03:24,434 INFO datanode.DataNode (LogAdapter.java:info(47)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at datanode.domain.com/192.168.1.3
************************************************************
This is my DataNode configuration (hdfs-site.xml) <!-- DataNode security config -->
<property>
<name>dfs.datanode.keytab.file</name>
<value>/path/to/hdfs.keytab</value>
</property>
<property>
<name>dfs.datanode.kerberos.principal</name>
<value>hadoop/kerberos.domain.com@DOMAIN.COM</value>
</property>
<property>
<name>dfs.datanode.address</name>
<value>0.0.0.0:1004</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:1006</value>
</property> Following this answer I use an user called "ambari" with sudo for deploying HDP and Ambari Agent is running by root. Package JSVC is installed. Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Hadoop
06-10-2016
09:44 PM
1 Kudo
In "Cluster install wizard", "Review" step, when I hit "Deploy" button I get the error message 500 status codereceived on DELETE method for API: /api/v1/clusters/mycluster
Error message: Server error The ambari-server log is 10 Jun 2016 18:31:19,317 ERROR [pool-3-thread-1] AmbariJpaLocalTxnInterceptor:180 - [DETAILED ERROR] Rollback reason:
Local Exception Stack:
Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.6.2.v20151217-774c696): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: DELETE command denied to user 'ambari'@'localhost' for table 'alert_notice'
Error Code: 1142
Call: DELETE FROM alert_notice WHERE (history_id IN (?))
bind => [1 parameter bound]
Query: DeleteAllQuery(name="AlertNoticeEntity.removeByHistoryIds" referenceClass=AlertNoticeEntity sql="DELETE FROM alert_notice WHERE (history_id IN ?)") And the ambari user privileges on MySQL +--------------+--------+-------------------------------------------+------------+------------+
| host | user | password | Grant_priv | Super_priv |
+--------------+--------+-------------------------------------------+------------+------------+
| localhost | ambari | *AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA | Y | Y |
| 192.168.1.2 | ambari | *AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA | Y | Y |
| 127.0.0.1 | ambari | *AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA | Y | Y |
+--------------+--------+-------------------------------------------+------------+------------+ Also, ambari user has granted privileges on `ambari.*`. I running ambari-server (2.2.2.0) on Debian 7.11 (Wheezy), and MySQL server version is 5.5.49. Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Ambari