Member since
02-23-2016
48
Posts
7
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1899 | 10-27-2016 05:49 AM | |
12676 | 08-25-2016 07:02 AM | |
2594 | 04-22-2016 04:31 AM | |
1372 | 04-22-2016 03:53 AM | |
11880 | 03-01-2016 10:38 PM |
11-08-2016
06:33 AM
Hi Sagar, thank you for your hints but I can't test it because my Cluster is destroyed. 🙂 Klaus
... View more
10-28-2016
06:56 AM
Hello; the client Installation process failed with this error on all nodes: stderr: /var/lib/ambari-agent/data/errors-3460.txt
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/OOZIE/4.0.0.2.0/package/scripts/oozie_client.py", line 75, in <module>
OozieClient().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute
method(env)
File "/var/lib/ambari-agent/cache/common-services/OOZIE/4.0.0.2.0/package/scripts/oozie_client.py", line 36, in install
self.install_packages(env)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 404, in install_packages
Package(name)
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 158, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 121, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 49, in action_install
self.install_package(package_name, self.resource.use_repos, self.resource.skip_repos)
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/apt.py", line 53, in wrapper
return function_to_decorate(self, name, *args[2:])
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/apt.py", line 97, in install_package
self.checked_call_until_not_locked(cmd, sudo=True, env=INSTALL_CMD_ENV, logoutput=self.get_logoutput())
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 72, in checked_call_until_not_locked
return self.wait_until_not_locked(cmd, is_checked=True, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 80, in wait_until_not_locked
code, out = func(cmd, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call
tries=tries, try_sleep=try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call
raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of '/usr/bin/apt-get -q -o Dpkg::Options::=--force-confdef --allow-unauthenticated --assume-yes install 'oozie-2-3-.*'' returned 100. Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package oozie-2-3-.*
E: Couldn't find any package by regex 'oozie-2-3-.*' stdout: /var/lib/ambari-agent/data/output-3460.txt
2016-10-28 08:03:58,249 - The hadoop conf dir /usr/hdp/current/hadoop-client/conf exists, will call conf-select on it for version 2.4.0.0-169
2016-10-28 08:03:58,250 - Checking if need to create versioned conf dir /etc/hadoop/2.4.0.0-169/0
2016-10-28 08:03:58,250 - call['conf-select create-conf-dir --package hadoop --stack-version 2.4.0.0-169 --conf-version 0'] {'logoutput': False, 'sudo': True, 'quiet': False, 'stderr': -1}
2016-10-28 08:03:58,271 - call returned (1, '/etc/hadoop/2.4.0.0-169/0 exist already', '')
2016-10-28 08:03:58,271 - checked_call['conf-select set-conf-dir --package hadoop --stack-version 2.4.0.0-169 --conf-version 0'] {'logoutput': False, 'sudo': True, 'quiet': False}
2016-10-28 08:03:58,295 - checked_call returned (0, '/usr/hdp/2.4.0.0-169/hadoop/conf -> /etc/hadoop/2.4.0.0-169/0')
2016-10-28 08:03:58,295 - Ensuring that hadoop has the correct symlink structure
2016-10-28 08:03:58,295 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
2016-10-28 08:03:58,296 - Group['spark'] {}
2016-10-28 08:03:58,297 - Group['hadoop'] {}
2016-10-28 08:03:58,297 - Group['users'] {}
2016-10-28 08:03:58,297 - User['hive'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-10-28 08:03:58,298 - User['zookeeper'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-10-28 08:03:58,298 - User['oozie'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'users']}
2016-10-28 08:03:58,299 - User['ams'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-10-28 08:03:58,299 - User['tez'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'users']}
2016-10-28 08:03:58,300 - User['accumulo'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-10-28 08:03:58,300 - User['spark'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-10-28 08:03:58,301 - User['ambari-qa'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'users']}
2016-10-28 08:03:58,301 - User['flume'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-10-28 08:03:58,302 - User['kafka'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-10-28 08:03:58,303 - User['hdfs'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-10-28 08:03:58,303 - User['yarn'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-10-28 08:03:58,304 - User['mapred'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-10-28 08:03:58,304 - User['hbase'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-10-28 08:03:58,305 - User['hcat'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-10-28 08:03:58,305 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2016-10-28 08:03:58,306 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'}
2016-10-28 08:03:58,316 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] due to not_if
2016-10-28 08:03:58,317 - Directory['/tmp/hbase-hbase'] {'owner': 'hbase', 'recursive': True, 'mode': 0775, 'cd_access': 'a'}
2016-10-28 08:03:58,317 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2016-10-28 08:03:58,318 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase'] {'not_if': '(test $(id -u hbase) -gt 1000) || (false)'}
2016-10-28 08:03:58,322 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase'] due to not_if
2016-10-28 08:03:58,322 - Group['hdfs'] {}
2016-10-28 08:03:58,322 - User['hdfs'] {'fetch_nonlocal_groups': True, 'groups': [u'hadoop', u'hdfs']}
2016-10-28 08:03:58,323 - Directory['/etc/hadoop'] {'mode': 0755}
2016-10-28 08:03:58,334 - File['/usr/hdp/current/hadoop-client/conf/hadoop-env.sh'] {'content': InlineTemplate(...), 'owner': 'hdfs', 'group': 'hadoop'}
2016-10-28 08:03:58,335 - Directory['/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir'] {'owner': 'hdfs', 'group': 'hadoop', 'mode': 0777}
2016-10-28 08:03:58,350 - Repository['HDP-2.4'] {'base_url': 'http://public-repo-1.hortonworks.com/HDP/ubuntu14/2.x/updates/2.4.0.0', 'action': ['create'], 'components': [u'HDP', 'main'], 'repo_template': '{{package_type}} {{base_url}} {{components}}', 'repo_file_name': 'HDP', 'mirror_list': None}
2016-10-28 08:03:58,354 - File['/tmp/tmpgzSjgM'] {'content': 'deb http://public-repo-1.hortonworks.com/HDP/ubuntu14/2.x/updates/2.4.0.0 HDP main'}
2016-10-28 08:03:58,355 - Writing File['/tmp/tmpgzSjgM'] because contents don't match
2016-10-28 08:03:58,355 - File['/tmp/tmpKdatW3'] {'content': StaticFile('/etc/apt/sources.list.d/HDP.list')}
2016-10-28 08:03:58,356 - Writing File['/tmp/tmpKdatW3'] because contents don't match
2016-10-28 08:03:58,358 - Repository['HDP-UTILS-1.1.0.20'] {'base_url': 'http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.20/repos/ubuntu12', 'action': ['create'], 'components': [u'HDP-UTILS', 'main'], 'repo_template': '{{package_type}} {{base_url}} {{components}}', 'repo_file_name': 'HDP-UTILS', 'mirror_list': None}
2016-10-28 08:03:58,360 - File['/tmp/tmpgWh4hC'] {'content': 'deb http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.20/repos/ubuntu12 HDP-UTILS main'}
2016-10-28 08:03:58,360 - Writing File['/tmp/tmpgWh4hC'] because contents don't match
2016-10-28 08:03:58,360 - File['/tmp/tmpy240ZL'] {'content': StaticFile('/etc/apt/sources.list.d/HDP-UTILS.list')}
2016-10-28 08:03:58,360 - Writing File['/tmp/tmpy240ZL'] because contents don't match
2016-10-28 08:03:58,362 - Package['unzip'] {}
2016-10-28 08:03:58,382 - Skipping installation of existing package unzip
2016-10-28 08:03:58,383 - Package['curl'] {}
2016-10-28 08:03:58,402 - Skipping installation of existing package curl
2016-10-28 08:03:58,402 - Package['hdp-select'] {}
2016-10-28 08:03:58,423 - Skipping installation of existing package hdp-select
2016-10-28 08:03:58,695 - Package['zip'] {}
2016-10-28 08:03:58,719 - Skipping installation of existing package zip
2016-10-28 08:03:58,720 - Package['mysql-connector-java'] {}
2016-10-28 08:03:58,739 - Skipping installation of existing package mysql-connector-java
2016-10-28 08:03:58,739 - Package['extjs'] {}
2016-10-28 08:03:58,759 - Skipping installation of existing package extjs
2016-10-28 08:03:58,759 - Package['oozie-2-3-.*'] {}
2016-10-28 08:03:58,779 - Installing package oozie-2-3-.* ('/usr/bin/apt-get -q -o Dpkg::Options::=--force-confdef --allow-unauthenticated --assume-yes install 'oozie-2-3-.*'')
2016-10-28 08:03:59,244 - Execution of '['/usr/bin/apt-get', '-q', '-o', 'Dpkg::Options::=--force-confdef', '--allow-unauthenticated', '--assume-yes', 'install', u'oozie-2-3-.*']' returned 100. Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package oozie-2-3-.*
E: Couldn't find any package by regex 'oozie-2-3-.*'
2016-10-28 08:03:59,245 - Failed to install package oozie-2-3-.*. Executing `/usr/bin/apt-get update -qq`
2016-10-28 08:04:32,864 - Retrying to install package oozie-2-3-.* Doing this manually the version 2.4.0.0.169 is available:: # apt-get install oozie\*
Reading package lists... Done
Building dependency tree
Reading state information... Done
Note, selecting 'oozie-2-4-0-0-169' for regex 'oozie*'
Note, selecting 'oozie-server' for regex 'oozie*'
Note, selecting 'oozie-2-4-0-0-169-server' for regex 'oozie*'
Note, selecting 'oozie-client' for regex 'oozie*'
Note, selecting 'oozie' for regex 'oozie*'
Note, selecting 'oozie-2-4-0-0-169-client' for regex 'oozie*'
The following extra packages will be installed:
bigtop-tomcat
The following NEW packages will be installed:
bigtop-tomcat oozie oozie-2-4-0-0-169 oozie-2-4-0-0-169-client
oozie-2-4-0-0-169-server oozie-client oozie-server
0 upgraded, 7 newly installed, 0 to remove and 0 not upgraded.
Need to get 672 MB of archives.
After this operation, 789 MB of additional disk space will be used.
Do you want to continue? [Y/n] n
Abort. How can I tell Ambari to use the actual version? 🙂 Klaus
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Oozie
10-27-2016
05:49 AM
Hi Josh, I found in the Tracer log file: 2016-10-27 07:23:48,988 [start.Main] ERROR: Thread 'tracer' died.
org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /tracers/trace-
at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
at org.apache.accumulo.fate.zookeeper.ZooUtil.putEphemeralSequential(ZooUtil.java:463)
at org.apache.accumulo.fate.zookeeper.ZooReaderWriter.putEphemeralSequential(ZooReaderWriter.java:99)
at org.apache.accumulo.tracer.TraceServer.registerInZooKeeper(TraceServer.java:297)
at org.apache.accumulo.tracer.TraceServer.<init>(TraceServer.java:235)
at org.apache.accumulo.tracer.TraceServer.main(TraceServer.java:339)
at org.apache.accumulo.tracer.TracerExecutable.execute(TracerExecutable.java:33)
at org.apache.accumulo.start.Main$1.run(Main.java:93)
at java.lang.Thread.run(Thread.java:745) After deleting the Tracer Zookeeper directory (rmr /tracers) the Tracer process had no problems to start. Many thanks for your support. 🙂 Klaus
... View more
10-26-2016
01:15 PM
Hello, I have a fresh installation of Accumulo and my problem is that the Tracer process terminated with: 2016-10-26 14:56:50,314 [start.Main] ERROR: Thread 'tracer' died.
org.apache.accumulo.core.client.AccumuloException: Internal error processing waitForFateOperation
at org.apache.accumulo.core.client.impl.TableOperationsImpl.doFateOperation(TableOperationsImpl.java:303)
at org.apache.accumulo.core.client.impl.TableOperationsImpl.doFateOperation(TableOperationsImpl.java:261)
at org.apache.accumulo.core.client.impl.TableOperationsImpl.doTableFateOperation(TableOperationsImpl.java:1427)
at org.apache.accumulo.core.client.impl.TableOperationsImpl.create(TableOperationsImpl.java:188)
at org.apache.accumulo.core.client.impl.TableOperationsImpl.create(TableOperationsImpl.java:155)
at org.apache.accumulo.tracer.TraceServer.<init>(TraceServer.java:211)
at org.apache.accumulo.tracer.TraceServer.main(TraceServer.java:339)
at org.apache.accumulo.tracer.TracerExecutable.execute(TracerExecutable.java:33)
at org.apache.accumulo.start.Main$1.run(Main.java:93) No idea why. Could someone help please? 🙂 Klaus
... View more
Labels:
- Labels:
-
Apache Accumulo
08-25-2016
07:02 AM
Hi Robert, I have no logs from TaskManagers in the log dir. I played a bit with the heap.mb size of the Taskmangers and entering 4096 for it, the taskmangers started. Thanks for your interest to help. 🙂 Klaus
... View more
08-24-2016
03:18 PM
Hello, I have a similar issue as discussed here.These are the settings: I see no TaskManagers. The overview shows:
0
Task Managers
0
Task Slots
0
Available Task Slots Running the example word count job I receive /usr/apache/flink-1.1.1/bin# /usr/apache/flink-1.1.1/bin/flink run /usr/apache/flink-1.1.1/examples/streaming/WordCount.jar
Cluster configuration: Standalone cluster with JobManager at dedcm4229/10.79.210.78:6130
Using address dedcm4229:6130 to connect to JobManager.
JobManager web interface address http://dedcm4229:8081
Starting execution of program
Executing WordCount example with default input data set.
Use --input to specify file input.
Printing result to stdout. Use --output to specify output path.
Submitting job with JobID: 47fee79c80eba58333eec5c3c3ee1cf0. Waiting for job completion.
08/24/2016 16:32:07 Job execution switched to status RUNNING.
08/24/2016 16:32:07 Source: Collection Source -> Flat Map(1/1) switched to SCHEDULED
08/24/2016 16:32:07 Job execution switched to status FAILING.
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Task to schedule: < Attempt #0 (Source: Collection Source -> Flat Map (1/1)) @ (unassigned) - [SCHEDULED] > with groupID < 963af48f2c5d35ff2fcaa1bc235543a7 > in sharing group < SlotSharingGroup [7168183d09cf33bacf5ac595e608bd87, 963af48f2c5d35ff2fcaa1bc235543a7] >. Resources available to scheduler: Number of instances=0, total number of slots=0, available slots=0
at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:256)
at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleImmediately(Scheduler.java:131)
at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:306)
at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:454)
at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:326)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:741)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1332)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1291)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1291)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
08/24/2016 16:32:07 Source: Collection Source -> Flat Map(1/1) switched to CANCELED
08/24/2016 16:32:07 Keyed Aggregation -> Sink: Unnamed(1/1) switched to CANCELED
08/24/2016 16:32:07 Job execution switched to status FAILED.
------------------------------------------------------------
The program finished with the following exception:
org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Job execution failed.
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:413)
at org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:92)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:389)
at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:68)
at org.apache.flink.streaming.examples.wordcount.WordCount.main(WordCount.java:93)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:509)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:403)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:331)
at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:777)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:253)
at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1005)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1048)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply$mcV$sp(JobManager.scala:822)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply(JobManager.scala:768)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply(JobManager.scala:768)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Task to schedule: < Attempt #0 (Source: Collection Source -> Flat Map (1/1)) @ (unassigned) - [SCHEDULED] > with groupID < 963af48f2c5d35ff2fcaa1bc235543a7 > in sharing group < SlotSharingGroup [7168183d09cf33bacf5ac595e608bd87, 963af48f2c5d35ff2fcaa1bc235543a7] >. Resources available to scheduler: Number of instances=0, total number of slots=0, available slots=0
at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:256)
at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleImmediately(Scheduler.java:131)
at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:306)
at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:454)
at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:326)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:741)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1332)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1291)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1291)
... 9 more
Could someone have a look into this log above and give advice to fix this issue please? 🙂 Klaus
... View more
Labels:
- Labels:
-
Apache Flink
08-24-2016
09:39 AM
At my site this will work ACCUMULO_CONF_DIR=/etc/accumulo/conf/server accumulo init After init no further issues found. Many Thanks for your detailed help 🙂 Klaus
... View more
08-23-2016
11:18 AM
Additional I've done: tables -l
accumulo.metadata => !0
accumulo.replication => +rep
accumulo.root => +r
trace => 1
CheckTables. Scanning stucks. /usr/bin/accumulo admin checkTablets
2016-08-23 12:19:18,521 [fs.VolumeManagerImpl] WARN : dfs.datanode.synconclose set to false in hdfs-site.xml: data loss is possible on hard system reset or power loss
*** Looking for offline tablets ***
Scanning zookeeper
+r<<@(null,de-hd-cluster.data-node3.com:9997[25669407cc8000b],de-hd-cluster.data-node3.com:9997[25669407cc8000b]) is ASSIGNED_TO_DEAD_SERVER #walogs:1
*** Looking for missing files ***
Scanning : accumulo.root (-inf,~ : [] 9223372036854775807 false)
Stats told me /usr/bin/accumulo org.apache.accumulo.test.GetMasterStats
2016-08-23 11:15:21,623 [fs.VolumeManagerImpl] WARN : dfs.datanode.synconclose set to false in hdfs-site.xml: data loss is possible on hard system reset or power loss
State: NORMAL
Goal State: NORMAL
Unassigned tablets: 1
Dead tablet servers count: 0
Tablet Servers
Name: de-hd-cluster.data-node3.com:9997
Ingest: 0.00
Last Contact: 1471943720583
OS Load Average: 0.12
Queries: 0.00
Time Difference: 1.3
Total Records: 0
Lookups: 0
Recoveries: 0
🙂 Klaus
... View more
08-23-2016
08:18 AM
Hello Josh, thanks for your quick reply. I thought that the peaks in the memory usage has something to do with table issue. On the Accumulo monitor page I see now: In recent logs I see only this warning: [fs.VolumeManagerImpl] WARN : dfs.datanode.synconclose set to false in
hdfs-site.xml: data loss is possible on hard system reset or power loss After a restart I see: 2016-08-23 09:19:30,318 [replication.WorkDriver] DEBUG: Sleeping 30000 ms before next work assignment
2016-08-23 09:19:36,776 [master.Master] DEBUG: Finished gathering information from 1 servers in 0.00 seconds
2016-08-23 09:19:36,776 [master.Master] DEBUG: not balancing because there are unhosted tablets: 1
2016-08-23 09:19:43,087 [recovery.RecoveryManager] DEBUG: Unable to initate log sort for hdfs://de-hd-cluster.name-node.com:8020/apps/accumulo/data/wal/de-hd-cluster.data-node3.com+9997/91ece971-7485-4acf-aa7f-dcde00fafce9: java.io.FileNotFoundException: File does not exist: /apps/accumulo/data/wal/de-hd-cluster.data-node3.com+9997/91ece971-7485-4acf-aa7f-dcde00fafce9
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLease(FSNamesystem.java:2835)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.recoverLease(NameNodeRpcServer.java:733)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.recoverLease(ClientNamenodeProtocolServerSideTranslatorPB.java:663)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
2016-08-23 09:19:43,611 [state.ZooTabletStateStore] DEBUG: root tablet logSet [hdfs://de-hd-cluster.name-node.com:8020/apps/accumulo/data/wal/de-hd-cluster.data-node3.com+9997/91ece971-7485-4acf-aa7f-dcde00fafce9]
2016-08-23 09:19:43,611 [state.ZooTabletStateStore] DEBUG: Returning root tablet state: +r<<@(null,de-hd-cluster.data-node3.com:9997[25669407cc8000b],de-hd-cluster.data-node3.com:9997[25669407cc8000b])
2016-08-23 09:19:43,611 [recovery.RecoveryManager] DEBUG: Recovering hdfs://de-hd-cluster.name-node.com:8020/apps/accumulo/data/wal/de-hd-cluster.data-node3.com+9997/91ece971-7485-4acf-aa7f-dcde00fafce9 to hdfs://de-hd-cluster.name-node.com:8020/apps/accumulo/data/recovery/91ece971-7485-4acf-aa7f-dcde00fafce9
2016-08-23 09:19:43,614 [conf.AccumuloConfiguration] INFO : Loaded class : org.apache.accumulo.server.master.recovery.HadoopLogCloser
2016-08-23 09:19:43,615 [recovery.RecoveryManager] INFO : Starting recovery of hdfs://de-hd-cluster.name-node.com:8020/apps/accumulo/data/wal/de-hd-cluster.data-node3.com+9997/91ece971-7485-4acf-aa7f-dcde00fafce9 (in : 300s), tablet +r<< holds a reference
2016-08-23 09:19:43,615 [master.Master] DEBUG: [Root Table]: scan time 0.00 seconds
2016-08-23 09:19:43,615 [master.Master] DEBUG: [Root Table] sleeping for 60.00 seconds
2016-08-23 09:19:46,779 [master.Master] DEBUG: Finished gathering information from 1 servers in 0.00 seconds
2016-08-23 09:19:46,779 [master.Master] DEBUG: not balancing because there are unhosted tablets: 1
2016-08-23 09:19:56,782 [master.Master] DEBUG: Finished gathering information from 1 servers in 0.00 seconds
2016-08-23 09:19:56,782 [master.Master] DEBUG: not balancing because there are unhosted tablets: 1
2016-08-23 09:20:00,318 [replication.WorkDriver] DEBUG: Sleeping 30000 ms before next work assignment
2016-08-23 09:20:06,785 [master.Master] DEBUG: Finished gathering information from 1 servers in 0.00 seconds
2016-08-23 09:20:06,785 [master.Master] DEBUG: not balancing because there are unhosted tablets: 1
2016-08-23 09:20:16,788 [master.Master] DEBUG: Finished gathering information from 1 servers in 0.00 seconds
2016-08-23 09:20:16,788 [master.Master] DEBUG: not balancing because there are unhosted tablets: 1 2016-08-23 09:24:44,144 [conf.AccumuloConfiguration] INFO : Loaded class : org.apache.accumulo.server.master.recovery.HadoopLogCloser
2016-08-23 09:24:44,144 [recovery.RecoveryManager] INFO : Starting recovery of hdfs://de-hd-cluster.name-node.com:8020/apps/accumulo/data/wal/de-hd-cluster.data-node3.com+9997/91ece971-7485-4acf-aa7f-dcde00fafce9 (in : 300s), tablet +r<< holds a reference Here the tables in Hadoop: root@NameNode:~# hadoop fs -ls -R /apps/accumulo/data/tables/
drwxr-xr-x - accumulo hdfs 0 2016-04-19 14:16 /apps/accumulo/data/tables/!0
drwxr-xr-x - accumulo hdfs 0 2016-08-08 13:33 /apps/accumulo/data/tables/!0/default_tablet
-rw-r--r-- 3 accumulo hdfs 871 2016-08-08 13:33 /apps/accumulo/data/tables/!0/default_tablet/F0002flt.rf
drwxr-xr-x - accumulo hdfs 0 2016-08-10 10:57 /apps/accumulo/data/tables/!0/table_info
-rw-r--r-- 3 accumulo hdfs 933 2016-08-08 10:14 /apps/accumulo/data/tables/!0/table_info/A0002bqu.rf
-rw-r--r-- 3 accumulo hdfs 933 2016-08-08 10:19 /apps/accumulo/data/tables/!0/table_info/A0002bqx.rf
-rw-r--r-- 3 accumulo hdfs 122 2016-08-10 10:57 /apps/accumulo/data/tables/!0/table_info/A004gpfm.rf_tmp
-rw-r--r-- 3 accumulo hdfs 688 2016-08-08 13:33 /apps/accumulo/data/tables/!0/table_info/F0002fl0.rf
drwxr-xr-x - accumulo hdfs 0 2016-04-19 14:16 /apps/accumulo/data/tables/+r
drwxr-xr-x - accumulo hdfs 0 2016-08-10 10:57 /apps/accumulo/data/tables/+r/root_tablet
-rw-r--r-- 3 accumulo hdfs 974 2016-08-08 10:19 /apps/accumulo/data/tables/+r/root_tablet/A0002bqz.rf
-rw-r--r-- 3 accumulo hdfs 16 2016-08-10 10:57 /apps/accumulo/data/tables/+r/root_tablet/A004gpfl.rf_tmp
-rw-r--r-- 3 accumulo hdfs 754 2016-08-10 10:13 /apps/accumulo/data/tables/+r/root_tablet/C004eodm.rf
-rw-r--r-- 3 accumulo hdfs 364 2016-08-10 10:18 /apps/accumulo/data/tables/+r/root_tablet/F004ew4v.rf
-rw-r--r-- 3 accumulo hdfs 364 2016-08-10 10:29 /apps/accumulo/data/tables/+r/root_tablet/F004fdch.rf
-rw-r--r-- 3 accumulo hdfs 364 2016-08-10 10:34 /apps/accumulo/data/tables/+r/root_tablet/F004fn1f.rf
-rw-r--r-- 3 accumulo hdfs 364 2016-08-10 10:39 /apps/accumulo/data/tables/+r/root_tablet/F004ftix.rf
-rw-r--r-- 3 accumulo hdfs 364 2016-08-10 10:44 /apps/accumulo/data/tables/+r/root_tablet/F004g3af.rf
-rw-r--r-- 3 accumulo hdfs 364 2016-08-10 10:54 /apps/accumulo/data/tables/+r/root_tablet/F004glat.rf
drwxr-xr-x - accumulo hdfs 0 2016-04-19 14:16 /apps/accumulo/data/tables/+rep
drwxr-xr-x - accumulo hdfs 0 2016-04-19 14:16 /apps/accumulo/data/tables/+rep/default_tablet
drwxr-xr-x - accumulo hdfs 0 2016-04-19 14:18 /apps/accumulo/data/tables/1
drwxr-xr-x - accumulo hdfs 0 2016-08-10 10:57 /apps/accumulo/data/tables/1/default_tablet
-rw-r--r-- 3 accumulo hdfs 2524936 2016-07-23 23:11 /apps/accumulo/data/tables/1/default_tablet/A0002041.rf
-rw-r--r-- 3 accumulo hdfs 1502864 2016-07-29 11:17 /apps/accumulo/data/tables/1/default_tablet/C00024ci.rf
-rw-r--r-- 3 accumulo hdfs 899175 2016-08-03 18:50 /apps/accumulo/data/tables/1/default_tablet/C00028be.rf
-rw-r--r-- 3 accumulo hdfs 1428721 2016-08-07 13:21 /apps/accumulo/data/tables/1/default_tablet/C0002av5.rf
-rw-r--r-- 3 accumulo hdfs 211245 2016-08-08 05:11 /apps/accumulo/data/tables/1/default_tablet/C0002bj6.rf
-rw-r--r-- 3 accumulo hdfs 30474 2016-08-08 07:42 /apps/accumulo/data/tables/1/default_tablet/C0002bn1.rf
-rw-r--r-- 3 accumulo hdfs 50286 2016-08-08 10:03 /apps/accumulo/data/tables/1/default_tablet/C0002bqh.rf
-rw-r--r-- 3 accumulo hdfs 122 2016-08-10 10:57 /apps/accumulo/data/tables/1/default_tablet/C004gpfk.rf_tmp
-rw-r--r-- 3 accumulo hdfs 905 2016-08-08 13:28 /apps/accumulo/data/tables/1/default_tablet/F0002byb.rf
The command: root@hdp-accumulo-instance> scan -np -t accumulo.root hangs. Do you know how can I get rid of this table? 🙂 Klaus
... View more
08-22-2016
07:14 AM
Hello, I receive the following messages from Accumulo every 10 seconds: monitor_de-hd-cluster.name-node.com.debug.log: 2016-08-22 07:43:14,841 [impl.ThriftScanner] DEBUG: Failed to locate tablet for table : !0 row : ~err_
2016-08-22 07:43:23,167 [monitor.Monitor] INFO : Failed to obtain problem reports
java.lang.RuntimeException: org.apache.accumulo.core.client.impl.ThriftScanner$ScanTimedOutException
at org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:161)
at org.apache.accumulo.server.problems.ProblemReports$3.hasNext(ProblemReports.java:252)
at org.apache.accumulo.server.problems.ProblemReports.summarize(ProblemReports.java:310)
at org.apache.accumulo.monitor.Monitor.fetchData(Monitor.java:346)
at org.apache.accumulo.monitor.Monitor$1.run(Monitor.java:486)
at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.accumulo.core.client.impl.ThriftScanner$ScanTimedOutException
at org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:230)
at org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:80)
at org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:151)
... 6 more
2016-08-22 07:43:23,510 [impl.ThriftScanner] DEBUG: Failed to locate tablet for table : !0 row : ~err_
2016-08-22 07:43:26,533 [monitor.Monitor] INFO : Failed to obtain problem reports
java.lang.RuntimeException: org.apache.accumulo.core.client.impl.ThriftScanner$ScanTimedOutException
at org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:161)
at org.apache.accumulo.server.problems.ProblemReports$3.hasNext(ProblemReports.java:252)
at org.apache.accumulo.server.problems.ProblemReports.summarize(ProblemReports.java:310)
at org.apache.accumulo.monitor.Monitor.fetchData(Monitor.java:346)
at org.apache.accumulo.monitor.Monitor$1.run(Monitor.java:486)
at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.accumulo.core.client.impl.ThriftScanner$ScanTimedOutException
at org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:230)
at org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:80)
at org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:151)
... 6 more After stopping Accumulo the alternating memory usage was gone. The cluster is not used by anyone and has nothing to do. Attached all debug log files after a restart of Accumulo. Could anyone assist? 🙂 Klaus
... View more
Labels:
- Labels:
-
Apache Accumulo
-
Apache Hadoop