Member since
01-09-2016
56
Posts
44
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5411 | 12-27-2016 05:00 AM | |
1153 | 09-28-2016 08:19 AM | |
9395 | 09-12-2016 09:44 AM | |
3582 | 02-09-2016 10:48 AM |
07-14-2017
07:51 AM
1 Kudo
I'm newbie in NiFi and try to import data from MySQL to DynamoDB. I can fetch data from MySQL(5 mln records) but I don't understand how to ingest query result to DynamoDB table. How I should configure PutDynamoDB? Especially how to fill Json Document attribute?
... View more
Labels:
- Labels:
-
Apache NiFi
07-11-2017
12:11 AM
@Eugene Koifman Thanks for the clarification!
... View more
04-20-2017
01:00 PM
14 Kudos
After an
unsuccessful upgrade, I was forced to completely remove HDP 2.4, Ambari 2.5 and
install HDP 2.6. I wanted to avoid reinstalling the OS, so I took advantage of this instruction. Unfortunately, it is not complete. For the problem-free
installation of HDP 2.6, you also need to do things like removing service
users, cleaning the cron. So here
is my plan of action: 1. Stop
all services in Ambari or kill them. In my case, Ambari damaged his database
during downgrade and could not start. So I manually killed all the processes on
all nodes: ps –u hdfs (see list of all services below)
kill PID 2. Run
python script on all cluster nodes python /usr/lib/python2.6/site-packages/ambari_agent/HostCleanup.py --silent --skip=users 3. Remove
Hadoop packages on all nodes yum remove hive\*
yum remove oozie\*
yum remove pig\*
yum remove zookeeper\*
yum remove tez\*
yum remove hbase\*
yum remove ranger\*
yum remove knox\*
yum remove storm\*
yum remove accumulo\*
yum remove falcon\*
yum remove ambari-metrics-hadoop-sink
yum remove smartsense-hst
yum remove slider_2_4_2_0_258
yum remove ambari-metrics-monitor
yum remove spark2_2_5_3_0_37-yarn-shuffle
yum remove spark_2_5_3_0_37-yarn-shuffle
yum remove ambari-infra-solr-client
4. Remove
ambari-server (on ambari host) and ambari-agent (on all nodes) ambari-server stop
ambari-agent stop
yum erase ambari-server
yum erase ambari-agent
5. Remove
repositories on all nodes rm -rf /etc/yum.repos.d/ambari.repo /etc/yum.repos.d/HDP*
yum clean all
6. Remove
log folders on all nodes rm -rf /var/log/ambari-agent
rm -rf /var/log/ambari-metrics-grafana
rm -rf /var/log/ambari-metrics-monitor
rm -rf /var/log/ambari-server/
rm -rf /var/log/falcon
rm -rf /var/log/flume
rm -rf /var/log/hadoop
rm -rf /var/log/hadoop-mapreduce
rm -rf /var/log/hadoop-yarn
rm -rf /var/log/hive
rm -rf /var/log/hive-hcatalog
rm -rf /var/log/hive2
rm -rf /var/log/hst
rm -rf /var/log/knox
rm -rf /var/log/oozie
rm -rf /var/log/solr
rm -rf /var/log/zookeeper
7. Remove
Hadoop folders including HDFS data on all nodes rm -rf /hadoop/*
rm -rf /hdfs/hadoop
rm -rf /hdfs/lost+found
rm -rf /hdfs/var
rm -rf /local/opt/hadoop
rm -rf /tmp/hadoop
rm -rf /usr/bin/hadoop
rm -rf /usr/hdp
rm -rf /var/hadoop
8. Remove
config folders on all nodes rm -rf /etc/ambari-agent
rm -rf /etc/ambari-metrics-grafana
rm -rf /etc/ambari-server
rm -rf /etc/ams-hbase
rm -rf /etc/falcon
rm -rf /etc/flume
rm -rf /etc/hadoop
rm -rf /etc/hadoop-httpfs
rm -rf /etc/hbase
rm -rf /etc/hive
rm -rf /etc/hive-hcatalog
rm -rf /etc/hive-webhcat
rm -rf /etc/hive2
rm -rf /etc/hst
rm -rf /etc/knox
rm -rf /etc/livy
rm -rf /etc/mahout
rm -rf /etc/oozie
rm -rf /etc/phoenix
rm -rf /etc/pig
rm -rf /etc/ranger-admin
rm -rf /etc/ranger-usersync
rm -rf /etc/spark2
rm -rf /etc/tez
rm -rf /etc/tez_hive2
rm -rf /etc/zookeeper
9. Remove
PIDs on all nodes rm -rf /var/run/ambari-agent
rm -rf /var/run/ambari-metrics-grafana
rm -rf /var/run/ambari-server
rm -rf /var/run/falcon
rm -rf /var/run/flume
rm -rf /var/run/hadoop
rm -rf /var/run/hadoop-mapreduce
rm -rf /var/run/hadoop-yarn
rm -rf /var/run/hbase
rm -rf /var/run/hive
rm -rf /var/run/hive-hcatalog
rm -rf /var/run/hive2
rm -rf /var/run/hst
rm -rf /var/run/knox
rm -rf /var/run/oozie
rm -rf /var/run/webhcat
rm -rf /var/run/zookeeper
10. Remove
library folders on all nodes rm -rf /usr/lib/ambari-agent
rm -rf /usr/lib/ambari-infra-solr-client
rm -rf /usr/lib/ambari-metrics-hadoop-sink
rm -rf /usr/lib/ambari-metrics-kafka-sink
rm -rf /usr/lib/ambari-server-backups
rm -rf /usr/lib/ams-hbase
rm -rf /usr/lib/mysql
rm -rf /var/lib/ambari-agent
rm -rf /var/lib/ambari-metrics-grafana
rm -rf /var/lib/ambari-server
rm -rf /var/lib/flume
rm -rf /var/lib/hadoop-hdfs
rm -rf /var/lib/hadoop-mapreduce
rm -rf /var/lib/hadoop-yarn
rm -rf /var/lib/hive2
rm -rf /var/lib/knox
rm -rf /var/lib/smartsense
rm -rf /var/lib/storm
11. Clean
folder /var/tmp/* on all nodes rm -rf /var/tmp/* 12. Delete
HST from cron on all nodes 0 * * * * /usr/hdp/share/hst/bin/hst-scheduled-capture.sh sync
0 2 * * 0 /usr/hdp/share/hst/bin/hst-scheduled-capture.sh
13. Remove
databases. I remove the instances of MySQL and Postgres so that Ambari installed
and configured fresh databases. yum remove mysql mysql-server
yum erase postgresql
rm -rf /var/lib/pgsql
rm -rf /var/lib/mysql
14. Remove
symlinks on all nodes. Especially check folders /usr/sbin and /usr/lib/python2.6/site-packages cd /usr/bin
rm -rf accumulo
rm -rf atlas-start
rm -rf atlas-stop
rm -rf beeline
rm -rf falcon
rm -rf flume-ng
rm -rf hbase
rm -rf hcat
rm -rf hdfs
rm -rf hive
rm -rf hiveserver2
rm -rf kafka
rm -rf mahout
rm -rf mapred
rm -rf oozie
rm -rf oozied.sh
rm -rf phoenix-psql
rm -rf phoenix-queryserver
rm -rf phoenix-sqlline
rm -rf phoenix-sqlline-thin
rm -rf pig
rm -rf python-wrap
rm -rf ranger-admin
rm -rf ranger-admin-start
rm -rf ranger-admin-stop
rm -rf ranger-kms
rm -rf ranger-usersync
rm -rf ranger-usersync-start
rm -rf ranger-usersync-stop
rm -rf slider
rm -rf sqoop
rm -rf sqoop-codegen
rm -rf sqoop-create-hive-table
rm -rf sqoop-eval
rm -rf sqoop-export
rm -rf sqoop-help
rm -rf sqoop-import
rm -rf sqoop-import-all-tables
rm -rf sqoop-job
rm -rf sqoop-list-databases
rm -rf sqoop-list-tables
rm -rf sqoop-merge
rm -rf sqoop-metastore
rm -rf sqoop-version
rm -rf storm
rm -rf storm-slider
rm -rf worker-lanucher
rm -rf yarn
rm -rf zookeeper-client
rm -rf zookeeper-server
rm -rf zookeeper-server-cleanup
15. Remove
service users on all nodes userdel -r accumulo
userdel -r ambari-qa
userdel -r ams
userdel -r falcon
userdel -r flume
userdel -r hbase
userdel -r hcat
userdel -r hdfs
userdel -r hive
userdel -r kafka
userdel -r knox
userdel -r mapred
userdel -r oozie
userdel -r ranger
userdel -r spark
userdel -r sqoop
userdel -r storm
userdel -r tez
userdel -r yarn
userdel -r zeppelin
userdel -r zookeeper
16. Run find / -name ** on all nodes. You will definitely find several more files/folders. Remove them. find / -name *ambari*
find / -name *accumulo*
find / -name *atlas*
find / -name *beeline*
find / -name *falcon*
find / -name *flume*
find / -name *hadoop*
find / -name *hbase*
find / -name *hcat*
find / -name *hdfs*
find / -name *hdp*
find / -name *hive*
find / -name *hiveserver2*
find / -name *kafka*
find / -name *mahout*
find / -name *mapred*
find / -name *oozie*
find / -name *phoenix*
find / -name *pig*
find / -name *ranger*
find / -name *slider*
find / -name *sqoop*
find / -name *storm*
find / -name *yarn*
find / -name *zookeeper*
17. Reboot
all nodes reboot
... View more
- Find more articles tagged with:
- Ambari
- delete
- Design & Architecture
- hdp-2.3.4
- How-ToTutorial
- remove
- uninstall
04-19-2017
11:04 AM
2 Kudos
I'm trying to copy a transaction table from a production cluster HDP 2.5 to a dev cluster HDP 2.6.
I set these ACID settings in dev cluster: hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
hive.support.concurrency=true
hive.enforce.bucketing=true
hive.exec.dynamic.partition.mode=nonstrict
hive.compactor.initiator.on=true
hive.compactor.worker.threads=3
then I import table from prod to dev: hive> export table hana.easy_check to 'export/easy_check';
hadoop distcp -prbugp hdfs://hdp-nn1:8020/user/hive/export/easy_check/ hdfs://dev-nn2:8020/user/hive/export/
hive> import from 'export/easy_check'; However, when I run any sql query on this table in dev cluster I get an error: 2017-04-19 11:08:33,879 [ERROR] [Dispatcher thread {Central}] |impl.VertexImpl|: Vertex Input: easy_check initializer failed, vertex=vertex_1492584180580_0005_1_00 [Map 1]
org.apache.tez.dag.app.dag.impl.AMUserCodeException: java.lang.RuntimeException: serious problem
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:319)
at com.google.common.util.concurrent.Futures$4.run(Futures.java:1140)
at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:150)
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:135)
at com.google.common.util.concurrent.ListenableFutureTask.done(ListenableFutureTask.java:91)
at java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:384)
at java.util.concurrent.FutureTask.setException(FutureTask.java:251)
at java.util.concurrent.FutureTask.run(FutureTask.java:271)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: serious problem
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1258)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1285)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:307)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:409)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:273)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:266)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Not enough history available for (0,x). Oldest available base: hdfs://development/apps/hive/warehouse/hana.db/easy_check/ym=2017-01/base_0001497
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1235)
... 15 more
Caused by: java.io.IOException: Not enough history available for (0,x). Oldest available base: hdfs://development/apps/hive/warehouse/hana.db/easy_check/ym=2017-01/base_0001497
at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:594)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.callInternal(OrcInputFormat.java:773)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.access$600(OrcInputFormat.java:738)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator$1.run(OrcInputFormat.java:763)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator$1.run(OrcInputFormat.java:760)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:760)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:738)
... 4 more What is wrong? Both Hive 1.2.1 # Detailed Table Information
Database: hana
Owner: hive
CreateTime: Wed Apr 19 13:27:00 MSK 2017
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://development/apps/hive/warehouse/hana.db/easy_check
Table Type: MANAGED_TABLE
Table Parameters:
NO_AUTO_COMPACTION false
compactor.mapreduce.map.memory.mb 2048
compactorthreshold.hive.compactor.delta.num.threshold 4
compactorthreshold.hive.compactor.delta.pct.threshold 0.3
last_modified_by hive
last_modified_time 1489647024
orc.bloom.filter.columns calday, request, material
orc.compress ZLIB
orc.compress.size 262144
orc.create.index true
orc.row.index.stride 5000
orc.stripe.size 67108864
transactional true
transient_lastDdlTime 1492597620
# Storage Information
SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed: No
Num Buckets: 1
Bucket Columns: [material]
Sort Columns: []
Storage Desc Params:
serialization.format 1
... View more
Labels:
- Labels:
-
Apache Hive
04-11-2017
05:19 AM
Hi @Jay SenSharma I have dumps before any upgrades (Ambari 2.2.2.0) and after downgrade (Ambari 2.4.1.0). Dump after downgrade is bad because of Downgrade Can Create Multiple Mappings For Latest Configs Anyway I tried to restore from it and run ambari-server upgrade. The errors the same. Actually I think that problem with tables clusterconfigmapping, serviceconfigmapping. Is there any way to get correct versions of these tables? Call: INSERT INTO serviceconfigmapping (config_id, service_config_id) VALUES (?, ?)
bind => [2 parameters bound]
at org.apache.ambari.server.upgrade.SchemaUpgradeHelper.executeDMLUpdates(SchemaUpgradeHelper.java:240)
at org.apache.ambari.server.upgrade.SchemaUpgradeHelper.main(SchemaUpgradeHelper.java:430)
Caused by: javax.persistence.RollbackException: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.6.2.v20151217-774c696): org.eclipse.persistence.exceptions.DatabaseException
... View more
04-10-2017
02:29 PM
I have a problem with upgrading Ambari from 2.4.0.1. to 2.5.0. Actually my problems started from Ambari not start after downgrade After dealing with Downgrade Can Create Multiple Mappings For Latest Configs I run Ambari as "ambari-server start --skip-database-check", fulfilled all prerequisites for Upgrade to 2.5 from instruction and finally on step ambari-server upgrade I got these errors. Any ideas how to recover Ambari? ambari-server.log ERROR: Error executing schema upgrade, please check the server logs.
ERROR: Error output from schema upgrade command:
ERROR: Exception in thread "main" org.apache.ambari.server.AmbariException: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.6.2.v20151217-774c696): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: java.sql.BatchUpdateException: Batch entry 4 INSERT INTO serviceconfigmapping (config_id, service_config_id) VALUES (18, 403) was aborted. Call getNextException to see the cause.
Error Code: 0
Call: INSERT INTO serviceconfigmapping (config_id, service_config_id) VALUES (?, ?)
bind => [2 parameters bound]
at org.apache.ambari.server.upgrade.SchemaUpgradeHelper.executeDMLUpdates(SchemaUpgradeHelper.java:240)
at org.apache.ambari.server.upgrade.SchemaUpgradeHelper.main(SchemaUpgradeHelper.java:430)
Caused by: javax.persistence.RollbackException: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.6.2.v20151217-774c696): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: java.sql.BatchUpdateException: Batch entry 4 INSERT INTO serviceconfigmapping (config_id, service_config_id) VALUES (18, 403) was aborted. Call getNextException to see the cause.
Error Code: 0
Call: INSERT INTO serviceconfigmapping (config_id, service_config_id) VALUES (?, ?)
bind => [2 parameters bound]
at org.eclipse.persistence.internal.jpa.transaction.EntityTransactionImpl.commit(EntityTransactionImpl.java:159)
at org.apache.ambari.server.orm.AmbariJpaLocalTxnInterceptor.invoke(AmbariJpaLocalTxnInterceptor.java:153)
at org.apache.ambari.server.state.cluster.ClusterImpl.addDesiredConfig(ClusterImpl.java:2180)
at org.apache.ambari.server.state.cluster.ClusterImpl.addDesiredConfig(ClusterImpl.java:2148)
at org.apache.ambari.server.upgrade.AbstractUpgradeCatalog.updateConfigurationPropertiesForCluster(AbstractUpgradeCatalog.java:598)
at org.apache.ambari.server.upgrade.AbstractUpgradeCatalog.updateConfigurationPropertiesWithValuesFromXml(AbstractUpgradeCatalog.java:521)
at org.apache.ambari.server.upgrade.AbstractUpgradeCatalog.updateConfigurationPropertiesWithValuesFromXml(AbstractUpgradeCatalog.java:482)
at org.apache.ambari.server.upgrade.AbstractUpgradeCatalog.addNewConfigurationsFromXml(AbstractUpgradeCatalog.java:421)
at org.apache.ambari.server.upgrade.UpgradeCatalog242.executeDMLUpdates(UpgradeCatalog242.java:127)
at org.apache.ambari.server.upgrade.AbstractUpgradeCatalog.upgradeData(AbstractUpgradeCatalog.java:943)
at org.apache.ambari.server.upgrade.SchemaUpgradeHelper.executeDMLUpdates(SchemaUpgradeHelper.java:237)
... 1 more
Caused by: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.6.2.v20151217-774c696): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: java.sql.BatchUpdateException: Batch entry 4 INSERT INTO serviceconfigmapping (config_id, service_config_id) VALUES (18, 403) was aborted. Call getNextException to see the cause.
Error Code: 0
Call: INSERT INTO serviceconfigmapping (config_id, service_config_id) VALUES (?, ?)
bind => [2 parameters bound]
at org.eclipse.persistence.exceptions.DatabaseException.sqlException(DatabaseException.java:340)
at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.processExceptionForCommError(DatabaseAccessor.java:1620)
at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.executeJDK12BatchStatement(DatabaseAccessor.java:926)
at org.eclipse.persistence.internal.databaseaccess.ParameterizedSQLBatchWritingMechanism.executeBatch(ParameterizedSQLBatchWritingMechanism.java:179)
at org.eclipse.persistence.internal.databaseaccess.ParameterizedSQLBatchWritingMechanism.executeBatchedStatements(ParameterizedSQLBatchWritingMechanism.java:134)
at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.writesCompleted(DatabaseAccessor.java:1845)
at org.eclipse.persistence.internal.sessions.AbstractSession.writesCompleted(AbstractSession.java:4300)
at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.writesCompleted(UnitOfWorkImpl.java:5592)
at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.acquireWriteLocks(UnitOfWorkImpl.java:1646)
at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.commitTransactionAfterWriteChanges(UnitOfWorkImpl.java:1614)
at org.eclipse.persistence.internal.sessions.RepeatableWriteUnitOfWork.commitRootUnitOfWork(RepeatableWriteUnitOfWork.java:285)
at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.commitAndResume(UnitOfWorkImpl.java:1169)
at org.eclipse.persistence.internal.jpa.transaction.EntityTransactionImpl.commit(EntityTransactionImpl.java:134)
... 11 more
Caused by: java.sql.BatchUpdateException: Batch entry 4 INSERT INTO serviceconfigmapping (config_id, service_config_id) VALUES (18, 403) was aborted. Call getNextException to see the cause.
at org.postgresql.jdbc2.AbstractJdbc2Statement$BatchResultHandler.handleError(AbstractJdbc2Statement.java:2740)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1891)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:405)
at org.postgresql.jdbc2.AbstractJdbc2Statement.executeBatch(AbstractJdbc2Statement.java:2889)
at org.eclipse.persistence.internal.databaseaccess.DatabasePlatform.executeBatch(DatabasePlatform.java:2336)
at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.executeJDK12BatchStatement(DatabaseAccessor.java:922)
... 21 more
ambari-eclipselink.log [EL Info]: 2016-07-28 09:22:30.365--UnitOfWork(820730256)-- Communication failure detected when attempting to perform read query outside of a transaction. Attempting to retry query. Error was: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.6.2.v20151217-774c696): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: org.postgresql.util.PSQLException: FATAL: terminating connection due to administrator command
Error Code: 0
Call: SELECT DISTINCT task_id FROM host_role_command WHERE ((role = ?) AND (status = ?)) ORDER BY task_id
bind => [2 parameters bound]
Query: ReportQuery(referenceClass=HostRoleCommandEntity sql="SELECT DISTINCT task_id FROM host_role_command WHERE ((role = ?) AND (status = ?)) ORDER BY task_id").
[EL Error]: 2016-07-28 09:22:30.372--ServerSession(879319843)--
[EL Info]: 2016-07-28 09:22:30.372--UnitOfWork(820730256)-- Communication failure detected when attempting to perform read query outside of a transaction. Attempting to retry query. Error was: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.6.2.v20151217-774c696): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: org.postgresql.util.PSQLException: Connection refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
Error Code: 0
Query: ReportQuery(referenceClass=HostRoleCommandEntity sql="SELECT DISTINCT task_id FROM host_role_command WHERE ((role = ?) AND (status = ?)) ORDER BY task_id").
[EL Error]: 2016-07-28 09:22:30.831--ServerSession(879319843)--
[EL Info]: 2016-07-28 09:22:30.833--UnitOfWork(1847253395)-- Communication failure detected when attempting to perform read query outside of a transaction. Attempting to retry query. Error was: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.6.2.v20151217-774c696): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: org.postgresql.util.PSQLException: Connection refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
Error Code: 0
Query: ReportQuery(name="HostRoleCommandEntity.findCountByCommandStatuses" referenceClass=HostRoleCommandEntity sql="SELECT COUNT(task_id) FROM host_role_command WHERE (status IN ?)").
[EL Error]: 2016-07-28 09:22:30.834--ServerSession(879319843)--
[EL Info]: 2016-07-28 09:22:30.834--UnitOfWork(1847253395)-- Communication failure detected when attempting to perform read query outside of a transaction. Attempting to retry query. Error was: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.6.2.v20151217-774c696): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: org.postgresql.util.PSQLException: Connection refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
Error Code: 0
Query: ReportQuery(name="HostRoleCommandEntity.findCountByCommandStatuses" referenceClass=HostRoleCommandEntity sql="SELECT COUNT(task_id) FROM host_role_command WHERE (status IN ?)").
[EL Error]: 2016-07-28 09:22:30.913--ServerSession(879319843)--
[EL Info]: 2016-07-28 09:22:30.915--UnitOfWork(691671603)-- Communication failure detected when attempting to perform read query outside of a transaction. Attempting to retry query. Error was: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.6.2.v20151217-774c696): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: org.postgresql.util.PSQLException: Connection refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
Error Code: 0
Query: ReportQuery(referenceClass=RequestEntity sql="SELECT request_id AS a1 FROM request ORDER BY request_id DESC LIMIT ? OFFSET ?").
[EL Error]: 2016-07-28 09:22:30.916--ServerSession(879319843)--
[EL Info]: 2016-07-28 09:22:30.916--UnitOfWork(691671603)-- Communication failure detected when attempting to perform read query outside of a transaction. Attempting to retry query. Error was: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.6.2.v20151217-774c696): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: org.postgresql.util.PSQLException: Connection refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
Error Code: 0
Query: ReportQuery(referenceClass=RequestEntity sql="SELECT request_id AS a1 FROM request ORDER BY request_id DESC LIMIT ? OFFSET ?").
[EL Error]: 2016-07-28 09:22:31.136--ServerSession(879319843)--
[EL Info]: 2016-07-28 09:22:31.138--UnitOfWork(831902125)-- Communication failure detected when attempting to perform read query outside of a transaction. Attempting to retry query. Error was: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.6.2.v20151217-774c696): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: org.postgresql.util.PSQLException: Connection refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
Error Code: 0
Query: ReadAllQuery(name="AlertCurrentEntity.findByHostAndName" referenceClass=AlertCurrentEntity sql="SELECT t1.alert_id AS a1, t1.definition_id AS a2, t1.history_id AS a3, t1.latest_text AS a4, t1.latest_timestamp AS a5, t1.maintenance_state AS a6, t1.original_timestamp AS a7 FROM alert_history t0, alert_definition t2, alert_current t1 WHERE ((((t0.cluster_id = ?) AND (t2.definition_name = ?)) AND (t0.host_name = ?)) AND ((t0.alert_id = t1.history_id) AND (t2.definition_id = t0.alert_definition_id))) LIMIT ? OFFSET ?").
[EL Error]: 2016-07-28 09:22:31.14--ServerSession(879319843)--
[EL Info]: 2016-07-28 09:22:31.14--UnitOfWork(831902125)-- Communication failure detected when attempting to perform read query outside of a transaction. Attempting to retry query. Error was: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.6.2.v20151217-774c696): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: org.postgresql.util.PSQLException: Connection refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
Error Code: 0
Query: ReadAllQuery(name="AlertCurrentEntity.findByHostAndName" referenceClass=AlertCurrentEntity sql="SELECT t1.alert_id AS a1, t1.definition_id AS a2, t1.history_id AS a3, t1.latest_text AS a4, t1.latest_timestamp AS a5, t1.maintenance_state AS a6, t1.original_timestamp AS a7 FROM alert_history t0, alert_definition t2, alert_current t1 WHERE ((((t0.cluster_id = ?) AND (t2.definition_name = ?)) AND (t0.host_name = ?)) AND ((t0.alert_id = t1.history_id) AND (t2.definition_id = t0.alert_definition_id))) LIMIT ? OFFSET ?").
[EL Warning]: 2016-07-28 09:22:33.401--UnitOfWork(691671603)--
[EL Warning]: 2016-07-28 09:58:01.779--ServerSession(879319843)-- The reference column name [resource_type_id] mapped on the element [field permissions] does not correspond to a valid id or basic field/column on the mapping reference. Will use referenced column name as provided.
[EL Info]: 2016-07-28 09:58:03.095--ServerSession(879319843)-- EclipseLink, version: Eclipse Persistence Services - 2.6.2.v20151217-774c696
[EL Info]: 2016-07-28 09:58:03.525--ServerSession(879319843)-- /file:/usr/lib/ambari-server/ambari-server-2.2.2.0.460.jar_ambari-server_url=jdbc:postgresql://localhost/ambari_user=ambari login successful
[EL Warning]: 2016-08-24 16:52:09.931--ServerSession(687329219)-- The reference column name [resource_type_id] mapped on the element [field permissions] does not correspond to a valid id or basic field/column on the mapping reference. Will use referenced column name as provided.
[EL Info]: 2016-08-24 16:52:10.135--ServerSession(687329219)-- EclipseLink, version: Eclipse Persistence Services - 2.6.2.v20151217-774c696
[EL Info]: 2016-08-24 16:52:10.287--ServerSession(687329219)-- /file:/usr/lib/ambari-server/ambari-server-2.2.2.0.460.jar_ambari-views_url=jdbc:postgresql://localhost/ambari_user=ambari login successful
HDP-2.4.2.0
Ambari 2.4.0.1
Postgres 8.4.20
... View more
Labels:
- Labels:
-
Apache Ambari
04-06-2017
02:32 PM
@Jay SenSharma Hi Jay, Thanks for your answer. I thought about upgrade Ambari to 2.5, but how can I do it if my current ambari-server does not even start? So I can't accomplish prerequisites such as:
Ensure all services in the cluster are running. Run each Service Check (found under the Service Actions menu) and confirm they execute successfully. Clear all alerts, or understand why they are being generated. Remediate as necessary. etc
... View more
04-06-2017
09:00 AM
I tried to upgrade HDP from 2.4.2 to 2.5.3 and encountered an error "There is no active namenodes". The problem is described here namenode-restart-fails-during-hdp-upgrade So I decided to do Downgrade and increase the timeout. However, after the downgrade Ambari doesn't start because of "DB configs consistency check failed." I guess the bag is Downgrade Can Create Multiple Mappings For Latest Configs Recovery from backup failed. How can I solve this problem? HDP-2.4.2.0
Ambari 2.4.0.1
Postgres 8.4.20 From ambari-server-check-database.log ERROR - You have config(s), in cluster development, that is(are) selected more than once in clusterconfigmapping table: ams-env,webhcat-log4j,ranger-site,ranger-ugsync-site,ranger-admin-site,admin-properties,ranger-hive-policymgr-ssl,ranger-env,hcat-env,hive-site,ranger-hive-plugin-properties,webhcat-site,hive-exec-log4j,ams-log4j,webhcat-env,ams-ssl-server,hiveserver2-site,ams-hbase-log4j,hive-log4j,ams-ssl-client,ams-hbase-policy,ams-grafana-ini,ams-hbase-security-site,hive-env,ams-grafana-env,usersync-properties,ranger-hive-security
... View more
Labels:
- Labels:
-
Apache Ambari
02-03-2017
07:14 AM
Solved it! Sick JN didn't stop when I stopped it in Ambari and even when I stop HDFS in Ambari. I killed the JN process manually, replaced the data from healthy JN and run HDFS. Now it works! 🙂
... View more
02-02-2017
08:59 AM
Hi @Brandon Wilson Your solution works perfectly but only if "edits_inprogress_" file has the same name on both JournalNodes (JN). In case of my devcluster, I was not engaged in the problem of two months. During this time, a healthy JN has created a new "edits_inprogress_" file, but the sick JN still asks the old "edits_inprogress_" file. I did all 4 steps of your algorithm, but sick JN again asks old file. The content of /hadoop/hdfs/journal/devcluster/current is the same on both nodes. What to do? Log of healthy JN (edits_inprogress_0000000000016172345) 2017-02-02 10:15:12,513 INFO namenode.FileJournalManager (FileJournalManager.java:finalizeLogSegment(133)) - Finalizing edits file /hadoop/hdfs/journal/devcluster/current/edits_inprogress_0000000000016172345 -> /hadoop/hdfs/journal/devcluster/current/edits_0000000000016172345-0000000000016172394
Log of sick JN (edits_inprogress_0000000000011766543) 2017-02-02 10:15:57,744 WARN namenode.FSImage (EditLogFileInputStream.java:scanEditLog(350)) - Caught exception after scanning through 0 ops from /hadoop/hdfs/journal/devcluster/current/edits_inprogress_0000000000011766543 while determining its valid length. Position was 1036288
java.io.IOException: Can't scan a pre-transactional edit log.
... View more
12-27-2016
05:00 AM
Hi @Aravindan Vijayan Yesterday suddenly Ambari Metrics has started working (and still works). The only thing I have changed yesterday - install Apache Atlas, which required restart almost all components, may be it helped.
Thanks for your assistance!
... View more
12-26-2016
05:49 AM
Hi @Aravindan Vijayan I have 7 nodes (2 nn + 5 dn). Response to GET calls http://<METRICS_COLLECTOR_HOST>:6188/ws/v1/timeline/metrics/metadata (it's too long, I cut it)
{"type":"COUNTER","seriesStartTime":1482480880891,"metricname":"regionserver.WAL.rollRequest","supportsAggregation":true},{"type":"COUNTER","seriesStartTime":1482480880869,"metricname":"jvm.Master.JvmMetrics.GcCount","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880889,"metricname":"master.FileSystem.MetaHlogSplitSize_99th_percentile","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880889,"metricname":"master.FileSystem.HlogSplitSize_98th_percentile","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880874,"metricname":"master.Master.QueueCallTime_median","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880889,"metricname":"master.FileSystem.HlogSplitSize_max","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880894,"metricname":"master.Balancer.BalancerCluster_median","supportsAggregation":true},{"type":"COUNTER","seriesStartTime":1482480880896,"metricname":"metricssystem.MetricsSystem.PublishNumOps","supportsAggregation":true},{"type":"COUNTER","seriesStartTime":1482480880874,"metricname":"master.Master.exceptions.FailedSanityCheckException","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880869,"metricname":"jvm.Master.JvmMetrics.MemHeapUsedM","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880894,"metricname":"master.AssignmentManger.Assign_mean","supportsAggregation":true},{"type":"COUNTER","seriesStartTime":1482480880896,"metricname":"metricssystem.MetricsSystem.Sink_timelineDropped","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880889,"metricname":"master.FileSystem.MetaHlogSplitTime_95th_percentile","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880894,"metricname":"master.AssignmentManger.BulkAssign_95th_percentile","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880869,"metricname":"jvm.Master.JvmMetrics.MemNonHeapUsedM","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880894,"metricname":"master.AssignmentManger.Assign_99.9th_percentile","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880874,"metricname":"master.Master.RequestSize_mean","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880874,"metricname":"master.Master.RequestSize_min","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880894,"metricname":"master.AssignmentManger.Assign_99th_percentile","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880891,"metricname":"regionserver.WAL.AppendSize_99th_percentile","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880894,"metricname":"master.Balancer.BalancerCluster_99.9th_percentile","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880894,"metricname":"master.AssignmentManger.BulkAssign_75th_percentile","supportsAggregation":true},{"type":"COUNTER","seriesStartTime":1482480880889,"metricname":"master.FileSystem.MetaHlogSplitTime_num_ops","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880891,"metricname":"regionserver.WAL.SyncTime_90th_percentile","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880894,"metricname":"master.Balancer.BalancerCluster_90th_percentile","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880891,"metricname":"regionserver.WAL.AppendTime_max","supportsAggregation":true}],"logfeeder":[{"type":"Long","seriesStartTime":1482480913546,"metricname":"output.solr.write_logs","supportsAggregation":true},{"type":"Long","seriesStartTime":1482480913546,"metricname":"input.files.count","supportsAggregation":true},{"type":"Long","seriesStartTime":1482480943578,"metricname":"filter.error.keyvalue","supportsAggregation":true},{"type":"Long","seriesStartTime":1482480913546,"metricname":"filter.error.grok","supportsAggregation":true},{"type":"Long","seriesStartTime":1482480913546,"metricname":"input.files.read_bytes","supportsAggregation":true},{"type":"Long","seriesStartTime":1482480913546,"metricname":"output.solr.write_bytes","supportsAggregation":true},{"type":"Long","seriesStartTime":1482480913546,"metricname":"input.files.read_lines","supportsAggregation":true}]} http://<METRICS_COLLECTOR_HOST>:6188/ws/v1/timeline/metrics/hosts
{"hdp-dn3.hostname":["accumulo","datanode","journalnode","HOST","nodemanager","hbase","logfeeder"],"hdp-dn5.hostname":["accumulo","datanode","HOST","nodemanager","hbase","logfeeder"],"hdp-dn2.hostname":["accumulo","datanode","HOST","nodemanager","logfeeder"],"hdp-nn1.hostname":["accumulo","nimbus","resourcemanager","journalnode","HOST","applicationhistoryserver","namenode","hbase","kafka_broker","logfeeder"],"hdp-dn1.hostname":["accumulo","hiveserver2","datanode","hivemetastore","HOST","nodemanager","logfeeder"],"hdp-dn4.hostname":["accumulo","datanode","HOST","nodemanager","hbase","logfeeder"],"hdp-nn2.hostname":["hiveserver2","hivemetastore","journalnode","resourcemanager","HOST","jobhistoryserver","namenode","ams-hbase","logfeeder"]} Config files cat /etc/ambari-metrics-collector/conf/ams-env.sh
# Set environment variables here.
# The java implementation to use. Java 1.6 required.
export JAVA_HOME=/usr/jdk64/jdk1.8.0_77
# Collector Log directory for log4j
export AMS_COLLECTOR_LOG_DIR=/var/log/ambari-metrics-collector
# Monitor Log directory for outfile
export AMS_MONITOR_LOG_DIR=/var/log/ambari-metrics-monitor
# Collector pid directory
export AMS_COLLECTOR_PID_DIR=/var/run/ambari-metrics-collector
# Monitor pid directory
export AMS_MONITOR_PID_DIR=/var/run/ambari-metrics-monitor
# AMS HBase pid directory
export AMS_HBASE_PID_DIR=/var/run/ambari-metrics-collector/
# AMS Collector heapsize
export AMS_COLLECTOR_HEAPSIZE=1024m
# HBase normalizer enabled
export AMS_HBASE_NORMALIZER_ENABLED=False
# HBase compaction policy enabled
export AMS_HBASE_FIFO_COMPACTION_ENABLED=True
# HBase Tables Initialization check enabled
export AMS_HBASE_INIT_CHECK_ENABLED=True
# AMS Collector options
export AMS_COLLECTOR_OPTS="-Djava.library.path=/usr/lib/ams-hbase/lib/hadoop-native"
# AMS Collector GC options
export AMS_COLLECTOR_GC_OPTS="-XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/ambari-metrics-collector/collector-gc.log-`date +'%Y%m%d%H%M'`"
export AMS_COLLECTOR_OPTS="$AMS_COLLECTOR_OPTS $AMS_COLLECTOR_GC_OPTS"
cat /etc/ambari-metrics-collector/conf/ams-site.xml
<configuration>
<property>
<name>phoenix.query.maxGlobalMemoryPercentage</name>
<value>25</value>
</property>
<property>
<name>phoenix.spool.directory</name>
<value>/tmp</value>
</property>
<property>
<name>timeline.metrics.aggregator.checkpoint.dir</name>
<value>/var/lib/ambari-metrics-collector/checkpoint</value>
</property>
<property>
<name>timeline.metrics.aggregators.skip.blockcache.enabled</name>
<value>false</value>
</property>
<property>
<name>timeline.metrics.cache.commit.interval</name>
<value>3</value>
</property>
<property>
<name>timeline.metrics.cache.enabled</name>
<value>true</value>
</property>
<property>
<name>timeline.metrics.cache.size</name>
<value>150</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregate.splitpoints</name>
<value>kafka.server.BrokerTopicMetrics.FailedFetchRequestsPerSec.meanRate</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.daily.checkpointCutOffMultiplier</name>
<value>2</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.daily.disabled</name>
<value>false</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.daily.interval</name>
<value>86400</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.daily.ttl</name>
<value>63072000</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.hourly.checkpointCutOffMultiplier</name>
<value>2</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.hourly.disabled</name>
<value>false</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.hourly.interval</name>
<value>3600</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.hourly.ttl</name>
<value>31536000</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.interpolation.enabled</name>
<value>true</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.minute.checkpointCutOffMultiplier</name>
<value>2</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.minute.disabled</name>
<value>false</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.minute.interval</name>
<value>300</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.minute.ttl</name>
<value>2592000</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.second.checkpointCutOffMultiplier</name>
<value>2</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.second.disabled</name>
<value>false</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.second.interval</name>
<value>120</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.second.timeslice.interval</name>
<value>30</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.second.ttl</name>
<value>259200</value>
</property>
<property>
<name>timeline.metrics.daily.aggregator.minute.interval</name>
<value>86400</value>
</property>
<property>
<name>timeline.metrics.hbase.compression.scheme</name>
<value>SNAPPY</value>
</property>
<property>
<name>timeline.metrics.hbase.data.block.encoding</name>
<value>FAST_DIFF</value>
</property>
<property>
<name>timeline.metrics.hbase.fifo.compaction.enabled</name>
<value>true</value>
</property>
<property>
<name>timeline.metrics.hbase.init.check.enabled</name>
<value>true</value>
</property>
<property>
<name>timeline.metrics.host.aggregate.splitpoints</name>
<value>kafka.server.BrokerTopicMetrics.FailedFetchRequestsPerSec.meanRate</value>
</property>
<property>
<name>timeline.metrics.host.aggregator.daily.checkpointCutOffMultiplier</name>
<value>2</value>
</property>
<property>
<name>timeline.metrics.host.aggregator.daily.disabled</name>
<value>false</value>
</property>
<property>
<name>timeline.metrics.host.aggregator.daily.ttl</name>
<value>31536000</value>
</property>
<property>
<name>timeline.metrics.host.aggregator.hourly.checkpointCutOffMultiplier</name>
<value>2</value>
</property>
<property>
<name>timeline.metrics.host.aggregator.hourly.disabled</name>
<value>false</value>
</property>
<property>
<name>timeline.metrics.host.aggregator.hourly.interval</name>
<value>3600</value>
</property>
<property>
<name>timeline.metrics.host.aggregator.hourly.ttl</name>
<value>2592000</value>
</property>
<property>
<name>timeline.metrics.host.aggregator.minute.checkpointCutOffMultiplier</name>
<value>2</value>
</property>
<property>
<name>timeline.metrics.host.aggregator.minute.disabled</name>
<value>false</value>
</property>
<property>
<name>timeline.metrics.host.aggregator.minute.interval</name>
<value>300</value>
</property>
<property>
<name>timeline.metrics.host.aggregator.minute.ttl</name>
<value>604800</value>
</property>
<property>
<name>timeline.metrics.host.aggregator.ttl</name>
<value>86400</value>
</property>
<property>
<name>timeline.metrics.service.checkpointDelay</name>
<value>60</value>
</property>
<property>
<name>timeline.metrics.service.cluster.aggregator.appIds</name>
<value>datanode,nodemanager,hbase</value>
</property>
<property>
<name>timeline.metrics.service.default.result.limit</name>
<value>15840</value>
</property>
<property>
<name>timeline.metrics.service.handler.thread.count</name>
<value>20</value>
</property>
<property>
<name>timeline.metrics.service.http.policy</name>
<value>HTTP_ONLY</value>
</property>
<property>
<name>timeline.metrics.service.operation.mode</name>
<value>distributed</value>
</property>
<property>
<name>timeline.metrics.service.resultset.fetchSize</name>
<value>2000</value>
</property>
<property>
<name>timeline.metrics.service.rpc.address</name>
<value>0.0.0.0:60200</value>
</property>
<property>
<name>timeline.metrics.service.use.groupBy.aggregators</name>
<value>true</value>
</property>
<property>
<name>timeline.metrics.service.watcher.delay</name>
<value>30</value>
</property>
<property>
<name>timeline.metrics.service.watcher.disabled</name>
<value>true</value>
</property>
<property>
<name>timeline.metrics.service.watcher.initial.delay</name>
<value>600</value>
</property>
<property>
<name>timeline.metrics.service.watcher.timeout</name>
<value>30</value>
</property>
<property>
<name>timeline.metrics.service.webapp.address</name>
<value>hdp-nn2.hostname:6188</value>
</property>
<property>
<name>timeline.metrics.sink.collection.period</name>
<value>10</value>
</property>
<property>
<name>timeline.metrics.sink.report.interval</name>
<value>60</value>
</property>
</configuration>
cat /etc/ams-hbase/conf/hbase-site.xml
<configuration>
<property>
<name>dfs.block.access.token.enable</name>
<value>true</value>
</property>
<property>
<name>dfs.blockreport.initialDelay</name>
<value>120</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.prodcluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>
<property>
<name>dfs.client.read.shortcircuit.streams.cache.size</name>
<value>4096</value>
</property>
<property>
<name>dfs.client.retry.policy.enabled</name>
<value>false</value>
</property>
<property>
<name>dfs.cluster.administrators</name>
<value> hdfs</value>
</property>
<property>
<name>dfs.content-summary.limit</name>
<value>5000</value>
</property>
<property>
<name>dfs.datanode.address</name>
<value>0.0.0.0:50010</value>
</property>
<property>
<name>dfs.datanode.balance.bandwidthPerSec</name>
<value>6250000</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/hdfs/hadoop/hdfs/data</value>
<final>true</final>
</property>
<property>
<name>dfs.datanode.data.dir.perm</name>
<value>750</value>
</property>
<property>
<name>dfs.datanode.du.reserved</name>
<value>65906998272</value>
</property>
<property>
<name>dfs.datanode.failed.volumes.tolerated</name>
<value>0</value>
<final>true</final>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:50075</value>
</property>
<property>
<name>dfs.datanode.https.address</name>
<value>0.0.0.0:50475</value>
</property>
<property>
<name>dfs.datanode.ipc.address</name>
<value>0.0.0.0:8010</value>
</property>
<property>
<name>dfs.datanode.max.transfer.threads</name>
<value>16384</value>
</property>
<property>
<name>dfs.domain.socket.path</name>
<value>/var/lib/hadoop-hdfs/dn_socket</value>
</property>
<property>
<name>dfs.encrypt.data.transfer.cipher.suites</name>
<value>AES/CTR/NoPadding</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>shell(/bin/true)</value>
</property>
<property>
<name>dfs.ha.namenodes.prodcluster</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.heartbeat.interval</name>
<value>3</value>
</property>
<property>
<name>dfs.hosts.exclude</name>
<value>/etc/hadoop/conf/dfs.exclude</value>
</property>
<property>
<name>dfs.http.policy</name>
<value>HTTP_ONLY</value>
</property>
<property>
<name>dfs.https.port</name>
<value>50470</value>
</property>
<property>
<name>dfs.internal.nameservices</name>
<value>prodcluster</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/hadoop/hdfs/journal</value>
</property>
<property>
<name>dfs.journalnode.http-address</name>
<value>0.0.0.0:8480</value>
</property>
<property>
<name>dfs.journalnode.https-address</name>
<value>0.0.0.0:8481</value>
</property>
<property>
<name>dfs.namenode.accesstime.precision</name>
<value>0</value>
</property>
<property>
<name>dfs.namenode.audit.log.async</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.avoid.read.stale.datanode</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.avoid.write.stale.datanode</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>/hdfs/hadoop/hdfs/namesecondary</value>
</property>
<property>
<name>dfs.namenode.checkpoint.edits.dir</name>
<value>${dfs.namenode.checkpoint.dir}</value>
</property>
<property>
<name>dfs.namenode.checkpoint.period</name>
<value>21600</value>
</property>
<property>
<name>dfs.namenode.checkpoint.txns</name>
<value>1000000</value>
</property>
<property>
<name>dfs.namenode.fslock.fair</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>600</value>
</property>
<property>
<name>dfs.namenode.http-address.prodcluster.nn1</name>
<value>hdp-nn1.hostname:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.prodcluster.nn2</name>
<value>hdp-nn2.hostname:50070</value>
</property>
<property>
<name>dfs.namenode.https-address.prodcluster.nn1</name>
<value>hdp-nn1.hostname:50470</value>
</property>
<property>
<name>dfs.namenode.https-address.prodcluster.nn2</name>
<value>hdp-nn2.hostname:50470</value>
</property>
<property>
<name>dfs.namenode.inode.attributes.provider.class</name>
<value>org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/hdfs/hadoop/hdfs/namenode</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.name.dir.restore</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.rpc-address.prodcluster.nn1</name>
<value>hdp-nn1.hostname:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.prodcluster.nn2</name>
<value>hdp-nn2.hostname:8020</value>
</property>
<property>
<name>dfs.namenode.safemode.threshold-pct</name>
<value>0.99</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hdp-dn3.hostname:8485;hdp-nn1.hostname:8485;hdp-nn2.hostname:8485/prodcluster</value>
</property>
<property>
<name>dfs.namenode.stale.datanode.interval</name>
<value>30000</value>
</property>
<property>
<name>dfs.namenode.startup.delay.block.deletion.sec</name>
<value>3600</value>
</property>
<property>
<name>dfs.namenode.write.stale.datanode.ratio</name>
<value>1.0f</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>prodcluster</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions.superusergroup</name>
<value>hdfs</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.replication.max</name>
<value>50</value>
</property>
<property>
<name>dfs.support.append</name>
<value>true</value>
<final>true</final>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
<final>true</final>
</property>
<property>
<name>fs.permissions.umask-mode</name>
<value>077</value>
</property>
<property>
<name>nfs.exports.allowed.hosts</name>
<value>* rw</value>
</property>
<property>
<name>nfs.file.dump.dir</name>
<value>/tmp/.hdfs-nfs</value>
</property>
</configuration>
cat /etc/ams-hbase/conf/hbase-env.sh
# Set environment variables here.
# The java implementation to use. Java 1.6+ required.
export JAVA_HOME=/usr/jdk64/jdk1.8.0_77
# HBase Configuration directory
export HBASE_CONF_DIR=${HBASE_CONF_DIR:-/etc/ams-hbase/conf}
# Extra Java CLASSPATH elements. Optional.
additional_cp=
if [ -n "$additional_cp" ];
then
export HBASE_CLASSPATH=${HBASE_CLASSPATH}:$additional_cp
else
export HBASE_CLASSPATH=${HBASE_CLASSPATH}
fi
# The maximum amount of heap to use for hbase shell.
export HBASE_SHELL_OPTS="-Xmx256m"
# Extra Java runtime options.
# Below are what we set by default. May only work with SUN JVM.
# For more on why as well as other possible settings,
# see http://wiki.apache.org/hadoop/PerformanceTuning
export HBASE_OPTS="-XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/ambari-metrics-collector/hs_err_pid%p.log -Djava.io.tmpdir=/var/lib/ambari-metrics-collector/hbase-tmp"
export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/ambari-metrics-collector/gc.log-`date +'%Y%m%d%H%M'`"
# Uncomment below to enable java garbage collection logging.
# export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:$HBASE_HOME/logs/gc-hbase.log"
# Uncomment and adjust to enable JMX exporting
# See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access.
# More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html
#
# export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"
export HBASE_MASTER_OPTS=" -Xms512m -Xmx512m -Xmn102m -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly"
export HBASE_REGIONSERVER_OPTS=" -Xmn128m -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms896m -Xmx896m"
# export HBASE_THRIFT_OPTS="$HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104"
# File naming hosts on which HRegionServers will run. $HBASE_HOME/conf/regionservers by default.
export HBASE_REGIONSERVERS=${HBASE_CONF_DIR}/regionservers
# Extra ssh options. Empty by default.
# export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"
# Where log files are stored. $HBASE_HOME/logs by default.
export HBASE_LOG_DIR=/var/log/ambari-metrics-collector
# A string representing this instance of hbase. $USER by default.
# export HBASE_IDENT_STRING=$USER
# The scheduling priority for daemon processes. See 'man nice'.
# export HBASE_NICENESS=10
# The directory where pid files are stored. /tmp by default.
export HBASE_PID_DIR=/var/run/ambari-metrics-collector/
# Seconds to sleep between slave commands. Unset by default. This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HBASE_SLAVE_SLEEP=0.1
# Tell HBase whether it should manage it's own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=false
# use embedded native libs
_HADOOP_NATIVE_LIB="/usr/lib/ams-hbase/lib/hadoop-native/"
export HBASE_OPTS="$HBASE_OPTS -Djava.library.path=${_HADOOP_NATIVE_LIB}"
# Unset HADOOP_HOME to avoid importing HADOOP installed cluster related configs like: /usr/hdp/2.2.0.0-2041/hadoop/conf/
export HADOOP_HOME=/usr/lib/ams-hbase/
# Explicitly Setting HBASE_HOME for AMS HBase so that there is no conflict
export HBASE_HOME=/usr/lib/ams-hbase/
rpm -qa | grep ambari
ambari-metrics-collector-2.4.0.1-1.x86_64
ambari-metrics-hadoop-sink-2.4.0.1-1.x86_64
ambari-agent-2.4.0.1-1.x86_64
ambari-infra-solr-client-2.4.0.1-1.x86_64
ambari-logsearch-logfeeder-2.4.0.1-1.x86_64
ambari-metrics-monitor-2.4.0.1-1.x86_64
ambari-metrics-grafana-2.4.0.1-1.x86_64
ambari-infra-solr-2.4.0.1-1.x86_64
... View more
12-23-2016
08:50 AM
@Aravindan Vijayan I did this: 1) Turn on Maintenance mode
2) Stop Ambari Metrics
3) hadoop fs -rmr /ams/hbase/*
4) rm -rf /var/lib/ambari-metrics-collector/hbase-tmp/*
5)
[zk: localhost:2181(CONNECTED) 0] ls /
[registry, controller, brokers, storm, zookeeper, infra-solr,
hiveserver2-hive2, hbase-unsecure, yarn-leader-election, tracers, hadoop-ha,
admin, isr_change_notification, services, templeton-hadoop, accumulo,
controller_epoch, hiveserver2, llap-unsecure, rmstore, ranger_audits,
consumers, config, ams-hbase-unsecure]
[zk: localhost:2181(CONNECTED) 1] rmr /ams-hbase-unsecure
[zk: localhost:2181(CONNECTED) 2] ls /
[registry, controller, brokers, storm, zookeeper, infra-solr,
hiveserver2-hive2, hbase-unsecure, yarn-leader-election, tracers, hadoop-ha,
admin, isr_change_notification, services, templeton-hadoop, accumulo,
controller_epoch, hiveserver2, llap-unsecure, rmstore, ranger_audits,
consumers, config]
6) Start Ambari Metrics
7) Turn off Maintenance mode After about 15 minutes I got this log: 2016-12-23 11:35:08,673 ERROR org.mortbay.log: /ws/v1/timeline/metrics/
javax.ws.rs.WebApplicationException: javax.xml.bind.MarshalException
- with linked exception:
[org.mortbay.jetty.EofException]
at com.sun.jersey.core.provider.jaxb.AbstractRootElementProvider.writeTo(AbstractRootElementProvider.java:159)
at com.sun.jersey.spi.container.ContainerResponse.write(ContainerResponse.java:306)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1437)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:895)
at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:843)
at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:804)
at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1294)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: javax.xml.bind.MarshalException
- with linked exception:
[org.mortbay.jetty.EofException]
at com.sun.xml.bind.v2.runtime.MarshallerImpl.write(MarshallerImpl.java:325)
at com.sun.xml.bind.v2.runtime.MarshallerImpl.marshal(MarshallerImpl.java:249)
at javax.xml.bind.helpers.AbstractMarshallerImpl.marshal(AbstractMarshallerImpl.java:95)
at com.sun.jersey.core.provider.jaxb.AbstractRootElementProvider.writeTo(AbstractRootElementProvider.java:179)
at com.sun.jersey.core.provider.jaxb.AbstractRootElementProvider.writeTo(AbstractRootElementProvider.java:157)
... 37 more
Caused by: org.mortbay.jetty.EofException
at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:634)
at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:580)
at com.sun.jersey.spi.container.servlet.WebComponent$Writer.write(WebComponent.java:307)
at com.sun.jersey.spi.container.ContainerResponse$CommittingOutputStream.write(ContainerResponse.java:134)
at com.sun.xml.bind.v2.runtime.output.UTF8XmlOutput.flushBuffer(UTF8XmlOutput.java:416)
at com.sun.xml.bind.v2.runtime.output.UTF8XmlOutput.endDocument(UTF8XmlOutput.java:141)
at com.sun.xml.bind.v2.runtime.XMLSerializer.endDocument(XMLSerializer.java:856)
at com.sun.xml.bind.v2.runtime.MarshallerImpl.postwrite(MarshallerImpl.java:374)
at com.sun.xml.bind.v2.runtime.MarshallerImpl.write(MarshallerImpl.java:321)
... 41 more
2016-12-23 11:35:09,796 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.PhoenixHBaseAccessor: Saved 8606 metadata records.
2016-12-23 11:35:09,843 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.PhoenixHBaseAccessor: Saved 7 hosted apps metadata records.
2016-12-23 11:35:25,123 INFO TimelineClusterAggregatorMinute: Started Timeline aggregator thread @ Fri Dec 23 11:35:25 MSK 2016
2016-12-23 11:35:25,124 INFO TimelineClusterAggregatorMinute: Last Checkpoint read : Fri Dec 23 11:30:00 MSK 2016
2016-12-23 11:35:25,124 INFO TimelineClusterAggregatorMinute: Rounded off checkpoint : Fri Dec 23 11:30:00 MSK 2016
2016-12-23 11:35:25,124 INFO TimelineClusterAggregatorMinute: Last check point time: 1482481800000, lagBy: 325 seconds.
2016-12-23 11:35:25,124 INFO TimelineClusterAggregatorMinute: Start aggregation cycle @ Fri Dec 23 11:35:25 MSK 2016, startTime = Fri Dec 23 11:30:00 MSK 2016, endTime = Fri Dec 23 11:35:00 MSK 2016
2016-12-23 11:35:25,143 INFO TimelineClusterAggregatorMinute: 0 row(s) updated.
2016-12-23 11:35:25,143 INFO TimelineClusterAggregatorMinute: Aggregated cluster metrics for METRIC_AGGREGATE_MINUTE, with startTime = Fri Dec 23 11:30:00 MSK 2016, endTime = Fri Dec 23 11:35:00 MSK 2016
2016-12-23 11:35:25,143 INFO TimelineClusterAggregatorMinute: End aggregation cycle @ Fri Dec 23 11:35:25 MSK 2016
2016-12-23 11:35:25,143 INFO TimelineClusterAggregatorMinute: End aggregation cycle @ Fri Dec 23 11:35:25 MSK 2016
2016-12-23 11:35:25,152 INFO TimelineMetricHostAggregatorMinute: Started Timeline aggregator thread @ Fri Dec 23 11:35:25 MSK 2016
2016-12-23 11:35:25,153 INFO TimelineMetricHostAggregatorMinute: Last Checkpoint read : Fri Dec 23 11:30:00 MSK 2016
2016-12-23 11:35:25,153 INFO TimelineMetricHostAggregatorMinute: Rounded off checkpoint : Fri Dec 23 11:30:00 MSK 2016
2016-12-23 11:35:25,153 INFO TimelineMetricHostAggregatorMinute: Last check point time: 1482481800000, lagBy: 325 seconds.
2016-12-23 11:35:25,153 INFO TimelineMetricHostAggregatorMinute: Start aggregation cycle @ Fri Dec 23 11:35:25 MSK 2016, startTime = Fri Dec 23 11:30:00 MSK 2016, endTime = Fri Dec 23 11:35:00 MSK 2016
2016-12-23 11:35:25,907 INFO TimelineMetricHostAggregatorMinute: 0 row(s) updated.
2016-12-23 11:35:25,907 INFO TimelineMetricHostAggregatorMinute: Aggregated host metrics for METRIC_RECORD_MINUTE, with startTime = Fri Dec 23 11:30:00 MSK 2016, endTime = Fri Dec 23 11:35:00 MSK 2016
2016-12-23 11:35:25,907 INFO TimelineMetricHostAggregatorMinute: End aggregation cycle @ Fri Dec 23 11:35:25 MSK 2016
2016-12-23 11:35:25,907 INFO TimelineMetricHostAggregatorMinute: End aggregation cycle @ Fri Dec 23 11:35:25 MSK 2016
2016-12-23 11:40:26,448 INFO TimelineMetricHostAggregatorMinute: Started Timeline aggregator thread @ Fri Dec 23 11:40:26 MSK 2016
2016-12-23 11:40:26,448 INFO TimelineClusterAggregatorMinute: Started Timeline aggregator thread @ Fri Dec 23 11:40:26 MSK 2016
2016-12-23 11:40:26,449 INFO TimelineMetricHostAggregatorMinute: Last Checkpoint read : Fri Dec 23 11:35:00 MSK 2016
2016-12-23 11:40:26,449 INFO TimelineMetricHostAggregatorMinute: Rounded off checkpoint : Fri Dec 23 11:35:00 MSK 2016
2016-12-23 11:40:26,449 INFO TimelineMetricHostAggregatorMinute: Last check point time: 1482482100000, lagBy: 326 seconds.
2016-12-23 11:40:26,449 INFO TimelineClusterAggregatorMinute: Last Checkpoint read : Fri Dec 23 11:35:00 MSK 2016
2016-12-23 11:40:26,449 INFO TimelineMetricHostAggregatorMinute: Start aggregation cycle @ Fri Dec 23 11:40:26 MSK 2016, startTime = Fri Dec 23 11:35:00 MSK 2016, endTime = Fri Dec 23 11:40:00 MSK 2016
2016-12-23 11:40:26,450 INFO TimelineClusterAggregatorMinute: Rounded off checkpoint : Fri Dec 23 11:35:00 MSK 2016
2016-12-23 11:40:26,450 INFO TimelineClusterAggregatorMinute: Last check point time: 1482482100000, lagBy: 326 seconds.
2016-12-23 11:40:26,450 INFO TimelineClusterAggregatorMinute: Start aggregation cycle @ Fri Dec 23 11:40:26 MSK 2016, startTime = Fri Dec 23 11:35:00 MSK 2016, endTime = Fri Dec 23 11:40:00 MSK 2016
2016-12-23 11:40:26,464 INFO TimelineClusterAggregatorMinute: 0 row(s) updated.
2016-12-23 11:40:26,464 INFO TimelineClusterAggregatorMinute: Aggregated cluster metrics for METRIC_AGGREGATE_MINUTE, with startTime = Fri Dec 23 11:35:00 MSK 2016, endTime = Fri Dec 23 11:40:00 MSK 2016
2016-12-23 11:40:26,465 INFO TimelineClusterAggregatorMinute: End aggregation cycle @ Fri Dec 23 11:40:26 MSK 2016
2016-12-23 11:40:26,465 INFO TimelineClusterAggregatorMinute: End aggregation cycle @ Fri Dec 23 11:40:26 MSK 2016
2016-12-23 11:40:46,839 INFO org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 25694 actions to finish
2016-12-23 11:40:46,899 INFO TimelineMetricHostAggregatorMinute: 22847 row(s) updated.
2016-12-23 11:40:46,899 INFO TimelineMetricHostAggregatorMinute: Aggregated host metrics for METRIC_RECORD_MINUTE, with startTime = Fri Dec 23 11:35:00 MSK 2016, endTime = Fri Dec 23 11:40:00 MSK 2016
2016-12-23 11:40:46,899 INFO TimelineMetricHostAggregatorMinute: End aggregation cycle @ Fri Dec 23 11:40:46 MSK 2016
2016-12-23 11:40:46,899 INFO TimelineMetricHostAggregatorMinute: End aggregation cycle @ Fri Dec 23 11:40:46 MSK 2016
2016-12-23 11:41:40,503 INFO org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 8014 actions to finish
What to do next?
... View more
12-23-2016
06:10 AM
Hi @Aravindan Vijayan I have 7 nodes (2 nn + 5 dn). Here is info: rpm -qa | grep ambari
ambari-metrics-collector-2.4.0.1-1.x86_64
ambari-metrics-hadoop-sink-2.4.0.1-1.x86_64
ambari-agent-2.4.0.1-1.x86_64
ambari-infra-solr-client-2.4.0.1-1.x86_64
ambari-logsearch-logfeeder-2.4.0.1-1.x86_64
ambari-metrics-monitor-2.4.0.1-1.x86_64
ambari-metrics-grafana-2.4.0.1-1.x86_64
ambari-infra-solr-2.4.0.1-1.x86_64
cat /etc/ambari-metrics-collector/conf/ams-env.sh
# Set environment variables here.
# The java implementation to use. Java 1.6 required.
export JAVA_HOME=/usr/jdk64/jdk1.8.0_77
# Collector Log directory for log4j
export AMS_COLLECTOR_LOG_DIR=/var/log/ambari-metrics-collector
# Monitor Log directory for outfile
export AMS_MONITOR_LOG_DIR=/var/log/ambari-metrics-monitor
# Collector pid directory
export AMS_COLLECTOR_PID_DIR=/var/run/ambari-metrics-collector
# Monitor pid directory
export AMS_MONITOR_PID_DIR=/var/run/ambari-metrics-monitor
# AMS HBase pid directory
export AMS_HBASE_PID_DIR=/var/run/ambari-metrics-collector/
# AMS Collector heapsize
export AMS_COLLECTOR_HEAPSIZE=1024m
# HBase normalizer enabled
export AMS_HBASE_NORMALIZER_ENABLED=False
# HBase compaction policy enabled
export AMS_HBASE_FIFO_COMPACTION_ENABLED=True
# HBase Tables Initialization check enabled
export AMS_HBASE_INIT_CHECK_ENABLED=True
# AMS Collector options
export AMS_COLLECTOR_OPTS="-Djava.library.path=/usr/lib/ams-hbase/lib/hadoop-native"
# AMS Collector GC options
export AMS_COLLECTOR_GC_OPTS="-XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/ambari-metrics-collector/collector-gc.log-`date +'%Y%m%d%H%M'`"
export AMS_COLLECTOR_OPTS="$AMS_COLLECTOR_OPTS $AMS_COLLECTOR_GC_OPTS"
cat /etc/ams-hbase/conf/hbase-env.sh
# Set environment variables here.
# The java implementation to use. Java 1.6+ required.
export JAVA_HOME=/usr/jdk64/jdk1.8.0_77
# HBase Configuration directory
export HBASE_CONF_DIR=${HBASE_CONF_DIR:-/etc/ams-hbase/conf}
# Extra Java CLASSPATH elements. Optional.
additional_cp=
if [ -n "$additional_cp" ];
then
export HBASE_CLASSPATH=${HBASE_CLASSPATH}:$additional_cp
else
export HBASE_CLASSPATH=${HBASE_CLASSPATH}
fi
# The maximum amount of heap to use for hbase shell.
export HBASE_SHELL_OPTS="-Xmx256m"
# Extra Java runtime options.
# Below are what we set by default. May only work with SUN JVM.
# For more on why as well as other possible settings,
# see http://wiki.apache.org/hadoop/PerformanceTuning
export HBASE_OPTS="-XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/ambari-metrics-collector/hs_err_pid%p.log -Djava.io.tmpdir=/var/lib/ambari-metrics-collector/hbase-tmp"
export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/ambari-metrics-collector/gc.log-`date +'%Y%m%d%H%M'`"
# Uncomment below to enable java garbage collection logging.
# export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:$HBASE_HOME/logs/gc-hbase.log"
# Uncomment and adjust to enable JMX exporting
# See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access.
# More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html
#
# export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"
export HBASE_MASTER_OPTS=" -Xms512m -Xmx512m -Xmn102m -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly"
export HBASE_REGIONSERVER_OPTS=" -Xmn128m -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms896m -Xmx896m"
# export HBASE_THRIFT_OPTS="$HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104"
# File naming hosts on which HRegionServers will run. $HBASE_HOME/conf/regionservers by default.
export HBASE_REGIONSERVERS=${HBASE_CONF_DIR}/regionservers
# Extra ssh options. Empty by default.
# export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"
# Where log files are stored. $HBASE_HOME/logs by default.
export HBASE_LOG_DIR=/var/log/ambari-metrics-collector
# A string representing this instance of hbase. $USER by default.
# export HBASE_IDENT_STRING=$USER
# The scheduling priority for daemon processes. See 'man nice'.
# export HBASE_NICENESS=10
# The directory where pid files are stored. /tmp by default.
export HBASE_PID_DIR=/var/run/ambari-metrics-collector/
# Seconds to sleep between slave commands. Unset by default. This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HBASE_SLAVE_SLEEP=0.1
# Tell HBase whether it should manage it's own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=false
# use embedded native libs
_HADOOP_NATIVE_LIB="/usr/lib/ams-hbase/lib/hadoop-native/"
export HBASE_OPTS="$HBASE_OPTS -Djava.library.path=${_HADOOP_NATIVE_LIB}"
# Unset HADOOP_HOME to avoid importing HADOOP installed cluster related configs like: /usr/hdp/2.2.0.0-2041/hadoop/conf/
export HADOOP_HOME=/usr/lib/ams-hbase/
# Explicitly Setting HBASE_HOME for AMS HBase so that there is no conflict
export HBASE_HOME=/usr/lib/ams-hbase/
... View more
12-23-2016
06:02 AM
Hi @Rahul Pathak I have tried, but without success(( Here ambari-metrics-collector.log 2016-12-23 08:49:16,046 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping phoenix metrics system...
2016-12-23 08:49:16,047 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: phoenix metrics system stopped.
2016-12-23 08:49:16,048 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: phoenix metrics system shutdown complete.
2016-12-23 08:49:16,048 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl: Stopping ApplicationHistory
2016-12-23 08:49:16,048 FATAL org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: Error starting ApplicationHistoryServer
org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.MetricsSystemInitializationException: Error creating Metrics Schema in HBase using Phoenix.
at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.PhoenixHBaseAccessor.initMetricSchema(PhoenixHBaseAccessor.java:470)
at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore.initializeSubsystem(HBaseTimelineMetricStore.java:94)
at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore.serviceInit(HBaseTimelineMetricStore.java:86)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:84)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:137)
at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:147)
Caused by: org.apache.phoenix.exception.PhoenixIOException: SYSTEM.CATALOG
at org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:111)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1292)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1257)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.createTable(ConnectionQueryServicesImpl.java:1453)
at org.apache.phoenix.schema.MetaDataClient.createTableInternal(MetaDataClient.java:2180)
at org.apache.phoenix.schema.MetaDataClient.createTable(MetaDataClient.java:865)
at org.apache.phoenix.compile.CreateTableCompiler$2.execute(CreateTableCompiler.java:194)
at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:343)
at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:331)
at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:329)
at org.apache.phoenix.jdbc.PhoenixStatement.executeUpdate(PhoenixStatement.java:1421)
at org.apache.phoenix.query.ConnectionQueryServicesImpl$13.call(ConnectionQueryServicesImpl.java:2378)
at org.apache.phoenix.query.ConnectionQueryServicesImpl$13.call(ConnectionQueryServicesImpl.java:2327)
at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:78)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:2327)
at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:233)
at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.createConnection(PhoenixEmbeddedDriver.java:142)
at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:202)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:270)
at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.query.DefaultPhoenixDataSource.getConnection(DefaultPhoenixDataSource.java:82)
at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.PhoenixHBaseAccessor.getConnection(PhoenixHBaseAccessor.java:376)
at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.PhoenixHBaseAccessor.getConnectionRetryingOnException(PhoenixHBaseAccessor.java:354)
at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.PhoenixHBaseAccessor.initMetricSchema(PhoenixHBaseAccessor.java:398)
... 8 more
Caused by: org.apache.hadoop.hbase.TableNotFoundException: SYSTEM.CATALOG
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1264)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1162)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1146)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1103)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:938)
at org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:83)
at org.apache.hadoop.hbase.client.HTable.getRegionLocation(HTable.java:504)
at org.apache.hadoop.hbase.client.HTable.getKeysAndRegionsInRange(HTable.java:720)
at org.apache.hadoop.hbase.client.HTable.getKeysAndRegionsInRange(HTable.java:690)
at org.apache.hadoop.hbase.client.HTable.getStartKeysInRange(HTable.java:1757)
at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1712)
at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1692)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1275)
... 31 more
2016-12-23 08:49:16,052 INFO org.apache.hadoop.util.ExitUtil: Exiting with status -1
2016-12-23 08:49:16,069 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down ApplicationHistoryServer at hdp-nn2.hostname/10.255.242.181
************************************************************/
2016-12-23 08:49:16,115 WARN org.apache.hadoop.hbase.io.util.HeapMemorySizeUtil: hbase.regionserver.global.memstore.upperLimit is deprecated by hbase.regionserver.global.memstore.size
... View more
12-22-2016
09:14 AM
1 Kudo
Ambari Metrics works intermittently. It works a few minutes and then stops to show pictures. Sometimes, the Metric Collector just stops, after a manual start it's working but in a few minutes it stops again. What is going wrong? screen-ams1.png
My settings HDP 2.5
Ambari 2.4.0.1
No Kerberos
iptables off hbase.zookeeper.property.tickTime = 6000
Metrics Service operation mode = distributed
hbase.cluster.distributed = true
hbase.zookeeper.property.clientPort = 2181
hbase.rootdir=hdfs://prodcluster/ams/hbase Logs ambari-metrics-collector.log
2016-12-22 11:59:23,030 ERROR org.mortbay.log: /ws/v1/timeline/metrics
javax.ws.rs.WebApplicationException: javax.xml.bind.MarshalException
- with linked exception:
[org.mortbay.jetty.EofException]
at com.sun.jersey.core.provider.jaxb.AbstractRootElementProvider.writeTo(AbstractRootElementProvider.java:159)
at com.sun.jersey.spi.container.ContainerResponse.write(ContainerResponse.java:306)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1437)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:895)
at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:843)
at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:804)
at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1294)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: javax.xml.bind.MarshalException
- with linked exception:
[org.mortbay.jetty.EofException]
at com.sun.xml.bind.v2.runtime.MarshallerImpl.write(MarshallerImpl.java:325)
at com.sun.xml.bind.v2.runtime.MarshallerImpl.marshal(MarshallerImpl.java:249)
at javax.xml.bind.helpers.AbstractMarshallerImpl.marshal(AbstractMarshallerImpl.java:95)
at com.sun.jersey.core.provider.jaxb.AbstractRootElementProvider.writeTo(AbstractRootElementProvider.java:179)
at com.sun.jersey.core.provider.jaxb.AbstractRootElementProvider.writeTo(AbstractRootElementProvider.java:157)
... 37 more
Caused by: org.mortbay.jetty.EofException
at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:634)
at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:580)
at com.sun.jersey.spi.container.servlet.WebComponent$Writer.write(WebComponent.java:307)
at com.sun.jersey.spi.container.ContainerResponse$CommittingOutputStream.write(ContainerResponse.java:134)
at com.sun.xml.bind.v2.runtime.output.UTF8XmlOutput.flushBuffer(UTF8XmlOutput.java:416)
at com.sun.xml.bind.v2.runtime.output.UTF8XmlOutput.endDocument(UTF8XmlOutput.java:141)
at com.sun.xml.bind.v2.runtime.XMLSerializer.endDocument(XMLSerializer.java:856)
at com.sun.xml.bind.v2.runtime.MarshallerImpl.postwrite(MarshallerImpl.java:374)
at com.sun.xml.bind.v2.runtime.MarshallerImpl.write(MarshallerImpl.java:321)
... 41 more
2016-12-22 12:00:05,549 INFO TimelineClusterAggregatorMinute: 0 row(s) updated.
2016-12-22 12:00:19,105 INFO TimelineClusterAggregatorMinute: Aggregated cluster metrics for METRIC_AGGREGATE_MINUTE, with startTime = Thu Dec 22 11:50:00 MSK 2016, endTime = Thu Dec 22 11:55:00 MSK 2016
2016-12-22 12:00:19,111 INFO TimelineClusterAggregatorMinute: End aggregation cycle @ Thu Dec 22 12:00:19 MSK 2016
2016-12-22 12:00:19,111 INFO TimelineClusterAggregatorMinute: End aggregation cycle @ Thu Dec 22 12:00:19 MSK 2016
2016-12-22 12:00:24,077 INFO TimelineClusterAggregatorMinute: Started Timeline aggregator thread @ Thu Dec 22 12:00:24 MSK 2016
2016-12-22 12:00:24,083 INFO TimelineClusterAggregatorMinute: Last Checkpoint read : Thu Dec 22 11:55:00 MSK 2016
2016-12-22 12:00:24,083 INFO TimelineClusterAggregatorMinute: Rounded off checkpoint : Thu Dec 22 11:55:00 MSK 2016
2016-12-22 12:00:24,084 INFO TimelineClusterAggregatorMinute: Last check point time: 1482396900000, lagBy: 324 seconds.
2016-12-22 12:00:24,085 INFO TimelineClusterAggregatorMinute: Start aggregation cycle @ Thu Dec 22 12:00:24 MSK 2016, startTime = Thu Dec 22 11:55:00 MSK 2016, endTime = Thu Dec 22 12:00:00 MSK 2016
hbase-ams-master-hdp-nn2.hostname.log
2016-12-22 11:22:46,455 INFO [hdp-nn2.hostname,61300,1482394421267_ChoreService_1] zookeeper.ZooKeeper: Session: 0x3570f2523bb3db4 closed
2016-12-22 11:22:46,455 INFO [hdp-nn2.hostname,61300,1482394421267_ChoreService_1-EventThread] zookeeper.ClientCnxn: EventThread shut down
2016-12-22 11:23:31,207 INFO [timeline] timeline.HadoopTimelineMetricsSink: Unable to connect to collector, http://hdp-nn2.hostname:6188/ws/v1/timeline/metrics
This exceptions will be ignored for next 100 times
2016-12-22 11:23:31,208 WARN [timeline] timeline.HadoopTimelineMetricsSink: Unable to send metrics to collector by address:http://hdp-nn2.hostname:6188/ws/v1/timeline/metrics
2016-12-22 11:23:41,484 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=159.41 KB, freeSize=150.39 MB, max=150.54 MB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=59, evicted=0, evictedPerRun=0.0
2016-12-22 11:23:46,460 INFO [hdp-nn2.hostname,61300,1482394421267_ChoreService_1] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x12f8ba29 connecting to ZooKeeper ensemble=hdp-nn1.hostname:2181,hdp-dn1.hostname:2181,hdp-nn2.hostname:2181
2016-12-22 11:23:46,461 INFO [hdp-nn2.hostname,61300,1482394421267_ChoreService_1] zookeeper.ZooKeeper: Initiating client connection, connectString=hdp-nn1.hostname:2181,hdp-dn1.hostname:2181,hdp-nn2.hostname:2181 sessionTimeout=120000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@20fc295f
2016-12-22 11:23:46,464 INFO [hdp-nn2.hostname,61300,1482394421267_ChoreService_1-SendThread(hdp-nn2.hostname:2181)] zookeeper.ClientCnxn: Opening socket connection to server hdp-nn2.hostname/10.255.242.181:2181. Will not attempt to authenticate using SASL (unknown error)
2016-12-22 11:23:46,466 INFO [hdp-nn2.hostname,61300,1482394421267_ChoreService_1-SendThread(hdp-nn2.hostname:2181)] zookeeper.ClientCnxn: Socket connection established to hdp-nn2.hostname/10.255.242.181:2181, initiating session
2016-12-22 11:23:46,469 INFO [hdp-nn2.hostname,61300,1482394421267_ChoreService_1-SendThread(hdp-nn2.hostname:2181)] zookeeper.ClientCnxn: Session establishment complete on server hdp-nn2.hostname/10.255.242.181:2181, sessionid = 0x3570f2523bb3db5, negotiated timeout = 40000
2016-12-22 11:23:46,495 INFO [hdp-nn2.hostname,61300,1482394421267_ChoreService_1] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x3570f2523bb3db5
2016-12-22 11:23:46,499 INFO [hdp-nn2.hostname,61300,1482394421267_ChoreService_1] zookeeper.ZooKeeper: Session: 0x3570f2523bb3db5 closed
2016-12-22 11:23:46,499 INFO [hdp-nn2.hostname,61300,1482394421267_ChoreService_1-EventThread] zookeeper.ClientCnxn: EventThread shut down
2016-12-22 11:28:41,485 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=159.41 KB, freeSize=150.39 MB, max=150.54 MB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=89, evicted=0, evictedPerRun=0.0
2016-12-22 11:29:49,178 INFO [WALProcedureStoreSyncThread] wal.WALProcedureStore: Remove log: hdfs://prodcluster/ams/hbase/MasterProcWALs/state-00000000000000000001.log
2016-12-22 11:29:49,180 INFO [WALProcedureStoreSyncThread] wal.WALProcedureStore: Removed logs: [hdfs://prodcluster/ams/hbase/MasterProcWALs/state-00000000000000000002.log]
2016-12-22 11:33:41,484 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=159.41 KB, freeSize=150.39 MB, max=150.54 MB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=119, evicted=0, evictedPerRun=0.0
2016-12-22 11:38:41,484 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=159.41 KB, freeSize=150.39 MB, max=150.54 MB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=149, evicted=0, evictedPerRun=0.0
2016-12-22 11:43:41,485 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=159.41 KB, freeSize=150.39 MB, max=150.54 MB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=179, evicted=0, evictedPerRun=0.0
2016-12-22 11:44:51,222 INFO [timeline] timeline.HadoopTimelineMetricsSink: Unable to connect to collector, http://hdp-nn2.hostname:6188/ws/v1/timeline/metrics
This exceptions will be ignored for next 100 times
2016-12-22 11:44:51,223 WARN [timeline] timeline.HadoopTimelineMetricsSink: Unable to send metrics to collector by address:http://hdp-nn2.hostname:6188/ws/v1/timeline/metrics
2016-12-22 11:48:41,484 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=159.41 KB, freeSize=150.39 MB, max=150.54 MB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=209, evicted=0, evictedPerRun=0.0
2016-12-22 11:50:51,205 INFO [timeline] timeline.HadoopTimelineMetricsSink: Unable to connect to collector, http://hdp-nn2.hostname:6188/ws/v1/timeline/metrics
This exceptions will be ignored for next 100 times
2016-12-22 11:50:51,205 WARN [timeline] timeline.HadoopTimelineMetricsSink: Unable to send metrics to collector by address:http://hdp-nn2.hostname:6188/ws/v1/timeline/metrics I removed folder /var/lib/ambari-metrics-collector/hbase-tmp and restarted AMS as recommended here https://community.hortonworks.com/articles/11805/how-to-solve-ambari-metrics-corrupted-data.html, but it did not help.
... View more
Labels:
- Labels:
-
Apache Ambari
10-22-2016
09:11 AM
got it, thanks!
... View more
10-21-2016
11:11 AM
3 Kudos
@rbiswas, @Lester Martin I tested
4 variants of partitioning on 6 queries: Daily partitions (calday=2016-10-20)
Year-month partitions (year_month=2016-10)
Year partitions (year=2016)
No partitions (but 10 files with yearly data) It was created 4 tables following @rbiswas recommendations. Here is yearly aggregate information about data. Just to give you idea about scale of data.
partition
size
records
year=2006
539.4
K
12 217
year=2007
2.8
M
75 584
year=2008
6.4
M
155 850
year=2009
9.1
M
228 247
year=2010
9.3
M
225 357
year=2011
8.5
M
196 280
year=2012
19.5
M
448 145
year=2013
113.4
M
2 494 787
year=2014
196.7
M
4 038 632
year=2015
204.3
M
4 047 002
year=2016
227.2
M
4 363 214
I run every query 5 times, cast the worst/best results and took the
average of the remaining three. The results are below: Obviously, daily partitioning
is the worst case. But it is not so clearly to the rest of the options. The
results depend on the query. In the end I decided that the yearly partitioning
in our case would be optimal. @rbiswas, thanks for the idea! @rbiswas, I have
couple of questions: 1. Given
that I have less than 10,000 records per day would it be better to set
orc.row.index.stride less than 12000? 2. In my table I have columns: Order_date string (looks '2016-10-20'),
Order_time timestamp (looks '2016-10-20 12:45:55') The
table is sorted by order_time as you recommended and has a bloom filter index. But
filter WHERE to_date(order_time) BETWEEN ... any period works 15-20% slower than WHERE order_date BETWEEN ... any period Actually
I expected that using column with bloom filter speeds up query execution. Why it did not happen?
... View more
10-18-2016
10:58 AM
@rbiswas Thank you! It's interesting idea. I'll test parititions by year (YYYY), by year and month as suggested @Lester Martin (YYYY-MM) and daily (YYYY-MM-DD). I'll share results here. By the way, what would be difference between two of your approaches? Partitions by year and compact one year in one file gives the same 10 files.
... View more
10-17-2016
01:45 PM
@Lester Martin Thank you, I keep in reserve option with monthly partition (YYYY-MM). This complicates queries. But if it's the only way I'll have to use it.
... View more
10-17-2016
11:36 AM
@jramakrishnan Thanks. I have already read these links. There is no clear answer. What would be good file format for small partitions? csv, orc, smth else? HBase as an alternative metastore is fine, but my Hive 1.2.1 still uses MySQL.
It was an idea about generate hash using the date. I would be glad if someone explains this idea in details.
... View more
10-17-2016
08:14 AM
3 Kudos
There is a lot of information how is necessary to avoid small files and a large number of partitions in Hive. But what if I can’t avoid them? I have to store a Hive-table with 10 years of history data. It contains 3710 daily partitions at present day. Every partition is really small, from 80 to 15000 records. In csv format partitions vary from 25Kb to 10Mb. In ORC format partitions vary from 10Kb to 2Mb. Though I don’t think that ORC format would be effective for that small size.
Queries to this table usually include date or period of dates, so daily partition is preferred. What would be optimal approach (in terms of performance) for such a large amount of small data?
... View more
Labels:
- Labels:
-
Apache Hive
09-28-2016
08:19 AM
2 Kudos
Finally I solved this problem by copying folder ZEPPELIN_HOME/incubator-zeppelin from dev cluster (that I installed one month ago) to production cluster. Now it works fine.
... View more
09-26-2016
03:56 PM
@pankaj singh thanks for the help. According to https://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/install/upgrade.html: I stopped Zeppelin 0.6 from Ambari, stopped Zeppelin 0.7 Copied conf folder from /etc/zeppelin/2.5.0.0-1245/0 to /home/zeppelin_price/incubator-zeppelin/conf Run Zeppelin 0.7 Nothing change. Still "Interpreter hive not found", logs the same as I posted before.
... View more
09-26-2016
11:17 AM
1 Kudo
My HDP-2.5 cluster has Zeppelin 0.6 from the box. It works fine. But when I install additional Zeppelin 0.7.0-SNAPSHOT (on the same datanode with Zeppelin 0.6) I get error "Interpreter hive not found". It's not tutorial code, I create new notebook and try run my own query. I follow this instruction https://zeppelin.apache.org/docs/0.5.5-incubating/install/yarn_install.html. Using this instruction I successfully installed Zeppelin 0.7.0 on HDP-2.4 cluster. Hadoop version 2.5.0.0-1245
Dependencies in JDBC interpreter:
org.apache.hive:hive-jdbc:2.0.1
org.apache.hadoop:hadoop-common:2.7.2
File hive-site.xml is copied from /etc/hive/conf/hive-site.xml to /home/zeppelin_price/incubator-zeppelin/conf zeppelin-env.sh export ZEPPELIN_PORT=8096
export JAVA_HOME=/usr/jdk64/jdk1.7.0_79
export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.5.0.0-1245"
export HADOOP_CONF_DIR=/etc/hadoop/conf
other variables are commented Log zeppelin-zeppelin_price-hdp-dn2.co.vectis.local.out ZEPPELIN_CLASSPATH: ::/home/zeppelin_price/incubator-zeppelin/zeppelin-server/target/lib/*:/home/zeppelin_price/incubator-zeppelin/zeppelin-zengine/target/lib/*:/home/zeppelin_price/incubator-zeppelin/zeppelin-interpreter/target/lib/*:/home/zeppelin_price/incubator-zeppelin/*::/home/zeppelin_price/incubator-zeppelin/conf:/home/zeppelin_price/incubator-zeppelin/zeppelin-interpreter/target/classes:/home/zeppelin_price/incubator-zeppelin/zeppelin-zengine/target/classes:/home/zeppelin_price/incubator-zeppelin/zeppelin-server/target/classes
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/zeppelin_price/incubator-zeppelin/zeppelin-server/target/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/zeppelin_price/incubator-zeppelin/zeppelin-server/target/lib/zeppelin-interpreter-0.7.0-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/zeppelin_price/incubator-zeppelin/zeppelin-zengine/target/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/zeppelin_price/incubator-zeppelin/zeppelin-zengine/target/lib/zeppelin-interpreter-0.7.0-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/zeppelin_price/incubator-zeppelin/zeppelin-interpreter/target/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Sep 26, 2016 1:19:29 PM com.sun.jersey.api.core.PackagesResourceConfig init
INFO: Scanning for root resource and provider classes in the packages:
org.apache.zeppelin.rest
Sep 26, 2016 1:19:29 PM com.sun.jersey.api.core.ScanningResourceConfig logClasses
INFO: Root resource classes found:
class org.apache.zeppelin.rest.ZeppelinRestApi
class org.apache.zeppelin.rest.ConfigurationsRestApi
class org.apache.zeppelin.rest.InterpreterRestApi
class org.apache.zeppelin.rest.NotebookRestApi
class org.apache.zeppelin.rest.CredentialRestApi
class org.apache.zeppelin.rest.LoginRestApi
class org.apache.zeppelin.rest.SecurityRestApi
class org.apache.zeppelin.rest.HeliumRestApi
Sep 26, 2016 1:19:29 PM com.sun.jersey.api.core.ScanningResourceConfig init
INFO: No provider classes found.
Sep 26, 2016 1:19:29 PM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.13 06/29/2012 05:14 PM'
Sep 26, 2016 1:19:30 PM com.sun.jersey.spi.inject.Errors processErrorMessages
WARNING: The following warnings have been detected with resource and/or provider classes:
WARNING: A HTTP GET method, public javax.ws.rs.core.Response org.apache.zeppelin.rest.CredentialRestApi.getCredentials(java.lang.String) throws java.io.IOException,java.lang.IllegalArgumentException, should not consume any entity.
WARNING: A sub-resource method, public javax.ws.rs.core.Response org.apache.zeppelin.rest.NotebookRestApi.createNote(java.lang.String) throws java.io.IOException, with URI template, "/", is treated as a resource method
WARNING: A sub-resource method, public javax.ws.rs.core.Response org.apache.zeppelin.rest.NotebookRestApi.getNotebookList() throws java.io.IOException, with URI template, "/", is treated as a resource method
WARNING: A HTTP GET method, public javax.ws.rs.core.Response org.apache.zeppelin.rest.InterpreterRestApi.listInterpreter(java.lang.String), should not consume any entity.
Log zeppelin-zeppelin_price-hdp-dn2.co.vectis.local.log ERROR [2016-09-26 13:21:27,987] ({qtp396501207-56} NotebookServer.java[runParagraph]:1154) - Exception from run
org.apache.zeppelin.interpreter.InterpreterException: paragraph_1474884934206_-1103344564's Interpreter hive not found
at org.apache.zeppelin.notebook.Note.run(Note.java:489)
at org.apache.zeppelin.socket.NotebookServer.runParagraph(NotebookServer.java:1152)
at org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:195)
at org.apache.zeppelin.socket.NotebookSocket.onWebSocketText(NotebookSocket.java:56)
at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextMessage(JettyListenerEventDriver.java:128)
at org.eclipse.jetty.websocket.common.message.SimpleTextMessage.messageComplete(SimpleTextMessage.java:69)
at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.appendMessage(AbstractEventDriver.java:65)
at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextFrame(JettyListenerEventDriver.java:122)
at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.incomingFrame(AbstractEventDriver.java:161)
at org.eclipse.jetty.websocket.common.WebSocketSession.incomingFrame(WebSocketSession.java:309)
at org.eclipse.jetty.websocket.common.extensions.ExtensionStack.incomingFrame(ExtensionStack.java:214)
at org.eclipse.jetty.websocket.common.Parser.notifyFrame(Parser.java:220)
at org.eclipse.jetty.websocket.common.Parser.parse(Parser.java:258)
at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.readParse(AbstractWebSocketConnection.java:632)
at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:480)
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)
... View more
Labels:
09-12-2016
09:44 AM
4 Kudos
Solved it! The problem was with the parameters: hive.llap.daemon.yarn.container.mb
llap_heap_size
Ambari sets default value of llap_heap_size about 96% of hive.llap.daemon.yarn.container.mb (when I move slider "% of Cluster Capacity"), although it should be about 80%. Manual setting the correct parameters allowed to start the HiveServer2 Interactive.
... View more
09-07-2016
06:37 PM
2 Kudos
On
fresh installed HDP-2.5 I can’t start HiveServer2 Interactive. Cluster is High Available. I tried to install HiveServer2 Interactive on both ActiveNN and StandbyNN, but with the same unsuccessful result. I didn't find any obvious exeptions in logs. Here stderr: Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_server_interactive.py", line 512, in check_llap_app_status
status = do_retries()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/decorator.py", line 55, in wrapper
return function(*args, **kwargs)
File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_server_interactive.py", line 505, in do_retries
raise Fail(status_str)
Fail: LLAP app 'llap0' current state is COMPLETE.
2016-09-07 20:37:48,705 - LLAP app 'llap0' deployment unsuccessful.
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_server_interactive.py", line 535, in <module>
HiveServerInteractive().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in execute
method(env)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 720, in restart
self.start(env, upgrade_type=upgrade_type)
File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_server_interactive.py", line 123, in start
raise Fail("Skipping START of Hive Server Interactive since LLAP app couldn't be STARTED.")
resource_management.core.exceptions.Fail: Skipping START of Hive Server Interactive since LLAP app couldn't be STARTED. sdtout too long, so
here some some excerpts: 2016-09-07 20:31:49,638 - Starting LLAP
2016-09-07 20:31:49,643 - Command: /usr/hdp/current/hive-server2-hive2/bin/hive --service llap --instances 1 --slider-am-container-mb 5120 --size 30720m --cache 0m --xmx 29696m --loglevel INFO --output /var/lib/ambari-agent/tmp/llap-slider2016-09-07_17-31-49 --args " -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:TLABSize=8m -XX:+ResizeTLAB -XX:+UseNUMA -XX:+AggressiveOpts -XX:MetaspaceSize=1024m -XX:InitiatingHeapOccupancyPercent=80 -XX:MaxGCPauseMillis=200"
2016-09-07 20:31:49,643 - checked_call['/usr/hdp/current/hive-server2-hive2/bin/hive --service llap --instances 1 --slider-am-container-mb 5120 --size 30720m --cache 0m --xmx 29696m --loglevel INFO --output /var/lib/ambari-agent/tmp/llap-slider2016-09-07_17-31-49 --args " -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:TLABSize=8m -XX:+ResizeTLAB -XX:+UseNUMA -XX:+AggressiveOpts -XX:MetaspaceSize=1024m -XX:InitiatingHeapOccupancyPercent=80 -XX:MaxGCPauseMillis=200"'] {'logoutput': True, 'user': 'hive', 'stderr': -1}
which: no hbase in (/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/var/lib/ambari-agent:/var/lib/ambari-agent)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.5.0.0-1245/hive2/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.5.0.0-1245/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
INFO cli.LlapServiceDriver: LLAP service driver invoked with arguments=--hiveconf
INFO conf.HiveConf: Found configuration file file:/etc/hive2/2.5.0.0-1245/0/conf.server/hive-site.xml
WARN conf.HiveConf: HiveConf of name hive.llap.daemon.allow.permanent.fns does not exist
WARN cli.LlapServiceDriver: Ignoring unknown llap server parameter: [hive.aux.jars.path]
WARN conf.HiveConf: HiveConf of name hive.llap.daemon.allow.permanent.fns does not exist
INFO metastore.HiveMetaStore: 0: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
INFO metastore.ObjectStore: ObjectStore, initialize called
WARN conf.HiveConf: HiveConf of name hive.llap.daemon.allow.permanent.fns does not exist
INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,Database,Type,FieldSchema,Order"
INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL
INFO metastore.ObjectStore: Initialized ObjectStore
INFO metastore.HiveMetaStore: Added admin role in metastore
INFO metastore.HiveMetaStore: Added public role in metastore
INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
INFO metastore.HiveMetaStore: 0: get_all_functions
INFO HiveMetaStore.audit: ugi=hive ip=unknown-ip-addr cmd=get_all_functions
WARN cli.LlapServiceDriver: Java versions might not match : JAVA_HOME=[/usr/jdk64/jdk1.8.0_77],process jre=[/usr/jdk64/jdk1.8.0_77/jre]
INFO cli.LlapServiceDriver: Using [/usr/jdk64/jdk1.8.0_77] for JAVA_HOME
INFO cli.LlapServiceDriver: Copied hadoop metrics2 properties file from file:/etc/hive2/2.5.0.0-1245/0/conf.server/hadoop-metrics2-llapdaemon.properties
INFO cli.LlapServiceDriver: LLAP service driver finished
Prepared /var/lib/ambari-agent/tmp/llap-slider2016-09-07_17-31-49/run.sh for running LLAP on Slider
2016-09-07 20:32:18,650 - checked_call returned (0, 'Prepared /var/lib/ambari-agent/tmp/llap-slider2016-09-07_17-31-49/run.sh for running LLAP on Slider', 'which: no hbase in (/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/var/lib/ambari-agent:/var/lib/ambari-agent)\nSLF4J: Class path contains multiple SLF4J bindings.\nSLF4J: Found binding in [jar:file:/usr/hdp/2.5.0.0-1245/hive2/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: Found binding in [jar:file:/usr/hdp/2.5.0.0-1245/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.\nSLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]\nINFO cli.LlapServiceDriver: LLAP service driver invoked with arguments=--hiveconf\nINFO conf.HiveConf: Found configuration file file:/etc/hive2/2.5.0.0-1245/0/conf.server/hive-site.xml\nWARN conf.HiveConf: HiveConf of name hive.llap.daemon.allow.permanent.fns does not exist\nWARN cli.LlapServiceDriver: Ignoring unknown llap server parameter: [hive.aux.jars.path]\nWARN conf.HiveConf: HiveConf of name hive.llap.daemon.allow.permanent.fns does not exist\nINFO metastore.HiveMetaStore: 0: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore\nINFO metastore.ObjectStore: ObjectStore, initialize called\nWARN conf.HiveConf: HiveConf of name hive.llap.daemon.allow.permanent.fns does not exist\nINFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,Database,Type,FieldSchema,Order"\nINFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL\nINFO metastore.ObjectStore: Initialized ObjectStore\nINFO metastore.HiveMetaStore: Added admin role in metastore\nINFO metastore.HiveMetaStore: Added public role in metastore\nINFO metastore.HiveMetaStore: No user is added in admin role, since config is empty\nINFO metastore.HiveMetaStore: 0: get_all_functions\nINFO HiveMetaStore.audit: ugi=hive\tip=unknown-ip-addr\tcmd=get_all_functions\t\nWARN cli.LlapServiceDriver: Java versions might not match : JAVA_HOME=[/usr/jdk64/jdk1.8.0_77],process jre=[/usr/jdk64/jdk1.8.0_77/jre]\nINFO cli.LlapServiceDriver: Using [/usr/jdk64/jdk1.8.0_77] for JAVA_HOME\nINFO cli.LlapServiceDriver: Copied hadoop metrics2 properties file from file:/etc/hive2/2.5.0.0-1245/0/conf.server/hadoop-metrics2-llapdaemon.properties\nINFO cli.LlapServiceDriver: LLAP service driver finished')
2016-09-07 20:32:18,651 - Run file path: /var/lib/ambari-agent/tmp/llap-slider2016-09-07_17-31-49/run.sh
2016-09-07 20:32:18,652 - Execute['/var/lib/ambari-agent/tmp/llap-slider2016-09-07_17-31-49/run.sh'] {'user': 'hive'}
2016-09-07 20:32:48,625 - Submitted LLAP app name : llap0
2016-09-07 20:32:48,627 - checked_call['/usr/hdp/current/hive-server2-hive2/bin/hive --service llapstatus --name llap0 --findAppTimeout 0'] {'logoutput': False, 'user': 'hive', 'stderr': -1}
2016-09-07 20:32:59,607 - checked_call returned (0, '{\n "amInfo" : {\n "appName" : "llap0",\n "appType" : "org-apache-slider",\n "appId" : "application_1473264739795_0004"\n },\n "state" : "LAUNCHING",\n "appStartTime" : 1473269567664\n}', 'which: no hbase in (/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/var/lib/ambari-agent:/var/lib/ambari-agent)\nSLF4J: Class path contains multiple SLF4J bindings.\nSLF4J: Found binding in [jar:file:/usr/hdp/2.5.0.0-1245/hive2/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: Found binding in [jar:file:/usr/hdp/2.5.0.0-1245/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.\nSLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]\nINFO cli.LlapStatusServiceDriver: LLAP status invoked with arguments = --hiveconf\nINFO conf.HiveConf: Found configuration file file:/etc/hive2/2.5.0.0-1245/0/conf.server/hive-site.xml\nWARN conf.HiveConf: HiveConf of name hive.llap.daemon.allow.permanent.fns does not exist\nINFO impl.TimelineClientImpl: Timeline service address: http://hdp-nn1.co.vectis.local:8188/ws/v1/timeline/\nINFO client.AHSProxy: Connecting to Application History server at hdp-nn1.co.vectis.local/10.255.242.180:10200\nINFO cli.LlapStatusServiceDriver: LLAP status finished')
2016-09-07 20:32:59,608 - Received 'llapstatus' command 'output' : {
"amInfo" : {
"appName" : "llap0",
"appType" : "org-apache-slider",
"appId" : "application_1473264739795_0004"
},
"state" : "LAUNCHING",
"appStartTime" : 1473269567664
}
2016-09-07 20:32:59,608 - Marker index for start of JSON data for 'llapsrtatus' comamnd : 0
2016-09-07 20:32:59,610 - LLAP app 'llap0' current state is LAUNCHING.
2016-09-07 20:32:59,611 - Will retry 19 time(s), caught exception: LLAP app 'llap0' current state is LAUNCHING.. Sleeping for 2 sec(s)
2016-09-07 20:33:01,614 - checked_call['/usr/hdp/current/hive-server2-hive2/bin/hive --service llapstatus --name llap0 --findAppTimeout 0'] {'logoutput': False, 'user': 'hive', 'stderr': -1}
2016-09-07 20:33:15,295 - checked_call returned (0, '{\n "amInfo" : {\n "appName" : "llap0",\n "appType" : "org-apache-slider",\n "appId" : "application_1473264739795_0004",\n "containerId" : "container_e12_1473264739795_0004_01_000001",\n "hostname" : "hdp-dn2.co.vectis.local",\n "amWebUrl" : "http://hdp-dn2.co.vectis.local:40485/"\n },\n "state" : "LAUNCHING",\n "originalConfigurationPath" : "hdfs://prodcluster/user/hive/.slider/cluster/llap0/snapshot",\n "generatedConfigurationPath" : "hdfs://prodcluster/user/hive/.slider/cluster/llap0/generated",\n "desiredInstances" : 1,\n "liveInstances" : 0,\n "appStartTime" : 1473269583908\n}', 'which: no hbase in (/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/var/lib/ambari-agent:/var/lib/ambari-agent)\nSLF4J: Class path contains multiple SLF4J bindings.\nSLF4J: Found binding in [jar:file:/usr/hdp/2.5.0.0-1245/hive2/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: Found binding in [jar:file:/usr/hdp/2.5.0.0-1245/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.\nSLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]\nINFO cli.LlapStatusServiceDriver: LLAP status invoked with arguments = --hiveconf\nINFO conf.HiveConf: Found configuration file file:/etc/hive2/2.5.0.0-1245/0/conf.server/hive-site.xml\nWARN conf.HiveConf: HiveConf of name hive.llap.daemon.allow.permanent.fns does not exist\nINFO impl.TimelineClientImpl: Timeline service address: http://hdp-nn1.co.vectis.local:8188/ws/v1/timeline/\nINFO client.AHSProxy: Connecting to Application History server at hdp-nn1.co.vectis.local/10.255.242.180:10200\nWARN curator.CuratorZookeeperClient: session timeout [10000] is less than connection timeout [15000]\nINFO impl.LlapZookeeperRegistryImpl: Llap Zookeeper Registry is enabled with registryid: llap0\nINFO impl.LlapRegistryService: Using LLAP registry type org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl@4e6f2bb5\nINFO impl.LlapZookeeperRegistryImpl: UGI security is not enabled, or non-daemon environment. Skipping setting up ZK auth.\nINFO imps.CuratorFrameworkImpl: Starting\nINFO impl.LlapRegistryService: Using LLAP registry (client) type: Service LlapRegistryService in state LlapRegistryService: STARTED\nINFO state.ConnectionStateManager: State change: CONNECTED\nINFO cli.LlapStatusServiceDriver: No information found in the LLAP registry\nINFO cli.LlapStatusServiceDriver: LLAP status finished')
2016-09-07 20:33:15,295 - Received 'llapstatus' command 'output' : {
"amInfo" : {
"appName" : "llap0",
"appType" : "org-apache-slider",
"appId" : "application_1473264739795_0004",
"containerId" : "container_e12_1473264739795_0004_01_000001",
"hostname" : "hdp-dn2.co.vectis.local",
"amWebUrl" : "http://hdp-dn2.co.vectis.local:40485/"
},
"state" : "LAUNCHING",
"originalConfigurationPath" : "hdfs://prodcluster/user/hive/.slider/cluster/llap0/snapshot",
"generatedConfigurationPath" : "hdfs://prodcluster/user/hive/.slider/cluster/llap0/generated",
"desiredInstances" : 1,
"liveInstances" : 0,
"appStartTime" : 1473269583908
}
... View more
Labels:
- Labels:
-
Apache Hive
05-03-2016
12:43 PM
Hi @Ian Roberts, thanks for the clarification.
... View more
04-04-2016
02:30 PM
If I use user-limit-factor=2.5 then why do I need to set yarn.scheduler.capacity.root.it.capacity=40? I can set it yarn.scheduler.capacity.root.it.capacity=100. Result will be the same.
Is yarn.scheduler.capacity.root.it.capacity just lower limit?
... View more
04-04-2016
10:05 AM
Hi nmaillard, I tried that already. yarn.scheduler.capacity.root.it.user-limit-factor=2
yarn.scheduler.capacity.root.price.user-limit-factor=1
In this case, the ituser1 picks up 63 containers, but if priceuser1 comes it this time, the ituser1 does not give him the vacant containers, he continues to use them for yourself. I expected ituser1 release 31 containers for priceuser1. But it did not happen. I guess because the ituser1 thinks that is eligible for 63 containers instead of 32.
... View more