Member since
01-09-2016
56
Posts
44
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
9598 | 12-27-2016 05:00 AM | |
2320 | 09-28-2016 08:19 AM | |
15849 | 09-12-2016 09:44 AM | |
7221 | 02-09-2016 10:48 AM |
07-11-2017
12:11 AM
@Eugene Koifman Thanks for the clarification!
... View more
04-20-2017
01:00 PM
14 Kudos
After an
unsuccessful upgrade, I was forced to completely remove HDP 2.4, Ambari 2.5 and
install HDP 2.6. I wanted to avoid reinstalling the OS, so I took advantage of this instruction. Unfortunately, it is not complete. For the problem-free
installation of HDP 2.6, you also need to do things like removing service
users, cleaning the cron. So here
is my plan of action: 1. Stop
all services in Ambari or kill them. In my case, Ambari damaged his database
during downgrade and could not start. So I manually killed all the processes on
all nodes: ps –u hdfs (see list of all services below)
kill PID 2. Run
python script on all cluster nodes python /usr/lib/python2.6/site-packages/ambari_agent/HostCleanup.py --silent --skip=users 3. Remove
Hadoop packages on all nodes yum remove hive\*
yum remove oozie\*
yum remove pig\*
yum remove zookeeper\*
yum remove tez\*
yum remove hbase\*
yum remove ranger\*
yum remove knox\*
yum remove storm\*
yum remove accumulo\*
yum remove falcon\*
yum remove ambari-metrics-hadoop-sink
yum remove smartsense-hst
yum remove slider_2_4_2_0_258
yum remove ambari-metrics-monitor
yum remove spark2_2_5_3_0_37-yarn-shuffle
yum remove spark_2_5_3_0_37-yarn-shuffle
yum remove ambari-infra-solr-client
4. Remove
ambari-server (on ambari host) and ambari-agent (on all nodes) ambari-server stop
ambari-agent stop
yum erase ambari-server
yum erase ambari-agent
5. Remove
repositories on all nodes rm -rf /etc/yum.repos.d/ambari.repo /etc/yum.repos.d/HDP*
yum clean all
6. Remove
log folders on all nodes rm -rf /var/log/ambari-agent
rm -rf /var/log/ambari-metrics-grafana
rm -rf /var/log/ambari-metrics-monitor
rm -rf /var/log/ambari-server/
rm -rf /var/log/falcon
rm -rf /var/log/flume
rm -rf /var/log/hadoop
rm -rf /var/log/hadoop-mapreduce
rm -rf /var/log/hadoop-yarn
rm -rf /var/log/hive
rm -rf /var/log/hive-hcatalog
rm -rf /var/log/hive2
rm -rf /var/log/hst
rm -rf /var/log/knox
rm -rf /var/log/oozie
rm -rf /var/log/solr
rm -rf /var/log/zookeeper
7. Remove
Hadoop folders including HDFS data on all nodes rm -rf /hadoop/*
rm -rf /hdfs/hadoop
rm -rf /hdfs/lost+found
rm -rf /hdfs/var
rm -rf /local/opt/hadoop
rm -rf /tmp/hadoop
rm -rf /usr/bin/hadoop
rm -rf /usr/hdp
rm -rf /var/hadoop
8. Remove
config folders on all nodes rm -rf /etc/ambari-agent
rm -rf /etc/ambari-metrics-grafana
rm -rf /etc/ambari-server
rm -rf /etc/ams-hbase
rm -rf /etc/falcon
rm -rf /etc/flume
rm -rf /etc/hadoop
rm -rf /etc/hadoop-httpfs
rm -rf /etc/hbase
rm -rf /etc/hive
rm -rf /etc/hive-hcatalog
rm -rf /etc/hive-webhcat
rm -rf /etc/hive2
rm -rf /etc/hst
rm -rf /etc/knox
rm -rf /etc/livy
rm -rf /etc/mahout
rm -rf /etc/oozie
rm -rf /etc/phoenix
rm -rf /etc/pig
rm -rf /etc/ranger-admin
rm -rf /etc/ranger-usersync
rm -rf /etc/spark2
rm -rf /etc/tez
rm -rf /etc/tez_hive2
rm -rf /etc/zookeeper
9. Remove
PIDs on all nodes rm -rf /var/run/ambari-agent
rm -rf /var/run/ambari-metrics-grafana
rm -rf /var/run/ambari-server
rm -rf /var/run/falcon
rm -rf /var/run/flume
rm -rf /var/run/hadoop
rm -rf /var/run/hadoop-mapreduce
rm -rf /var/run/hadoop-yarn
rm -rf /var/run/hbase
rm -rf /var/run/hive
rm -rf /var/run/hive-hcatalog
rm -rf /var/run/hive2
rm -rf /var/run/hst
rm -rf /var/run/knox
rm -rf /var/run/oozie
rm -rf /var/run/webhcat
rm -rf /var/run/zookeeper
10. Remove
library folders on all nodes rm -rf /usr/lib/ambari-agent
rm -rf /usr/lib/ambari-infra-solr-client
rm -rf /usr/lib/ambari-metrics-hadoop-sink
rm -rf /usr/lib/ambari-metrics-kafka-sink
rm -rf /usr/lib/ambari-server-backups
rm -rf /usr/lib/ams-hbase
rm -rf /usr/lib/mysql
rm -rf /var/lib/ambari-agent
rm -rf /var/lib/ambari-metrics-grafana
rm -rf /var/lib/ambari-server
rm -rf /var/lib/flume
rm -rf /var/lib/hadoop-hdfs
rm -rf /var/lib/hadoop-mapreduce
rm -rf /var/lib/hadoop-yarn
rm -rf /var/lib/hive2
rm -rf /var/lib/knox
rm -rf /var/lib/smartsense
rm -rf /var/lib/storm
11. Clean
folder /var/tmp/* on all nodes rm -rf /var/tmp/* 12. Delete
HST from cron on all nodes 0 * * * * /usr/hdp/share/hst/bin/hst-scheduled-capture.sh sync
0 2 * * 0 /usr/hdp/share/hst/bin/hst-scheduled-capture.sh
13. Remove
databases. I remove the instances of MySQL and Postgres so that Ambari installed
and configured fresh databases. yum remove mysql mysql-server
yum erase postgresql
rm -rf /var/lib/pgsql
rm -rf /var/lib/mysql
14. Remove
symlinks on all nodes. Especially check folders /usr/sbin and /usr/lib/python2.6/site-packages cd /usr/bin
rm -rf accumulo
rm -rf atlas-start
rm -rf atlas-stop
rm -rf beeline
rm -rf falcon
rm -rf flume-ng
rm -rf hbase
rm -rf hcat
rm -rf hdfs
rm -rf hive
rm -rf hiveserver2
rm -rf kafka
rm -rf mahout
rm -rf mapred
rm -rf oozie
rm -rf oozied.sh
rm -rf phoenix-psql
rm -rf phoenix-queryserver
rm -rf phoenix-sqlline
rm -rf phoenix-sqlline-thin
rm -rf pig
rm -rf python-wrap
rm -rf ranger-admin
rm -rf ranger-admin-start
rm -rf ranger-admin-stop
rm -rf ranger-kms
rm -rf ranger-usersync
rm -rf ranger-usersync-start
rm -rf ranger-usersync-stop
rm -rf slider
rm -rf sqoop
rm -rf sqoop-codegen
rm -rf sqoop-create-hive-table
rm -rf sqoop-eval
rm -rf sqoop-export
rm -rf sqoop-help
rm -rf sqoop-import
rm -rf sqoop-import-all-tables
rm -rf sqoop-job
rm -rf sqoop-list-databases
rm -rf sqoop-list-tables
rm -rf sqoop-merge
rm -rf sqoop-metastore
rm -rf sqoop-version
rm -rf storm
rm -rf storm-slider
rm -rf worker-lanucher
rm -rf yarn
rm -rf zookeeper-client
rm -rf zookeeper-server
rm -rf zookeeper-server-cleanup
15. Remove
service users on all nodes userdel -r accumulo
userdel -r ambari-qa
userdel -r ams
userdel -r falcon
userdel -r flume
userdel -r hbase
userdel -r hcat
userdel -r hdfs
userdel -r hive
userdel -r kafka
userdel -r knox
userdel -r mapred
userdel -r oozie
userdel -r ranger
userdel -r spark
userdel -r sqoop
userdel -r storm
userdel -r tez
userdel -r yarn
userdel -r zeppelin
userdel -r zookeeper
16. Run find / -name ** on all nodes. You will definitely find several more files/folders. Remove them. find / -name *ambari*
find / -name *accumulo*
find / -name *atlas*
find / -name *beeline*
find / -name *falcon*
find / -name *flume*
find / -name *hadoop*
find / -name *hbase*
find / -name *hcat*
find / -name *hdfs*
find / -name *hdp*
find / -name *hive*
find / -name *hiveserver2*
find / -name *kafka*
find / -name *mahout*
find / -name *mapred*
find / -name *oozie*
find / -name *phoenix*
find / -name *pig*
find / -name *ranger*
find / -name *slider*
find / -name *sqoop*
find / -name *storm*
find / -name *yarn*
find / -name *zookeeper*
17. Reboot
all nodes reboot
... View more
04-19-2017
11:04 AM
2 Kudos
I'm trying to copy a transaction table from a production cluster HDP 2.5 to a dev cluster HDP 2.6.
I set these ACID settings in dev cluster: hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
hive.support.concurrency=true
hive.enforce.bucketing=true
hive.exec.dynamic.partition.mode=nonstrict
hive.compactor.initiator.on=true
hive.compactor.worker.threads=3
then I import table from prod to dev: hive> export table hana.easy_check to 'export/easy_check';
hadoop distcp -prbugp hdfs://hdp-nn1:8020/user/hive/export/easy_check/ hdfs://dev-nn2:8020/user/hive/export/
hive> import from 'export/easy_check'; However, when I run any sql query on this table in dev cluster I get an error: 2017-04-19 11:08:33,879 [ERROR] [Dispatcher thread {Central}] |impl.VertexImpl|: Vertex Input: easy_check initializer failed, vertex=vertex_1492584180580_0005_1_00 [Map 1]
org.apache.tez.dag.app.dag.impl.AMUserCodeException: java.lang.RuntimeException: serious problem
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:319)
at com.google.common.util.concurrent.Futures$4.run(Futures.java:1140)
at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:150)
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:135)
at com.google.common.util.concurrent.ListenableFutureTask.done(ListenableFutureTask.java:91)
at java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:384)
at java.util.concurrent.FutureTask.setException(FutureTask.java:251)
at java.util.concurrent.FutureTask.run(FutureTask.java:271)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: serious problem
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1258)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1285)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:307)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:409)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:273)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:266)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Not enough history available for (0,x). Oldest available base: hdfs://development/apps/hive/warehouse/hana.db/easy_check/ym=2017-01/base_0001497
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1235)
... 15 more
Caused by: java.io.IOException: Not enough history available for (0,x). Oldest available base: hdfs://development/apps/hive/warehouse/hana.db/easy_check/ym=2017-01/base_0001497
at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:594)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.callInternal(OrcInputFormat.java:773)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.access$600(OrcInputFormat.java:738)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator$1.run(OrcInputFormat.java:763)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator$1.run(OrcInputFormat.java:760)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:760)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:738)
... 4 more What is wrong? Both Hive 1.2.1 # Detailed Table Information
Database: hana
Owner: hive
CreateTime: Wed Apr 19 13:27:00 MSK 2017
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://development/apps/hive/warehouse/hana.db/easy_check
Table Type: MANAGED_TABLE
Table Parameters:
NO_AUTO_COMPACTION false
compactor.mapreduce.map.memory.mb 2048
compactorthreshold.hive.compactor.delta.num.threshold 4
compactorthreshold.hive.compactor.delta.pct.threshold 0.3
last_modified_by hive
last_modified_time 1489647024
orc.bloom.filter.columns calday, request, material
orc.compress ZLIB
orc.compress.size 262144
orc.create.index true
orc.row.index.stride 5000
orc.stripe.size 67108864
transactional true
transient_lastDdlTime 1492597620
# Storage Information
SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed: No
Num Buckets: 1
Bucket Columns: [material]
Sort Columns: []
Storage Desc Params:
serialization.format 1
... View more
Labels:
- Labels:
-
Apache Hive
02-03-2017
07:14 AM
Solved it! Sick JN didn't stop when I stopped it in Ambari and even when I stop HDFS in Ambari. I killed the JN process manually, replaced the data from healthy JN and run HDFS. Now it works! 🙂
... View more
02-02-2017
08:59 AM
Hi @Brandon Wilson Your solution works perfectly but only if "edits_inprogress_" file has the same name on both JournalNodes (JN). In case of my devcluster, I was not engaged in the problem of two months. During this time, a healthy JN has created a new "edits_inprogress_" file, but the sick JN still asks the old "edits_inprogress_" file. I did all 4 steps of your algorithm, but sick JN again asks old file. The content of /hadoop/hdfs/journal/devcluster/current is the same on both nodes. What to do? Log of healthy JN (edits_inprogress_0000000000016172345) 2017-02-02 10:15:12,513 INFO namenode.FileJournalManager (FileJournalManager.java:finalizeLogSegment(133)) - Finalizing edits file /hadoop/hdfs/journal/devcluster/current/edits_inprogress_0000000000016172345 -> /hadoop/hdfs/journal/devcluster/current/edits_0000000000016172345-0000000000016172394
Log of sick JN (edits_inprogress_0000000000011766543) 2017-02-02 10:15:57,744 WARN namenode.FSImage (EditLogFileInputStream.java:scanEditLog(350)) - Caught exception after scanning through 0 ops from /hadoop/hdfs/journal/devcluster/current/edits_inprogress_0000000000011766543 while determining its valid length. Position was 1036288
java.io.IOException: Can't scan a pre-transactional edit log.
... View more
12-27-2016
05:00 AM
Hi @Aravindan Vijayan Yesterday suddenly Ambari Metrics has started working (and still works). The only thing I have changed yesterday - install Apache Atlas, which required restart almost all components, may be it helped.
Thanks for your assistance!
... View more
12-26-2016
05:49 AM
Hi @Aravindan Vijayan I have 7 nodes (2 nn + 5 dn). Response to GET calls http://<METRICS_COLLECTOR_HOST>:6188/ws/v1/timeline/metrics/metadata (it's too long, I cut it)
{"type":"COUNTER","seriesStartTime":1482480880891,"metricname":"regionserver.WAL.rollRequest","supportsAggregation":true},{"type":"COUNTER","seriesStartTime":1482480880869,"metricname":"jvm.Master.JvmMetrics.GcCount","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880889,"metricname":"master.FileSystem.MetaHlogSplitSize_99th_percentile","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880889,"metricname":"master.FileSystem.HlogSplitSize_98th_percentile","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880874,"metricname":"master.Master.QueueCallTime_median","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880889,"metricname":"master.FileSystem.HlogSplitSize_max","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880894,"metricname":"master.Balancer.BalancerCluster_median","supportsAggregation":true},{"type":"COUNTER","seriesStartTime":1482480880896,"metricname":"metricssystem.MetricsSystem.PublishNumOps","supportsAggregation":true},{"type":"COUNTER","seriesStartTime":1482480880874,"metricname":"master.Master.exceptions.FailedSanityCheckException","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880869,"metricname":"jvm.Master.JvmMetrics.MemHeapUsedM","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880894,"metricname":"master.AssignmentManger.Assign_mean","supportsAggregation":true},{"type":"COUNTER","seriesStartTime":1482480880896,"metricname":"metricssystem.MetricsSystem.Sink_timelineDropped","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880889,"metricname":"master.FileSystem.MetaHlogSplitTime_95th_percentile","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880894,"metricname":"master.AssignmentManger.BulkAssign_95th_percentile","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880869,"metricname":"jvm.Master.JvmMetrics.MemNonHeapUsedM","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880894,"metricname":"master.AssignmentManger.Assign_99.9th_percentile","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880874,"metricname":"master.Master.RequestSize_mean","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880874,"metricname":"master.Master.RequestSize_min","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880894,"metricname":"master.AssignmentManger.Assign_99th_percentile","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880891,"metricname":"regionserver.WAL.AppendSize_99th_percentile","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880894,"metricname":"master.Balancer.BalancerCluster_99.9th_percentile","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880894,"metricname":"master.AssignmentManger.BulkAssign_75th_percentile","supportsAggregation":true},{"type":"COUNTER","seriesStartTime":1482480880889,"metricname":"master.FileSystem.MetaHlogSplitTime_num_ops","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880891,"metricname":"regionserver.WAL.SyncTime_90th_percentile","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880894,"metricname":"master.Balancer.BalancerCluster_90th_percentile","supportsAggregation":true},{"type":"GAUGE","seriesStartTime":1482480880891,"metricname":"regionserver.WAL.AppendTime_max","supportsAggregation":true}],"logfeeder":[{"type":"Long","seriesStartTime":1482480913546,"metricname":"output.solr.write_logs","supportsAggregation":true},{"type":"Long","seriesStartTime":1482480913546,"metricname":"input.files.count","supportsAggregation":true},{"type":"Long","seriesStartTime":1482480943578,"metricname":"filter.error.keyvalue","supportsAggregation":true},{"type":"Long","seriesStartTime":1482480913546,"metricname":"filter.error.grok","supportsAggregation":true},{"type":"Long","seriesStartTime":1482480913546,"metricname":"input.files.read_bytes","supportsAggregation":true},{"type":"Long","seriesStartTime":1482480913546,"metricname":"output.solr.write_bytes","supportsAggregation":true},{"type":"Long","seriesStartTime":1482480913546,"metricname":"input.files.read_lines","supportsAggregation":true}]} http://<METRICS_COLLECTOR_HOST>:6188/ws/v1/timeline/metrics/hosts
{"hdp-dn3.hostname":["accumulo","datanode","journalnode","HOST","nodemanager","hbase","logfeeder"],"hdp-dn5.hostname":["accumulo","datanode","HOST","nodemanager","hbase","logfeeder"],"hdp-dn2.hostname":["accumulo","datanode","HOST","nodemanager","logfeeder"],"hdp-nn1.hostname":["accumulo","nimbus","resourcemanager","journalnode","HOST","applicationhistoryserver","namenode","hbase","kafka_broker","logfeeder"],"hdp-dn1.hostname":["accumulo","hiveserver2","datanode","hivemetastore","HOST","nodemanager","logfeeder"],"hdp-dn4.hostname":["accumulo","datanode","HOST","nodemanager","hbase","logfeeder"],"hdp-nn2.hostname":["hiveserver2","hivemetastore","journalnode","resourcemanager","HOST","jobhistoryserver","namenode","ams-hbase","logfeeder"]} Config files cat /etc/ambari-metrics-collector/conf/ams-env.sh
# Set environment variables here.
# The java implementation to use. Java 1.6 required.
export JAVA_HOME=/usr/jdk64/jdk1.8.0_77
# Collector Log directory for log4j
export AMS_COLLECTOR_LOG_DIR=/var/log/ambari-metrics-collector
# Monitor Log directory for outfile
export AMS_MONITOR_LOG_DIR=/var/log/ambari-metrics-monitor
# Collector pid directory
export AMS_COLLECTOR_PID_DIR=/var/run/ambari-metrics-collector
# Monitor pid directory
export AMS_MONITOR_PID_DIR=/var/run/ambari-metrics-monitor
# AMS HBase pid directory
export AMS_HBASE_PID_DIR=/var/run/ambari-metrics-collector/
# AMS Collector heapsize
export AMS_COLLECTOR_HEAPSIZE=1024m
# HBase normalizer enabled
export AMS_HBASE_NORMALIZER_ENABLED=False
# HBase compaction policy enabled
export AMS_HBASE_FIFO_COMPACTION_ENABLED=True
# HBase Tables Initialization check enabled
export AMS_HBASE_INIT_CHECK_ENABLED=True
# AMS Collector options
export AMS_COLLECTOR_OPTS="-Djava.library.path=/usr/lib/ams-hbase/lib/hadoop-native"
# AMS Collector GC options
export AMS_COLLECTOR_GC_OPTS="-XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/ambari-metrics-collector/collector-gc.log-`date +'%Y%m%d%H%M'`"
export AMS_COLLECTOR_OPTS="$AMS_COLLECTOR_OPTS $AMS_COLLECTOR_GC_OPTS"
cat /etc/ambari-metrics-collector/conf/ams-site.xml
<configuration>
<property>
<name>phoenix.query.maxGlobalMemoryPercentage</name>
<value>25</value>
</property>
<property>
<name>phoenix.spool.directory</name>
<value>/tmp</value>
</property>
<property>
<name>timeline.metrics.aggregator.checkpoint.dir</name>
<value>/var/lib/ambari-metrics-collector/checkpoint</value>
</property>
<property>
<name>timeline.metrics.aggregators.skip.blockcache.enabled</name>
<value>false</value>
</property>
<property>
<name>timeline.metrics.cache.commit.interval</name>
<value>3</value>
</property>
<property>
<name>timeline.metrics.cache.enabled</name>
<value>true</value>
</property>
<property>
<name>timeline.metrics.cache.size</name>
<value>150</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregate.splitpoints</name>
<value>kafka.server.BrokerTopicMetrics.FailedFetchRequestsPerSec.meanRate</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.daily.checkpointCutOffMultiplier</name>
<value>2</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.daily.disabled</name>
<value>false</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.daily.interval</name>
<value>86400</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.daily.ttl</name>
<value>63072000</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.hourly.checkpointCutOffMultiplier</name>
<value>2</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.hourly.disabled</name>
<value>false</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.hourly.interval</name>
<value>3600</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.hourly.ttl</name>
<value>31536000</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.interpolation.enabled</name>
<value>true</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.minute.checkpointCutOffMultiplier</name>
<value>2</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.minute.disabled</name>
<value>false</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.minute.interval</name>
<value>300</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.minute.ttl</name>
<value>2592000</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.second.checkpointCutOffMultiplier</name>
<value>2</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.second.disabled</name>
<value>false</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.second.interval</name>
<value>120</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.second.timeslice.interval</name>
<value>30</value>
</property>
<property>
<name>timeline.metrics.cluster.aggregator.second.ttl</name>
<value>259200</value>
</property>
<property>
<name>timeline.metrics.daily.aggregator.minute.interval</name>
<value>86400</value>
</property>
<property>
<name>timeline.metrics.hbase.compression.scheme</name>
<value>SNAPPY</value>
</property>
<property>
<name>timeline.metrics.hbase.data.block.encoding</name>
<value>FAST_DIFF</value>
</property>
<property>
<name>timeline.metrics.hbase.fifo.compaction.enabled</name>
<value>true</value>
</property>
<property>
<name>timeline.metrics.hbase.init.check.enabled</name>
<value>true</value>
</property>
<property>
<name>timeline.metrics.host.aggregate.splitpoints</name>
<value>kafka.server.BrokerTopicMetrics.FailedFetchRequestsPerSec.meanRate</value>
</property>
<property>
<name>timeline.metrics.host.aggregator.daily.checkpointCutOffMultiplier</name>
<value>2</value>
</property>
<property>
<name>timeline.metrics.host.aggregator.daily.disabled</name>
<value>false</value>
</property>
<property>
<name>timeline.metrics.host.aggregator.daily.ttl</name>
<value>31536000</value>
</property>
<property>
<name>timeline.metrics.host.aggregator.hourly.checkpointCutOffMultiplier</name>
<value>2</value>
</property>
<property>
<name>timeline.metrics.host.aggregator.hourly.disabled</name>
<value>false</value>
</property>
<property>
<name>timeline.metrics.host.aggregator.hourly.interval</name>
<value>3600</value>
</property>
<property>
<name>timeline.metrics.host.aggregator.hourly.ttl</name>
<value>2592000</value>
</property>
<property>
<name>timeline.metrics.host.aggregator.minute.checkpointCutOffMultiplier</name>
<value>2</value>
</property>
<property>
<name>timeline.metrics.host.aggregator.minute.disabled</name>
<value>false</value>
</property>
<property>
<name>timeline.metrics.host.aggregator.minute.interval</name>
<value>300</value>
</property>
<property>
<name>timeline.metrics.host.aggregator.minute.ttl</name>
<value>604800</value>
</property>
<property>
<name>timeline.metrics.host.aggregator.ttl</name>
<value>86400</value>
</property>
<property>
<name>timeline.metrics.service.checkpointDelay</name>
<value>60</value>
</property>
<property>
<name>timeline.metrics.service.cluster.aggregator.appIds</name>
<value>datanode,nodemanager,hbase</value>
</property>
<property>
<name>timeline.metrics.service.default.result.limit</name>
<value>15840</value>
</property>
<property>
<name>timeline.metrics.service.handler.thread.count</name>
<value>20</value>
</property>
<property>
<name>timeline.metrics.service.http.policy</name>
<value>HTTP_ONLY</value>
</property>
<property>
<name>timeline.metrics.service.operation.mode</name>
<value>distributed</value>
</property>
<property>
<name>timeline.metrics.service.resultset.fetchSize</name>
<value>2000</value>
</property>
<property>
<name>timeline.metrics.service.rpc.address</name>
<value>0.0.0.0:60200</value>
</property>
<property>
<name>timeline.metrics.service.use.groupBy.aggregators</name>
<value>true</value>
</property>
<property>
<name>timeline.metrics.service.watcher.delay</name>
<value>30</value>
</property>
<property>
<name>timeline.metrics.service.watcher.disabled</name>
<value>true</value>
</property>
<property>
<name>timeline.metrics.service.watcher.initial.delay</name>
<value>600</value>
</property>
<property>
<name>timeline.metrics.service.watcher.timeout</name>
<value>30</value>
</property>
<property>
<name>timeline.metrics.service.webapp.address</name>
<value>hdp-nn2.hostname:6188</value>
</property>
<property>
<name>timeline.metrics.sink.collection.period</name>
<value>10</value>
</property>
<property>
<name>timeline.metrics.sink.report.interval</name>
<value>60</value>
</property>
</configuration>
cat /etc/ams-hbase/conf/hbase-site.xml
<configuration>
<property>
<name>dfs.block.access.token.enable</name>
<value>true</value>
</property>
<property>
<name>dfs.blockreport.initialDelay</name>
<value>120</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.prodcluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>
<property>
<name>dfs.client.read.shortcircuit.streams.cache.size</name>
<value>4096</value>
</property>
<property>
<name>dfs.client.retry.policy.enabled</name>
<value>false</value>
</property>
<property>
<name>dfs.cluster.administrators</name>
<value> hdfs</value>
</property>
<property>
<name>dfs.content-summary.limit</name>
<value>5000</value>
</property>
<property>
<name>dfs.datanode.address</name>
<value>0.0.0.0:50010</value>
</property>
<property>
<name>dfs.datanode.balance.bandwidthPerSec</name>
<value>6250000</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/hdfs/hadoop/hdfs/data</value>
<final>true</final>
</property>
<property>
<name>dfs.datanode.data.dir.perm</name>
<value>750</value>
</property>
<property>
<name>dfs.datanode.du.reserved</name>
<value>65906998272</value>
</property>
<property>
<name>dfs.datanode.failed.volumes.tolerated</name>
<value>0</value>
<final>true</final>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:50075</value>
</property>
<property>
<name>dfs.datanode.https.address</name>
<value>0.0.0.0:50475</value>
</property>
<property>
<name>dfs.datanode.ipc.address</name>
<value>0.0.0.0:8010</value>
</property>
<property>
<name>dfs.datanode.max.transfer.threads</name>
<value>16384</value>
</property>
<property>
<name>dfs.domain.socket.path</name>
<value>/var/lib/hadoop-hdfs/dn_socket</value>
</property>
<property>
<name>dfs.encrypt.data.transfer.cipher.suites</name>
<value>AES/CTR/NoPadding</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>shell(/bin/true)</value>
</property>
<property>
<name>dfs.ha.namenodes.prodcluster</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.heartbeat.interval</name>
<value>3</value>
</property>
<property>
<name>dfs.hosts.exclude</name>
<value>/etc/hadoop/conf/dfs.exclude</value>
</property>
<property>
<name>dfs.http.policy</name>
<value>HTTP_ONLY</value>
</property>
<property>
<name>dfs.https.port</name>
<value>50470</value>
</property>
<property>
<name>dfs.internal.nameservices</name>
<value>prodcluster</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/hadoop/hdfs/journal</value>
</property>
<property>
<name>dfs.journalnode.http-address</name>
<value>0.0.0.0:8480</value>
</property>
<property>
<name>dfs.journalnode.https-address</name>
<value>0.0.0.0:8481</value>
</property>
<property>
<name>dfs.namenode.accesstime.precision</name>
<value>0</value>
</property>
<property>
<name>dfs.namenode.audit.log.async</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.avoid.read.stale.datanode</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.avoid.write.stale.datanode</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>/hdfs/hadoop/hdfs/namesecondary</value>
</property>
<property>
<name>dfs.namenode.checkpoint.edits.dir</name>
<value>${dfs.namenode.checkpoint.dir}</value>
</property>
<property>
<name>dfs.namenode.checkpoint.period</name>
<value>21600</value>
</property>
<property>
<name>dfs.namenode.checkpoint.txns</name>
<value>1000000</value>
</property>
<property>
<name>dfs.namenode.fslock.fair</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>600</value>
</property>
<property>
<name>dfs.namenode.http-address.prodcluster.nn1</name>
<value>hdp-nn1.hostname:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.prodcluster.nn2</name>
<value>hdp-nn2.hostname:50070</value>
</property>
<property>
<name>dfs.namenode.https-address.prodcluster.nn1</name>
<value>hdp-nn1.hostname:50470</value>
</property>
<property>
<name>dfs.namenode.https-address.prodcluster.nn2</name>
<value>hdp-nn2.hostname:50470</value>
</property>
<property>
<name>dfs.namenode.inode.attributes.provider.class</name>
<value>org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/hdfs/hadoop/hdfs/namenode</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.name.dir.restore</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.rpc-address.prodcluster.nn1</name>
<value>hdp-nn1.hostname:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.prodcluster.nn2</name>
<value>hdp-nn2.hostname:8020</value>
</property>
<property>
<name>dfs.namenode.safemode.threshold-pct</name>
<value>0.99</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hdp-dn3.hostname:8485;hdp-nn1.hostname:8485;hdp-nn2.hostname:8485/prodcluster</value>
</property>
<property>
<name>dfs.namenode.stale.datanode.interval</name>
<value>30000</value>
</property>
<property>
<name>dfs.namenode.startup.delay.block.deletion.sec</name>
<value>3600</value>
</property>
<property>
<name>dfs.namenode.write.stale.datanode.ratio</name>
<value>1.0f</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>prodcluster</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions.superusergroup</name>
<value>hdfs</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.replication.max</name>
<value>50</value>
</property>
<property>
<name>dfs.support.append</name>
<value>true</value>
<final>true</final>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
<final>true</final>
</property>
<property>
<name>fs.permissions.umask-mode</name>
<value>077</value>
</property>
<property>
<name>nfs.exports.allowed.hosts</name>
<value>* rw</value>
</property>
<property>
<name>nfs.file.dump.dir</name>
<value>/tmp/.hdfs-nfs</value>
</property>
</configuration>
cat /etc/ams-hbase/conf/hbase-env.sh
# Set environment variables here.
# The java implementation to use. Java 1.6+ required.
export JAVA_HOME=/usr/jdk64/jdk1.8.0_77
# HBase Configuration directory
export HBASE_CONF_DIR=${HBASE_CONF_DIR:-/etc/ams-hbase/conf}
# Extra Java CLASSPATH elements. Optional.
additional_cp=
if [ -n "$additional_cp" ];
then
export HBASE_CLASSPATH=${HBASE_CLASSPATH}:$additional_cp
else
export HBASE_CLASSPATH=${HBASE_CLASSPATH}
fi
# The maximum amount of heap to use for hbase shell.
export HBASE_SHELL_OPTS="-Xmx256m"
# Extra Java runtime options.
# Below are what we set by default. May only work with SUN JVM.
# For more on why as well as other possible settings,
# see http://wiki.apache.org/hadoop/PerformanceTuning
export HBASE_OPTS="-XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/ambari-metrics-collector/hs_err_pid%p.log -Djava.io.tmpdir=/var/lib/ambari-metrics-collector/hbase-tmp"
export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/ambari-metrics-collector/gc.log-`date +'%Y%m%d%H%M'`"
# Uncomment below to enable java garbage collection logging.
# export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:$HBASE_HOME/logs/gc-hbase.log"
# Uncomment and adjust to enable JMX exporting
# See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access.
# More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html
#
# export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"
export HBASE_MASTER_OPTS=" -Xms512m -Xmx512m -Xmn102m -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly"
export HBASE_REGIONSERVER_OPTS=" -Xmn128m -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms896m -Xmx896m"
# export HBASE_THRIFT_OPTS="$HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104"
# File naming hosts on which HRegionServers will run. $HBASE_HOME/conf/regionservers by default.
export HBASE_REGIONSERVERS=${HBASE_CONF_DIR}/regionservers
# Extra ssh options. Empty by default.
# export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"
# Where log files are stored. $HBASE_HOME/logs by default.
export HBASE_LOG_DIR=/var/log/ambari-metrics-collector
# A string representing this instance of hbase. $USER by default.
# export HBASE_IDENT_STRING=$USER
# The scheduling priority for daemon processes. See 'man nice'.
# export HBASE_NICENESS=10
# The directory where pid files are stored. /tmp by default.
export HBASE_PID_DIR=/var/run/ambari-metrics-collector/
# Seconds to sleep between slave commands. Unset by default. This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HBASE_SLAVE_SLEEP=0.1
# Tell HBase whether it should manage it's own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=false
# use embedded native libs
_HADOOP_NATIVE_LIB="/usr/lib/ams-hbase/lib/hadoop-native/"
export HBASE_OPTS="$HBASE_OPTS -Djava.library.path=${_HADOOP_NATIVE_LIB}"
# Unset HADOOP_HOME to avoid importing HADOOP installed cluster related configs like: /usr/hdp/2.2.0.0-2041/hadoop/conf/
export HADOOP_HOME=/usr/lib/ams-hbase/
# Explicitly Setting HBASE_HOME for AMS HBase so that there is no conflict
export HBASE_HOME=/usr/lib/ams-hbase/
rpm -qa | grep ambari
ambari-metrics-collector-2.4.0.1-1.x86_64
ambari-metrics-hadoop-sink-2.4.0.1-1.x86_64
ambari-agent-2.4.0.1-1.x86_64
ambari-infra-solr-client-2.4.0.1-1.x86_64
ambari-logsearch-logfeeder-2.4.0.1-1.x86_64
ambari-metrics-monitor-2.4.0.1-1.x86_64
ambari-metrics-grafana-2.4.0.1-1.x86_64
ambari-infra-solr-2.4.0.1-1.x86_64
... View more
12-23-2016
08:50 AM
@Aravindan Vijayan I did this: 1) Turn on Maintenance mode
2) Stop Ambari Metrics
3) hadoop fs -rmr /ams/hbase/*
4) rm -rf /var/lib/ambari-metrics-collector/hbase-tmp/*
5)
[zk: localhost:2181(CONNECTED) 0] ls /
[registry, controller, brokers, storm, zookeeper, infra-solr,
hiveserver2-hive2, hbase-unsecure, yarn-leader-election, tracers, hadoop-ha,
admin, isr_change_notification, services, templeton-hadoop, accumulo,
controller_epoch, hiveserver2, llap-unsecure, rmstore, ranger_audits,
consumers, config, ams-hbase-unsecure]
[zk: localhost:2181(CONNECTED) 1] rmr /ams-hbase-unsecure
[zk: localhost:2181(CONNECTED) 2] ls /
[registry, controller, brokers, storm, zookeeper, infra-solr,
hiveserver2-hive2, hbase-unsecure, yarn-leader-election, tracers, hadoop-ha,
admin, isr_change_notification, services, templeton-hadoop, accumulo,
controller_epoch, hiveserver2, llap-unsecure, rmstore, ranger_audits,
consumers, config]
6) Start Ambari Metrics
7) Turn off Maintenance mode After about 15 minutes I got this log: 2016-12-23 11:35:08,673 ERROR org.mortbay.log: /ws/v1/timeline/metrics/
javax.ws.rs.WebApplicationException: javax.xml.bind.MarshalException
- with linked exception:
[org.mortbay.jetty.EofException]
at com.sun.jersey.core.provider.jaxb.AbstractRootElementProvider.writeTo(AbstractRootElementProvider.java:159)
at com.sun.jersey.spi.container.ContainerResponse.write(ContainerResponse.java:306)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1437)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:895)
at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:843)
at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:804)
at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1294)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: javax.xml.bind.MarshalException
- with linked exception:
[org.mortbay.jetty.EofException]
at com.sun.xml.bind.v2.runtime.MarshallerImpl.write(MarshallerImpl.java:325)
at com.sun.xml.bind.v2.runtime.MarshallerImpl.marshal(MarshallerImpl.java:249)
at javax.xml.bind.helpers.AbstractMarshallerImpl.marshal(AbstractMarshallerImpl.java:95)
at com.sun.jersey.core.provider.jaxb.AbstractRootElementProvider.writeTo(AbstractRootElementProvider.java:179)
at com.sun.jersey.core.provider.jaxb.AbstractRootElementProvider.writeTo(AbstractRootElementProvider.java:157)
... 37 more
Caused by: org.mortbay.jetty.EofException
at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:634)
at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:580)
at com.sun.jersey.spi.container.servlet.WebComponent$Writer.write(WebComponent.java:307)
at com.sun.jersey.spi.container.ContainerResponse$CommittingOutputStream.write(ContainerResponse.java:134)
at com.sun.xml.bind.v2.runtime.output.UTF8XmlOutput.flushBuffer(UTF8XmlOutput.java:416)
at com.sun.xml.bind.v2.runtime.output.UTF8XmlOutput.endDocument(UTF8XmlOutput.java:141)
at com.sun.xml.bind.v2.runtime.XMLSerializer.endDocument(XMLSerializer.java:856)
at com.sun.xml.bind.v2.runtime.MarshallerImpl.postwrite(MarshallerImpl.java:374)
at com.sun.xml.bind.v2.runtime.MarshallerImpl.write(MarshallerImpl.java:321)
... 41 more
2016-12-23 11:35:09,796 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.PhoenixHBaseAccessor: Saved 8606 metadata records.
2016-12-23 11:35:09,843 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.PhoenixHBaseAccessor: Saved 7 hosted apps metadata records.
2016-12-23 11:35:25,123 INFO TimelineClusterAggregatorMinute: Started Timeline aggregator thread @ Fri Dec 23 11:35:25 MSK 2016
2016-12-23 11:35:25,124 INFO TimelineClusterAggregatorMinute: Last Checkpoint read : Fri Dec 23 11:30:00 MSK 2016
2016-12-23 11:35:25,124 INFO TimelineClusterAggregatorMinute: Rounded off checkpoint : Fri Dec 23 11:30:00 MSK 2016
2016-12-23 11:35:25,124 INFO TimelineClusterAggregatorMinute: Last check point time: 1482481800000, lagBy: 325 seconds.
2016-12-23 11:35:25,124 INFO TimelineClusterAggregatorMinute: Start aggregation cycle @ Fri Dec 23 11:35:25 MSK 2016, startTime = Fri Dec 23 11:30:00 MSK 2016, endTime = Fri Dec 23 11:35:00 MSK 2016
2016-12-23 11:35:25,143 INFO TimelineClusterAggregatorMinute: 0 row(s) updated.
2016-12-23 11:35:25,143 INFO TimelineClusterAggregatorMinute: Aggregated cluster metrics for METRIC_AGGREGATE_MINUTE, with startTime = Fri Dec 23 11:30:00 MSK 2016, endTime = Fri Dec 23 11:35:00 MSK 2016
2016-12-23 11:35:25,143 INFO TimelineClusterAggregatorMinute: End aggregation cycle @ Fri Dec 23 11:35:25 MSK 2016
2016-12-23 11:35:25,143 INFO TimelineClusterAggregatorMinute: End aggregation cycle @ Fri Dec 23 11:35:25 MSK 2016
2016-12-23 11:35:25,152 INFO TimelineMetricHostAggregatorMinute: Started Timeline aggregator thread @ Fri Dec 23 11:35:25 MSK 2016
2016-12-23 11:35:25,153 INFO TimelineMetricHostAggregatorMinute: Last Checkpoint read : Fri Dec 23 11:30:00 MSK 2016
2016-12-23 11:35:25,153 INFO TimelineMetricHostAggregatorMinute: Rounded off checkpoint : Fri Dec 23 11:30:00 MSK 2016
2016-12-23 11:35:25,153 INFO TimelineMetricHostAggregatorMinute: Last check point time: 1482481800000, lagBy: 325 seconds.
2016-12-23 11:35:25,153 INFO TimelineMetricHostAggregatorMinute: Start aggregation cycle @ Fri Dec 23 11:35:25 MSK 2016, startTime = Fri Dec 23 11:30:00 MSK 2016, endTime = Fri Dec 23 11:35:00 MSK 2016
2016-12-23 11:35:25,907 INFO TimelineMetricHostAggregatorMinute: 0 row(s) updated.
2016-12-23 11:35:25,907 INFO TimelineMetricHostAggregatorMinute: Aggregated host metrics for METRIC_RECORD_MINUTE, with startTime = Fri Dec 23 11:30:00 MSK 2016, endTime = Fri Dec 23 11:35:00 MSK 2016
2016-12-23 11:35:25,907 INFO TimelineMetricHostAggregatorMinute: End aggregation cycle @ Fri Dec 23 11:35:25 MSK 2016
2016-12-23 11:35:25,907 INFO TimelineMetricHostAggregatorMinute: End aggregation cycle @ Fri Dec 23 11:35:25 MSK 2016
2016-12-23 11:40:26,448 INFO TimelineMetricHostAggregatorMinute: Started Timeline aggregator thread @ Fri Dec 23 11:40:26 MSK 2016
2016-12-23 11:40:26,448 INFO TimelineClusterAggregatorMinute: Started Timeline aggregator thread @ Fri Dec 23 11:40:26 MSK 2016
2016-12-23 11:40:26,449 INFO TimelineMetricHostAggregatorMinute: Last Checkpoint read : Fri Dec 23 11:35:00 MSK 2016
2016-12-23 11:40:26,449 INFO TimelineMetricHostAggregatorMinute: Rounded off checkpoint : Fri Dec 23 11:35:00 MSK 2016
2016-12-23 11:40:26,449 INFO TimelineMetricHostAggregatorMinute: Last check point time: 1482482100000, lagBy: 326 seconds.
2016-12-23 11:40:26,449 INFO TimelineClusterAggregatorMinute: Last Checkpoint read : Fri Dec 23 11:35:00 MSK 2016
2016-12-23 11:40:26,449 INFO TimelineMetricHostAggregatorMinute: Start aggregation cycle @ Fri Dec 23 11:40:26 MSK 2016, startTime = Fri Dec 23 11:35:00 MSK 2016, endTime = Fri Dec 23 11:40:00 MSK 2016
2016-12-23 11:40:26,450 INFO TimelineClusterAggregatorMinute: Rounded off checkpoint : Fri Dec 23 11:35:00 MSK 2016
2016-12-23 11:40:26,450 INFO TimelineClusterAggregatorMinute: Last check point time: 1482482100000, lagBy: 326 seconds.
2016-12-23 11:40:26,450 INFO TimelineClusterAggregatorMinute: Start aggregation cycle @ Fri Dec 23 11:40:26 MSK 2016, startTime = Fri Dec 23 11:35:00 MSK 2016, endTime = Fri Dec 23 11:40:00 MSK 2016
2016-12-23 11:40:26,464 INFO TimelineClusterAggregatorMinute: 0 row(s) updated.
2016-12-23 11:40:26,464 INFO TimelineClusterAggregatorMinute: Aggregated cluster metrics for METRIC_AGGREGATE_MINUTE, with startTime = Fri Dec 23 11:35:00 MSK 2016, endTime = Fri Dec 23 11:40:00 MSK 2016
2016-12-23 11:40:26,465 INFO TimelineClusterAggregatorMinute: End aggregation cycle @ Fri Dec 23 11:40:26 MSK 2016
2016-12-23 11:40:26,465 INFO TimelineClusterAggregatorMinute: End aggregation cycle @ Fri Dec 23 11:40:26 MSK 2016
2016-12-23 11:40:46,839 INFO org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 25694 actions to finish
2016-12-23 11:40:46,899 INFO TimelineMetricHostAggregatorMinute: 22847 row(s) updated.
2016-12-23 11:40:46,899 INFO TimelineMetricHostAggregatorMinute: Aggregated host metrics for METRIC_RECORD_MINUTE, with startTime = Fri Dec 23 11:35:00 MSK 2016, endTime = Fri Dec 23 11:40:00 MSK 2016
2016-12-23 11:40:46,899 INFO TimelineMetricHostAggregatorMinute: End aggregation cycle @ Fri Dec 23 11:40:46 MSK 2016
2016-12-23 11:40:46,899 INFO TimelineMetricHostAggregatorMinute: End aggregation cycle @ Fri Dec 23 11:40:46 MSK 2016
2016-12-23 11:41:40,503 INFO org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 8014 actions to finish
What to do next?
... View more
12-23-2016
06:10 AM
Hi @Aravindan Vijayan I have 7 nodes (2 nn + 5 dn). Here is info: rpm -qa | grep ambari
ambari-metrics-collector-2.4.0.1-1.x86_64
ambari-metrics-hadoop-sink-2.4.0.1-1.x86_64
ambari-agent-2.4.0.1-1.x86_64
ambari-infra-solr-client-2.4.0.1-1.x86_64
ambari-logsearch-logfeeder-2.4.0.1-1.x86_64
ambari-metrics-monitor-2.4.0.1-1.x86_64
ambari-metrics-grafana-2.4.0.1-1.x86_64
ambari-infra-solr-2.4.0.1-1.x86_64
cat /etc/ambari-metrics-collector/conf/ams-env.sh
# Set environment variables here.
# The java implementation to use. Java 1.6 required.
export JAVA_HOME=/usr/jdk64/jdk1.8.0_77
# Collector Log directory for log4j
export AMS_COLLECTOR_LOG_DIR=/var/log/ambari-metrics-collector
# Monitor Log directory for outfile
export AMS_MONITOR_LOG_DIR=/var/log/ambari-metrics-monitor
# Collector pid directory
export AMS_COLLECTOR_PID_DIR=/var/run/ambari-metrics-collector
# Monitor pid directory
export AMS_MONITOR_PID_DIR=/var/run/ambari-metrics-monitor
# AMS HBase pid directory
export AMS_HBASE_PID_DIR=/var/run/ambari-metrics-collector/
# AMS Collector heapsize
export AMS_COLLECTOR_HEAPSIZE=1024m
# HBase normalizer enabled
export AMS_HBASE_NORMALIZER_ENABLED=False
# HBase compaction policy enabled
export AMS_HBASE_FIFO_COMPACTION_ENABLED=True
# HBase Tables Initialization check enabled
export AMS_HBASE_INIT_CHECK_ENABLED=True
# AMS Collector options
export AMS_COLLECTOR_OPTS="-Djava.library.path=/usr/lib/ams-hbase/lib/hadoop-native"
# AMS Collector GC options
export AMS_COLLECTOR_GC_OPTS="-XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/ambari-metrics-collector/collector-gc.log-`date +'%Y%m%d%H%M'`"
export AMS_COLLECTOR_OPTS="$AMS_COLLECTOR_OPTS $AMS_COLLECTOR_GC_OPTS"
cat /etc/ams-hbase/conf/hbase-env.sh
# Set environment variables here.
# The java implementation to use. Java 1.6+ required.
export JAVA_HOME=/usr/jdk64/jdk1.8.0_77
# HBase Configuration directory
export HBASE_CONF_DIR=${HBASE_CONF_DIR:-/etc/ams-hbase/conf}
# Extra Java CLASSPATH elements. Optional.
additional_cp=
if [ -n "$additional_cp" ];
then
export HBASE_CLASSPATH=${HBASE_CLASSPATH}:$additional_cp
else
export HBASE_CLASSPATH=${HBASE_CLASSPATH}
fi
# The maximum amount of heap to use for hbase shell.
export HBASE_SHELL_OPTS="-Xmx256m"
# Extra Java runtime options.
# Below are what we set by default. May only work with SUN JVM.
# For more on why as well as other possible settings,
# see http://wiki.apache.org/hadoop/PerformanceTuning
export HBASE_OPTS="-XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/ambari-metrics-collector/hs_err_pid%p.log -Djava.io.tmpdir=/var/lib/ambari-metrics-collector/hbase-tmp"
export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/ambari-metrics-collector/gc.log-`date +'%Y%m%d%H%M'`"
# Uncomment below to enable java garbage collection logging.
# export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:$HBASE_HOME/logs/gc-hbase.log"
# Uncomment and adjust to enable JMX exporting
# See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access.
# More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html
#
# export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"
export HBASE_MASTER_OPTS=" -Xms512m -Xmx512m -Xmn102m -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly"
export HBASE_REGIONSERVER_OPTS=" -Xmn128m -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms896m -Xmx896m"
# export HBASE_THRIFT_OPTS="$HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104"
# File naming hosts on which HRegionServers will run. $HBASE_HOME/conf/regionservers by default.
export HBASE_REGIONSERVERS=${HBASE_CONF_DIR}/regionservers
# Extra ssh options. Empty by default.
# export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"
# Where log files are stored. $HBASE_HOME/logs by default.
export HBASE_LOG_DIR=/var/log/ambari-metrics-collector
# A string representing this instance of hbase. $USER by default.
# export HBASE_IDENT_STRING=$USER
# The scheduling priority for daemon processes. See 'man nice'.
# export HBASE_NICENESS=10
# The directory where pid files are stored. /tmp by default.
export HBASE_PID_DIR=/var/run/ambari-metrics-collector/
# Seconds to sleep between slave commands. Unset by default. This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HBASE_SLAVE_SLEEP=0.1
# Tell HBase whether it should manage it's own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=false
# use embedded native libs
_HADOOP_NATIVE_LIB="/usr/lib/ams-hbase/lib/hadoop-native/"
export HBASE_OPTS="$HBASE_OPTS -Djava.library.path=${_HADOOP_NATIVE_LIB}"
# Unset HADOOP_HOME to avoid importing HADOOP installed cluster related configs like: /usr/hdp/2.2.0.0-2041/hadoop/conf/
export HADOOP_HOME=/usr/lib/ams-hbase/
# Explicitly Setting HBASE_HOME for AMS HBase so that there is no conflict
export HBASE_HOME=/usr/lib/ams-hbase/
... View more
12-23-2016
06:02 AM
Hi @Rahul Pathak I have tried, but without success(( Here ambari-metrics-collector.log 2016-12-23 08:49:16,046 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping phoenix metrics system...
2016-12-23 08:49:16,047 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: phoenix metrics system stopped.
2016-12-23 08:49:16,048 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: phoenix metrics system shutdown complete.
2016-12-23 08:49:16,048 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl: Stopping ApplicationHistory
2016-12-23 08:49:16,048 FATAL org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: Error starting ApplicationHistoryServer
org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.MetricsSystemInitializationException: Error creating Metrics Schema in HBase using Phoenix.
at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.PhoenixHBaseAccessor.initMetricSchema(PhoenixHBaseAccessor.java:470)
at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore.initializeSubsystem(HBaseTimelineMetricStore.java:94)
at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore.serviceInit(HBaseTimelineMetricStore.java:86)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:84)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:137)
at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:147)
Caused by: org.apache.phoenix.exception.PhoenixIOException: SYSTEM.CATALOG
at org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:111)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1292)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1257)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.createTable(ConnectionQueryServicesImpl.java:1453)
at org.apache.phoenix.schema.MetaDataClient.createTableInternal(MetaDataClient.java:2180)
at org.apache.phoenix.schema.MetaDataClient.createTable(MetaDataClient.java:865)
at org.apache.phoenix.compile.CreateTableCompiler$2.execute(CreateTableCompiler.java:194)
at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:343)
at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:331)
at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:329)
at org.apache.phoenix.jdbc.PhoenixStatement.executeUpdate(PhoenixStatement.java:1421)
at org.apache.phoenix.query.ConnectionQueryServicesImpl$13.call(ConnectionQueryServicesImpl.java:2378)
at org.apache.phoenix.query.ConnectionQueryServicesImpl$13.call(ConnectionQueryServicesImpl.java:2327)
at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:78)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:2327)
at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:233)
at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.createConnection(PhoenixEmbeddedDriver.java:142)
at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:202)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:270)
at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.query.DefaultPhoenixDataSource.getConnection(DefaultPhoenixDataSource.java:82)
at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.PhoenixHBaseAccessor.getConnection(PhoenixHBaseAccessor.java:376)
at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.PhoenixHBaseAccessor.getConnectionRetryingOnException(PhoenixHBaseAccessor.java:354)
at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.PhoenixHBaseAccessor.initMetricSchema(PhoenixHBaseAccessor.java:398)
... 8 more
Caused by: org.apache.hadoop.hbase.TableNotFoundException: SYSTEM.CATALOG
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1264)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1162)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1146)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1103)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:938)
at org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:83)
at org.apache.hadoop.hbase.client.HTable.getRegionLocation(HTable.java:504)
at org.apache.hadoop.hbase.client.HTable.getKeysAndRegionsInRange(HTable.java:720)
at org.apache.hadoop.hbase.client.HTable.getKeysAndRegionsInRange(HTable.java:690)
at org.apache.hadoop.hbase.client.HTable.getStartKeysInRange(HTable.java:1757)
at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1712)
at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1692)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1275)
... 31 more
2016-12-23 08:49:16,052 INFO org.apache.hadoop.util.ExitUtil: Exiting with status -1
2016-12-23 08:49:16,069 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down ApplicationHistoryServer at hdp-nn2.hostname/10.255.242.181
************************************************************/
2016-12-23 08:49:16,115 WARN org.apache.hadoop.hbase.io.util.HeapMemorySizeUtil: hbase.regionserver.global.memstore.upperLimit is deprecated by hbase.regionserver.global.memstore.size
... View more