Created on 12-05-2017 11:35 AM - edited 08-17-2019 07:44 PM
Hello I have the following blueprint:
{ "Blueprints": { "stack_name": "HDP", "stack_version": "2.6" }, "host_groups": [ { "name": "namenode1", "cardinality" : "1", "components": [ { "name" : "HST_AGENT" }, { "name" : "HDFS_CLIENT" }, { "name" : "ZKFC" }, { "name" : "ZOOKEEPER_SERVER" }, { "name" : "HST_SERVER" }, { "name" : "HBASE_CLIENT"}, { "name" : "METRICS_MONITOR" }, { "name" : "JOURNALNODE" }, { "name" : "HBASE_MASTER"}, { "name" : "NAMENODE" }, { "name" : "APP_TIMELINE_SERVER" }, { "name" : "METRICS_GRAFANA" } ] }, { "name": "namenode2", "cardinality" : "1", "components": [ { "name" : "ACTIVITY_EXPLORER" }, { "name" : "HST_AGENT" }, { "name" : "HDFS_CLIENT" }, { "name" : "ZKFC" }, { "name" : "ZOOKEEPER_SERVER" }, { "name" : "HBASE_CLIENT"}, { "name" : "HISTORYSERVER" }, { "name" : "METRICS_MONITOR" }, { "name" : "JOURNALNODE" }, { "name" : "HBASE_MASTER"}, { "name" : "NAMENODE" }, { "name" : "METRICS_COLLECTOR" } ] }, { "name": "namenode3", "cardinality" : "1", "components": [ { "name" : "ACTIVITY_ANALYZER" }, { "name" : "HST_AGENT" }, { "name" : "MAPREDUCE2_CLIENT" }, { "name" : "YARN_CLIENT" }, { "name" : "HDFS_CLIENT" }, { "name" : "ZOOKEEPER_SERVER" }, { "name" : "HBASE_CLIENT"}, { "name" : "METRICS_MONITOR" }, { "name" : "JOURNALNODE" }, { "name" : "RESOURCEMANAGER" } ] }, { "name": "hosts_group", "cardinality" : "3", "components": [ { "name" : "NODEMANAGER" }, { "name" : "HST_AGENT" }, { "name" : "MAPREDUCE2_CLIENT" }, { "name" : "YARN_CLIENT" }, { "name" : "HDFS_CLIENT" }, { "name" : "HBASE_REGIONSERVER"}, { "name" : "DATANODE" }, { "name" : "HBASE_CLIENT"}, { "name" : "METRICS_MONITOR" }, { "name" : "ZOOKEEPER_CLIENT" } ] } ], "configurations": [ { "core-site": { "properties" : { "fs.defaultFS" : "hdfs://HACluster", "ha.zookeeper.quorum": "%HOSTGROUP::namenode1%:2181,%HOSTGROUP::namenode2%:2181,%HOSTGROUP::namenode3%:2181", "hadoop.proxyuser.yarn.hosts": "%HOSTGROUP::namenode2%,%HOSTGROUP::namenode3%" } } }, { "hdfs-site": { "properties" : { "dfs.client.failover.proxy.provider.HACluster" : "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider", "dfs.ha.automatic-failover.enabled" : "true", "dfs.ha.fencing.methods" : "shell(/bin/true)", "dfs.ha.namenodes.HACluster" : "nn1,nn2", "dfs.namenode.http-address" : "%HOSTGROUP::namenode1%:50070", "dfs.namenode.http-address.HACluster.nn1" : "%HOSTGROUP::namenode1%:50070", "dfs.namenode.http-address.HACluster.nn2" : "%HOSTGROUP::namenode2%:50070", "dfs.namenode.https-address" : "%HOSTGROUP::namenode1%:50470", "dfs.namenode.https-address.HACluster.nn1" : "%HOSTGROUP::namenode1%:50470", "dfs.namenode.https-address.HACluster.nn2" : "%HOSTGROUP::namenode2%:50470", "dfs.namenode.rpc-address.HACluster.nn1" : "%HOSTGROUP::namenode1%:8020", "dfs.namenode.rpc-address.HACluster.nn2" : "%HOSTGROUP::namenode2%:8020", "dfs.namenode.shared.edits.dir" : "qjournal://%HOSTGROUP::namenode1%:8485;%HOSTGROUP::namenode2%:8485;%HOSTGROUP::namenode3%:8485/mycluster", "dfs.nameservices" : "HACluster" } } }, { "yarn-site": { "properties": { "yarn.resourcemanager.ha.enabled": "true", "yarn.resourcemanager.ha.rm-ids": "rm1,rm2", "yarn.resourcemanager.hostname.rm1": "%HOSTGROUP::namenode2%", "yarn.resourcemanager.hostname.rm2": "%HOSTGROUP::namenode3%", "yarn.resourcemanager.webapp.address.rm1": "%HOSTGROUP::namenode2%:8088", "yarn.resourcemanager.webapp.address.rm2": "%HOSTGROUP::namenode3%:8088", "yarn.resourcemanager.webapp.https.address.rm1": "%HOSTGROUP::namenode2%:8090", "yarn.resourcemanager.webapp.https.address.rm2": "%HOSTGROUP::namenode3%:8090", "yarn.resourcemanager.recovery.enabled": "true", "yarn.resourcemanager.store.class": "org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore", "yarn.resourcemanager.zk-address": "%HOSTGROUP::namenode1%:2181,%HOSTGROUP::namenode2%:2181,%HOSTGROUP::namenode3%:2181", "yarn.client.failover-proxy-provider": "org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider", "yarn.resourcemanager.cluster-id": "yarn-cluster", "yarn.resourcemanager.ha.automatic-failover.zk-base-path": "/yarn-leader-election" } } }, { "hdfs-site" : { "properties_attributes" : { }, "properties" : { "dfs.datanode.data.dir" : "/mnt/secondary1,/mnt/secondary2" } } }, { "hadoop-env" : { "properties_attributes" : { }, "properties" : { "namenode_heapsize" : "2048m" } } }, { "activity-zeppelin-shiro": { "properties": { "users.admin": "admin" } } }, { "hbase-site" : { "properties" : { "hbase.rootdir" : "hdfs://HACluster/apps/hbase/data" } } } ] }
{ "blueprint":"HACluster", "default_password":"admin", "host_groups": [ { "name": "namenode1", "hosts": [ { "fqdn": "namenode1" } ] }, { "name": "namenode2", "hosts": [ { "fqdn": "namenode2" } ] }, { "name": "namenode3", "hosts": [ { "fqdn": "namenode3" } ] }, { "name": "hosts_group", "hosts": [ { "fqdn": "datanode1" }, { "fqdn": "datanode2" }, { "fqdn": "datanode3" } ] } ] }
When I launch this configuration, HBase is the only service that doesn´t work. I get the following errors (screeshot attached).
What I am missing?
Thank you.
Created 12-05-2017 02:32 PM
I found the error... By default "hbase_regionserver_heapsize" was set to 4096m, greater than my server, therefore, regionservers were not able to start.
I Changed that value to 1024 and everything went ok!
"hbase_regionserver_heapsize" : "4096m", "hbase_regionserver_heapsize" : "1024",
Created 12-05-2017 11:54 AM
Can you please attach the region server logs located under (/var/log/hbase/hbase-hbase-regionserver-{hostname}.log)
Thanks,
Aditya
Created 12-05-2017 12:00 PM
This is the output:
$ cat /var/log/hbase/hbase-hbase-regionserver-namenode1.log
2017-12-05 12:11:29,525 INFO [timeline] availability.MetricSinkWriteShardHostnameHashingStrategy: Calculated collector shard namenode2 based on hostname: namenode1 2017-12-05 12:15:09,962 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=1.67 MB, freeSize=1.59 GB, max=1.59 GB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=29, evicted=0, evictedPerRun=0.0 2017-12-05 12:20:09,961 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=1.67 MB, freeSize=1.59 GB, max=1.59 GB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=59, evicted=0, evictedPerRun=0.0 2017-12-05 12:25:09,961 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=1.67 MB, freeSize=1.59 GB, max=1.59 GB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=89, evicted=0, evictedPerRun=0.0 2017-12-05 12:30:09,961 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=1.67 MB, freeSize=1.59 GB, max=1.59 GB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=119, evicted=0, evictedPerRun=0.0 2017-12-05 12:35:09,961 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=1.67 MB, freeSize=1.59 GB, max=1.59 GB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=149, evicted=0, evictedPerRun=0.0 2017-12-05 12:40:09,961 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=1.67 MB, freeSize=1.59 GB, max=1.59 GB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=179, evicted=0, evictedPerRun=0.0 2017-12-05 12:45:09,961 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=1.67 MB, freeSize=1.59 GB, max=1.59 GB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=209, evicted=0, evictedPerRun=0.0 2017-12-05 12:50:09,961 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=1.67 MB, freeSize=1.59 GB, max=1.59 GB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=239, evicted=0, evictedPerRun=0.0 2017-12-05 12:55:09,961 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=1.67 MB, freeSize=1.59 GB, max=1.59 GB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=269, evicted=0, evictedPerRun=0.0
$ cat /var/log/hbase/hbase-hbase-regionserver-datanode1.log
Tue Dec 5 12:08:28 CET 2017 Starting regionserver on datanode1 core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 13671 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 32000 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 16000 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited Tue Dec 5 12:18:37 CET 2017 Starting regionserver on datanode1 core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 13671 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 32000 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 16000 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited Tue Dec 5 12:21:20 CET 2017 Starting regionserver on datanode1 core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 13671 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 32000 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 16000 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited Tue Dec 5 12:46:37 CET 2017 Starting regionserver on datanode1 core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 13671 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 32000 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 16000 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
Created 12-05-2017 12:04 PM
I do not see any errors in the above logs. Can you do a tail on these logs and restart the region servers to see if there are any ERROR logs printed. That would be helpful for debugging.
Created 12-05-2017 12:16 PM
When region servers are restarted just the following is displayed when tail (same as obove) :(:
Tue Dec 5 13:11:53 CET 2017 Starting regionserver on datanode2 core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 13671 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 32000 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 16000 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
Created 12-05-2017 02:32 PM
I found the error... By default "hbase_regionserver_heapsize" was set to 4096m, greater than my server, therefore, regionservers were not able to start.
I Changed that value to 1024 and everything went ok!
"hbase_regionserver_heapsize" : "4096m", "hbase_regionserver_heapsize" : "1024",