Member since
03-09-2018
8
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4167 | 02-03-2020 05:52 AM |
02-03-2020
05:52 AM
1 Kudo
Ok, it seems I have fixed the issue myself: Somehow after the HDFS HA setup there have been two Hive metastore processes running on the same host. I stopped Spark2 and Hive services through Ambari, killed (SIGTERMED) the additional Metastore process, which was still running and started Hive and Spark2 services again.
... View more
01-31-2020
02:42 AM
Hi,
I configured HDFS HA on our HDP 3.1 cluster and since then the Spark2 Thrift server fails to start.
Our Spark setup still uses the Hive metastore (MySql database)
I did the HA configuration process according to the HDP / Cloudera documentation through Ambari and also ran
hive --config /etc/hive/conf/conf.server --service metatool -updateLocation hdfs://<nameservice-id>/apps/hive/warehouse hdfs://<namenode>:8020/apps/hive/warehouse
and
hive --config /etc/hive/conf/conf.server --service metatool -updateLocation hdfs://<nameservice-id>/apps/spark/warehouse hdfs://<namenode>:8020/apps/spark/warehouse
I suspect that the new namenode-id is still missing in some configuration entry somewhere.
The file /var/log/spark2/spark-spark-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1...log contains the following exceptions:
20/01/31 11:27:13 INFO HiveClientImpl: Warehouse location for Hive client (versi
on 3.0.0) is /apps/spark/warehouse
20/01/31 11:27:13 INFO HiveMetaStoreClient: Trying to connect to metastore with
URI thrift://annamaster.lan.riscsw.shp:9083
20/01/31 11:27:13 INFO HiveMetaStoreClient: Opened a connection to metastore, cu
rrent connections: 1
20/01/31 11:27:13 INFO HiveMetaStoreClient: Connected to metastore.
20/01/31 11:27:13 INFO RetryingMetaStoreClient: RetryingMetaStoreClient proxy=cl
ass org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=spark (auth:SIMPLE) retries=1 delay=5 lifetime=0
20/01/31 11:27:13 WARN RetryingMetaStoreClient: MetaStoreClient lost connection. Attempting to reconnect (1 of 1) after 5s. getDatabase
org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_database(ThriftHiveMetastore.java:1163)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_database(ThriftHiveMetastore.java:1150)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:1607)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:1602)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
at com.sun.proxy.$Proxy28.getDatabase(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2956)
at com.sun.proxy.$Proxy28.getDatabase(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1875)
at org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1864)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$databaseExists$1.apply$mcZ$sp(HiveClientImpl.scala:346)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$databaseExists$1.apply(HiveClientImpl.scala:346)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$databaseExists$1.apply(HiveClientImpl.scala:346)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:278)
at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:216)
at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:215)
at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:261)
at org.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:345)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:195)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:194)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)
at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:53)
at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.org$apache$spark$sql$hive$thriftserver$HiveThriftServer2$$startHiveThriftServer2(HiveThriftServer2.scala:106)
at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:100)
at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
20/01/31 11:27:18 INFO RetryingMetaStoreClient: RetryingMetaStoreClient trying reconnect as spark (auth:SIMPLE)
20/01/31 11:27:18 INFO HiveMetaStoreClient: Closed a connection to metastore, current connections: 0
20/01/31 11:27:18 INFO HiveMetaStoreClient: Trying to connect to metastore with URI thrift://annamaster.lan.riscsw.shp:9083
20/01/31 11:27:18 INFO HiveMetaStoreClient: Opened a connection to metastore, current connections: 1
20/01/31 11:27:18 INFO HiveMetaStoreClient: Connected to metastore.
20/01/31 11:27:18 WARN HiveClientImpl: HiveClient got thrift exception, destroying client and retrying (0 tries remaining)
org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransportException
at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1879)
at org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1864)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$databaseExists$1.apply$mcZ$sp(HiveClientImpl.scala:346)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$databaseExists$1.apply(HiveClientImpl.scala:346)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$databaseExists$1.apply(HiveClientImpl.scala:346)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:278)
at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:216)
at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:215)
at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:261)
at org.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:345)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:195)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:194)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)
at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:53)
at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.org$apache$spark$sql$hive$thriftserver$HiveThriftServer2$$startHiveThriftServer2(HiveThriftServer2.scala:106)
at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:100)
at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_database(ThriftHiveMetastore.java:1163)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_database(ThriftHiveMetastore.java:1150)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:1607)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:1602)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
at com.sun.proxy.$Proxy28.getDatabase(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2956)
at com.sun.proxy.$Proxy28.getDatabase(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1875)
... 30 more
20/01/31 11:27:23 WARN HiveClientImpl: Deadline exceeded
Exception in thread "main" org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransportException;
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:194)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)
at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:53)
at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.org$apache$spark$sql$hive$thriftserver$HiveThriftServer2$$startHiveThriftServer2(HiveThriftServer2.scala:106)
at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:100)
at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransportException
at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1879)
at org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1864)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$databaseExists$1.apply$mcZ$sp(HiveClientImpl.scala:346)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$databaseExists$1.apply(HiveClientImpl.scala:346)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$databaseExists$1.apply(HiveClientImpl.scala:346)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:278)
at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:216)
at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:215)
at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:261)
at org.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:345)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:195)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
... 17 more
Caused by: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_database(ThriftHiveMetastore.java:1163)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_database(ThriftHiveMetastore.java:1150)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:1607)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:1602)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
at com.sun.proxy.$Proxy28.getDatabase(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2956)
at com.sun.proxy.$Proxy28.getDatabase(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1875)
... 30 more
20/01/31 11:27:23 INFO SparkContext: Invoking stop() from shutdown hook
... View more
Labels:
03-15-2019
08:10 AM
Hi, thanks very much that resolved the issue.
... View more
03-14-2019
04:28 PM
I did shutdown all HDP Services and subsequently ambari-agents and ambari-server on a cluster due to some scheduled network maintainance. When starting the ambari-server and ambari-agents again, the ambari-agent on the headnode (also running ambari-server) did not register with the ambari-server (the server is not getting a request at all). The ambari-agent on the slavenode registers fine. Ambari version is 2.6.1.0 (recently upgraded according to HDP documentation, did work after that), HDP version 2.6.0.3 Python 2.7.5 / CentOS7 on headnode (ambari running as root): ambari-agent start produces Verifying Python version compatibility... Using python /usr/bin/python Checking for previously running Ambari Agent... Checking ambari-common dir... Starting ambari-agent Verifying ambari-agent process status... Ambari Agent successfully started Agent PID at: /run/ambari-agent/ambari-agent.pid Agent out at: /var/log/ambari-agent/ambari-agent.out Agent log at: /var/log/ambari-agent/ambari-agent.log ambari-agent status ambari-agent currently not running ps aux | grep ambari_agent root 15892 0.0 0.0 238584 17624 pts/4 S 11:29 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/ambari_agent/AmbariAgent.py start root 15900 0.0 0.0 312740 18292 pts/4 Sl 11:29 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/ambari_agent/main.py start There is no pid-file for ambari-agent, the directory /run/ambari-agent exists and the permissions (root:root 755) are fine, no stale pid file cat /var/log/ambari-agent/ambari-agent.log .... INFO 2019-03-14 13:09:26,515 main.py:145 - loglevel=logging.INFO INFO 2019-03-14 13:09:26,516 main.py:145 - loglevel=logging.INFO INFO 2019-03-14 13:09:26,516 main.py:145 - loglevel=logging.INFO INFO 2019-03-14 13:09:26,517 DataCleaner.py:39 - Data cleanup thread started INFO 2019-03-14 13:09:26,518 DataCleaner.py:120 - Data cleanup started INFO 2019-03-14 13:09:26,523 DataCleaner.py:122 - Data cleanup finished However, the loglevel of ambari-agent should be DEBUG: cat /etc/ambari-agent/conf/ambari-agent.ini ... loglevel=DEBUG ... - How can I increase the log-level of ambari-agent? - ambari-agent seems to be running, but not creating a PID file. - according to the ambari-server log, there seems to be no connection attempt from the ambari-agent. I did reinstall the ambari-agent, but that did not help. Any help appreciated.
... View more
Labels:
- Labels:
-
Apache Ambari
03-09-2018
07:44 PM
Using HDP-2.6.0.3 I ran hbase org.apache.hadoop.hbase.util.LoadTestTool -compression NONE -write 8:8 -num_keys 1048576 generating an HBase table with the following characteristics: 1048576 rows row key length 39 bytes 8 columns/row with a mean size of 8 bytes each that should sum up to a storage requirement of approximately 1048576*(39+8*8) = 108003328 bytes =~ 103 MB When I check the storage usage for that HBase table in HDFS: hdfs dfs -du -h -s /apps/hbase/data/data/default/cluster_test gives 853.7 M /apps/hbase/data/data/default/cluster_test I have a HDFS replication factor of 3, however hdfs dfs -du should give the disk usage "before" replication anyways. HBase Region replication for the table is 1: hbase(main):001:0> describe 'cluster_test' Table cluster_test is ENABLED cluster_test, {TABLE_ATTRIBUTES => {DURABILITY => 'USE_DEFAULT', REGION_REPLICAT ION => '1'} COLUMN FAMILIES DESCRIPTION {NAME => 'test_cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 1 row(s) in 0.2240 seconds https://community.hortonworks.com/questions/46350/how-much-actual-space-required-to-store-10gb-to-hd.html mentiones higher disk usage in hbase. http://blog.cloudera.com/blog/2010/08/hadoophbase-capacity-planning/ mentiones doubling of disk usage during compactions. Could someone please clarify why the HBase table data is using nearly 9x the space in HDFS compared to the actual data being stored in the table? What do I miss here?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache HBase