Member since
03-01-2016
104
Posts
95
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
927 | 06-03-2018 09:22 PM | |
15026 | 05-21-2018 10:31 PM | |
1089 | 10-19-2016 07:13 AM |
07-14-2020
02:47 PM
@Thirupathi These articles were written keeping HDP 2.6.x versions in mind. With HDP 3 and CDH6 having Phoenix 5.0 , many issues have been resolved. But I cannot comment on case to case basis here. You will need to log a support ticket for more comprehensive discussion on specific JIRA basis.
... View more
07-02-2019
08:10 AM
1 Kudo
In HDP 2.x we had hbck tool which was used to identify and fix inconsistencies in hbase tables. However with HBase 2.x and HDP 3.x together this tool is deprecated and its strongly advised to not run it on production clusters as it may lead to data corruption. HBase 2.0 introduces a completely revamped HBase Assignment Manager with Proc V2 framework (State machine). In HDP-2.x, we maintained HBASE_REGION_STATE and TABLE_STATE in zookeeper and Hmaster in-memory ( possibility for inconsistencies) but with HDP-3 it uses MasterProcWAL procedure which is stored in HDFS. These set of procedures are now scheduled and executed by HMaster. Given the changes on internals of new AssignmentManager , the old hbck fix options no longer work, but it can still provide a report about tables/regions states. To fix regions assignment problems that may still occur, a new HBCK 2 has been designed, and shipped independently from hbase build artifacts (and hence not part of HDP 3.x distribution ), as a plan to have it able to evolve on its own, with new fix options for previously unforeseen issues added and integrated into the tool without the need of whole new hbase release. This HBCK2 tool as of now is not as intuitive and useful as old hbck but it helps fix issues related to Proc v2 (introduced in HBase 2.0) and other issues such as region in transition. Note: HBCK2 tool requires minimum HBase 2.0.3 which comes starting HDP 3.1 To build hbase-hbck2: # cd <work directory> # git clone https://github.com/apache/hbase-operator-tools.git # cd hbase-operator-tools/ # mvn clean install -DskipTests # cd hbase-hbck2/target # Download hbase-hbck2-1.0.0-SNAPSHOT.jar Running HBCK2 Once HBCK2 has been uploaded to the target cluster, it can be executed by specifying its jar path in the "-j" option of "hbck" command as below: # su - hbase $ hbase hbck -j <jar path>/hbase-hbck2-1.0.0-SNAPSHOT.jar <COMMAD> You may encounter following error running in HDP 3.0 Exception in thread "main" java.lang.UnsupportedOperationException: Requires 2.0.3 at least. Use -s option, instead. $ hbase hbck -j <jar path>/hbase-hbck2-1.0.0-SNAPSHOT.jar -s <COMMAND> Reference: HBASE-14350 HBASE-19121 HBase-Operator-Tool Special mention and thanks to Karthik Palanisamy for critical inputs here.
... View more
- Find more articles tagged with:
- Data Processing
- FAQ
- HBase
Labels:
11-17-2018
01:22 AM
3 Kudos
HBase, Phoenix And Ranger In Part 1, Part 2 and Part3 of this article series , we discussed internals of Phoenix Index maintenance and major issues hit around this feature. In this article we will discuss about Phoenix - Ranger relationship , how it works and what had broken until recently which caused reporting of several issues. How native HBase authorization work: ACLs in HBase are implemented as a coprocessor called AccessController. (hbase.security.authorization=true). Users are granted specific permissions such as Read, Write, Execute, Create, Admin against resources such as global, namespaces, tables, cells, or endpoints. ( all self explanatory) There is an additional user called “Superuser”. Superusers can perform any operation available in HBase, to any resource. The user who runs HBase on your cluster is a superuser, as are any principals assigned to the configuration property hbase.superuser in hbase-site.xml. Much more details on this subject are here How things Changed with Ranger HBase plugin enabled ? Once Ranger is involved, one can create policies for HBase from Ranger Policy Manager or via Grant / Revoke commands from HBase shell only. These Grant / Revoke commands are mapped to ranger policies and Ranger intercepting appropriate commands from hbase shell adds or edits ranger policies according to user/group and resource information provided in command. And of course, the user running these commands must be an admin user. It has been seen that using grant / revoke commands which are mapped with Ranger create multiple issues and creation of redundant or conflicting policies. Thus we have an option to disable this feature completely and allow use of only Ranger Policy Manager to manage permissions. You can disable the command route by setting following parameter in Ranger configs (ranger-hbase-security.xml): <property> <name>xasecure.hbase.update.xapolicies.on.grant.revoke</name>
<value>false</value>
<description> Should HBase plugin update Ranger policies for updates to permissions done using GRANT/REVOKE? </description> </property> How it works in Phoenix with Ranger: Simply put, having a Phoenix table means an existence of HBase table as well and therefore any permissions required to access that HBase table are also required for this Phoenix table. But this not a complete truth, Phoenix has something called as SYSTEM tables which manage table metadata, and thus users also need to have sufficient permissions on these system tables to be able to login to Phoenix shell, view, create, delete tables etc. By design, only the first ever user connecting to Phoenix needs the CREATE permission on all SYSTEM tables. This is a first-time operation so that system tables get created if not created already. For every other time, regular users should require READ on the system tables. For users requiring to create tables in Phoenix would need WRITE as well. But this functionality broke due to PHOENIX-3652 (partly fixed in HDP 2.6.1) and other ranger level complexities and due to this Phoenix expected full permissions on system tables. Users observed any of the following exceptions either during phoenix shell launch or during any DDL operation: Error: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions (user=test@HWX.COM scope=SYSTEM:CATALOG, family=0:SALT_BUCKETS, params=[table=SYSTEM:CATALOG,family=0:SALT_BUCKETS],action=WRITE) OR Error: org.apache.hadoop.hbase.security.AccessDeniedException: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions for user 'test@EXAMPLE.COM' (action=admin) To get this working temporarily, users created a policy in ranger and gave all access to these system tables as follows:
Table : SYSTEM.* Column Family : * Column : * Groups : public Permissions : Read, Write, Create, Admin Now this was all good in an ideal world, but in real world it raises lot of security issues, customers do not want users to have all access on these system tables due to the obvious fear of manipulation on user tables and their metadata. To take care of this concern, our developers started working on PHOENIX-4198 (fix available with HDP 2.6.3) where there would be a need for giving only RX permissions on SYSTEM.CATALOG table and rest of the authorization part would be done by a coprocessor endpoint querying either Ranger or native HBase ACLs appropriately. Important to know that this feature does not support working with Ranger yet. (Work In Progress) However, above feature was specifically designed for system.catalog and users reported issues for system.stats as well where write permissions to users were required in order to drop a table. This has been reported in PHOENIX-4753 and the issue is still unresolved. You may see following exceptions: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions (user=user01t01@EXAMPLE.COM, scope=SYSTEM:STATS, family=0:, params=[table=SYSTEM:STATS,family=0:],action=WRITE) Here again, workaround would be to give this user or group a write permission to system.stats. grant '@group', 'RWX' , 'SYSTEM:STATS' Also See : Part 1, Part 2 , Part3
... View more
- Find more articles tagged with:
- Data Processing
- HBase
- How-ToTutorial
- index
- Phoenix
- Ranger
Labels:
11-17-2018
12:56 AM
4 Kudos
Issues with Global Indexes In Part 1 and Part 2 of this article series we discussed Index internals and some frequently faced issues. In this article we will cover few more index issues in the form of scenarios. Scenario 4 : Index writes are failing, client retries exhaused, handler pool saturated while Index table regions are in transition. Here client is trying to write to data table on server1 which is triggering index update for server2 (via server-to-server RPC) . If Index region is stuck in transition, the index update RPC would also hang and eventually timeout and because of this the RPCs between client and server1 also gets stuck and timed out. Since client would make several retries to write this mutation, it would again lead to handler saturation on region server one causing another “deadlock” like situation. This should be fixed in two steps:
Fix all Index RITs first as without this no client index maintenance or server side index rebuild would succeed. As a holistic tuning, keep server side RPC timeout (hbase.rpc.timeout) relatively smaller than phoenix client side timeout (phoenix.query.timeoutMs). This is so that server side RPCs are not stuck due to hung client side queries. Scenario 5 : Row count mismatch between Phoenix data table and Index table When data table is bulk loaded for existing primary keys. There is a limitation in CSV BulkLoad for phoenix tables with secondary index. We must know that when an index update is carried out from data table server to index table server, the first step is to retrieve existing row state from index table , delete it and then insert the updated row.However CSV bulkload does not perform this check and delete steps and directly upserts the data to index table, thus making duplication of rows for same primary key. As of writing this article, the only workaround was to delete index and build it fresh using IndexTool (async way). Scenario 6: Region servers crashing, Index table disabled and ZK connections maxing out In some cases, it was seen that region servers crashed due to long GC pauses , Index updates to other servers failed due to exceptions such as "Unable to create Native Threads" eventually leading to Index table going into "disabled" state. It was also observed that ZK connections were also maxing out from Region Servers. ("Too Many Connections" in ZK log) There could be many intertwined reasons of what issue triggered the other issue but PHOENIX-4685 was seen to be playing part in many of such issues. Basically region servers (in an attempt to update index ) create sessions with zookeeper in order to do meta lookup, and this connection cache is maintained in region server heap which eventually grows large and causes GC pauses leading to server crashes, once region server crashes, Index update fails on this server and Index goes into disabled state and the vicious circle continues. A careful examination of situation and detailed log analysis is required though to come to a conclusion on this bug. In Part 4 of this article series, we will talk about Phoenix - Ranger relationship. Also See : Part 1, Part 2
... View more
- Find more articles tagged with:
- Data Processing
- HBase
- How-ToTutorial
- index
- Phoenix
Labels:
11-17-2018
12:22 AM
5 Kudos
Issues with Global Indexes In Part 1 of this article series, we discussed the internals of Index maintenance, in this part we will cover some of the major issues we face during the life cycle of Index maintenance. Before we get into issues, we need to understand various “states” of Index table which reflect its health in general. BUILDING("b") : This will partially rebuild the index from the last disabled timestamp UNUSABLE (“d”) / INACTIVE ("i") : This will cause the index to no longer be considered for use in queries, however index maintenance will continue to be performed. ACTIVE("a") : Index ready to use and updated. DISABLE("x") : This will cause the no further index maintenance to be performed on the index and it will no longer be considered for use in queries. REBUILD("r") : This will completely rebuild the index and upon completion will enable the index to be used in queries again. --- What happens when an Index update fails for any reason : The answer is not straight as there are choices of implementations here based on use case or table types. Following are two choices we have: Choice 1: Block writes to data table but let Index continue to serve read requests. Maintain a point of “consistency” in the form of a timestamp just before failure occurred. Keep the write block until Index table is rebuilt in the background and gets in sync with data table again. Properties involved are: phoenix.index.failure.block.write=true phoenix.index.failure.handling.rebuild=true This option is not yet available in HDP 2 but available with HDP 3.0 Choice 2: Writes to the data table are not stopped but the index table in question is disabled to be detected by rebuilder threads (pushed from system.catalog hosting server ), converted as “inactive” and partially rebuilt again. In this mode, Index table will not serve any requests to clients.This is the implementation we are using with HDP 2 . Properties involved are: phoenix.index.failure.handling.rebuild=true phoenix.index.failure.handling.rebuild.interval=10000 (or 10 seconds, interval in which server checks if any index table needs partial rebuild ) phoenix.index.failure.handling.rebuild.overlap.time=1 (time to go back before index_disable_timestamp to be able to rebuild from that point) ---- Few Scenarios for Troubleshooting issues: There are various scenarios which could help us gain more insight into how Index maintenance, update and failure handling is done in Phoenix. (and we will only talk about choice 2 above) Scenario 1: Index update is written to WAL and before being written to data or Index table region server hosting data table crashes. WAL is replayed and Index updates are committed via server-to-server RPC Scenario 2 : Data table is written however server-to-server RPC to Index table fails This is where state of Index table will change to disabled. A rebuilder thread in server hosting system.catalog table keeps checking on these index states, as soon as it detects a “disabled” index table, it starts rebuild process by first marking this table as “Inactive” , then running rebuild scan on data table regions and finally making index updates via server to server RPCs. Client Queries during this time only refer data table. Here it's good to know about “INDEX_DISABLE_TIMESTAMP” , It is the timestamp at which index got disabled. It will be 0 , if the index is active or disabled by client manually and will be non-zero if index is disabled during write failures. Thus rebuild will only happen after disabled timestamp updates. One can use following query to check the value of this column: select TABLE_NAME, cast(INDEX_DISABLE_TIMESTAMP as timestamp) from SYSTEM.CATALOG where index_state is not null limit 10;
+------------------------+----------------------------------------
+ | TABLE_NAME | TO_TIMESTAMP(INDEX_DISABLE_TIMESTAMP) |
+------------------------+----------------------------------------
+ | TEST_INDEX_PERF | 2018-05-26 10:28:54.079 |
| TEST1_INDEX_PERF | 2018-05-26 10:28:54.079
| +------------------------+----------------------------------------
+ 2 rows selected (0.089 seconds) Once rebuild completes in background, Index table’s state changes back to “active”. All this while data table keeps serving read and write requests. Scenario 3 : Index went into disabled state, HBase became unresponsive, handlers are saturated (verified from Grafana), Queries are dead slow and nothing is moving. Let's break this down into a sequence of most probable events:
Multiple Client writing to region server 1 (data table) using all of the default handlers. Now there are no handlers left on region server 1 to write the index update to region server 2 which hosts Index table regions. Since index update is not written on RS2, client RPC on RS1 does not free up (and if situation continues, times out after hbase.rpc.timeout ) Because Index update failed, Index table goes into disabled state. Rebuilder threads detect disabled state of Index and start rebuilding this table subsequently contesting for the same default handler pool aggravating this situation further. This is a very common “deadlock” scenario and users struggle to find what caused all these issues and where to start fixing them. In computer science, this situation is also known as “dining philosophers problem”. Above sequence of events could cause some or all of the possible issues:
queries getting hung or timed out region servers getting unresponsive clients unable to login to phoenix shell long GC pauses (due to large number of objects creation ) Point “4” above would eventually break the session with zookeeper and may bring the region server down. What is the solution to this problem ? Since we had a common pool of default handlers for client and servers both which caused these issues, it was decided to create a dedicated Index handler pool and a custom RPC scheduler for the same. Also add custom RPC controller to the chain of controllers. This would filter outgoing index RPCs and tag them for higher priority. Following parameters were expected be added for the same (already part of HDP 2.6): <property> <name>hbase.region.server.rpc.scheduler.factory.class</name>
<value>org.apache.hadoop.hbase.ipc.PhoenixRpcSchedulerFactory</value>
</property><property><name>hbase.rpc.controllerfactory.class</name>
<value>org.apache.hadoop.hbase.ipc.controller.ServerRpcControllerFactory</value></property> However, there was another issue introduced (PHOENIX-3360, PHOENIX-3994) due to these added parameters. Since clients also shared the same hbase-site.xml with these additional parameters , they started sending normal requests tagged with index priority. Similarly Index rebuild scans also sent their RPCs tagged with index priority and using Index handler pool which is not what it was designed for and led many users to another “deadlock” situation where index writes would fail because most index handlers are busy doing rebuild scans or being used by clients. The fix to PHOENIX-3994 (part of HDP 2.6.5) would remove dependencies on these parameters for index priority and hence these parameters would neither be needed at server side nor at client side. However Ambari still adds these parameters and could still create issues. A quick heck would be to remove these two properties from all the client side hbase-site.xml files. For clients such as NIFI which source hbase-site.xml from phoenix-client jars, it would be good to zip the updated hbase-site.xml in the jar itself. If you have many or large Index tables which require substantial number of RPCs, you can also define “phoenix.rpc.index.handler.count” in custom hbase-site.xml and give it an appropriate value proportional to the total handler counts you have defined. We will discuss couple more scenarios in Part 3 of this article series. Also See: Part 1, Part4
... View more
- Find more articles tagged with:
- Data Processing
- HBase
- How-ToTutorial
- index
- Phoenix
Labels:
11-17-2018
12:03 AM
6 Kudos
Phoenix secondary index Phoenix secondary Indexes are useful for point lookups or scans directed against non primary key of Phoenix or non row key columns of HBase. This saves the “full scan” of data table you would otherwise do if you intend to retrieve data based non rowkey. You create secondary indexes by choosing existing non primary key column from data table and making it as primary or a covered column. By covered column, we mean making exact copy of the covered column’s data from data table to index table. Types of secondary Index: Functional Index: Built on functions rather than just columns Global secondary Index: This is the one where we make a exact copy of covered columns and call it index table. In simple terms, its an upsert select on all chosen columns from data table to Index table. Since a lot of write is involved during initial stages of index creation, this type of index would work best for read heavy use cases where data is written only occasionally and read more frequently. There are two ways we can create global index : Sync way : In this data table is upsert selected and the rows transported over to client and client in turn writes to index table. Very cumbersome and error prone (timeouts etc) method. Async way: Index is written asynchronously via mapreduce job. Here index state becomes “building” and each single mapper works with each data table region and writes to index region. For specific commands on creating various types of indexes , refer here Thus, global index (above) assumes the following points : you have lot of available disk space to create several copies of data columns. You do not worry about write penalties (across network !) of creating indexes The query is fully covered i.e. all columns queried are part of index.Note that global index would not be used if a column is referred in query which is not part of index table (unless we use hint ) Local Index: What if none or some of above assumptions are not fulfilled ? Thats where local index becomes useful as it is part of data table itself in the form of a shadow column family (eliminating assumption 1) , best fit for write heavy use-cases (eliminating assumption 2) and can best be used for partially covered queries as data and index tables co reside. (eliminating assumption 3) For all practical purposes, I will talk about global index only as that is most common use case and most stable option so far. How global Index maintenance works To go into details of Index maintenance, we need to also know about another type of global secondary index : Immutable Global Secondary Index: This is the type of index where index is written once and never updated in-place. Only the client which writes to data table is responsible for writing to index table as well (at the same time ! ). Thus its purely client’s responsibility to maintain sync between data and index table. Use cases such as time series data or event logs can take advantage of immutable data and index tables. (create data table with IMMUTABLE_ROWS=true option and all index created would default to immutable) Mutable Global Secondary Index: Here index maintenance is done via server to server RPC (network and handler overhead remember ! ) between data table server and index table server. For simplicity , we can believe that if client was successfully able to write to data table, writes to index table also would have been completed by the data region server. However many issues around this aspect exist, which we will discuss in Part 2. There are two more varieties of tables called transactional tables and non transactional tables. Transactional tables intend to have atomic writes to data and index table (ACID compliant) and are still work in progress. Thus in next few sections and articles, for all practical purposes, we will talk about non transactional mutable global secondary indexes. Here are the steps involved in Index maintenance : Client submits “upsert” RPC to regionserver 1 The mutation is written to WAL (and thus makes it durable) so in case if region server crashes at this point or later , WAL replay syncs the Index table. ( If there is a write failure before this step, its client which is supposed to retry.) In preBatchMutate step (part of Phoenix Indexer box in above diagram) , Phoenix coprocessor prepares for writing index update to Region server 2 ( actually step 2 and 3 occur together ) The mutation is written to data table. In postBatchMutate step (also part of Phoenix Indexer box in above diagram ), Index update is committed on regionserver 2 via server-to-server RPC call. Understanding of these steps is very important because in Part 2 of this article series , we will discuss about various issues appearing in index maintenance, index going out of sync, index getting disabled , Queries slowing down, region servers getting unresponsive etc. References: https://phoenix.apache.org https://issues.apache.org Also See: Part 3, Part 4
... View more
- Find more articles tagged with:
- Data Processing
- How-ToTutorial
- index
- Phoenix
- secondary-index
Labels:
08-21-2018
10:35 PM
Phoenix shipped with HDP does not support import from Sqoop yet.
... View more
06-05-2018
08:36 PM
@John how many znodes you have on zookeeper ? One of the reasons for Keeper Exceptions are bigger request / response size between zookeeper and its client component.
... View more
06-03-2018
09:26 PM
@Rishabh Jain : Have you configured your odbc driver for ssl ? Check "ssl options" and enable ssl as well as self signed cert option.
... View more
06-03-2018
09:22 PM
@MB My responses: 1. Yes more datanodes always help , you can have data replicated across nodes by choosing replication factor of your choice. Default is 3. Having DN and NN with sufficient resources on node is not a bad idea.Critical prod clusters could have a dedicated NN to avoid IO overheads caused by DNs creating any potential issues for NN. 2. Not sure the context behind Cassandra but having different spec nodes in a cluster should not be a problem. For better management of resources , try creating config groups of nodes and allocate / isolate better spec nodes to components which are critical for your use case or which may need more resources. 3. Research more but I dont think it should be a problem. 4. Should not be.
... View more
06-03-2018
09:09 PM
@Satyendra Kumar , Please check datanode logs around the time of issue. Also confirm how many VMs have you added for datanodes ? Check /etc/hadoop/conf/dfs.exclude and see which node is excluded and why from its logs.
... View more
05-21-2018
10:31 PM
@Mike Wong 1. Is it a new or an existing cluster ? How many total nodes you have in cluster ? 2. Plz provide us output of following command from all zookeeper server nodes echo 'stat' | nc <ZK_HOST> 2181 Keeperexceptions could many times be due to large number of znode counts in zookeeper for various services. Also check zoo.cfg of all ZK nodes and verify if this file is identical across all nodes and hostnames for zk nodes referred are identical as well.
... View more
05-21-2018
10:23 PM
@Greenhorn Techie I think there is no best practice which can be suggested beforehand. Its incremental tuning which is required here based on what kind of jobs / workload you are expecting , which queues you consider to be critical ones , which queue could have predictable workload etc. For example , for a critical department / project's queue which generally will have heavy duty reporting job but would trigger only once a week, you can consider giving a higher maximum capacity such as 70 - 80 % (of overall capacity) so if and when required it could utilize "idle" cluster resources across its defined capacity. Again, this depends upon understanding overall cluster needs and business requirements. Unless required for a mission critical job, I'd not set very higher max capacity for queues in general as could prove to be bottleneck for remaining queues. With time and experience of how queue is used and what its need are, these values should be tuned incrementally.
... View more
05-16-2018
07:15 AM
1 Kudo
Sqoop has been popular to be a client interfacing between relational databases and Hive / HDFS etc for importing and exporting datasets. It also supports importing table to HBase as well. The most common use case is to import data in bulk from Databases such as Oracle for the initial loads and then using tools like GoldenGate to continuously replicate data. In this article, we will import 4 million rows from a Mysql table to HBase table. Steps are as follows: Create a table in mysql CREATE TABLE sqooptest (id int NOT NULL PRIMARY KEY AUTO_INCREMENT, val int); Create a procedure in Mysql to generate some random data in the columns defined. DELIMITER $$
CREATE PROCEDURE prepare_data1()
BEGIN
DECLARE i INT DEFAULT 100;
WHILE i < 4000000 DO
INSERT INTO sqooptest (val) VALUES (i);
SET i = i + 1;END WHILE;
END$$
DELIMITER ; When you call this procedure , you'll have 4 million records in mysql table : CALL prepare_data1(); Lets Create HBase table from HBase shell. Shell > create ‘sqooptest’, ‘A’ insert one dummy record. Shell > put ‘sqooptest’, ‘1’, ‘A:val’, ‘123’ Now lets run Sqoop command to import data from mysql to HBase ( uses regular “put” internally ).There is another way and which is bulk loading using sqoop , we will discuss that shortly. sqoop import --connect jdbc:mysql://example.com/test --driver com.mysql.jdbc.Driver--username root -P --table sqooptest --hbase-table sqooptest --hbase-row-key id --column-family A —split-by id -m 8 Where --driver specifies database driver --connect indicates connection string to Mysql. --username is database user --table is database table name --hbase-table is table name at HBase side. --hbase-row-key is the primary key of source DB which you would like to consider as HBase row key. It would be a comma separated composite key as well. -- column-family is the family where writes are going to land in HBase. -- split-by is the column name at source DB which should be used to split the data to be imported across give number of mappers. So if you have defined 8 mappers for this job , column specified in this field would be divided equally in parts to be served by each mapper. Some additional useful options: --fetch-size is the number of rows which should be fetched by each mapper in one iteration from source DB. Increasing fetch-size corresponding to mapper container size (memory / CPU / bandwidth to be considered ) has proved to be beneficial in making the job faster and efficient. --hbase-bulkload enables bulkloading of data in HBase. Finally, the output of this job looks like any other regular mapreduce job. 18/05/16 06:42:52 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 32.4191 seconds (0 bytes/sec)18/05/16 06:42:52 INFO mapreduce.ImportJobBase: Retrieved 3999900 records. References: http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_importing_data_into_hbase
... View more
- Find more articles tagged with:
- HBase
- How-ToTutorial
- import
- Sandbox & Learning
- Sqoop
Labels:
04-15-2018
12:02 PM
8 Kudos
In this article series, part 1 , part 2 , part 3, part 4 covered various Hbase tuning parameters, scenarios, system side of things etc, in last and part 5 of this series, I will discuss little bit about Phoenix performance parameters and general tips for tuning. I am taking an example of a query which was performing very slow and how we investigated this situation. We will start by reading explain plan of this query. I cannot quote exact query (customer's data) here but it was a select query with some where clause and finally order by conditions. The explain plan of the query is as follows: +--------------------------------------------------------------------------------------------------------------------------------------------------------+
| CLIENT 5853-CHUNK 234762256 ROWS 1671023360974 BYTES PARALLEL 5853-WAY FULL SCAN OVER MESSAGE_LOGS.TEST_MESSAGE | | SERVER FILTER BY ((ROUTING IS NULL OR ROUTING = ‘TEST') AND TESTNAME = 'createTest' AND TESTID = ’TEST’ AND ERRORCODE | | SERVER TOP 20 ROWS SORTED BY [CREATEDTIMESTAMP DESC, TESTID] | | CLIENT MERGE SORT |
+--------------------------------------------------------------------------------------------------------------------------------------------------------+
4 rows selected (0.958 seconds) Firstly, lets learn to dissect an explain plan of a Phoenix query, following are my observations looking at this plan: First statement ”CLIENT” , means this statement shall be executed at client side. “5853-CHUNK”, this means query plan has logically divided the data in about 5853 chunks . And each chunk would utilize one single thread. So for future reference, lets keep it in mind, one chunk == one thread of client thread-pool. “234762256 ROWS”, means these many rows will be processed by this query. Self explanatory. “1671023360974 BYTES”, means about 1.6 TB of data will be processed. “PARALLEL”, so the query processing on these 5853 chunks (5853-WAY) would be done in parallel. “FULL SCAN OVER MESSAGE_LOGS.TEST_MESSAGE“ , this means it will scan entire table , most inefficient way and anti-pattern for Hbase / Phoenix use cases. This table requires secondary index to convert this full scan into a range scan. Second Statement “SERVER” , processing would happen in region servers “FILTER BY” , returns only results that match the expression Third Statement “SERVER” , processing happening on server side, specifically “SORTED BY” Fourth Statement “CLIENT MERGE SORT”, meaning all the SORTED ROWS at server side would be brought back to client node and be merge sorted again. What tuning was done to make query run faster ? 5853 chunks appeared too much for the query specially with a thread-pool having at default value of 128, this was making whole query get slower as only 128 threads would work at a time and rest all tasks would wait in queue. (phoenix.query.queueSize) We decided to bump up thread-pool (phoenix.query.threadPoolSize) from default 128 to about 1000 , but customer did not have enough CPU cores on client side and he feared CPU contention there if we go beyond this number, so we decided to go for another tuning. We increased guidepost width (phoenix.stats.guidepost.width) which are markers to logically distribute data in chunks. (from its default 100 MB to 500 MB ). This effectively reduced the number of chunks and hence the threads. Read more about all tuning parameters including above ones here. For making this query more effective, recommended customer to create secondary index on top of this data table and include most frequently used columns in it. Read more about secondary index here. Thus after all the changes in place, the query which was earlier taking about 5 minutes was now taking about 15 - 20 seconds. Tuning recommendations in general : For improving read performance, create global secondary index, it will have some write penalty as data of chosen columns for index would be duplicated to another table. For improving write performance, pre-split the table if you know the key ranges , also consider going for Local index which is written in same table and being added as another column. Local indexes will be more stable with HDP 3.0 with lot of bug fixes. Choose most frequently used columns for primary key. Since all these columns are concatenated to form Hbase’s “row key” ,their order of appearance in row-key as well as its length matters. Order matters because if most frequently used column comes first in row key the range scan becomes more efficient. Length matters because this row key will be part of each cell and hence would occupy some memory and some disk. Use Salt Buckets if you have a monotonically increasing row-key . Read more about it here Please note Salting would incur read penalties as scans would be repeated for each bucket. Don’t create too many salt buckets , thumb rule is to be equal to number of region servers in your cluster. Reference: http://phoenix.apache.org/explainplan.html https://phoenix.apache.org/tuning.html https://phoenix.apache.org/tuning_guide.html Also see : PART 1 , PART 2 , PART 3, PART 4
... View more
- Find more articles tagged with:
- Data Processing
- HBase
- How-ToTutorial
- performance
- Phoenix
Labels:
04-15-2018
11:36 AM
11 Kudos
In my articles part 1 and part 2 , I explained various parameters which could be tuned for achieving optimized performance from Hbase. In part 3, we discussed some scenarios and aspects to focus while investigating performance issues. Continuing this series, part 4 covers some system and network level investigations. DISKS Along with investigating potential issues at HBASE and HDFS layer, we must not ignore system side of thingslike OS, network and disks. We see several cases everyday when severe issues at this layer are identified.Detailed investigation is beyond the scope of this article , but we should know where to point fingers. The triggering factor to look at system side are messages such as following in datanode logs at the time of performance issue. WARN datanode.DataNode (BlockReceiver.java:receivePacket(694)) - Slow BlockReceiver write data to disk cost:317ms (threshold=300ms) Following are some of several tests we could do to ascertain disk performance: - Run dd test to check read and write throughput and latencies. For checking write throughput: dd bs=1M count=10000 if=/dev/zero of=/data01/test.img conv=fdatasync For checking read throughput: dd if=/data01/test.img of=/dev/null bs=1M count=10000 Where /data is one of your data node data disk. - For checking latencies either during read or write, prepend “time” command before above commands and you will know how much time it took to complete these operations and also if the actual delays were from user side or system side. Compare these results with the agreed upon throughputs with your storage vendor / Cloud service providers. - Another important tool is Linux “ iostat” command which provides great deal of advanced information to diagnose such as how much time an IO request was in IO scheduler queue, disk controller queue, how many requests were waiting in queues, how much time disk took to complete an IO operation etc. - This command could very well explain if your work load is way beyond your disk capacities or if your disks have issues either at hardware or driver / firmware level. Another detailed article could be written to explain each and every parameter specified in this command but that’s beyond the scope of this article, some parameters though need highlighted: A. Await: Covers the time that is taken through scheduler, driver, controller, transport (for example fibre san), and storage needed to complete each IO. Await is the average time, in milliseconds, for I/O requests completed by storage and includes the time spent by the requests in the scheduler queue and time spent by storage servicing them. B.. avgqu-sz : the average number of IO queuedwithin both the IO scheduler queue and storage controller queue. C. Svctm : Actual service time storage / disk took to serve IO request excluding all queue latencies. D. Util : Percentage utilization of each disk. - Needless to say, you would always check commands like top / vmstat/mpstat for identifying issues related to CPU / Memory / Swapping etc. - Last but most important command to see live streaming of whats happening at your IO layer is “iotop” command.This command would give you real time details of which command , process and user is actually clogging your disks. Some general tuning tips : - Selecting right IO scheduler is very critical to latency sensitive work loads. “deadline” scheduler is proven to be the best scheduler for such use cases.Check and correct which scheduler you are getting your IO processed with: [root@example.com hbase]# cat /sys/block/sd*/queue/scheduler
noop [deadline] cfq - Choose right mount options for your data disks. Options such as “noatime” saves a great deal of IO overhead on data disks and in turn improve their performance. - Check which mode your CPU cores are running in. We recommend them to run in performance mode. Virtual machines and cloud instances may not have this file. echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor - Flushing of large pool of accumulated “dirty pages” to disks has been seen to be causing a significant IO overhead on systems. Please tune the following kernel parameters controlling this behavior. There is no single number which is suited for all and trial and error is a best resort here, but with systems having large amount of memory, we can keep this ratio smaller than their default values so that we dont end up accumulating a huge pool of dirty pages in memory eventually to be burst synced to disks with limited capacities and degrading application performance. vm.dirty_background_ratio (default is 10) vm.dirty_ratio (default is 40 ) Read more about this parameter here : NETWORK Network bandwith across the nodes of a HDP cluster plays critical role in heavy read / write use cases. It becomes further critical in distributed computing because any one limping node is capable of degrading entire cluster performance. While we are already in the world of gigabit networks and generally things are stable at this front. However we continue to see issues this side, messages such as the following seen in datanode logs could prove to be the triggering factor to investigate network side of things: WARN datanode.DataNode (BlockReceiver.java:receivePacket(571)) - Slow BlockReceiver write packet to mirror took 319ms (threshold=300ms) Following are some of the tools / commands we can use to find out if something is wrong here: - “iperf” to test network bandwidth between nodes. See more details about iperf here. - Use Linux commands like ping / ifconfig and “netstat -s” to find out there are any significant packet drops / socket buffer overruns and if this number is increasing over time. -ethtool ethX command would help you provide negotiated network bandwidth. - ethtool -S would help collect NIC and driver statistics. Some general tuning tips: - Generally, any sort of NIC level receive acceleration does not work well with our use cases and in turn prove to be a performance bottleneck in most scenarios. Disable any acceleration enabled on your NIC cards ( of course after consultation with your platform teams) : - Check if receive offloading is enabled: $ grep 'receive-offload' sos_commands/networking/ethtool_-k_eth0 | grep ': on' generic-receive-offload: on large-receive-offload: on - Disable them using following commands: # ethtool -K eth0 gro off # ethtool -K eth0 lro off Referenece - Make sure MTU size is uniform across all nodes and switches in your network. - Increase socket buffer sizes if you observe consistent overruns / prunes / collapses of packets as explained above. Consult your network and platform teams as how to tweak these values. KERNEL / MEMORY A tuned kernel is a mandatory requirement for the nature of work load you are expecting any node to process. However it has been seen that kernel tuning is often ignored at the time of design of such infrastructures. Although its a very vast topic to cover and is beyond the scope of this article, I will mention some important kernel parameters related to memory management which must be tuned on HDP cluster nodes. These configuration parameters stay in /etc/sysctl.conf - vm.min_free_kbytes: Kernel tries to ensure that min_free_kbytes of memory is always available on the system. To achieve this the kernel will reclaim memory. Keeping this parameter to be about 2 - 5 % of total memory on node makes sure that your applications do not suffer due to prevailing memory fragmentation. The first symptom of memory fragmentation is the appearance of message such as “Page Allocation Failures” in /var/log/messages or “dmesg” (kernel ring buffer) or worst, when kernel starts killing processes to free up memory by “OOM Killer”. - vm.swappiness : This is a parameter reflecting tendency of a system to swap. Default value is 60. We dont want system to swap on its will , so keep this value to about 0 - 5 to keep system's swap tendencies to be minimal. - It has been seen that transparent hugepages do not work well with the kind of work load we have. Its thus recommended to disable THP on our cluster nodes. echo never > /sys/kernel/mm/transparent_hugepage/enabled echo never > /sys/kernel/mm/transparent_hugepage/defrag - On modern NUMA (read here about NUMA ) systems , we strongly recommend to disable zone reclaim mode. This is based on understanding that performance penalties incurred due to reclaiming pages from within a zone are far worse than having the requested page served from another zone. Usually applications strongly dependent on using cache prefer having this parameter disabled. vm.zone_reclaim_mode = 0 All these kernel level changes can be made on running systems by editing /etc/sysctl.conf and running sysctl -p command to bring them into effect. In last part (PART 5) of this article series , I will discuss Phoenix performance tuning in details. Also see : PART 1 , PART 2 , PART 3
... View more
- Find more articles tagged with:
- Data Processing
- HBase
- How-ToTutorial
- Kernel
- Linux
- performance
Labels:
04-15-2018
10:15 AM
13 Kudos
As we understood important tuning parameters of Hbase in part 1 and part 2 of this article series, this article focuses on various areas which should be investigated when handling any Hbase performance issue. Locality By locality we mean the physical HDFS blocks related to Hbase Hfiles need to be local to the region server node where this respective region is online. This locality is important because Hbase prefers to use short circuit reads directly from physical disks bypassing HDFS. If a region’s Hfiles are not local to it, it will incur cost to a read latency by doing reads across the node over network. One can monitor locality of a table / region from Hbase Master UI specifically on table’s page by clicking on the listed table name. The value of “1”on the locality column means 100% block locality. Overall Hbase locality is visible on Ambari metrics section of Hbase ( in percentage). Here, Major compaction tries to bring back all Hfiles related to a region on a single region server thus restoring locality to a great extent. Locality is generally messed up due to balancer run by HDFS which tries to balance disk space across data nodes OR by Hbase balancer which tries to move regions across region server nodes to balance the number of regions on each server. Hbase balancer (default is Stochastic Load Balancer ) can be tuned by tweaking various costs ( region load, table load, data locality, MemStore sizes, store file sizes) associated with it and have it run according to our requirements, for example , to have balancer prefer Locality cost more than anything else , we can add following parameter in hbase configs and give it a higher value. (Default value is 25). ( an advanced parameter and an expert must be consulted before such an addition.) hbase.master.balancer.stochastic.localityCost To overcome locality harm done by HDFS balancer we have no solution as of date except running compaction immediately after HDFS balancer is run. There are some unfinished JIRAs which once implemented would bring in features like block pinning and favored nodes , once they are available Hbase can configure its favored nodes and writes would be dedicated only to those nodes and HDFS balancer won’t be able to touch its respective blocks. (Refer HBASE-15531 to see all unfinished work on this feature ) Hotspotting Hotspotting has been discussed quiet a lot but its important to mention it here as its a very crucial aspect to be investigated during performance issues. Basically hotspotting appears when all your write traffic is hitting only on a particular region server. And this might have happened because of the row key design which might be sequential in nature and due to that all writes get landed to a node which has this hot spotted region online. We can come over this problem using three ways (I mean I know three ways) : Use random keys - Not so ideal solution as it would not help in range scans with start and stop keys Use Salt buckets - If you have Phoenix tables on top of hbase tables, use this feature. Read more about this here Use Pre-splitting - If you know the start and end keys of your sequential keys, you can pre split the table by giving split key points beforehand at the time of creation. This would distribute empty regions across nodes and whenever writes come on for a particular key it would get landed to respective node eventually distributing the write traffic across nodes.Read more about here. HDFS HDFS layer is very important layer as no matter how optimized your Hbase is, if datanodes are not responding as expected you would not get performance as expected. Anytime you have latencies on Hbase / Phoenix queries and you observe following messages in region server logs in a large number : 2018-03-29 13:49:20,903 INFO [regionserver/example.com/10.9.2.35:16020] wal.FSHLog: Slow sync cost: 18546 ms, current pipeline: [xyz] OR Following messages in datanodes logs at the time of query run: 2018-04-12 08:22:51,910 WARN datanode.DataNode (BlockReceiver.java:receivePacket(571)) - Slow BlockReceiver write packet to mirror took 34548ms (threshold=300ms) OR 2018-04-12 09:20:57,423 WARN datanode.DataNode (BlockReceiver.java:receivePacket(703)) - Slow BlockReceiver write data to disk cost:3440 ms (threshold=300ms) If you see such messages in your logs, Its time to investigate things from HDFS side such as if we have sufficient datanode transfer threads , heap , file descriptors, checking logs further to see if there are any GC or Non GC pauses etc. Once confirmed on HDFS side we must also look at underlying infrastructure side (network, disk, OS). This is because these messages mostly convey that HDFS is having hard time receiving / transferring block from/to another node or to sync the data to disk. We will discuss about system side of things in part 4 of this article series. BlockCache Utilization and hitRatio When investigating performance issues for read traffic, its worth checking how much your Block Cache and Bucket cache are helpful and whether they are getting utilized or not. 2017-06-12 19:01:48,453 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=33.67 MB, freeSize=31.33 GB, max=31.36 GB, blockCount=7, accesses=4626646, hits=4626646, hitRatio=100.00%, , cachingAccesses=4625995, cachingHits=4625995, cachingHitsRatio=100.00%, evictions=24749, evicted=662, evictedPerRun=0.026748554781079292
2017-06-12 19:02:07,429 INFO [BucketCacheStatsExecutor] bucket.BucketCache: failedBlockAdditions=0, totalSize=46.00 GB, freeSize=45.90 GB, usedSize=106.77 MB, cacheSize=93.21 MB, accesses=9018587, hits=4350242, IOhitsPerSecond=2, IOTimePerHit=0.03, hitRatio=48.24%, cachingAccesses=4354489, cachingHits=4350242, cachingHitsRatio=99.90%, evictions=0, evicted=234, evictedPerRun=Infinity Flush Queue / Compaction queue During the crisis hours when you are facing severe write latencies ,its very important to check how memstore flush queue and compaction queue look like. Lets discuss couple of scenarios here and some possible remedies here (need expert consultation ) A. Flush queue not reducing: This leads us to three additional possibilities : A.1 Flushes have been suspended for some reason , one such reason could be a condition called “too many store files” seen somewhere down in region server logs (dictated by hbase.hstore.blockingStoreFiles). Check my part 2 article to know more about this parameter and how to tune it. Simply put, this parameter blocks flushing temporarily till minor compaction is completed on existing Hfiles. May be increasing this number few folds at the time of heavy write load should help. Here , we can even help minor compaction by assigning it more threads so that it finishes compaction of these files faster: hbase.regionserver.thread.compaction.small (default value is 1 , we can tune it to say 3 ) A.2 Another possibility could be that flusher operation itself is slow and not able to cope up with write traffic which lead to slow down of flushes. We can help this flusher by allocating few more handler threads using : hbase.hstore.flusher.count (default value is 2, we can bump it to say 4 ) A.3 There is another possibility seen in such cases and which is of “flush storm” , this behavior triggers when number of Write ahead log files reach their defined limit (hbase.regionserver.maxlogs) and and region server is forced to trigger flush on all memstores until WAL files are archived and enough room is created to resume write operations. You would see messages like: 2017-09-23 17:43:49,356 INFO[regionserver//10.22.100.5:16020.logRoller] wal.FSHLog:Too many wals: logs=35, maxlogs=32; forcing flush of 20 regions(s): d4kjnfnkf34335666d03cb1f Such behaviors could be controlled by bumping up: hbase.regionserver.maxlogs (default value is 32, double or triple up this number if you know you have a heavy write load ) B.Compaction queue growing : compaction_queue=0:30 —> ( meaning 0 major compactions and 30 minor compactions in queue) . Please note that compaction whether minor or major is an additional IO overhead on system, so whenever you are trying to fix a performance problem by making compactions faster or accommodating more Hfiles in one compaction thread, you must remember that this medicine itself has its own side effects. Nevertheless, we can tune to make compaction more efficient by bumping up following parameters: Lower bound on number of files in any minor compaction hbase.hstore.compactionThreshold : ( Default3 ) Upper bound on number of files in any minor compaction. hbase.hstore.compaction.max ( Default10 ) The number of threads to handle a minor compaction. hbase.regionserver.thread.compaction.small ( Default1 ) The number of threads to handle a major compaction. hbase.regionserver.thread.compaction.large ( Default => 1 ) JVM metrics Various performance issues drill down to JVM level issues , most specifically related togarbage collection STW (stop the world) pauses which bring down application / region server performance or some times brings them down to halt / crash. There are several possibilities under which you can have long GC pauses and I wont be able to consider them all here. But the least you can do is have your region server’s gc.log file analyzed by one of many online tools such as this , this tool would help you analyze what’s going wrong in your JVM or garbage collection behavior specifically. Also , if you have huge number of regions on every region server (500 + ) , consider moving to G1 GC algorithm, even though Hortonworks does not officially support it, but we are in process of it and many of our customers have implemented it successfully. In your spare time, also go through this GC tuning presentation. Without going into too many details, I would like to mentionfew most basic thumb rules: With CMS GC algorithm , never set region server heap greater than. 36 - 38 GB, if you have more requirements, switch to G1GC. Young generation should never be more than 1/8th or 1/10 th of total region server heap. Start your first tweak in reducing GC pauses by changing -XX:ParallelGCThreads , which is 8 by default and you can safely take it till 16 (watch the number of CPU cores you have though ) Check who contributed to GC pause “user” or “sys” or “real” 2017-10-11T14:06:17.492+0530: 646872.162: [GC [1 CMS-initial-mark: 17454832K(29458432K)] 20202988K(32871808K), 74.6880980 secs] [Times: user=37.61 sys=1.96, real=74.67 secs] ‘real’ time is the total elapsed time of the GC event. This is basically the time that you see in the clock. ‘user’ time is the CPU time spent in user-mode(outside the kernel). ‘Sys’ time is the amount of CPU time spent in the kernel. This means CPU time spent in system calls within the kernel. Thus in above scenario, “sys” time is very small but still “real” time is very high indicating that GC did not get enough CPU cycles as it needed indicating a heavily resource clogged system from CPU side. There is another category of pause which we see regularly: 2017-10-20 14:03:42,327 INFO [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 9056ms
No GCs detected This category of Pause indicate that JVM was frozen for about 9 seconds without any GC event. This situation mostly indicate a problem at physical machine side possibly indicating an issue at Memory / CPU or any other OS issue which is causing whole JVM to freeze momentarily. In part 4 of this series, I will cover some infrastructure level investigation of performance issues.
... View more
- Find more articles tagged with:
- Data Processing
- HBase
- How-ToTutorial
- performance
Labels:
04-12-2018
05:44 PM
for exporting table to another cluster, snapshot is the best method. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_hbase_snapshots_guide/content/ch_hbase_snapshots_chapter.html
... View more
04-12-2018
05:31 PM
You can check hbase root directory from hbase-site.xml but usually the directory format is somthing like -> /apps/hbase/data/data/schema/table/region/column-family-store/store-files
... View more
04-12-2018
06:40 AM
12 Kudos
As we understood basic parameters of Hbase in Part 1 , lets try and understand some advanced parameters which should be tried after consultation with experts or by those who know what they are doing. hbase.hstore.blockingStoreFiles Default value of this parameter is 10. We know that each memstore flush creates an Hfile (hbase.hregion.memstore.flush.size), now the purpose this parameter serves is to send a message along the write pipeline that unless these many Hfiles are compacted using minor compaction, we should not go ahead with any more flushes. One would see messages in logs such as “Too many HFiles, delaying flush” . But like it says, it can only delay flush up to certain seconds, and even if the writes continue to happen, memstore could stretch only up to the size guided by : hbase.hregion.memstore.flush.size X hbase.hregion.memstore.block.multiplier Once this limit reaches , no more writes will be accepted by region server for this region and you will see messages like "org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit” in logs. Situation will come under control and writes will resume once minor compaction gets over.One can always increase this parameter to avoid any potential issues during such heavy write traffic and could in turn make channels more productive.To help further, one could also increase hbase.hstore.compaction.max to a higher value so that more Hfiles are covered in compaction process. Lets discuss it below in details. hbase.hstore.compaction.max Default value is 10. Like I said above, under situations of heavy write load , you can tune this parameter and thus have minor compaction cover more Hfiles and help stuck write traffic resume. Please note that compaction itself has its own IO overhead so keep this in mind when you bump up this number. hbase.hregion.max.filesize Default value is 10 GB. Virtually can be used to control the rate of splitting of regions in Hbase. Once “any" one of the store (Column family) within a region reaches this value, the whole region would go for split. To disable splitting virtually , keep it to a very high number like ( 100 gb ) and set Splitting policy to be : hbase.regionserver.region.split.policy = org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy zookeeper.session.timeout Default value is 90 seconds. We know that region server maintain a session with zookeeper server to remain active in cluster, for this each one of the region server has its own ephemeral znode in zookeeper. As soon as session gets disconnected or timed out for any reason, this znode get deleted and region server gets crashed. It is zookeeper.session.timeout which “partly” dictates negotiated session timeout with zookeeper server at the time of startup of region server. Now why “partly” ? This is because zookeeper server itself has a minimum and maximum session timeout defined for its clients like hbase, as per following formula : Minimum session timeout : 2 X tick time Maximum session timeout : 20 x tick time Now, no matter what session timeout you set in your client configs, if your zookeeper server timeout is less than that, it would only negotiate that value with client. For example default tick time in zookeeper configuration is 2 seconds , which means maximum session timeout could not be bigger than 40 seconds, so no matter Hbase keeps its timeout to be 90 seconds, the negotiated value would only be 40 seconds. For increasing session timeouts in hbase so that your region servers could tolerate minor fluctuations coming out of GC pauses or due to network or any other transient problems either at hbase or zookeeper, consider increasing “tick time”.Tick time higher than 4 - 5 seconds is not recommended as it could impact the health of your zookeeper quorum. hbase.rpc.timeout Default value is 300000 ms. This is the timeout which a client gets for a RPC call which it makes to Hbase. Set it proportional to what your client / job / query requirements are. However don’t keep it too big that clients report issues and subsequently fail after a long time. When we talk about hbase.rpc.timeout, we talk about two more parameters in the same breath. They are right below. hbase.client.scanner.timeout.period Default value is 300000 ms.The time which any hbase client scanner gets to perform its task (scan) on hbase tables. It is a replacement of an earlier parameter called “hbase.regionserver.lease.period” . The difference here is, this timeout is specifically for RPCs that come from the HBase Scanner classes (e.g. ClientScanner) while hbase.rpc.timeout is the default timeout for any RPC call.Please also note that hbase.regionserver.lease.period parameter is deprecated now and this parameter replaces it. Thus looks like this parameter is going to take care of lease timeouts as well on scanners. Rule of thumb is to keep hbase.rpc.timeout to be equal to or higher than hbase.client.scanner.timeout.period , this is because no matter scanner is doing its job scanning rows but if in between hbase.rpc.timeout expires , the client session would be expired.This could result in exceptions such as “ScannerTimeoutException or UnknownScannerException”. However, there is one more parameter which comes at play when we discuss above timeouts and that I would discuss below: hbase.client.scanner.caching Default value is 100 rows. Basically this is the number of rows which a client scanner pulls from Hbase in one round ( before “scanner.next” could trigger ) and transfers back to the client. Multiple performance issues could arise (and in fact scanner timeout Exceptions as well ) if you set this value to a very high number as scanner would be burdened fetching these many rows and transferring them back to client , now assume region server or underlying HDFS or the network between client and server is slow for any reason, then it can very well lead to RPC session getting expired, which eventually leading failure of the job, messages like “ClosedChannel Exception” are seen when hbase tries to send back rows to the client whose session is already expired. While Keeping a smaller count leaves cluster and scanner under utilized, higher number consumes resources such as region server heap and client memory in large quantity. Thus a good number is dependent upon how good resources like disk / memory / CPU you have on cluster nodes as well as how many million rows you need to scan. Go high if demands are high. Also see : PART 1 , PART 3, PART 4, PART 5 of this series.
... View more
- Find more articles tagged with:
- Data Processing
- HBase
- How-ToTutorial
- performance
- performance-issue
Labels:
04-10-2018
05:39 PM
21 Kudos
Hbase works smoothly in auto pilot mode if one knows how to tune several of the knobs on its dashboard. Not only its important to understand each knob but also what its dependencies are with other knobs. There are several parameters which require tuning based on your use case or work load to make Hbase work in an optimized way. I will try to explain some of the basic parameters in this article. More advanced parameters would be covered in next article. 1. Hbase_master_heapsize To many’s surprise , a master in Hbase does not do any heavy lifting and hence never require more than 4 - 8 GB in regular setups. Master is basically responsible for meta operations such as create/ delete of tables , keeping check on region servers’ well being using watchers on zookeeper znodes , re-distribution of regions during startup (balancer) or when a region server shuts down. Please note that master's assignment manager keeps track of region states in this memory only and hence if you have huge number of tables / regions, you ought to have proportional amount of heap for master. 2. hbase_regionserver_heapsize This is a very crucial parameter for region server as most of the data loading / processing would happen in allocated region server heap. This is the heap memory which would accommodate block cache to make your reads faster, and it is this heap that would have region memstores to hold all the writes coming from your users (until they get flushed to the disk). But what is the best value for this heap size? How do we calculate this number? Well there is no direct formula for this , but if you are using CMS GC algorithm for JVMs, your hard stop for heap is about 36 - 38 GB , otherwise long "stop the world" GC pauses would turn Hbase not only unusable but bring lot of complications w.r.t. the data being stored. Use your best judgement based on number of regions hosted currently, your future projections, any number between 16 GB - 36 GB is a good number, also, you should have a proper plan to tune this parameter incrementally over time based on cluster usage and number of regions added to nodes. With G1GC algorithm there is no restriction on heap size. One can always check heap usage from Ambari > Hbase> Master UI > Memory tab, if utilization shoots during peak hours to about 60 - 70 % of total heap , its time to increase the heap size further. (unless its a case of memory leak). 3. hbase_regionserver_xmn_max This parameter sets upper bound on region server heap’s young generation size. Rule of thumb is to keep 1/8th - 1/10th of total heap and never exceeding 4000 Mb. 4. Number of regions on each region server Discussing this aspect of tuning here as it would help you figure out the best heap size for region servers, memstore size as well as make you explain how these parameters are all dependent on number of regions and how performance of hbase is dependent on all these numbers. We never recommend more than 200 - 400 regions per region server. One can figure out if the existing count in his cluster is an optimized number or not using below formula : (regionserver_memory_size) * (memstore_fraction) / ((memstore_size) * (num_column_families)) For example, assume :
region server with 16 Gb RAM (or 16384 Mb) Memstore fraction of .4 Memstore with 128 Mb RAM 1 column family in table The formula for this configuration would look as follows: (16384 Mb * .4) / ((128 Mb * 1) = approximately 51 regions 5. file.block.cache.size This is the portion of total heap which would be used by block cache to make your reads even faster. The data once accessed from disk gets loaded in this cache and the next time any user requests same data, its served from here which is way faster than being served from disk. Caveat here is, keep this number big only if : a. You have a heavy read use case. b. Even in read heavy use case , you have your users requesting same data repetitively. If both conditions do not match , you will be wasting a whole lot of heap loading unnecessary data blocks. In matching conditions, any value between 20 - 40 % is a good value. Again,need to be tuned using trial and error method and what works best for you. 6. hbase.regionserver.global.memstore.size Portion of total heap used for all the memstores opened for each column family per region per table. This is where all the edits and mutations get landed first during write operation.For write heavy use cases, any value between 20 - 40 % is a good value. Also note that sum of block cache as explained in point 4 above and global memstore size should never be greater than 70 - 75% of total heap, this is so that we have enough heap available for regular hbase operations apart from read and write caching. 7. hbase.hregion.memstore.flush.size This is the size of each memstore opened for a single column family , during write operations, when this size is used completely , the memstore gets flushed to disk in the form of a hfile. Also to note here that all memstores for a single region would get flushed even if any one of them reaches this size. Each flush operation creates an hfile , so smaller this number, chances of having more frequent flushes, more IO overhead, greater the number of Hfiles getting created and subsequently, greater the number of Hfiles , quicker the compaction getting triggered. And we understand compaction involves additional round of write operations as it writes smaller Hfiles into a bigger Hfile and hence proving to be significant overhead if getting triggered very frequently. Thus a significantly bigger flush size would ensure lesser Hfiles and lesser compactions, but caveat here is the total heap size and number of regions and column families on each region server. If you have too many regions and column families, you cannot afford to have a bigger flush size under limited total heap size. Ideal numbers are anything between 128 MB to 256 MB. 8. hbase.hregion.memstore.block.multiplier This is a simple tuning parameter, allows single memstore to get stretched by this multiplier during heavy bursty writes. Once memstore reaches this size (flush size X multiplier ) , write operations are blocked on this column family until flushes are completed. 9. hbase.regionserver.handler.count This parameter defines the number of RPC listeners / threads that are spun up to answer incoming requests from users. The default value is 30. Good to keep it higher if more concurrent users are trying to access Hbase , however the value should also be proportional to number of CPU cores and region server heap you have on each region server as each thread consumes some amount memory and CPU cycles. A rule of thumb is to keep the value low when the payload for each request is large, and keep the value high when the payload is small. Start with a value double the number of cores on the node and increase it as per the requirements further. More advanced parameters and performance related discussions in my next articles - PART 2, PART3, PART 4, PART5
... View more
- Find more articles tagged with:
- Data Processing
- HBase
- How-ToTutorial
- performance
- performance-issue
Labels:
02-14-2018
11:14 PM
PROBLEM: There are multiple issues connecting Windows clients such as Squirrel client to Phoenix / Hbase on a cluster enabled with Kerberos.
SOLUTION: Please follow below working steps.
1. Create Jaas configuration file hbase.jaas on client machine.
Client { com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=false useTicketCache=true
renewTicket=true serviceName="zookeeper";
keytab="" principal="" };
2. Copy kerberos config file from cluster nodes to local client machine (krb5.conf/krb5.ini) 3. Add below java options in Squirrel launcher script. [/Applications/SQuirreLSQL.app/Contents/MacOS/squirrel-sql.sh ]
-Djava.security.auth.login.config="/Users/gsharma/hbase/hbase.jaas"
-Djava.security.krb5.conf="/Users/gsharma/hbase/krb5.conf"
e.g
$JAVACMD -Xmx256m -cp "$CP" -Djava.security.auth.login.config="/Users/gsharma/hbase/hbase.jaas" -Djava.security.krb5.conf="/Users/gsharma/hbase/krb5.conf" $MACOSX_SQUIRREL_PROPS -splash:"$SQUIRREL_SQL_HOME/icons/splash.jpg" net.sourceforge.squirrel_sql.client.Main --log-config-file "$UNIX_STYLE_HOME"/log4j.properties --squirrel-home "$UNIX_STYLE_HOME" $NATIVE_LAF_PROP $SCRIPT_ARGS
4. Download phoenix driver jar file [phoenix-version-client.jar] . 5. Download hdfs-site.xml,hbase-site.xml,core-site.xml files from hbase server to local client folder. 7. Open Squirrel UI and register Phoenix driver. (Put example url - jdbc:phoenix:1.openstacklocal:2181:/hbase-secure:hbase/5.openstacklocal@EXAMPLE.COM:/Users/gsharma/hbase/hbase.service.keytab) 8. Now create alias to connect to Hbase in squirrel UI using registered driver.
jdbc url example : jdbc:phoenix:1.openstacklocal:2181:/hbasesecure:hbase/5.openstacklocal@EXAMPLE.COM:/Users/gsharma/hbase/hbase.service.keytab
Please note phoenix does not support windows path in keytab file path. So if we have keytab file under C:\Users\Hbase\hbase.service.keytab, we can use "/Users/Hbase/hbase.service.keytab" in JDBC URL. 9. Check if connection is successful.
... View more
- Find more articles tagged with:
- Data Processing
- HBase
- How-ToTutorial
- Phoenix
- squirrell
- Windows
Labels:
02-14-2018
11:01 PM
PROBLEM: Created a Phoenix Table on top of existing Hbase table, the phoenix table is not working as expected or is not needed now, need to delete Phoenix table but not the existing Hbase table as it has valid data. CONSTRAINT : Simple dropping of table from Phoenix shell would delete Hbase table too. SOLUTION : Step 1: Delete Phoenix table entry from SYSTEM.CATALOG as follows : delete from SYSTEM.CATALOG where TABLE_NAME='table_name';
Step 2 : Restart the region server which hosts SYSTEM.CATALOG to clear metadata cache which may still not allow creating same named table from Phoenix. Note : This approach may not work well if there are indexes or views on top of this table as we would end up corrupting catalog table. You may also try snapshotting existing Hbase table , dropping the table completely from Phoenix and then cloning / restoring the Hbase table from snapshot.
... View more
- Find more articles tagged with:
- Data Processing
- HBase
- How-ToTutorial
- Phoenix
Labels:
06-04-2017
09:33 AM
1 Kudo
PROBLEM: HDP 2.5.3 with Ambari 2.4.2.0 and using Kerberos and Ranger for HBase authorization. We need grant pretty much ALL permissions to the 'default' namespace to every user so they can connect using sqlline.py. 1;31mError: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions for user 'abc@NA.EXAMPLE.COM' (action=create)
at org.apache.ranger.authorization.hbase.AuthorizationSession.publishResults(AuthorizationSession.java:261)
at org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor.authorizeAccess(RangerAuthorizationCoprocessor.java:595)
at org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor.requirePermission(RangerAuthorizationCoprocessor.java:664)
at org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor.preCreateTable(RangerAuthorizationCoprocessor.java:769)
at org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor.preCreateTable(RangerAuthorizationCoprocessor.java:496)
at org.apache.hadoop.hbase.master.MasterCoprocessorHost$11.call(MasterCoprocessorHost.java:222)
at org.apache.hadoop.hbase.master.MasterCoprocessorHost.execOperation(MasterCoprocessorHost.java:1146)
at org.apache.hadoop.hbase.master.MasterCoprocessorHost.preCreateTable(MasterCoprocessorHost.java:218)
at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1603)
at org.apache.hadoop.hbase.master.MasterRpcServices.createTable(MasterRpcServices.java:462)
at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:57204)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2127)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
at java.lang.Thread.run(Thread.java:745) (state=08000,code=101)[m
org.apache.phoenix.exception.PhoenixIOException: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions for user 'abc@NA.EXAMPLE.COM' (action=create)
EXPECTED BEHAVIOR : Once the phoenix SYSTEM tables are created , only Read permission on the 'default' namespace should have allowed the user to connect using sqlline.py ROOT CAUSE : Phoenix is using HBaseAdmin.getTableDescriptor during most of the checks for valid version of Phoenix. But this function requires CREATE or ADMIN permissions. This is a known issue and tracked in PHOENIX-3652 . Fix is available in Phoenix 4.8.3 and 4.10 SOLUTION: To get a hotfix backported to previous Phoenix versions, please log a case with HWX.
... View more
- Find more articles tagged with:
- authorization
- Data Processing
- HBase
- Issue Resolution
- Phoenix
- Ranger
Labels:
06-04-2017
08:59 AM
STEPS TO REPRODUCE: 1. Create a Phoenix connection 2. CREATE SCHEMA test 3. DROP SCHEMA test 4. Now create the test schema again. This will throw below exceptions : 0: jdbc:phoenix:> create schema "test";
Error: ERROR 721 (42M04): Schema with given name already exists schemaName=test (state=42M04,code=721) org.apache.phoenix.schema.SchemaAlreadyExistsException: ERROR 721 (42M04): Schema with given name already exists schemaName=test 5. Now attempt to drop the test schema again: 0: jdbc:phoenix:> drop schema "test";Error: ERROR 722(43M05):Schema does not exists schemaName=test (state=43M05,code=722) org.apache.phoenix.schema.SchemaNotFoundException: ERROR 722(43M05):Schema does not exists schemaName=test If you look in Hbase and list_namespace, you will see the test schema still exists. ROOT CAUSE : This is a known issue discussed in Apache JIRA PHOENIX-3694 SOLUTION : Issue is fixed in Phoenix 4.10. For a backport hotfix , please log a case with HWX.
... View more
- Find more articles tagged with:
- Data Processing
- Issue Resolution
- issue-resolution
- Phoenix
Labels:
06-04-2017
08:08 AM
PROBLEM: While creating an external Hive table using Hive Storage Handler for Phoenix, the table creation fails if schema name is added along with Phoenix table name. STEPS TO REPRODUCE: 1. Hive DDL for the table: create external table test.hive_e_phoenix1 (
scd string,
scddesc string
)
STORED BY 'org.apache.phoenix.hive.PhoenixStorageHandler'
TBLPROPERTIES (
'phoenix.table.name'='myschema.mytbl',
'phoenix.zookeeper.quorum'='node1,node2,node3',
'phoenix.rowkeys'='scd'
); 2. Following exceptions are seen. 2016-07-17 16:02:25,898 ERROR [HiveServer2-Background-Pool: Thread-202]: operation.Operation (SQLOperation.java:run(209)) - Error running hive query:
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Phoenix table bigstats.bigstats_cd doesn't exist)
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:315)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:156)
at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71)
at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Phoenix table myschema.mytbl doesn't exist)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:720)
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4135)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:306)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154)
... 11 more
Caused by: MetaException(message:Phoenix table myschema.mytbl doesn't exist)
at org.apache.phoenix.hive.PhoenixMetaHook.preCreateTable(PhoenixMetaHook.java:63)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:664)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:657)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
at com.sun.proxy.$Proxy6.createTable(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:714)
... 21 more
ROOT CAUSE: This is a known issue with Phoenix 4.8 discussed in Apache JIRA PHOENIX-3078 SOLUTION : Please log a case with HWX and request for a hotfix for HDP 2.5 backported from HDP 2.6 REFERENCE: https://issues.apache.org/jira/browse/PHOENIX-3078
... View more
- Find more articles tagged with:
- Data Processing
- HBase
- Issue Resolution
- Phoenix
Labels:
06-04-2017
12:36 AM
PROBLEM: Phoenix ODBC driver strips out "hint" part of "Select" statements. ODBC driver logs looks like below : Mar2016:15:00.601 INFO 6380Statement::SQLSetStmtAttrW:Attribute: SQL_ATTR_MAX_ROWS (1) ----Comments: the original query passed in has "hint" Mar2016:15:00.602 INFO 6380StatementState::InternalPrepare:Preparing query:select/*+ INDEX(c.more_xref_cad more_xref_acct_idx) */*from c.more_xref_cad where cad_acct_id =219980018 Mar2016:15:00.602 DEBUG 6380RESTAction::HMDebugCallback:Infor type: CURLINFO_TEXT ..... Mar2016:15:00.802 DEBUG 6380RESTAction::HMDebugCallback:Info data:Connected to localhost (127.0.0.1) port 8765(#1) Mar2016:15:00.802 DEBUG 6380RESTAction::HMDebugCallback:Infor type: CURLINFO_HEADER_OUT Mar2016:15:00.802 DEBUG 6380RESTAction::HMDebugCallback:Info data: POST / HTTP/1.1Host: localhost:8765Content-Type: application/octet-stream Accept: text/html, image/gif, image/jpeg,*; q=.2,*/*; q=.2 User-Agent: Phoenix ODBC Connection: keep-alive Content-Length: 160 Mar 20 16:15:00.803 DEBUG 6380 RESTAction::HMDebugCallback: Infor type: CURLINFO_DATA_OUT ----Comments: the query generated and submited to PQS has no "hint" part. Mar 20 16:15:00.803 DEBUG 6380 RESTAction::HMDebugCallback: Info data: {"request":"prepare","connectionId":"2166b30f-1bf8-1f9d-309e-4009877a1a62","sql":"SELECT * FROM c.more_xref_cad WHERE cad_acct_id = 219980018","maxRowCount":-1} Mar 20 16:15:00.803 DEBUG 6380 RESTAction::HMDebugCallback: Infor type: CURLINFO_TEXT Mar 20 16:15:00.803 DEBUG 6380 RESTAction::HMDebugCallback: Info data: upload completely sent off: 160 out of 160 bytes
ROOT CAUSE : As per Simba team,In the ODBC driver, there is a component that removes ODBC escape sequences such as {fn SIN(col_1)} and turns it into SIN(col_1). The reason for this is because Phoenix does not support such escape sequence but BI tools emits them. The problem here is that the component that removes escape sequence also removes the hint as it is currently being treated as comments. SOLUTION : Phoenix ODBC Driver GA (v1.0.6.1008) has the fix. Please raise a support case with HWX if you need additional assistance here.
... View more
- Find more articles tagged with:
- Data Processing
- Issue Resolution
- odbc
- Phoenix
Labels:
06-04-2017
12:09 AM
PROBLEM : Inner query does not recognize alias for the table and consider it as column family. Following exceptions are seen: 0: jdbc:phoenix:localhost:2181> SELECT a.tgb_id FROM "APC:TGB_CURRENT_STATUS" a WHERE a.utc_date_epoch in (SELECT b.utc_date_epoch FROM "APC:TGB_STATUS" b WHERE a.tgb_id = b.tgb_id ORDER BY b.utc_date_epoch DESC limit 2)
. . . . . . . . . . . . . . . > ;
Error: ERROR 1001 (42I01): Undefined column family. familyName=A.null (state=42I01,code=1001)
org.apache.phoenix.schema.ColumnFamilyNotFoundException: ERROR 1001 (42I01): Undefined column family. familyName=A.null
at org.apache.phoenix.schema.PTableImpl.getColumnFamily(PTableImpl.java:724)
at org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.resolveColumn(FromCompiler.java:365) STEPS TO REPRODUCE : 1. Create two tables from Phoenix with below schema CREATE TABLE "${namespace}:TGB_STATUS" (
tgb_id integer not null,
utc_date_epoch integer not null,
tgb_name varchar(40),
city_name varchar(25),
tgb_type_description varchar(64),
local_date_key integer,
utc_date_key integer,
enclosure_type integer,
door_tamper_flag boolean,
over_temp_flag boolean,
under_temp_flag boolean,
voltage_fault_flag boolean,
power_fail_flag boolean,
battery_low_flag boolean,
major_rectifier_alarm_flag boolean,
minor_rectifier_alarm_flag boolean,
env_fault_alarm_flag boolean,
fusion_proc_load integer,
fusion_proc_uptime integer,
pcc_installed_flag boolean,
vswr_alarm_flag boolean,
pcc_power_supply_flag boolean,
forward_power integer,
reflected_power integer,
alarm_bits integer,
alarm_bits_text varchar(8),
slot0_temperature integer,
slot0_channel integer,
slot0_active boolean,
slot1_temperature integer,
slot1_channel integer,
slot1_active boolean,
slot2_temperature integer,
slot2_channel integer,
slot2_active boolean,
slot3_temperature integer,
slot3_channel integer,
slot3_active boolean,
slot4_temperature integer,
slot4_channel integer,
slot4_active boolean,
slot5_temperature integer,
slot5_channel integer,
slot5_active boolean,
slot6_temperature integer,
slot6_channel integer,
slot6_active boolean,
slot7_temperature integer,
slot7_channel integer,
slot7_active boolean,
slot8_temperature integer,
slot8_channel integer,
slot8_active boolean,
slot9_temperature integer,
slot9_channel integer,
slot9_active boolean,
CONSTRAINT pk_tgb_status PRIMARY KEY (tgb_id ASC, utc_date_epoch DESC)
)SALT_BUCKETS = ${tgb_status};
And CREATE TABLE "${namespace}:TGB_CURRENT_STATUS" (
tgb_id integer not null,
utc_date_epoch integer,
tgb_name varchar(40),
city_name varchar(25),
tgb_type_description varchar(64),
local_date_key integer,
utc_date_key integer,
enclosure_type integer,
door_tamper_flag boolean,
over_temp_flag boolean,
under_temp_flag boolean,
voltage_fault_flag boolean,
power_fail_flag boolean,
battery_low_flag boolean,
major_rectifier_alarm_flag boolean,
minor_rectifier_alarm_flag boolean,
env_fault_alarm_flag boolean,
fusion_proc_load integer,
fusion_proc_uptime integer,
pcc_installed_flag boolean,
vswr_alarm_flag boolean,
pcc_power_supply_flag boolean,
forward_power integer,
reflected_power integer,
alarm_bits integer,
alarm_bits_text varchar(8),
slot0_temperature integer,
slot0_channel integer,
slot0_active boolean,
slot1_temperature integer,
slot1_channel integer,
slot1_active boolean,
slot2_temperature integer,
slot2_channel integer,
slot2_active boolean,
slot3_temperature integer,
slot3_channel integer,
slot3_active boolean,
slot4_temperature integer,
slot4_channel integer,
slot4_active boolean,
slot5_temperature integer,
slot5_channel integer,
slot5_active boolean,
slot6_temperature integer,
slot6_channel integer,
slot6_active boolean,
slot7_temperature integer,
slot7_channel integer,
slot7_active boolean,
slot8_temperature integer,
slot8_channel integer,
slot8_active boolean,
slot9_temperature integer,
slot9_channel integer,
slot9_active boolean,
CONSTRAINT pk_tgb_current_status PRIMARY KEY (tgb_id ASC)
);
2. Run the following query. SELECT a.tgb_id FROM "APC:TGB_CURRENT_STATUS" a WHERE a.utc_date_epoch in (SELECT b.utc_date_epoch FROM "APC:TGB_STATUS" b WHERE a.tgb_id = b.tgb_id ORDER BY b.utc_date_epoch DESC limit 2)
3. Try removing the alias in inner query as well : SELECT a.tgb_id FROM "APC:TGB_CURRENT_STATUS" a WHERE a.utc_date_epoch in (SELECT b.utc_date_epoch FROM "APC:TGB_STATUS" b WHERE "APC:TGB_CURRENT_STATUS".tgb_id = b.tgb_id ORDER BY b.utc_date_epoch DESC limit 2) ROOT CAUSE: This is a known issue reported in internal BUG-78975 with no workaround as of now.
... View more
- Find more articles tagged with:
- Data Processing
- Issue Resolution
- Phoenix
Labels: