Member since
03-01-2016
104
Posts
97
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1670 | 06-03-2018 09:22 PM | |
28206 | 05-21-2018 10:31 PM | |
2179 | 10-19-2016 07:13 AM |
07-14-2020
02:47 PM
@Thirupathi These articles were written keeping HDP 2.6.x versions in mind. With HDP 3 and CDH6 having Phoenix 5.0 , many issues have been resolved. But I cannot comment on case to case basis here. You will need to log a support ticket for more comprehensive discussion on specific JIRA basis.
... View more
07-02-2019
08:10 AM
1 Kudo
In HDP 2.x we had hbck tool which was used to identify and fix inconsistencies in hbase tables. However with HBase 2.x and HDP 3.x together this tool is deprecated and its strongly advised to not run it on production clusters as it may lead to data corruption. HBase 2.0 introduces a completely revamped HBase Assignment Manager with Proc V2 framework (State machine). In HDP-2.x, we maintained HBASE_REGION_STATE and TABLE_STATE in zookeeper and Hmaster in-memory ( possibility for inconsistencies) but with HDP-3 it uses MasterProcWAL procedure which is stored in HDFS. These set of procedures are now scheduled and executed by HMaster. Given the changes on internals of new AssignmentManager , the old hbck fix options no longer work, but it can still provide a report about tables/regions states. To fix regions assignment problems that may still occur, a new HBCK 2 has been designed, and shipped independently from hbase build artifacts (and hence not part of HDP 3.x distribution ), as a plan to have it able to evolve on its own, with new fix options for previously unforeseen issues added and integrated into the tool without the need of whole new hbase release. This HBCK2 tool as of now is not as intuitive and useful as old hbck but it helps fix issues related to Proc v2 (introduced in HBase 2.0) and other issues such as region in transition. Note: HBCK2 tool requires minimum HBase 2.0.3 which comes starting HDP 3.1 To build hbase-hbck2: # cd <work directory> # git clone https://github.com/apache/hbase-operator-tools.git # cd hbase-operator-tools/ # mvn clean install -DskipTests # cd hbase-hbck2/target # Download hbase-hbck2-1.0.0-SNAPSHOT.jar Running HBCK2 Once HBCK2 has been uploaded to the target cluster, it can be executed by specifying its jar path in the "-j" option of "hbck" command as below: # su - hbase $ hbase hbck -j <jar path>/hbase-hbck2-1.0.0-SNAPSHOT.jar <COMMAD> You may encounter following error running in HDP 3.0 Exception in thread "main" java.lang.UnsupportedOperationException: Requires 2.0.3 at least. Use -s option, instead. $ hbase hbck -j <jar path>/hbase-hbck2-1.0.0-SNAPSHOT.jar -s <COMMAND> Reference: HBASE-14350 HBASE-19121 HBase-Operator-Tool Special mention and thanks to Karthik Palanisamy for critical inputs here.
... View more
Labels:
11-17-2018
01:22 AM
3 Kudos
HBase, Phoenix And Ranger In Part 1, Part 2 and Part3 of this article series , we discussed internals of Phoenix Index maintenance and major issues hit around this feature. In this article we will discuss about Phoenix - Ranger relationship , how it works and what had broken until recently which caused reporting of several issues. How native HBase authorization work: ACLs in HBase are implemented as a coprocessor called AccessController. (hbase.security.authorization=true). Users are granted specific permissions such as Read, Write, Execute, Create, Admin against resources such as global, namespaces, tables, cells, or endpoints. ( all self explanatory) There is an additional user called “Superuser”. Superusers can perform any operation available in HBase, to any resource. The user who runs HBase on your cluster is a superuser, as are any principals assigned to the configuration property hbase.superuser in hbase-site.xml. Much more details on this subject are here How things Changed with Ranger HBase plugin enabled ? Once Ranger is involved, one can create policies for HBase from Ranger Policy Manager or via Grant / Revoke commands from HBase shell only. These Grant / Revoke commands are mapped to ranger policies and Ranger intercepting appropriate commands from hbase shell adds or edits ranger policies according to user/group and resource information provided in command. And of course, the user running these commands must be an admin user. It has been seen that using grant / revoke commands which are mapped with Ranger create multiple issues and creation of redundant or conflicting policies. Thus we have an option to disable this feature completely and allow use of only Ranger Policy Manager to manage permissions. You can disable the command route by setting following parameter in Ranger configs (ranger-hbase-security.xml): <property> <name>xasecure.hbase.update.xapolicies.on.grant.revoke</name>
<value>false</value>
<description> Should HBase plugin update Ranger policies for updates to permissions done using GRANT/REVOKE? </description> </property> How it works in Phoenix with Ranger: Simply put, having a Phoenix table means an existence of HBase table as well and therefore any permissions required to access that HBase table are also required for this Phoenix table. But this not a complete truth, Phoenix has something called as SYSTEM tables which manage table metadata, and thus users also need to have sufficient permissions on these system tables to be able to login to Phoenix shell, view, create, delete tables etc. By design, only the first ever user connecting to Phoenix needs the CREATE permission on all SYSTEM tables. This is a first-time operation so that system tables get created if not created already. For every other time, regular users should require READ on the system tables. For users requiring to create tables in Phoenix would need WRITE as well. But this functionality broke due to PHOENIX-3652 (partly fixed in HDP 2.6.1) and other ranger level complexities and due to this Phoenix expected full permissions on system tables. Users observed any of the following exceptions either during phoenix shell launch or during any DDL operation: Error: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions (user=test@HWX.COM scope=SYSTEM:CATALOG, family=0:SALT_BUCKETS, params=[table=SYSTEM:CATALOG,family=0:SALT_BUCKETS],action=WRITE) OR Error: org.apache.hadoop.hbase.security.AccessDeniedException: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions for user 'test@EXAMPLE.COM' (action=admin) To get this working temporarily, users created a policy in ranger and gave all access to these system tables as follows:
Table : SYSTEM.* Column Family : * Column : * Groups : public Permissions : Read, Write, Create, Admin Now this was all good in an ideal world, but in real world it raises lot of security issues, customers do not want users to have all access on these system tables due to the obvious fear of manipulation on user tables and their metadata. To take care of this concern, our developers started working on PHOENIX-4198 (fix available with HDP 2.6.3) where there would be a need for giving only RX permissions on SYSTEM.CATALOG table and rest of the authorization part would be done by a coprocessor endpoint querying either Ranger or native HBase ACLs appropriately. Important to know that this feature does not support working with Ranger yet. (Work In Progress) However, above feature was specifically designed for system.catalog and users reported issues for system.stats as well where write permissions to users were required in order to drop a table. This has been reported in PHOENIX-4753 and the issue is still unresolved. You may see following exceptions: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions (user=user01t01@EXAMPLE.COM, scope=SYSTEM:STATS, family=0:, params=[table=SYSTEM:STATS,family=0:],action=WRITE) Here again, workaround would be to give this user or group a write permission to system.stats. grant '@group', 'RWX' , 'SYSTEM:STATS' Also See : Part 1, Part 2 , Part3
... View more
Labels:
11-17-2018
12:56 AM
4 Kudos
Issues with Global Indexes In Part 1 and Part 2 of this article series we discussed Index internals and some frequently faced issues. In this article we will cover few more index issues in the form of scenarios. Scenario 4 : Index writes are failing, client retries exhaused, handler pool saturated while Index table regions are in transition. Here client is trying to write to data table on server1 which is triggering index update for server2 (via server-to-server RPC) . If Index region is stuck in transition, the index update RPC would also hang and eventually timeout and because of this the RPCs between client and server1 also gets stuck and timed out. Since client would make several retries to write this mutation, it would again lead to handler saturation on region server one causing another “deadlock” like situation. This should be fixed in two steps:
Fix all Index RITs first as without this no client index maintenance or server side index rebuild would succeed. As a holistic tuning, keep server side RPC timeout (hbase.rpc.timeout) relatively smaller than phoenix client side timeout (phoenix.query.timeoutMs). This is so that server side RPCs are not stuck due to hung client side queries. Scenario 5 : Row count mismatch between Phoenix data table and Index table When data table is bulk loaded for existing primary keys. There is a limitation in CSV BulkLoad for phoenix tables with secondary index. We must know that when an index update is carried out from data table server to index table server, the first step is to retrieve existing row state from index table , delete it and then insert the updated row.However CSV bulkload does not perform this check and delete steps and directly upserts the data to index table, thus making duplication of rows for same primary key. As of writing this article, the only workaround was to delete index and build it fresh using IndexTool (async way). Scenario 6: Region servers crashing, Index table disabled and ZK connections maxing out In some cases, it was seen that region servers crashed due to long GC pauses , Index updates to other servers failed due to exceptions such as "Unable to create Native Threads" eventually leading to Index table going into "disabled" state. It was also observed that ZK connections were also maxing out from Region Servers. ("Too Many Connections" in ZK log) There could be many intertwined reasons of what issue triggered the other issue but PHOENIX-4685 was seen to be playing part in many of such issues. Basically region servers (in an attempt to update index ) create sessions with zookeeper in order to do meta lookup, and this connection cache is maintained in region server heap which eventually grows large and causes GC pauses leading to server crashes, once region server crashes, Index update fails on this server and Index goes into disabled state and the vicious circle continues. A careful examination of situation and detailed log analysis is required though to come to a conclusion on this bug. In Part 4 of this article series, we will talk about Phoenix - Ranger relationship. Also See : Part 1, Part 2
... View more
Labels:
11-17-2018
12:22 AM
5 Kudos
Issues with Global Indexes In Part 1 of this article series, we discussed the internals of Index maintenance, in this part we will cover some of the major issues we face during the life cycle of Index maintenance. Before we get into issues, we need to understand various “states” of Index table which reflect its health in general. BUILDING("b") : This will partially rebuild the index from the last disabled timestamp UNUSABLE (“d”) / INACTIVE ("i") : This will cause the index to no longer be considered for use in queries, however index maintenance will continue to be performed. ACTIVE("a") : Index ready to use and updated. DISABLE("x") : This will cause the no further index maintenance to be performed on the index and it will no longer be considered for use in queries. REBUILD("r") : This will completely rebuild the index and upon completion will enable the index to be used in queries again. --- What happens when an Index update fails for any reason : The answer is not straight as there are choices of implementations here based on use case or table types. Following are two choices we have: Choice 1: Block writes to data table but let Index continue to serve read requests. Maintain a point of “consistency” in the form of a timestamp just before failure occurred. Keep the write block until Index table is rebuilt in the background and gets in sync with data table again. Properties involved are: phoenix.index.failure.block.write=true phoenix.index.failure.handling.rebuild=true This option is not yet available in HDP 2 but available with HDP 3.0 Choice 2: Writes to the data table are not stopped but the index table in question is disabled to be detected by rebuilder threads (pushed from system.catalog hosting server ), converted as “inactive” and partially rebuilt again. In this mode, Index table will not serve any requests to clients.This is the implementation we are using with HDP 2 . Properties involved are: phoenix.index.failure.handling.rebuild=true phoenix.index.failure.handling.rebuild.interval=10000 (or 10 seconds, interval in which server checks if any index table needs partial rebuild ) phoenix.index.failure.handling.rebuild.overlap.time=1 (time to go back before index_disable_timestamp to be able to rebuild from that point) ---- Few Scenarios for Troubleshooting issues: There are various scenarios which could help us gain more insight into how Index maintenance, update and failure handling is done in Phoenix. (and we will only talk about choice 2 above) Scenario 1: Index update is written to WAL and before being written to data or Index table region server hosting data table crashes. WAL is replayed and Index updates are committed via server-to-server RPC Scenario 2 : Data table is written however server-to-server RPC to Index table fails This is where state of Index table will change to disabled. A rebuilder thread in server hosting system.catalog table keeps checking on these index states, as soon as it detects a “disabled” index table, it starts rebuild process by first marking this table as “Inactive” , then running rebuild scan on data table regions and finally making index updates via server to server RPCs. Client Queries during this time only refer data table. Here it's good to know about “INDEX_DISABLE_TIMESTAMP” , It is the timestamp at which index got disabled. It will be 0 , if the index is active or disabled by client manually and will be non-zero if index is disabled during write failures. Thus rebuild will only happen after disabled timestamp updates. One can use following query to check the value of this column: select TABLE_NAME, cast(INDEX_DISABLE_TIMESTAMP as timestamp) from SYSTEM.CATALOG where index_state is not null limit 10;
+------------------------+----------------------------------------
+ | TABLE_NAME | TO_TIMESTAMP(INDEX_DISABLE_TIMESTAMP) |
+------------------------+----------------------------------------
+ | TEST_INDEX_PERF | 2018-05-26 10:28:54.079 |
| TEST1_INDEX_PERF | 2018-05-26 10:28:54.079
| +------------------------+----------------------------------------
+ 2 rows selected (0.089 seconds) Once rebuild completes in background, Index table’s state changes back to “active”. All this while data table keeps serving read and write requests. Scenario 3 : Index went into disabled state, HBase became unresponsive, handlers are saturated (verified from Grafana), Queries are dead slow and nothing is moving. Let's break this down into a sequence of most probable events:
Multiple Client writing to region server 1 (data table) using all of the default handlers. Now there are no handlers left on region server 1 to write the index update to region server 2 which hosts Index table regions. Since index update is not written on RS2, client RPC on RS1 does not free up (and if situation continues, times out after hbase.rpc.timeout ) Because Index update failed, Index table goes into disabled state. Rebuilder threads detect disabled state of Index and start rebuilding this table subsequently contesting for the same default handler pool aggravating this situation further. This is a very common “deadlock” scenario and users struggle to find what caused all these issues and where to start fixing them. In computer science, this situation is also known as “dining philosophers problem”. Above sequence of events could cause some or all of the possible issues:
queries getting hung or timed out region servers getting unresponsive clients unable to login to phoenix shell long GC pauses (due to large number of objects creation ) Point “4” above would eventually break the session with zookeeper and may bring the region server down. What is the solution to this problem ? Since we had a common pool of default handlers for client and servers both which caused these issues, it was decided to create a dedicated Index handler pool and a custom RPC scheduler for the same. Also add custom RPC controller to the chain of controllers. This would filter outgoing index RPCs and tag them for higher priority. Following parameters were expected be added for the same (already part of HDP 2.6): <property> <name>hbase.region.server.rpc.scheduler.factory.class</name>
<value>org.apache.hadoop.hbase.ipc.PhoenixRpcSchedulerFactory</value>
</property><property><name>hbase.rpc.controllerfactory.class</name>
<value>org.apache.hadoop.hbase.ipc.controller.ServerRpcControllerFactory</value></property> However, there was another issue introduced (PHOENIX-3360, PHOENIX-3994) due to these added parameters. Since clients also shared the same hbase-site.xml with these additional parameters , they started sending normal requests tagged with index priority. Similarly Index rebuild scans also sent their RPCs tagged with index priority and using Index handler pool which is not what it was designed for and led many users to another “deadlock” situation where index writes would fail because most index handlers are busy doing rebuild scans or being used by clients. The fix to PHOENIX-3994 (part of HDP 2.6.5) would remove dependencies on these parameters for index priority and hence these parameters would neither be needed at server side nor at client side. However Ambari still adds these parameters and could still create issues. A quick heck would be to remove these two properties from all the client side hbase-site.xml files. For clients such as NIFI which source hbase-site.xml from phoenix-client jars, it would be good to zip the updated hbase-site.xml in the jar itself. If you have many or large Index tables which require substantial number of RPCs, you can also define “phoenix.rpc.index.handler.count” in custom hbase-site.xml and give it an appropriate value proportional to the total handler counts you have defined. We will discuss couple more scenarios in Part 3 of this article series. Also See: Part 1, Part4
... View more
Labels:
11-17-2018
12:03 AM
6 Kudos
Phoenix secondary index Phoenix secondary Indexes are useful for point lookups or scans directed against non primary key of Phoenix or non row key columns of HBase. This saves the “full scan” of data table you would otherwise do if you intend to retrieve data based non rowkey. You create secondary indexes by choosing existing non primary key column from data table and making it as primary or a covered column. By covered column, we mean making exact copy of the covered column’s data from data table to index table. Types of secondary Index: Functional Index: Built on functions rather than just columns Global secondary Index: This is the one where we make a exact copy of covered columns and call it index table. In simple terms, its an upsert select on all chosen columns from data table to Index table. Since a lot of write is involved during initial stages of index creation, this type of index would work best for read heavy use cases where data is written only occasionally and read more frequently. There are two ways we can create global index : Sync way : In this data table is upsert selected and the rows transported over to client and client in turn writes to index table. Very cumbersome and error prone (timeouts etc) method. Async way: Index is written asynchronously via mapreduce job. Here index state becomes “building” and each single mapper works with each data table region and writes to index region. For specific commands on creating various types of indexes , refer here Thus, global index (above) assumes the following points : you have lot of available disk space to create several copies of data columns. You do not worry about write penalties (across network !) of creating indexes The query is fully covered i.e. all columns queried are part of index.Note that global index would not be used if a column is referred in query which is not part of index table (unless we use hint ) Local Index: What if none or some of above assumptions are not fulfilled ? Thats where local index becomes useful as it is part of data table itself in the form of a shadow column family (eliminating assumption 1) , best fit for write heavy use-cases (eliminating assumption 2) and can best be used for partially covered queries as data and index tables co reside. (eliminating assumption 3) For all practical purposes, I will talk about global index only as that is most common use case and most stable option so far. How global Index maintenance works To go into details of Index maintenance, we need to also know about another type of global secondary index : Immutable Global Secondary Index: This is the type of index where index is written once and never updated in-place. Only the client which writes to data table is responsible for writing to index table as well (at the same time ! ). Thus its purely client’s responsibility to maintain sync between data and index table. Use cases such as time series data or event logs can take advantage of immutable data and index tables. (create data table with IMMUTABLE_ROWS=true option and all index created would default to immutable) Mutable Global Secondary Index: Here index maintenance is done via server to server RPC (network and handler overhead remember ! ) between data table server and index table server. For simplicity , we can believe that if client was successfully able to write to data table, writes to index table also would have been completed by the data region server. However many issues around this aspect exist, which we will discuss in Part 2. There are two more varieties of tables called transactional tables and non transactional tables. Transactional tables intend to have atomic writes to data and index table (ACID compliant) and are still work in progress. Thus in next few sections and articles, for all practical purposes, we will talk about non transactional mutable global secondary indexes. Here are the steps involved in Index maintenance : Client submits “upsert” RPC to regionserver 1 The mutation is written to WAL (and thus makes it durable) so in case if region server crashes at this point or later , WAL replay syncs the Index table. ( If there is a write failure before this step, its client which is supposed to retry.) In preBatchMutate step (part of Phoenix Indexer box in above diagram) , Phoenix coprocessor prepares for writing index update to Region server 2 ( actually step 2 and 3 occur together ) The mutation is written to data table. In postBatchMutate step (also part of Phoenix Indexer box in above diagram ), Index update is committed on regionserver 2 via server-to-server RPC call. Understanding of these steps is very important because in Part 2 of this article series , we will discuss about various issues appearing in index maintenance, index going out of sync, index getting disabled , Queries slowing down, region servers getting unresponsive etc. References: https://phoenix.apache.org https://issues.apache.org Also See: Part 3, Part 4
... View more
Labels:
08-21-2018
10:35 PM
Phoenix shipped with HDP does not support import from Sqoop yet.
... View more
06-05-2018
08:36 PM
@John how many znodes you have on zookeeper ? One of the reasons for Keeper Exceptions are bigger request / response size between zookeeper and its client component.
... View more