About srai1

srai1 · ‎01-26-2018

Simple illustration of locking in Hive when ACID is enabled Considerations for illustration Cluster Version: HDP -2.5.6.0 Hive Version: Hive 1.2.1000 Enabled with following properties in place hive.support.concurrency=true hive.compactor.initiator.on=true hive.compactor.worker.threads=1 hive.exec.dynamic.partition.mode=nonstrict hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager (if non-Ambari cluster) Types of Supported Locks S = SHARED or SHARED_READ X = EXCLUSIVE Tables used for testing orc_tab (ORC format table with col1 int and col2 string), non-transactional orc_tab_bucketed(ORC format table with col1 int and col2 string, transactional) txt_tab (TEXT format table with col1 int, col2 string, non-transactional, for loading purposes) Either tables have closed to 5 GB data on a Single node cluster SCENARIO 1 (Non Transactional Table) SELECT blocks ALTER SELECT starts first followed by ALTER Session A beeline -u "jdbc:hive2://localhost:10000/default" -n hive -p '' -e "select col1 from orc_tab order by col1 limit 2" Session B beeline -u "jdbc:hive2://localhost:10000/default" -n hive -p '' -e "ALTER TABLE orc_tab ADD COLUMNS (col3 string)" Session C +----------+-----------+----------+------------+-------------+---------------+--------------+-----------------+-----------------+----------------+-------+-------------------------+-----------------------------------------------------------+--+ | lockid | database | table | partition | lock_state | blocked_by | lock_type | transaction_id | last_heartbeat | acquired_at | user | hostname | agent_info | +----------+-----------+----------+------------+-------------+---------------+--------------+-----------------+-----------------+----------------+-------+-------------------------+-----------------------------------------------------------+--+ | Lock ID | Database | Table | Partition | State | Blocked By | Type | Transaction ID | Last Heartbeat | Acquired At | User | Hostname | Agent Info | | 31.1 | default | orc_tab | NULL | ACQUIRED | | SHARED_READ | NULL | 1517003062122 | 1517003062122 | hive | xlpatch.openstacklocal | hive_20180126214422_aaeb4b28-5170-4131-b509-ef0213c8b842 | | 32.1 | default | orc_tab | NULL | WAITING | 31.1 | EXCLUSIVE | NULL | 1517003063314 | NULL | hive | xlpatch.openstacklocal | hive_20180126214422_a65af104-05d1-4c19-ab54-7bb37b4cdbfa | +----------+-----------+----------+------------+-------------+---------------+--------------+-----------------+-----------------+----------------+-------+-------------------------+-----------------------------------------------------------+--+ SCENARIO 2 (Non Transactional Table) SELECT blocks INSERT OVERWRITE SELECT starts first followed by INSERT OVERWRITE Session A beeline -u "jdbc:hive2://localhost:10000/default" -n hive -p '' -e "select col1 from orc_tab order by col1 limit 2" Session B beeline -u "jdbc:hive2://localhost:10000/default" -n hive -p '' -e "INSERT OVERWRITE TABLE orc_tab SELECT col1,col2 from txt_tab" Session C 0: jdbc:hive2://localhost:10000/default> show locks;+----------+-----------+----------+------------+-------------+---------------+--------------+-----------------+-----------------+----------------+-------+-------------------------+-----------------------------------------------------------+--+| lockid | database | table | partition | lock_state | blocked_by | lock_type | transaction_id | last_heartbeat | acquired_at | user | hostname | agent_info |+----------+-----------+----------+------------+-------------+---------------+--------------+-----------------+-----------------+----------------+-------+-------------------------+-----------------------------------------------------------+--+| Lock ID | Database | Table | Partition | State | Blocked By | Type | Transaction ID | Last Heartbeat | Acquired At | User | Hostname | Agent Info || 36.1 | default | orc_tab | NULL | ACQUIRED | | SHARED_READ | NULL | 1517003567582 | 1517003567582 | hive | xlpatch.openstacklocal | hive_20180126215247_7537e30b-d5bf-4fc8-aa23-8e860efe1ac8 || 37.1 | default | txt_tab | NULL | WAITING | | SHARED_READ | NULL | 1517003568897 | NULL | hive | xlpatch.openstacklocal | hive_20180126215248_875685ed-a552-4009-892c-e13c61cf7eb5 || 37.2 | default | orc_tab | NULL | WAITING | 36.1 | EXCLUSIVE | NULL | 1517003568897 | NULL | hive | xlpatch.openstacklocal | hive_20180126215248_875685ed-a552-4009-892c-e13c61cf7eb5 |+----------+-----------+----------+------------+-------------+---------------+--------------+-----------------+-----------------+----------------+-------+-------------------------+-----------------------------------------------------------+--+ SCENARIO 3 (Non Transactional Table) SELECT blocks INSERT SELECT starts first followed by INSERT Session A beeline -u "jdbc:hive2://localhost:10000/default" -n hive -p '' -e "select col1 from orc_tab order by col1 limit 2" Session B beeline -u "jdbc:hive2://localhost:10000/default" -n hive -p '' -e "INSERT INTO orc_tab SELECT col1,col2 from txt_tab limit 20" Session C 0: jdbc:hive2://localhost:10000/default> show locks;+----------+-----------+----------+------------+-------------+---------------+--------------+-----------------+-----------------+----------------+-------+-------------------------+-----------------------------------------------------------+--+| lockid | database | table | partition | lock_state | blocked_by | lock_type | transaction_id | last_heartbeat | acquired_at | user | hostname | agent_info |+----------+-----------+----------+------------+-------------+---------------+--------------+-----------------+-----------------+----------------+-------+-------------------------+-----------------------------------------------------------+--+| Lock ID | Database | Table | Partition | State | Blocked By | Type | Transaction ID | Last Heartbeat | Acquired At | User | Hostname | Agent Info || 38.1 | default | orc_tab | NULL | ACQUIRED | | SHARED_READ | NULL | 1517004119030 | 1517004119030 | hive | xlpatch.openstacklocal | hive_20180126220158_775842e7-5e34-42d0-b574-874076fd5204 || 39.1 | default | txt_tab | NULL | WAITING | | SHARED_READ | NULL | 1517004120971 | NULL | hive | xlpatch.openstacklocal | hive_20180126220200_9e9eeb8c-9c32-42fd-8ddf-c96f08699224 || 39.2 | default | orc_tab | NULL | WAITING | 38.1 | EXCLUSIVE | NULL | 1517004120971 | NULL | hive | xlpatch.openstacklocal | hive_20180126220200_9e9eeb8c-9c32-42fd-8ddf-c96f08699224 |+----------+-----------+----------+------------+-------------+---------------+--------------+-----------------+-----------------+----------------+-------+-------------------------+-----------------------------------------------------------+--+4 rows selected (0.028 seconds) SCENARIO 4 (Transactional Table) SELECT does not block INSERT SELECT starts first followed by INSERT Session A beeline -u "jdbc:hive2://localhost:10000/default" -n hive -p '' -e "select col1 from orc_tab_bucketed order by col1 limit 2" Session B beeline -u "jdbc:hive2://localhost:10000/default" -n hive -p '' -e "INSERT INTO orc_tab_bucketed SELECT col1,col2 from txt_tab limit 20" Session C 0: jdbc:hive2://localhost:10000/default> show locks;+----------+-----------+-------------------+------------+-------------+---------------+--------------+-----------------+-----------------+----------------+-------+-------------------------+-----------------------------------------------------------+--+| lockid | database | table | partition | lock_state | blocked_by | lock_type | transaction_id | last_heartbeat | acquired_at | user | hostname | agent_info |+----------+-----------+-------------------+------------+-------------+---------------+--------------+-----------------+-----------------+----------------+-------+-------------------------+-----------------------------------------------------------+--+| Lock ID | Database | Table | Partition | State | Blocked By | Type | Transaction ID | Last Heartbeat | Acquired At | User | Hostname | Agent Info || 42.1 | default | orc_tab_bucketed | NULL | ACQUIRED | | SHARED_READ | NULL | 1517004495025 | 1517004495025 | hive | xlpatch.openstacklocal | hive_20180126220814_cae3893a-8e97-49eb-8b07-a3a60c4a6dc2 || 43.1 | default | txt_tab | NULL | ACQUIRED | | SHARED_READ | 3 | 0 | 1517004495874 | hive | xlpatch.openstacklocal | hive_20180126220815_a335e284-476a-42e0-b758-e181e6ab44e9 || 43.2 | default | orc_tab_bucketed | NULL | ACQUIRED | | SHARED_READ | 3 | 0 | 1517004495874 | hive | xlpatch.openstacklocal | hive_20180126220815_a335e284-476a-42e0-b758-e181e6ab44e9 |+----------+-----------+-------------------+------------+-------------+---------------+--------------+-----------------+-----------------+----------------+-------+-------------------------+-----------------------------------------------------------+--+4 rows selected (0.02 seconds) SCENARIO 5 (Transactional Table) SELECT does not block INSERT OVERWRITE SELECT starts first followed by INSERT OVERWRITE Session A beeline -u "jdbc:hive2://localhost:10000/default" -n hive -p '' -e "select col1 from orc_tab_bucketed order by col1 limit 2" Session B beeline -u "jdbc:hive2://localhost:10000/default" -n hive -p '' -e "ALTER TABLE orc_tab_bucketed ADD COLUMNS (col3 string)" Session C 0: jdbc:hive2://localhost:10000/default> show locks;Getting log thread is interrupted, since query is done!+----------+-----------+-------------------+------------+-------------+---------------+--------------+-----------------+-----------------+----------------+-------+-------------------------+-----------------------------------------------------------+--+| lockid | database | table | partition | lock_state | blocked_by | lock_type | transaction_id | last_heartbeat | acquired_at | user | hostname | agent_info |+----------+-----------+-------------------+------------+-------------+---------------+--------------+-----------------+-----------------+----------------+-------+-------------------------+-----------------------------------------------------------+--+| Lock ID | Database | Table | Partition | State | Blocked By | Type | Transaction ID | Last Heartbeat | Acquired At | User | Hostname | Agent Info || 53.1 | default | orc_tab_bucketed | NULL | ACQUIRED | | SHARED_READ | NULL | 1517005855005 | 1517005855005 | hive | xlpatch.openstacklocal | hive_20180126223053_db2d0054-6cb6-48fb-b732-6ca677007695 || 54.1 | default | orc_tab_bucketed | NULL | WAITING | 53.1 | EXCLUSIVE | NULL | 1517005855870 | NULL | hive | xlpatch.openstacklocal | hive_20180126223054_6294af5a-15da-4178-9a83-40f150e08cb1 |+----------+-----------+-------------------+------------+-------------+---------------+--------------+-----------------+-----------------+----------------+-------+-------------------------+-----------------------------------------------------------+--+3 rows selected (0.064 seconds) Synopsis Without "transactional" feature set to true EXCLUSIVE lock (ALTER) waits for SHARED (SELECT) EXCLUSIVE lock (INSERT OVERWRITE) waits for SHARED (SELECT) EXCLUSIVE lock (INSERT) waits for SHARED (SELECT) With "transactional" enabled EXCLUSIVE lock (ALTER) waits for SHARED (SELECT) INSERT/SELECT both take SHARED lock

srai1 · ‎09-27-2017

Goal Create a new Ambari view for Hive Interactive. Use this link to get detailed information on configuring views. https://docs.hortonworks.com/HDPDocuments/Ambari-2.5.1.0/bk_ambari-views/content/settings_and_cluster_configuration.html Steps Navigate to Ambari page with admin privileges and click on the username dropdown icon Select the views link to explore all the views available in Ambari Collapse the "Hive" dropdown and click on "Create Instance" to create a new view for LLAP/Interactive Give the name of this instance per your requirement Ensure that under "Settings" tab, "User Interactive Mode" is set to true If the cluster is kerberized, use proper auth method and principal name Also update the proper JDBC URL with principal name NOTE: If ranger is enabled, ensure that the user trying to access the database objects does have the permissions to browse the contents of the database(s).

srai1 · ‎08-03-2017

Goal: Demonstrate how to change the database location in HDFS and Metastore There are circumstances wherein we can consider moving the database location. By default, the location for default and custom databases is defined within the value of hive.metastore.warehouse.dir, which is /apps/hive/warehouse. Here are the illustrated steps to change a custom database location, for instance "dummy.db", along with the contents of the database. Verify the details of the database we would like to move to a new location [hive@xlautomation-2 ~]$ beeline -u "jdbc:hive2://xlautomation-2.h.c:10000/default;principal=hive/xlautomation-2.h.c@H.C" 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> create database dummy; No rows affected (0.394 seconds) 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> describe database dummy; +----------+----------+--------------------------------------------------------------+-------------+-------------+-------------+--+ | db_name | comment | location | owner_name | owner_type | parameters | +----------+----------+--------------------------------------------------------------+-------------+-------------+-------------+--+ | dummy | | hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/dummy.db | hive | USER | | +----------+----------+--------------------------------------------------------------+-------------+-------------+-------------+--+ 1 row selected (0.561 seconds) NOTE: The example provides the database location i.e. /apps/hive/warehouse/dummy.db which needs to be updated. Verified the same using dummy table to test whether the location update was indeed successful 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> create table dummy.test123 (col1 string, col2 string) row format delimited fields terminated by ',' stored as textfile; No rows affected (0.691 seconds) 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> insert into dummy.test123 values (1,1),(2,2),(3,3),(4,4),(5,5),(6,6); INFO : Session is already open INFO : Dag name: insert into dummy.tes...3),(4,4),(5,5),(6,6)(Stage-1) INFO : Status: Running (Executing on YARN cluster with App id application_1499274604190_0034) INFO : Loading data to table dummy.test123 from hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/dummy.db/test123/.hive-staging_hive_2017-08-03_16-20-11_965_647196527379814552-1/-ext-10000 INFO : Table dummy.test123 stats: [numFiles=1, numRows=6, totalSize=24, rawDataSize=18] No rows affected (2.47 seconds) 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> select * from dummy.test123; +---------------+---------------+--+ | test123.col1 | test123.col2 | +---------------+---------------+--+ | 1 | 1 | | 2 | 2 | | 3 | 3 | | 4 | 4 | | 5 | 5 | | 6 | 6 | +---------------+---------------+--+ 6 rows selected (0.451 seconds) Create a new storage DIR of our choice (we used newdummy.db) and replicate the permission at the directory level. [hive@xlautomation-2 ~]$ hdfs dfs -mkdir -p /apps/hive/warehouse/newdummy.db [hive@xlautomation-2 ~]$ hdfs dfs -mkdir -p /apps/hive/warehouse/newdummy.db [hive@xlautomation-2 ~]$ hdfs dfs -chmod 777 /apps/hive/warehouse/newdummy.db Verify if the DB (dir) level permissions are the same [hive@xlautomation-2 ~]$ hdfs dfs -ls /apps/hive/warehouse | egrep dummy.db drwxrwxrwx - hive hdfs 0 2017-08-03 16:19 /apps/hive/warehouse/dummy.db drwxrwxrwx - hive hdfs 0 2017-08-03 16:27 /apps/hive/warehouse/newdummy.db Copy all the underlying contents from /apps/hive/warehouse/dummy.db/ into the new directory [hive@xlautomation-2 ~]$ hdfs dfs -cp -f -p /apps/hive/warehouse/dummy.db/* /apps/hive/warehouse/newdummy.db/ Caution: The usage of "cp" with "p" to preserve the permission is prone to the following error cp: Access time for hdfs is not configured. Please set dfs.namenode.accesstime.precision configuration parameter. This is because the value of dfs.namenode.accesstime.precision is set to 0 by default, in hortonworks HDP distribution. Since this is a client level configuration, it can be configured in hdfs-site.xml on a non-ambari managed cluster in client i.e., from 0 to 3600000. We can verify this at the client level by running the following command. [hive@xlautomation-2 ~]$ hdfs getconf -confKey dfs.namenode.accesstime.precision 3600000 Once the change is made, copy the contents of database folder /dummy.db/* to the new location i.e., /newdummy.db/ as HDFS user. We are overwriting (-f) any existing files within new directory and (-p) preserving the permissions [hdfs@xlautomation-2 ~]$ hdfs dfs -cp -f -p /apps/hive/warehouse/dummy.db/* /apps/hive/warehouse/newdummy.db/ Check the permissions once the copy is completed [hdfs@xlautomation-2 ~]$ hdfs dfs -ls /apps/hive/warehouse/dummy.db/ Found 1 items drwxrwxrwx - hive hdfs 0 2017-08-03 16:20 /apps/hive/warehouse/dummy.db/test123 [hdfs@xlautomation-2 ~]$ [hdfs@xlautomation-2 ~]$ [hdfs@xlautomation-2 ~]$ hdfs dfs -ls /apps/hive/warehouse/newdummy.db/ Found 1 items drwxrwxrwx - hive hdfs 0 2017-08-03 16:20 /apps/hive/warehouse/newdummy.db/test123 With the privileged user access to metastore db (hive in our case) we may need to update three tables i.e., DBS, SDS and FUNC_RU as they log the locations for database, table and function in that order. In our example, since we do not have any functions, we will just update SDS and DBS tables mysql> update SDS set location= replace(location,'hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/dummy.db','hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/newdummy.db') where location like '%dummy.db%'; Query OK, 3 rows affected (0.53 sec) Rows matched: 3 Changed: 3 Warnings: 0 mysql> update DBS set db_location_uri= replace(db_location_uri,'hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/dummy.db','hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/newdummy.db') where db_location_uri like '%dummy.db%'; Query OK, 1 row affected (0.06 sec) Rows matched: 1 Changed: 1 Warnings: 0 NOTE: If you want to try and run this before committing the changes in metastore, use begin; before and end; after your UPDATE statements. This update statement will replace all the occurrences of specified string within DBS and SDS tables. Check if the changes made to the tables were permanent, the location should be updated to */newdummy.db 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> describe database dummy; +----------+----------+-----------------------------------------------------------------+-------------+-------------+-------------+--+ | db_name | comment | location | owner_name | owner_type | parameters | +----------+----------+-----------------------------------------------------------------+-------------+-------------+-------------+--+ | dummy | | hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/newdummy.db | hive | USER | | +----------+----------+-----------------------------------------------------------------+-------------+-------------+-------------+--+ 1 row selected (0.444 seconds) Verify the data from the table and also confirm its location 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> describe formatted dummy.test123; +-------------------------------+-------------------------------------------------------------------------+-----------------------------+--+ | col_name | data_type | comment | +-------------------------------+-------------------------------------------------------------------------+-----------------------------+--+ | # col_name | data_type | comment | | | NULL | NULL | | col1 | string | | | col2 | string | | | | NULL | NULL | | # Detailed Table Information | NULL | NULL | | Database: | dummy | NULL | | Owner: | hive | NULL | | CreateTime: | Thu Aug 03 16:19:33 UTC 2017 | NULL | | LastAccessTime: | UNKNOWN | NULL | | Protect Mode: | None | NULL | | Retention: | 0 | NULL | | Location: | hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/newdummy.db/test123 | NULL | | Table Type: | MANAGED_TABLE | NULL | | Table Parameters: | NULL | NULL | | | COLUMN_STATS_ACCURATE | {\"BASIC_STATS\":\"true\"} | | | numFiles | 1 | | | numRows | 6 | | | rawDataSize | 18 | | | totalSize | 24 | | | transient_lastDdlTime | 1501777214 | | | NULL | NULL | | # Storage Information | NULL | NULL | | SerDe Library: | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL | | InputFormat: | org.apache.hadoop.mapred.TextInputFormat | NULL | | OutputFormat: | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL | | Compressed: | No | NULL | | Num Buckets: | -1 | NULL | | Bucket Columns: | [] | NULL | | Sort Columns: | [] | NULL | | Storage Desc Params: | NULL | NULL | | | field.delim | , | | | serialization.format | , | +-------------------------------+-------------------------------------------------------------------------+-----------------------------+--+ 33 rows selected (0.362 seconds) 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> select * from dummy.test123; +---------------+---------------+--+ | test123.col1 | test123.col2 | +---------------+---------------+--+ | 1 | 1 | | 2 | 2 | | 3 | 3 | | 4 | 4 | | 5 | 5 | | 6 | 6 | +---------------+---------------+--+ 6 rows selected (0.275 seconds) 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> Considerations Remove the old database directory only when you are sure the tables are readable To check if hive or other privileged user has access to modify contents in metastore database, login to mysql and run the following commands (ensure that you are logged on to the node that hosts metastore database) mysql> show grants for hive; +--------------------------------------------------------------------------------------------------------------+ | Grants for hive@% | +--------------------------------------------------------------------------------------------------------------+ | GRANT ALL PRIVILEGES ON *.* TO 'hive'@'%' IDENTIFIED BY PASSWORD '*7ACE763ED393514FE0C162B93996ECD195FFC4F5' | | GRANT ALL PRIVILEGES ON `hive`.* TO 'hive'@'%' | +--------------------------------------------------------------------------------------------------------------+ 2 rows in set (0.02 sec) mysql> select user,host from user; +------+--------------------+ | user | host | +------+--------------------+ | hive | % | | root | 127.0.0.1 | | root | localhost | | root | xlautomation-2.h.c | +------+--------------------+ 4 rows in set (0.00 sec) All the operations mentioned above was performed on a kerberized cluster hive --service metatool -updateLocation did not succeed in updating the location, it is successful when changing the namenode uri to HA short name configuration For any external tables whose locations are different, it should ideally not affect its access. Copy output of "hdfs dfs -ls -R /apps/hive/warehouse/dummy.db" to ensure that you have a copy of the permissions before getting rid of the directory.

tsk · ‎06-24-2018

Nice Article.

theyaa · ‎05-10-2017

As simple as adding quotes!

srai1 · ‎05-04-2017

GOAL To change the default log location for HUE from /var/log/hue to elsewhere. Steps Default HUE service writes the log files to /var/log/hue directory. This is somehow enforced in the code which states this The ``log_dir`` will replace the %LOG_DIR% in log.conf. If not specified, we look for the DESTKOP_LOG_DIR environment variable, and then default to the DEFAULT_LOG_DIR. However, trying to set either variables i.e. DESKTOP_LOG_DIR and DEFAULT_LOG_DIR don't seem to work. We can however change the %LOG_DIR% with an absolute path within /etc/hue/conf/log.conf. Here is an output of the log.conf replaced with absolute path for the log directory. #args=('%LOG_DIR%/access.log', 'a', 1000000, 3) args=('/opt/log/hue/access.log', 'a', 1000000, 3) -- #args=('%LOG_DIR%/error.log', 'a', 1000000, 3) args=('/opt/log/hue/error.log', 'a', 1000000, 3) -- #args=('%LOG_DIR%/%PROC_NAME%.log', 'a', 1000000, 3) args=('/opt/log/hue/%PROC_NAME%.log', 'a', 1000000, 3) -- #args=('%LOG_DIR%/shell_output.log', 'a', 1000000, 3) args=('/opt/log/hue/shell_output.log', 'a', 1000000, 3) -- #args=('%LOG_DIR%/shell_input.log', 'a', 1000000, 3) args=('/opt/log/hue/shell_input.log', 'a', 1000000, 3) NOTE: Do ensure that the new log directory exists and has hue as user and group. Post the changes, a restart of hue service should be good enough to route the new logs in the new location.

srai1 · ‎05-04-2017

Error Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 110 at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.planReadPartialDataStreams(RecordReaderImpl.java:914) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readPartialDataStreams(RecordReaderImpl.java:958) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:793) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:986) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1019) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.<init>(RecordReaderImpl.java:205) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:598) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rows(ReaderImpl.java:585) at org.apache.hadoop.hive.ql.io.orc.FileDump.printMetaDataImpl(FileDump.java:291) at org.apache.hadoop.hive.ql.io.orc.FileDump.printMetaData(FileDump.java:261) at org.apache.hadoop.hive.ql.io.orc.FileDump.main(FileDump.java:127) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Cause The issue is related to lack of enhancement in the code RecordReaderImpl.java in 2.3.4 and lower versions while reading stream of data. The issue lies with the "includedColumns[column]" check wherein the size of output exceeds the size of array variable, around lines 916. Resolution This issue was fixed in 2.4 and above versions. Its possible that an intermediate fix might be available within one of the versions higher than 2.3.4.7. Its safer to upgrade to 2.4.x or better 2.6.

srai1 · ‎04-14-2017

Steps to Create Table in Hive on S3A with Ranger Create a bucket with a unique name, I've used "myhivebucket" and do not change any details in the permissions Complete the "Create bucket" wizard by clicking on "create bucket" button Make the following entries in custom hdfs-site.xml fs.s3a.access.key = <access key> fs.s3a.secret.key = <access secret> fs.s3a.impl = org.apache.hadoop.fs.s3a.S3AFileSystem To retrieve the value for access key and secret, follow these steps: Login to https://aws.amazon.com/console Click on "Sign in to the console" tab Login with appropriate credentials Once logged in, you should see your login name on the top right corner of the AWS page Click on the drop-down arrow beside your login name and select "My Security Credentials" This should take you to a page titled "Your Security Credentials" From this page, collapse the option that says "Access Keys(Access Key ID and Secret Access Key)" You have to click on "create a new access key" because of Amazon limitation described here This lets you download the key/secret in this format (this is not case sensitive) AWSAccessKeyId=XXXXXXXXXXXXXXXXXXXXX AWSSecretKey=XXXXXxxxxxXXXXXxxxxxXXXXX/xxxxx Value for "fs.s3a.access.key" will be the value for "AWSAccessKeyId" Value for "fs.s3a.secret.key" will be the value for "AWSSecretKey" Login to ranger admin interface and create a policy for hive/desired user to allow the desired permissions Now login to hive with the kerberos credentials, as required via beeline and create table ensuring that the location is on s3a [hive@xlnode-standalone ~]$ beeline -u "jdbc:hive2://xlnode-standalone.hwx.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2" WARNING: Use "yarn jar" to launch YARN applications. Connecting to jdbc:hive2://xlnode-standalone.hwx.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2 Connected to: Apache Hive (version 1.2.1000.2.4.3.0-227) Driver: Hive JDBC (version 1.2.1000.2.4.3.0-227) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 1.2.1000.2.4.3.0-227 by Apache Hive 0: jdbc:hive2://xlnode-standalone.hwx.com:218> create table mys3test (col1 int, col2 string) row format delimited fields terminated by ',' stored as textfile location 's3a://myhivebucket/test'; No rows affected (12.04 seconds) 0: jdbc:hive2://xlnode-standalone.hwx.com:218> Now try and insert some rows 0: jdbc:hive2://xlnode-standalone.hwx.com:218> insert into mys3test values (1,'test'),(2,'test'); Error: Error while compiling statement: FAILED: HiveAccessControlException Permission denied: user [hive] does not have [UPDATE] privilege on [default/mys3test] (state=42000,code=40000) 0: jdbc:hive2://xlnode-standalone.hwx.com:218> The above error is intentional, since we do not have "UPDATE" privilege assigned via ranger, we cannot insert the values yet, allow the permission and INSERT again Validate INSERT/UPDATE and SELECT 0: jdbc:hive2://xlnode-standalone.hwx.com:218> insert into mys3test values (1,'test'),(2,'test'); INFO : Tez session hasn't been created yet. Opening session INFO : Dag name: insert into mys3test ...1,'test'),(2,'test')(Stage-1) INFO : INFO : Status: Running (Executing on YARN cluster with App id application_1492107639289_0002) INFO : Map 1: -/- INFO : Map 1: 0/1 INFO : Map 1: 0(+1)/1 INFO : Map 1: 0(+1)/1 INFO : Map 1: 0(+1)/1 INFO : Map 1: 0(+1)/1 INFO : Map 1: 0/1 INFO : Map 1: 1/1 INFO : Loading data to table default.mys3test from s3a://myhivebucket/test/.hive-staging_hive_2017-04-13_19-27-13_226_6105571528298793138-1/-ext-10000 INFO : Table default.mys3test stats: [numFiles=1, numRows=2, totalSize=14, rawDataSize=12] No rows affected (53.854 seconds) 0: jdbc:hive2://xlnode-standalone.hwx.com:218> select * from mys3test; +----------------+----------------+--+ | mys3test.col1 | mys3test.col2 | +----------------+----------------+--+ | 1 | test | | 2 | test | +----------------+----------------+--+ 2 rows selected (3.554 seconds) 0: jdbc:hive2://xlnode-standalone.hwx.com:218>

Vikas1 · ‎02-15-2018

I had same issue on rhel 7 with below error Error: Execution of '/usr/bin/yum -d 0 -e 0 -y install hadoop_2_6_3_0_235-hdfs' returned 1. Error: Package: hadoop_2_6_3_0_235-hdfs-2.7.3.2.6.3.0-235.x86_64 (HDP-2.6-repo-101) Requires: libtirpc-devel You could try using --skip-broken to work around the problem Solution : Please check the Red Hat Enterprise Linux Server 7 Optional (RPMs) enabled on all nodes with below command # yum repolist all (To check enabled or disabled) !rhui-REGION-rhel-server-optional/7Server/x86_64 Red Hat Enterprise Linux Server 7 Optional (RPMs) Disabled: #yum-config-manager --enable rhui-REGION-rhel-server-optional ( enabling the optional rpms) Cross verify with first command to get it optional rpms enabled # yum repolist all !rhui-REGION-rhel-server-optional/7Server/x86_64 Red Hat Enterprise Linux Server 7 Optional (RPMs) enabled: 13,201

srai1 · ‎03-14-2017

GOALS Configure Ranger + Ranger KMS Create an encryption Zone OS used is CentOS/Redhat 6.6 At the end, should be able to create an encryption zone and validate using Hive NOTE: This article is in a walkthrough mode wherein snapshots were taken from relevant screens to create a step-by-step guide on installing Ranger and Ranger KMS along with creation and validation of encrypted zones in Hadoop. https://www.scribd.com/presentation/341887712/HDFS-Encryption-Zone-Hive-Orig

Online	Offline
Last Visited	‎10-05-2018 02:57 PM

Member Since	‎05-10-2016 03:24 AM
Last Visited	‎10-05-2018 02:57 PM
Posts	184
Kudos received	60

Cloudera Community

Re: Anonymous user requests to access on Hive HDFS...

Re: How to change the default logs path of HUE?

Re: While loading the data from external hive tabl...

Re: Hive Security using Apache Ranger

Re: File View Error: Unauthorized connection for s...

Hive Locks with ACID Enabled

Creating View for Hive Interactive

Hive: Changing Database Location

Re: Hive Statistics: Why Useful

Re: Ambari REST API list components with Stale con...

Changing Log Location for HUE

ArrayIndexOutOfBoundsException [at org.apache.hado...

Creating Hive Table on S3A bucket + Ranger

Re: Ambari 2.5 deploy of hdp 2.6 faiils on RHEL 7 ...

Walkthrough: Creating Encryption Zone in HDFS and ...