About srai1

srai1 · ‎01-19-2018

@Rodrigo Mendez what is the final result ? you need the files from user(s) directories to be removed ? Why do you want to perform this action from hive, you could easily do so by scripting (if you have multiple directories) from hdfs command.

srai1 · ‎09-27-2017

Goal Create a new Ambari view for Hive Interactive. Use this link to get detailed information on configuring views. https://docs.hortonworks.com/HDPDocuments/Ambari-2.5.1.0/bk_ambari-views/content/settings_and_cluster_configuration.html Steps Navigate to Ambari page with admin privileges and click on the username dropdown icon Select the views link to explore all the views available in Ambari Collapse the "Hive" dropdown and click on "Create Instance" to create a new view for LLAP/Interactive Give the name of this instance per your requirement Ensure that under "Settings" tab, "User Interactive Mode" is set to true If the cluster is kerberized, use proper auth method and principal name Also update the proper JDBC URL with principal name NOTE: If ranger is enabled, ensure that the user trying to access the database objects does have the permissions to browse the contents of the database(s).

srai1 · ‎08-03-2017

Goal: Demonstrate how to change the database location in HDFS and Metastore There are circumstances wherein we can consider moving the database location. By default, the location for default and custom databases is defined within the value of hive.metastore.warehouse.dir, which is /apps/hive/warehouse. Here are the illustrated steps to change a custom database location, for instance "dummy.db", along with the contents of the database. Verify the details of the database we would like to move to a new location [hive@xlautomation-2 ~]$ beeline -u "jdbc:hive2://xlautomation-2.h.c:10000/default;principal=hive/xlautomation-2.h.c@H.C" 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> create database dummy; No rows affected (0.394 seconds) 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> describe database dummy; +----------+----------+--------------------------------------------------------------+-------------+-------------+-------------+--+ | db_name | comment | location | owner_name | owner_type | parameters | +----------+----------+--------------------------------------------------------------+-------------+-------------+-------------+--+ | dummy | | hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/dummy.db | hive | USER | | +----------+----------+--------------------------------------------------------------+-------------+-------------+-------------+--+ 1 row selected (0.561 seconds) NOTE: The example provides the database location i.e. /apps/hive/warehouse/dummy.db which needs to be updated. Verified the same using dummy table to test whether the location update was indeed successful 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> create table dummy.test123 (col1 string, col2 string) row format delimited fields terminated by ',' stored as textfile; No rows affected (0.691 seconds) 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> insert into dummy.test123 values (1,1),(2,2),(3,3),(4,4),(5,5),(6,6); INFO : Session is already open INFO : Dag name: insert into dummy.tes...3),(4,4),(5,5),(6,6)(Stage-1) INFO : Status: Running (Executing on YARN cluster with App id application_1499274604190_0034) INFO : Loading data to table dummy.test123 from hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/dummy.db/test123/.hive-staging_hive_2017-08-03_16-20-11_965_647196527379814552-1/-ext-10000 INFO : Table dummy.test123 stats: [numFiles=1, numRows=6, totalSize=24, rawDataSize=18] No rows affected (2.47 seconds) 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> select * from dummy.test123; +---------------+---------------+--+ | test123.col1 | test123.col2 | +---------------+---------------+--+ | 1 | 1 | | 2 | 2 | | 3 | 3 | | 4 | 4 | | 5 | 5 | | 6 | 6 | +---------------+---------------+--+ 6 rows selected (0.451 seconds) Create a new storage DIR of our choice (we used newdummy.db) and replicate the permission at the directory level. [hive@xlautomation-2 ~]$ hdfs dfs -mkdir -p /apps/hive/warehouse/newdummy.db [hive@xlautomation-2 ~]$ hdfs dfs -mkdir -p /apps/hive/warehouse/newdummy.db [hive@xlautomation-2 ~]$ hdfs dfs -chmod 777 /apps/hive/warehouse/newdummy.db Verify if the DB (dir) level permissions are the same [hive@xlautomation-2 ~]$ hdfs dfs -ls /apps/hive/warehouse | egrep dummy.db drwxrwxrwx - hive hdfs 0 2017-08-03 16:19 /apps/hive/warehouse/dummy.db drwxrwxrwx - hive hdfs 0 2017-08-03 16:27 /apps/hive/warehouse/newdummy.db Copy all the underlying contents from /apps/hive/warehouse/dummy.db/ into the new directory [hive@xlautomation-2 ~]$ hdfs dfs -cp -f -p /apps/hive/warehouse/dummy.db/* /apps/hive/warehouse/newdummy.db/ Caution: The usage of "cp" with "p" to preserve the permission is prone to the following error cp: Access time for hdfs is not configured. Please set dfs.namenode.accesstime.precision configuration parameter. This is because the value of dfs.namenode.accesstime.precision is set to 0 by default, in hortonworks HDP distribution. Since this is a client level configuration, it can be configured in hdfs-site.xml on a non-ambari managed cluster in client i.e., from 0 to 3600000. We can verify this at the client level by running the following command. [hive@xlautomation-2 ~]$ hdfs getconf -confKey dfs.namenode.accesstime.precision 3600000 Once the change is made, copy the contents of database folder /dummy.db/* to the new location i.e., /newdummy.db/ as HDFS user. We are overwriting (-f) any existing files within new directory and (-p) preserving the permissions [hdfs@xlautomation-2 ~]$ hdfs dfs -cp -f -p /apps/hive/warehouse/dummy.db/* /apps/hive/warehouse/newdummy.db/ Check the permissions once the copy is completed [hdfs@xlautomation-2 ~]$ hdfs dfs -ls /apps/hive/warehouse/dummy.db/ Found 1 items drwxrwxrwx - hive hdfs 0 2017-08-03 16:20 /apps/hive/warehouse/dummy.db/test123 [hdfs@xlautomation-2 ~]$ [hdfs@xlautomation-2 ~]$ [hdfs@xlautomation-2 ~]$ hdfs dfs -ls /apps/hive/warehouse/newdummy.db/ Found 1 items drwxrwxrwx - hive hdfs 0 2017-08-03 16:20 /apps/hive/warehouse/newdummy.db/test123 With the privileged user access to metastore db (hive in our case) we may need to update three tables i.e., DBS, SDS and FUNC_RU as they log the locations for database, table and function in that order. In our example, since we do not have any functions, we will just update SDS and DBS tables mysql> update SDS set location= replace(location,'hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/dummy.db','hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/newdummy.db') where location like '%dummy.db%'; Query OK, 3 rows affected (0.53 sec) Rows matched: 3 Changed: 3 Warnings: 0 mysql> update DBS set db_location_uri= replace(db_location_uri,'hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/dummy.db','hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/newdummy.db') where db_location_uri like '%dummy.db%'; Query OK, 1 row affected (0.06 sec) Rows matched: 1 Changed: 1 Warnings: 0 NOTE: If you want to try and run this before committing the changes in metastore, use begin; before and end; after your UPDATE statements. This update statement will replace all the occurrences of specified string within DBS and SDS tables. Check if the changes made to the tables were permanent, the location should be updated to */newdummy.db 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> describe database dummy; +----------+----------+-----------------------------------------------------------------+-------------+-------------+-------------+--+ | db_name | comment | location | owner_name | owner_type | parameters | +----------+----------+-----------------------------------------------------------------+-------------+-------------+-------------+--+ | dummy | | hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/newdummy.db | hive | USER | | +----------+----------+-----------------------------------------------------------------+-------------+-------------+-------------+--+ 1 row selected (0.444 seconds) Verify the data from the table and also confirm its location 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> describe formatted dummy.test123; +-------------------------------+-------------------------------------------------------------------------+-----------------------------+--+ | col_name | data_type | comment | +-------------------------------+-------------------------------------------------------------------------+-----------------------------+--+ | # col_name | data_type | comment | | | NULL | NULL | | col1 | string | | | col2 | string | | | | NULL | NULL | | # Detailed Table Information | NULL | NULL | | Database: | dummy | NULL | | Owner: | hive | NULL | | CreateTime: | Thu Aug 03 16:19:33 UTC 2017 | NULL | | LastAccessTime: | UNKNOWN | NULL | | Protect Mode: | None | NULL | | Retention: | 0 | NULL | | Location: | hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/newdummy.db/test123 | NULL | | Table Type: | MANAGED_TABLE | NULL | | Table Parameters: | NULL | NULL | | | COLUMN_STATS_ACCURATE | {\"BASIC_STATS\":\"true\"} | | | numFiles | 1 | | | numRows | 6 | | | rawDataSize | 18 | | | totalSize | 24 | | | transient_lastDdlTime | 1501777214 | | | NULL | NULL | | # Storage Information | NULL | NULL | | SerDe Library: | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL | | InputFormat: | org.apache.hadoop.mapred.TextInputFormat | NULL | | OutputFormat: | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL | | Compressed: | No | NULL | | Num Buckets: | -1 | NULL | | Bucket Columns: | [] | NULL | | Sort Columns: | [] | NULL | | Storage Desc Params: | NULL | NULL | | | field.delim | , | | | serialization.format | , | +-------------------------------+-------------------------------------------------------------------------+-----------------------------+--+ 33 rows selected (0.362 seconds) 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> select * from dummy.test123; +---------------+---------------+--+ | test123.col1 | test123.col2 | +---------------+---------------+--+ | 1 | 1 | | 2 | 2 | | 3 | 3 | | 4 | 4 | | 5 | 5 | | 6 | 6 | +---------------+---------------+--+ 6 rows selected (0.275 seconds) 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> Considerations Remove the old database directory only when you are sure the tables are readable To check if hive or other privileged user has access to modify contents in metastore database, login to mysql and run the following commands (ensure that you are logged on to the node that hosts metastore database) mysql> show grants for hive; +--------------------------------------------------------------------------------------------------------------+ | Grants for hive@% | +--------------------------------------------------------------------------------------------------------------+ | GRANT ALL PRIVILEGES ON *.* TO 'hive'@'%' IDENTIFIED BY PASSWORD '*7ACE763ED393514FE0C162B93996ECD195FFC4F5' | | GRANT ALL PRIVILEGES ON `hive`.* TO 'hive'@'%' | +--------------------------------------------------------------------------------------------------------------+ 2 rows in set (0.02 sec) mysql> select user,host from user; +------+--------------------+ | user | host | +------+--------------------+ | hive | % | | root | 127.0.0.1 | | root | localhost | | root | xlautomation-2.h.c | +------+--------------------+ 4 rows in set (0.00 sec) All the operations mentioned above was performed on a kerberized cluster hive --service metatool -updateLocation did not succeed in updating the location, it is successful when changing the namenode uri to HA short name configuration For any external tables whose locations are different, it should ideally not affect its access. Copy output of "hdfs dfs -ls -R /apps/hive/warehouse/dummy.db" to ensure that you have a copy of the permissions before getting rid of the directory.

srai1 · ‎08-02-2017

Goal: Understand why statistics are useful in hive Table with stats 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> select count(*) from mytable; +---------+--+ | _c0 | +---------+--+ | 843280 | +---------+--+ 1 row selected (0.332 seconds) 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> Here the data is available in metastore so there is no need to launch map tasks to gather how many rows are there in the table for the query. Look-ups are faster as accessing data from metastore is faster any day compared to launching map tasks. Without column statistics 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> select count(col1) from abcd; INFO : Session is already open INFO : Dag name: select count(col1) from abcd(Stage-1) INFO : Status: Running (Executing on YARN cluster with App id application_1499274604190_0031) +-------+--+ | _c0 | +-------+--+ | 1000 | +-------+--+ 1 row selected (2.109 seconds) << Takes about 2 seconds After update 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> analyze table abcd compute statistics for columns col1; INFO : Session is already open INFO : Dag name: analyze table abcd compute statistics...col1(Stage-0) INFO : Status: Running (Executing on YARN cluster with App id application_1499274604190_0031) No rows affected (2.61 seconds) 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> select count(col1) from abcd; +-------+--+ | _c0 | +-------+--+ | 1000 | +-------+--+ 1 row selected (0.344 seconds) <<< Runs within 1/3 of a second 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> When to run ANALYZE or gather statistics If the variation in the data is too much, 30% or more (depends on what is acceptable based on runtimes), we can choose to run ANALYZE. In this example, the change in the dataset is almost 200% 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> insert into abcd select * from mytable where col1 > 5000 limit 2000; INFO : Session is already open INFO : Dag name: insert into abcd select * from mytabl...2000(Stage-1) INFO : Status: Running (Executing on YARN cluster with App id application_1499274604190_0031) -------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- Map 1 .......... SUCCEEDED 6 6 0 0 0 0 Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0 -------------------------------------------------------------------------------- VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 16.49 s -------------------------------------------------------------------------------- INFO : Loading data to table default.abcd from hdfs://xlautomation-1.h.c:8020/apps/hive/warehouse/abcd/.hive-staging_hive_2017-08-02_20-39-39_024_4775642732421672051-1/-ext-10000 INFO : Table default.abcd stats: [numFiles=1, numRows=1000, totalSize=128640, rawDataSize=127640] No rows affected (17.647 seconds) Running the query post this variation in data, the runtimes are impacted for the same query which ran faster before 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> select count(col1) from abcd; INFO : Session is already open INFO : Dag name: select count(col1) from abcd(Stage-1) INFO : Status: Running (Executing on YARN cluster with App id application_1499274604190_0031) +-------+--+ | _c0 | +-------+--+ | 1000 | +-------+--+ 1 row selected (3.284 seconds) <<<<< Time increased Lets update the statistics 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> analyze table abcd compute statistics for columns col1; INFO : Session is already open INFO : Dag name: analyze table abcd compute statistics...col1(Stage-0) INFO : Status: Running (Executing on YARN cluster with App id application_1499274604190_0031) No rows affected (3.374 seconds) 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> select count(col1) from abcd; +-------+--+ | _c0 | +-------+--+ | 1000 | +-------+--+ 1 row selected (0.346 seconds) <<<<<< Back to almost quarter of a second to fetch the same data from metastore. Which column to pick It is not necessary to gather statistics on all the columns, we can choose to consider the columns which are being used in queries. We can verify if we are collecting stats for a column even by looking at the explain plan 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> explain select count(col2) from abcd; +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+ | Explain | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+ | Plan not optimized by CBO. | | | | Vertex dependency in root stage | | Reducer 2 <- Map 1 (SIMPLE_EDGE) | | | | Stage-0 | | Fetch Operator | | limit:-1 | | Stage-1 | | Reducer 2 | | File Output Operator [FS_192] | | compressed:false | | Statistics:Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE | | table:{"input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"} | | Group By Operator [GBY_190] | | | aggregations:["count(VALUE._col0)"] | | | outputColumnNames:["_col0"] | | | Statistics:Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE | | |<-Map 1 [SIMPLE_EDGE] | | Reduce Output Operator [RS_189] | | sort order: | | Statistics:Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE | | value expressions:_col0 (type: bigint) | | Group By Operator [GBY_188] | | aggregations:["count(col2)"] | | outputColumnNames:["_col0"] | | Statistics:Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE | | Select Operator [SEL_187] | | outputColumnNames:["col2"] | | Statistics:Num rows: 1000 Data size: 127640 Basic stats: COMPLETE Column stats: NONE | | TableScan [TS_186] | | alias:abcd | | Statistics:Num rows: 1000 Data size: 127640 Basic stats: COMPLETE Column stats: NONE | | | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+ 34 rows selected (0.384 seconds) Once the stats are gathered, the plan is simplified: 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> explain select count(col2) from abcd; +-----------------------------+--+ | Explain | +-----------------------------+--+ | Plan not optimized by CBO. | | | | Stage-0 | | Fetch Operator | | limit:1 | | | +-----------------------------+--+ 6 rows selected (0.306 seconds) Considerations for Statistics Is enabled by default, can be verified using 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> set hive.stats.autogather; +------------------------------+--+ | set | +------------------------------+--+ | hive.stats.autogather=true | +------------------------------+--+ 1 row selected (0.03 seconds) Stats can be manually gathered using ANALYZE for both table and column levels (one, more or all) 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> analyze table zzzz compute statistics; INFO : Session is already open INFO : Dag name: analyze table zzzz compute statistics(Stage-0) INFO : Tez session was closed. Reopening... INFO : Session re-established. INFO : Status: Running (Executing on YARN cluster with App id application_1499274604190_0033) INFO : Table default.zzzz stats: [numFiles=1, numRows=1000, totalSize=128640, rawDataSize=127640] -------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- Map 1 .......... SUCCEEDED 1 1 0 0 0 0 -------------------------------------------------------------------------------- VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 2.07 s -------------------------------------------------------------------------------- No rows affected (21.615 seconds) 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> analyze table zzzz compute statistics for columns; INFO : Session is already open INFO : Dag name: analyze table zzzz compute statist...columns(Stage-0) INFO : Status: Running (Executing on YARN cluster with App id application_1499274604190_0033) No rows affected (4.626 seconds) 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> analyze table zzzz compute statistics for columns col1; INFO : Session is already open INFO : Dag name: analyze table zzzz compute statistics...col1(Stage-0) INFO : Status: Running (Executing on YARN cluster with App id application_1499274604190_0033) No rows affected (3.299 seconds) 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> Can be gathered for specific partitions and partition columns ANALZYE TABLE zzz PARTITION (idate=2017-07-29) COMPUTE STATISTICS Other parameters include NOSCAN/CACHE METADATA, where when NOSCAN is specified only the number of physical files and their bytes are gathered for statistics. CACHE METADATA is relevant when hbase is being used to store the temporary metadata. 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> ANALYZE TABLE zzzz compute statistics NOSCAN; INFO : Table default.zzzz stats: [numFiles=1, numRows=1000, totalSize=128640, rawDataSize=127640] No rows affected (0.455 seconds) 0: jdbc:hive2://xlautomation-2.h.c:10000/defa> Reference https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-StatisticsinHive

srai1 · ‎06-01-2017

@kotesh banoth You also need to review the application log from yarn for this job and see if that gives you any indication. Also, change the hive.execution.engine=mr at the session level and re-run your query to see if it succeeds or fails at the same reducer2.

srai1 · ‎05-17-2017

@John Glorioso Can you try ANALYZE TABLE test_table PARTITION('2016-12-30') COMPUTE STATISTICS; and check the stats again ?

srai1 · ‎05-09-2017

@Theyaa Matti Tried this with Ambari 2.2, of course I am checking for non-stale configs here, but it does show the service_name. You should upgrade your Ambari version, its not a major version. http://172.26.102.21:8080/api/v1/clusters/hdp234/host_components?HostRoles/stale_configs=false&fields=HostRoles/service_name Output { "href" : "http://172.26.102.21:8080/api/v1/clusters/hdp234/host_components?HostRoles/stale_configs=false&fields=HostRoles/service_name", "items" : [ { "href" : "http://172.26.102.21:8080/api/v1/clusters/hdp234/hosts/xlnode-234.h.c/host_components/APP_TIMELINE_SERVER", "HostRoles" : { "cluster_name" : "hdp234", "component_name" : "APP_TIMELINE_SERVER", "host_name" : "xlnode-234.h.c", "service_name" : "YARN", "stale_configs" : false }, "host" : { "href" : "http://172.26.102.21:8080/api/v1/clusters/hdp234/hosts/xlnode-234.h.c" } }, { "href" : "http://172.26.102.21:8080/api/v1/clusters/hdp234/hosts/xlnode-234.h.c/host_components/DATANODE", "HostRoles" : { "cluster_name" : "hdp234", "component_name" : "DATANODE", "host_name" : "xlnode-234.h.c", "service_name" : "HDFS", "stale_configs" : false }, "host" : { "href" : "http://172.26.102.21:8080/api/v1/clusters/hdp234/hosts/xlnode-234.h.c" } }, .... ... ..

srai1 · ‎05-09-2017

@Theyaa Matti Can you try this http://172.26.100.103:8080/api/v1/clusters/hdp253/host_components?HostRoles/stale_configs=false&fields=HostRoles/service_name see if you are able to list all service names, if yes, then change the value for "stale_configs=true" and rerun it like this http://172.26.100.103:8080/api/v1/clusters/hdp253/host_components?HostRoles/stale_configs=true&fields=HostRoles/service_name I am using HDP 2.6 and Ambari 2.5, works for me. Here is the sample output: { "href" : "http://172.26.100.103:8080/api/v1/clusters/hdp253/host_components?HostRoles/stale_configs=false&fields=HostRoles/service_name", "items" : [ { "href" : "http://172.26.100.103:8080/api/v1/clusters/hdp253/hosts/xlnode-1.h.c/host_components/ACTIVITY_ANALYZER", "HostRoles" : { "cluster_name" : "hdp253", "component_name" : "ACTIVITY_ANALYZER", "host_name" : "xlnode-1.h.c", "service_name" : "SMARTSENSE", "stale_configs" : false }, "host" : { "href" : "http://172.26.100.103:8080/api/v1/clusters/hdp253/hosts/xlnode-1.h.c" } },

srai1 · ‎05-06-2017

@Smart Data This link talks about anonymous user: https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2. Like @mqureshi stated, if the value for hive.server2.authentication is set to NONE then you'd see the anonymous user come into play. Also, since you are using ranger, I'd guess that the whole point is to ensure that you have more fine grained control i.e., for each user so impersonation should also be enabled.

srai1 · ‎05-06-2017

@Nhan Nguyen This seems more like a combination of encoding of the source/input file. So like @Jay SenSharma mention 0: jdbc:hive2://xlnode-2.h.c:2181,xlnode-3.h.> select * from abc_orc; +---------------+--+ | abc_orc.col1 | +---------------+--+ | Env�gen | +---------------+--+ We can check the file format of this file -bash-4.1$ hdfs dfs -get /apps/hive/warehouse/abc/000000_0 . -bash-4.1$ file 000000_0 000000_0: ISO-8859 text and like @Umair Khan stated, if we convert the encode, we can see the file accordingly -bash-4.1$ iconv -f ISO-8859-1 -t UTF-8//TRANSLIT 000000_0 -o 000000_1 -bash-4.1$ file 000000_1 000000_1: UTF-8 Unicode text -bash-4.1$ -bash-4.1$ -bash-4.1$ hdfs dfs -put 000000_1 /apps/hive/warehouse/abc/ -bash-4.1$ beeline -u "jdbc:hive2://xlnode-2.h.c:2181,xlnode-3.h.c:2181,xlnode-1.h.c:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2" -n hive -p '' Connecting to jdbc:hive2://xlnode-2.h.c:2181,xlnode-3.h.c:2181,xlnode-1.h.c:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2 Connected to: Apache Hive (version 1.2.1000.2.6.0.3-8) Driver: Hive JDBC (version 1.2.1000.2.6.0.3-8) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 1.2.1000.2.6.0.3-8 by Apache Hive 0: jdbc:hive2://xlnode-2.h.c:2181,xlnode-3.h.> select * from abc; +-----------+--+ | abc.col1 | +-----------+--+ | Env�gen | | Envägen | +-----------+--+ 2 rows selected (0.26 seconds) 0: jdbc:hive2://xlnode-2.h.c:2181,xlnode-3.h.> Can you try using a different browser, or if you are using chrome, can enable supporting all the encodings !! see if that works

Online	Offline
Last Visited	‎10-05-2018 02:57 PM

Member Since	‎05-10-2016 03:24 AM
Last Visited	‎10-05-2018 02:57 PM
Posts	184
Kudos received	60

Cloudera Community

Re: Anonymous user requests to access on Hive HDFS...

Re: How to change the default logs path of HUE?

Re: While loading the data from external hive tabl...

Re: Hive Security using Apache Ranger

Re: File View Error: Unauthorized connection for s...

Re: How delete files created in user directory wit...

Creating View for Hive Interactive

Hive: Changing Database Location

Hive Statistics: Why Useful

Re: Vertex did not succeed due to OWN_TASK_FAILURE...

Re: Cannot generate stats for partitioned Hive tab...

Re: Ambari REST API list components with Stale con...

Re: Ambari REST API list components with Stale con...

Re: Anonymous user requests to access on Hive HDFS...

Re: Hive-View does not display correctly UTF-8 cha...