About saranvisa

saranvisa · ‎01-19-2017

@ski309 This is nothing to do with Impala, If I am correct, The query "create table test as select 1" will not work in any DB (at least 95% of DB), because the query "select 1" will return the data & column name as '1'. But this is not valid column name create table test (1 int); --This is invalid column name Also I put the data type 'int' on my own, but "select 1" will not return any datatype. As everyone know, "Column name" and "data type" are mandatory to create any table. But "Select 1" will neither return valid ColumnName nor datatype But the below query will work, because it will get the column name and datatype from the base table create table db.table2 as select * from db.table1 Hope this will help you!! Thanks Kumar

saranvisa · ‎01-17-2017

@bgooley Increasing the "Java Heap Size of Navigator Metadata Server in Bytes" is fixing the "NAVIGATORMETASERVER_SCM_HEALTH has become bad" issue. But getting the same issue after a month. Pls find below the log that we are maintaining internally about the Java Heap size increment. 09/06/16 - changed Java Heap Size of Navigator Metadata Server in Bytes from 1 GiB to 2 GiB due to NAVIGATORMETASERVER_SCM_HEALTH bad health 10/18/16 - changed Java Heap Size of Navigator Metadata Server in Bytes from 2 GiB to 3 GiB due to NAVIGATORMETASERVER_SCM_HEALTH bad health 12/01/16 - changed Java Heap Size of Navigator Metadata Server in Bytes from 3 GiB to 4 GiB due to NAVIGATORMETASERVER_SCM_HEALTH bad health 01/17/17 - changed Java Heap Size of Navigator Metadata Server in Bytes from 4 GiB to 5 GiB due to NAVIGATORMETASERVER_SCM_HEALTH bad health So my question is, 1. What would be the maximum Java Heap Size? I know it is based on our configuration but Is there any chart to define/identify the max, so that I will make sure to not increase more than the recommendation. Because this is our prod and I don't want to break anything else just by Keep increasing Java Heap Size

saranvisa · ‎01-16-2017

@MasterOfPuppets Complex query can be tuned but applying count(*) query on hive table with 4 million records returning result in 15 seconds is not an issue from Hive point of view Still if you need quick result, you have to login to impala-shell instead of Hive and run your query. But pls be aware that impala will use more memory

saranvisa · ‎01-14-2017

@MasterOfPuppets Follow the below points one by one 1. As I mentioned already, if you change the parameter temporarily via Hive CLI/beeline, just exit from hive and login back, so it will set back to your original status now. Run the query again, confirm the issue that you are getting due to parameter change 2. As I mentioned already, You can change the property "as needed", meaning... I don't know your memory capacity, In my example, i've given 5120 mb (or) 5 GB... but you have to alter the numbers based on your memory capacity. Check your memory capacity at CM -> Hosts (menu) -> Get memory capacity for each node 2.1. To make it more easier, get the current memory allocation for Map & Reduce by : Go to CM -> Yarn -> Configuration -> search for "memory.mb" Then increase little bit based on your memory capacity 3. Also the log you are getting is not an actual log... Get it from below steps Cloudera Manager -> Yarn -> Web UI (Menu)-> ResourceManager Web UI -> (It will open 8088 window) -> Click on Failed link (left) -> Click on Application/History link -> Get Diagnostics informations & Log If you still need assitance, Hide only confidential information and share the complete log and Diagnostics informations Thanks Kumar

saranvisa · ‎01-14-2017

@MasterOfPuppets There are so many methods to improve performance. In your statement, you have mentioned index enabled for ORC (hope you are referring to Row group/bloom filter, etc) 1. In addition to that, you can also create index on particular columns (On the col1, col2 that you have mentioned in your example) 2. Also You can change the property as needed. Note: I would recommand to set the below parameters temporarily in hive/beeline CLI before change permenantly in hive-site.xml/Cloudera Manager configuration set mapreduce.map.memory.mb=5120; set mapreduce.map.java.opts=-Xmx4g # Should be 80% of (mapreduce.map.memory.mb) set mapreduce.reduce.memory.mb=5120; set mapreduce.reduce.java.opts==-Xmx4g ; # Should be 80% of (mapreduce.reduce.memory.mb) Thanks Kumar

saranvisa · ‎01-12-2017

Since you have mentioned the word "user role", I want to clarify this You have to understand the difference between Group, User and Role Group and User to be created in both Linux(root user) and Hue(as admin user) But Role to be created only in Hue Ex: Login as root in Linux and apply below commands. Group: groupadd hive; groupadd hue; groupadd impala; groupadd analyst; groupadd admin; # In your case, your Group suppose to be.. Auditor, Read-Only, Limited Operator, Operator, Configurator, Cluster Administrator ,BDR Administrator, Navigator Administrator, User Administrator, Key Administrator, Full Administrator User: useradd kumar; # User belongs to Group usermod -a -G hive,hue,impala,admin,analyst kumar; passwd kumar; # Role assigned to Group: Now, login to Hue -> Security(Menu)-> Sentry Tables -> Add Roles (as Hive user)

saranvisa · ‎01-12-2017

@cplusplus1 1. Login to Linux: Create required Group & User 2. Login to Hue: Either sync with LDAP or Create required Group & User manually. Note1: You have to login as "admin user" to manage user/group Note2: Make sure Linux Group & User exactly matches to Hue Group & user 3. Login to Hue: Create Roles for each DB/Tables by Hue -> Security(Menu)-> Sentry Tables -> Add Roles Note1: You have to login as "Hive user". Because CM -> Sentry -> Configuration -> Admin Groups -> Default values are Hive, Impala, Solr, Hue Thanks Kumar

saranvisa · ‎01-12-2017

@cplusplus1 You can get xml files in the below path... But I will not recommand you to update it directly, instead you can update your configuration using CM /var/run/cloudera-scm-agent/process/*-hive-HIVESERVER2 By default, Sentry requirs configuration changes in Hive, Imapal, YARN and Hue ( you can add addiontal services as needed and change configuration) Ex: You can follow this method CM -> Hive -> Configuration Select Scope > HiveServer2. Select Category > Main. Uncheck the HiveServer2 Enable Impersonation checkbox

saranvisa · ‎01-09-2017

We are getting the following error from YARN: NodeManager Health is bad: GC Duration: Average time spent in garbage collection was 45.2 second(s) (75.40%) per minute over the previous 5 minute(s). Critical threshold: 60.00%. Average time spent in garbage collection was 30.3 second(s) (50.45%) per minute over the previous 5 minute(s). Warning threshold: 30.00%. Below are my configuration: Currently we are using the default setting for CM -> Yarn -> Configuration -> Java Configuration Options for Node Manager -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled CM -> Yarn -> Configuration -> nodemanager_gc_duration_window 5 minute(s) CM -> Yarn -> Configuration -> nodemanager_gc_duration_thresholds Warning: 30.0 Critical: 60.0 I went through this link but it doesn't cover how to fix this issue https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cm_ht_nodemanager.html Below are my questions : 1. The environment was good for more than a year but getting issue now. why? Is it due to more usage? 2. Do we need to clear any old garbage from the environment to fix this issue? if so, how? 3. Do we need to change any configuration to fix this issue? if so, how? 4. Do we need to do both step 2 and step 3 by any chance?

saranvisa · ‎01-06-2017

FYI... Everything is fine with kadmin.local but kadmin is not working properly.. the same issue was raised by someone else in stackoverflow... I just followed the instruction.. The issue has been fixed now http://stackoverflow.com/questions/23779468/kerberos-kadmin-not-working-properly

Online	Offline
Last Visited	‎08-10-2019 05:12 PM

Member Since	‎09-02-2016 11:35 AM
Last Visited	‎08-10-2019 05:12 PM
Posts	523
Kudos received	96

Cloudera Community

Re: Promoting Metadata

Re: Mix on premise and cloud nodes

Re: impala-shell

Re: How do I see user usage stats by table in Impa...

Re: Replica Not FoundException

Re: CREATE TABLE AS SELECT returns error 'Failed t...

Re: NAVIGATORMETASERVER_SCM_HEALTH has become bad

Re: Hive Queries run slowly

Re: Hive Queries run slowly

Re: Hive Queries run slowly

Re: How to create the following user roles

Re: How to create the following user roles

Re: how to find the hive-sitexml and hdfs-site.xml...

NodeManager Health is bad: Issue due to garbage co...

Re: Getting error when add new service in Cluster ...