Member since
09-02-2016
523
Posts
89
Kudos Received
42
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1638 | 08-28-2018 02:00 AM | |
1247 | 07-31-2018 06:55 AM | |
3217 | 07-26-2018 03:02 AM | |
1369 | 07-19-2018 02:30 AM | |
3766 | 05-21-2018 03:42 AM |
01-25-2017
06:48 PM
@Fawze The answer for your 2nd question first, You have to implement apache sentry to restrict user specific query access on impala. Work with your Hadoop admin to setup it on your cluster now for the first (generic) question, obviously you have to pass the parameter from the source. Apache sentry has dependency on other security tool called kerberos which has a concept called keytab. so when you pass the user and password from the source, you also have to pass the keytab to authenticate your network. you can achieve your first requirement with this step I've shared some high level security information in the below link, hope this will give some idea https://community.cloudera.com/t5/Security-Apache-Sentry/Hadoop-Security-for-beginners/m-p/49876#M247 Thanks Kumar
... View more
01-25-2017
01:46 PM
1 Kudo
@MasterOfPuppets When you configure Hive, hive-site.xml will be updated with default property. So you can customize hive-site.xml by adding addional property. Instead of directly updating hive-site.xml, i would recommend you to update via CM. You can follow the below steps CM -> Hive -> Configuration -> Advanced Category -> search for snippet -> It will show you option to add additional property. Press + button, Name = "hive.prewarm.enabled" and value to be "true" to enable it Press ? button, to know more about each property Thanks Kumar
... View more
01-25-2017
08:23 AM
@cplusplus1 I've answered to a similar question already, you can refer to this link https://community.cloudera.com/t5/Cloudera-Manager-Installation/To-generate-reports-on-Impala-daemon-memory-usage-per-user/m-p/49752#M9285
... View more
01-24-2017
01:29 PM
1 Kudo
@HarishS Any DML changes will be reflected automatically. But you need to run the below replace command to reflect DDL changes in view CREATE OR REPLACE VIEW View_Name as query;
... View more
01-23-2017
08:46 AM
@Neelesh You don't need to install CM in two different nodes to enable HA. I am just trying to explain in very very very high level consider you have 5 nodes (before HA enabled)... Node1: Master (Cloudera Manager installed), NameNode, Datanode Node2: Master, Secondary namenode, Datanode Node3: Slave, Datanode Node4: Slave, Datanode Node5: Slave, Datanode 1. HA will NOT have secondary name node. So enable HA in node2 instead of Secondary namenode 2. So you will have two namenodes (in both Node1 & Node2) 3. But always only one name node is Active and other one is Standby. You can confirm this in CM -> HDFS -> Instances (One is active and other is Standby) 4. So that if any point in time, if active breaks, the standby become active Follow the link provided by @lhebert with above understanding.. Hope this will help you!! Thanks Kumar
... View more
01-20-2017
09:22 AM
@majeedk I am using Linux but never tried from windows. Still hope it may give some insights 1. Create a keytab (for the source login) in the environment where you have kerberos installed. And keep the keytab file somewhere like /home/user/.auth/example.keytab (Change the path to windows equalent) 2. Create a shell to call the kinit (change the shell, kinit command to windows equalent) kinit user@REALM -k -t /home/user/.auth/example.keytab 3. Create a cron job (or anyother job which suits for windows) to call the above shell in a frequent interval. Becuase by default, Kerberos ticket will expire in a week, so you need to run a job to kinit the process in a frequent interval Thanks Kumar
... View more
01-19-2017
07:47 AM
@dsss You can find all the Impala build in functions in the below link. I don't find any option for PIVOT (pls double chk) https://www.cloudera.com/documentation/enterprise/5-5-x/topics/impala_functions.html There are so many ways to transpose row to column/column to row using normal SQL. I would suggest you to follow that and create a UDF (User Defined Function) https://www.cloudera.com/documentation/enterprise/5-8-x/topics/impala_udf.html Below is the JIRA ticket created with apache to include PIVOT option in Hive, you can see the status & comments. Also some links provided in the comment section to manually transpose the column to row/row to column. https://issues.apache.org/jira/browse/HIVE-3776 Thanks Kumar
... View more
01-19-2017
07:19 AM
@ski309 This is nothing to do with Impala, If I am correct, The query "create table test as select 1" will not work in any DB (at least 95% of DB), because the query "select 1" will return the data & column name as '1'. But this is not valid column name create table test (1 int); --This is invalid column name Also I put the data type 'int' on my own, but "select 1" will not return any datatype. As everyone know, "Column name" and "data type" are mandatory to create any table. But "Select 1" will neither return valid ColumnName nor datatype But the below query will work, because it will get the column name and datatype from the base table create table db.table2 as select * from db.table1 Hope this will help you!! Thanks Kumar
... View more
01-17-2017
12:27 PM
@bgooley Increasing the "Java Heap Size of Navigator Metadata Server in Bytes" is fixing the "NAVIGATORMETASERVER_SCM_HEALTH has become bad" issue. But getting the same issue after a month. Pls find below the log that we are maintaining internally about the Java Heap size increment. 09/06/16 - changed Java Heap Size of Navigator Metadata Server in Bytes from 1 GiB to 2 GiB due to NAVIGATORMETASERVER_SCM_HEALTH bad health
10/18/16 - changed Java Heap Size of Navigator Metadata Server in Bytes from 2 GiB to 3 GiB due to NAVIGATORMETASERVER_SCM_HEALTH bad health
12/01/16 - changed Java Heap Size of Navigator Metadata Server in Bytes from 3 GiB to 4 GiB due to NAVIGATORMETASERVER_SCM_HEALTH bad health
01/17/17 - changed Java Heap Size of Navigator Metadata Server in Bytes from 4 GiB to 5 GiB due to NAVIGATORMETASERVER_SCM_HEALTH bad health So my question is, 1. What would be the maximum Java Heap Size? I know it is based on our configuration but Is there any chart to define/identify the max, so that I will make sure to not increase more than the recommendation. Because this is our prod and I don't want to break anything else just by Keep increasing Java Heap Size
... View more
01-16-2017
07:58 AM
@MasterOfPuppets Complex query can be tuned but applying count(*) query on hive table with 4 million records returning result in 15 seconds is not an issue from Hive point of view Still if you need quick result, you have to login to impala-shell instead of Hive and run your query. But pls be aware that impala will use more memory
... View more
01-14-2017
09:25 PM
@MasterOfPuppets Follow the below points one by one 1. As I mentioned already, if you change the parameter temporarily via Hive CLI/beeline, just exit from hive and login back, so it will set back to your original status now. Run the query again, confirm the issue that you are getting due to parameter change 2. As I mentioned already, You can change the property "as needed", meaning... I don't know your memory capacity, In my example, i've given 5120 mb (or) 5 GB... but you have to alter the numbers based on your memory capacity. Check your memory capacity at CM -> Hosts (menu) -> Get memory capacity for each node 2.1. To make it more easier, get the current memory allocation for Map & Reduce by : Go to CM -> Yarn -> Configuration -> search for "memory.mb" Then increase little bit based on your memory capacity 3. Also the log you are getting is not an actual log... Get it from below steps Cloudera Manager -> Yarn -> Web UI (Menu)-> ResourceManager Web UI -> (It will open 8088 window) -> Click on Failed link (left) -> Click on Application/History link -> Get Diagnostics informations & Log If you still need assitance, Hide only confidential information and share the complete log and Diagnostics informations Thanks Kumar
... View more
01-14-2017
08:12 PM
@MasterOfPuppets There are so many methods to improve performance. In your statement, you have mentioned index enabled for ORC (hope you are referring to Row group/bloom filter, etc) 1. In addition to that, you can also create index on particular columns (On the col1, col2 that you have mentioned in your example) 2. Also You can change the property as needed. Note: I would recommand to set the below parameters temporarily in hive/beeline CLI before change permenantly in hive-site.xml/Cloudera Manager configuration set mapreduce.map.memory.mb=5120;
set mapreduce.map.java.opts=-Xmx4g # Should be 80% of (mapreduce.map.memory.mb)
set mapreduce.reduce.memory.mb=5120;
set mapreduce.reduce.java.opts==-Xmx4g ; # Should be 80% of (mapreduce.reduce.memory.mb) Thanks Kumar
... View more
01-12-2017
12:38 PM
Since you have mentioned the word "user role", I want to clarify this You have to understand the difference between Group, User and Role Group and User to be created in both Linux(root user) and Hue(as admin user) But Role to be created only in Hue Ex: Login as root in Linux and apply below commands. Group: groupadd hive; groupadd hue; groupadd impala; groupadd analyst; groupadd admin; # In your case, your Group suppose to be.. Auditor, Read-Only, Limited Operator, Operator, Configurator, Cluster Administrator ,BDR Administrator, Navigator Administrator, User Administrator, Key Administrator, Full Administrator User: useradd kumar; # User belongs to Group usermod -a -G hive,hue,impala,admin,analyst kumar; passwd kumar; # Role assigned to Group: Now, login to Hue -> Security(Menu)-> Sentry Tables -> Add Roles (as Hive user)
... View more
01-12-2017
11:12 AM
2 Kudos
@cplusplus1 1. Login to Linux: Create required Group & User 2. Login to Hue: Either sync with LDAP or Create required Group & User manually. Note1: You have to login as "admin user" to manage user/group
Note2: Make sure Linux Group & User exactly matches to Hue Group & user 3. Login to Hue: Create Roles for each DB/Tables by Hue -> Security(Menu)-> Sentry Tables -> Add Roles Note1: You have to login as "Hive user". Because CM -> Sentry -> Configuration -> Admin Groups -> Default values are Hive, Impala, Solr, Hue Thanks Kumar
... View more
01-12-2017
09:13 AM
@cplusplus1 You can get xml files in the below path... But I will not recommand you to update it directly, instead you can update your configuration using CM /var/run/cloudera-scm-agent/process/*-hive-HIVESERVER2 By default, Sentry requirs configuration changes in Hive, Imapal, YARN and Hue ( you can add addiontal services as needed and change configuration) Ex: You can follow this method CM -> Hive -> Configuration Select Scope > HiveServer2. Select Category > Main. Uncheck the HiveServer2 Enable Impersonation checkbox
... View more
01-09-2017
02:21 PM
We are getting the following error from YARN: NodeManager Health is bad: GC Duration: Average time spent in garbage collection was 45.2 second(s) (75.40%) per minute over the previous 5 minute(s). Critical threshold: 60.00%. Average time spent in garbage collection was 30.3 second(s) (50.45%) per minute over the previous 5 minute(s). Warning threshold: 30.00%. Below are my configuration: Currently we are using the default setting for CM -> Yarn -> Configuration -> Java Configuration Options for Node Manager -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled CM -> Yarn -> Configuration -> nodemanager_gc_duration_window 5 minute(s) CM -> Yarn -> Configuration -> nodemanager_gc_duration_thresholds Warning: 30.0
Critical: 60.0 I went through this link but it doesn't cover how to fix this issue https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cm_ht_nodemanager.html Below are my questions : 1. The environment was good for more than a year but getting issue now. why? Is it due to more usage? 2. Do we need to clear any old garbage from the environment to fix this issue? if so, how? 3. Do we need to change any configuration to fix this issue? if so, how? 4. Do we need to do both step 2 and step 3 by any chance?
... View more
Labels:
- Labels:
-
Cloudera Manager
01-06-2017
09:08 AM
FYI... Everything is fine with kadmin.local but kadmin is not working properly.. the same issue was raised by someone else in stackoverflow... I just followed the instruction.. The issue has been fixed now http://stackoverflow.com/questions/23779468/kerberos-kadmin-not-working-properly
... View more
01-05-2017
05:22 PM
@benassi As we know "Yarn Aggregate Log Retention" can control only YARN but /tmp/logs is not limited to YARN So Can you check the YARN log date using below steps. CM -> Yarn -> Web UI -> Resource Manager web UI -> (it will open 8088 link) Click on Finished link (left side) -> Come down and click on 'Last' button -> Check the log date -> You should see only one day history data as you configured to 1 day Note: Make sure CM-> Yarn -> Configuration -> Enable Log Aggregation = Enabled Thanks Kumar
... View more
01-04-2017
11:52 AM
@bgooley You are correct, I am getting the privilege error when I use kadmin but it is working fine with kadmin.local... I understand Generate Missing Credential will use kadmin instead of kadmin.local, so this is causing the trouble. [root@abc]# kadmin
Authenticating as principal root/admin@REALM.COM with password.
Password for root/admin@REALM.COM:
kadmin: addprinc -maxrenewlife "432000 sec" -randkey -pw hadoop1 solr/<<my_IP>>@REALM.COM
WARNING: no policy specified for solr/<<my_IP>>@REALM.COM; defaulting to no policy
add_principal: Operation requires ``add'' privilege while creating "solr/<<my_IP>>@REALM.COM".c
[root@abc]# kadmin.local
kadmin.local: addprinc -maxrenewlife "432000 sec" -randkey -pw hadoop1 solr/<<my_IP>>@REALM.COM
WARNING: no policy specified for solr/<<my_IP>>@REALM.COM; defaulting to no policy
Principal "solr/<<my_IP>>@REALM.COM" created. I tried to Import the credential using CM -> Admin -> Security. It says success message but I list the Kerberos credential, the principal is still missing for only solr Successfully imported KDC Account Manager credentials. so I've deleted the principal that i've added manually using kadmin.local..... How to fix the issue with kadmin? so that I can use Generate Missing Credential option Here I've listed my configuration, do you think any change required on it? cat /var/kerberos/krb5kdc/kadm5.acl
*/admin@REALM.COM *
hive@REALM.COM *
hdfs@REALM.COM
###
kadmin.local: listprincs
HTTP/<<my_ipaddress>>@REALM.COM
K/M@REALM.COM
cloudera-scm/admin@REALM.COM
hdfs/<<my_ipaddress>>@REALM.COM
hdfs@REALM.COM
hive/<<my_ipaddress>>@REALM.COM
hue/<<my_ipaddress>>@REALM.COM
impala/<<my_ipaddress>>@REALM.COM
kadmin/admin@REALM.COM
kadmin/changepw@REALM.COM
kadmin/<<my_ipaddress>>@REALM.COM
krbtgt/REALM.COM@REALM.COM
mapred/<<my_ipaddress>>@REALM.COM
oozie/<<my_ipaddress>>@REALM.COM
root/admin@REALM.COM
root@REALM.COM
sentry/<<my_ipaddress>>@REALM.COM
yarn/<<my_ipaddress>>@REALM.COM
zookeeper/<<my_ipaddress>>@REALM.COM I've confirmed that my Fully qualified Domain Name (FQDN) is correct with my configurations Note: I am using admin login in Cloudera manager to generate new principal and root/admin@REALM in CLI to add new principal
... View more
01-04-2017
09:32 AM
@bgooley Thanks for quick reply. Let me double check all the points that you have mentioned. In the mean time, I am still not clear with one point.... I believe my /var/kerberos/krb5kdc/kadm5.acl and other configurations are fine, because As I mentioned already, all the existing services (HDFS, Hive, Impala, Oozie, Hue, etc) are working fine. If there is a problem with my configuration, I should get the same error for my the existing services right.. why should I get error for only new service? The only difference between existing and new services are 1. Existing services are added before enable Kerberos (everything is ok) 2. Trying to add New services now after enable Kerberos any idea?
... View more
01-04-2017
08:52 AM
Hi CDH 5.7.x I used to add new services in our cluster using Cloudera Manager without any issue before enable Kerberos. We have installed/enabled Kerberos now and everything is good for the existing services But I want to add new service (Solr) and getting the following error Start Solr: Failed to start service
Execute command Start this Solr Server on role Solr Server: Command failed to run because this role has invalid configuration. Review and correct its configuration. First error: Role is missing Kerberos keytab. Please run the Generate Missing Credentials command on the Kerberos Credentials tab of the Administration -> Security page I hv tried to Generate Missing Credentials in Admin -> security page but it end up with following error /usr/share/cmf/bin/gen_credentials.sh failed with exit code 1 and output of <<
+ export PATH=/usr/kerberos/bin:/usr/kerberos/sbin:/usr/lib/mit/sbin:/usr/sbin:/usr/lib/mit/bin:/usr/bin:/sbin:/usr/sbin:/bin:/usr/bin
+ PATH=/usr/kerberos/bin:/usr/kerberos/sbin:/usr/lib/mit/sbin:/usr/sbin:/usr/lib/mit/bin:/usr/bin:/sbin:/usr/sbin:/bin:/usr/bin
+ CMF_REALM=REALM.COM
+ KEYTAB_OUT=/var/run/cloudera-scm-server/cmf6942980384105255302.keytab
+ PRINC=solr/<<my_ipaddress>>@REALM.COM
+ MAX_RENEW_LIFE=432000
+ KADMIN='kadmin -k -t /var/run/cloudera-scm-server/cmf2028852611455413307.keytab -p root/admin@REALM.COM -r REALM.COM'
+ RENEW_ARG=
+ '[' 432000 -gt 0 ']'
+ RENEW_ARG='-maxrenewlife "432000 sec"'
+ '[' -z /var/run/cloudera-scm-server/krb5920427054266466413.conf ']'
+ echo 'Using custom config path '\''/var/run/cloudera-scm-server/krb5920427054266466413.conf'\'', contents below:'
+ cat /var/run/cloudera-scm-server/krb5920427054266466413.conf
+ kadmin -k -t /var/run/cloudera-scm-server/cmf2028852611455413307.keytab -p root/admin@REALM.COM -r REALM.COM -q 'addprinc -maxrenewlife "432000 sec" -randkey solr/<<my_ipaddress>>@REALM.COM'
WARNING: no policy specified for solr/<<my_ipaddress>>@REALM.COM; defaulting to no policy
add_principal: Operation requires ``add'' privilege while creating "solr/<<my_ipaddress>>@REALM.COM".
+ '[' 432000 -gt 0 ']'
++ kadmin -k -t /var/run/cloudera-scm-server/cmf2028852611455413307.keytab -p root/admin@REALM.COM -r REALM.COM -q 'getprinc -terse solr/<<my_ipaddress>>@REALM.COM'
++ tail -1
++ cut -f 12
get_principal: Operation requires ``get'' privilege while retrieving "solr/<<my_ipaddress>>@REALM.COM".
+ RENEW_LIFETIME='Authenticating as principal root/admin@REALM.COM with keytab /var/run/cloudera-scm-server/cmf2028852611455413307.keytab.'
+ '[' Authenticating as principal root/admin@REALM.COM with keytab /var/run/cloudera-scm-server/cmf2028852611455413307.keytab. -eq 0 ']'
/usr/share/cmf/bin/gen_credentials.sh: line 35: [: too many arguments
+ kadmin -k -t /var/run/cloudera-scm-server/cmf2028852611455413307.keytab -p root/admin@REALM.COM -r REALM.COM -q 'xst -k /var/run/cloudera-scm-server/cmf6942980384105255302.keytab solr/<<my_ipaddress>>@REALM.COM'
kadmin: Operation requires ``change-password'' privilege while changing solr/<<my_ipaddress>>@REALM.COM's key
+ chmod 600 /var/run/cloudera-scm-server/cmf6942980384105255302.keytab
chmod: cannot access `/var/run/cloudera-scm-server/cmf6942980384105255302.keytab': No such file or directory
>> So I've manually added "solr/<<my_ipaddress>>@REALM.COM" using kadmin.local and tried to import from Admin -> security page.. no luck so now my questions are 1. Is there any prequest to add a new service in Kerberoized cluster? 2. I cannot simply press "Generate Missing Credentials in Admin -> security page" Becuase How does my cluster knows which service I am going to add... it can be Solr, or something else?? Still I tried but it says nothing to generate Thanks Kumar
... View more
Labels:
- Labels:
-
Kerberos
12-31-2016
04:56 PM
@cdhnaidu Hope you have enabled sentry ... I can see that you login as hive user in hue. Can you login as admin instead of hive in hue and try... Note: make sure you have required user, group, roles in both Hadoop and Hue. Also check your admin configuration setting in CM -> Sentry -> Configruation. Few sample commands beeline>Create role admin; Granted priviledges to admin role. GRANT ALL ON SERVER server1 TO ROLE admin WITH GRANT OPTION; Assign the role to a group. GRANT ROLE admin TO GROUP administrators; After these steps all users within the group administrators are allowed to manage hive priviledges
... View more
12-23-2016
12:30 PM
1 Kudo
@zhuw.bigdata I hope you are done with Import KDC Acc Manager Credential already using the following steps" CM -> Administration -> Setting -> Import KDC Account Manager Credentials" And now you want to change the credential In your CLI, type kadmin.local (if you are in Kerberos master node) --or-- kadmin (if you are from client/remote node) kadmin.local: ? # Type ?, it will give you help including how to change credentials Hope this helps
... View more
12-20-2016
01:27 PM
From your examples, i hope you are getting this issue for almost all the tables.... 1. Can you try to access the table from Impala? 2. Pls do not forget to execute "invalidate metadata table" 3. Are you getting this issue for only existing tables? Can you create a new table in hive and test it? if new table works, compare the configuration difference between old & new tables
... View more
12-20-2016
12:21 PM
If not aware already, do not get confuse with /apps/hive/warehouse/ in Cloudera Hive meta store databases are usually created under /user/hive/warehouse where as in Hortonworks distribution it is usually under /apps/hive/warehouse/ bottom line of that link is to make sure the Hive Warehouse Directory should be match in all possible areas
... View more
12-20-2016
12:08 PM
Try this https://community.hortonworks.com/content/supportkb/48759/javalangillegalargumentexception-wrong-fs-running.html
... View more
12-20-2016
10:34 AM
Since you update nameservice just now, it may require a restart! not sure... pls query after cluster restart! it may help
... View more
12-20-2016
10:10 AM
1 Kudo
To make it more easier CM -> HDFS -> Configuration -> NameNode Nameservice I assume it was "nameservice1" before and "zeus.corpdom.com now". Also did you check the location from "describe formatted tablename" in Hive. If the location refers to nameservice1 then change either NameNode Nameservice name or Hive table location. If you have very less tables, then I would recommand to change the hive table location to match with NameService. Hope this helps!
... View more
- « Previous
- Next »