Member since
09-20-2017
49
Posts
3
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1209 | 02-14-2019 12:54 PM | |
1823 | 02-13-2019 08:32 AM | |
942 | 01-28-2019 11:42 PM |
02-14-2019
01:02 PM
Hello, I have configured CDH cluster with LDAP integration and CompositeGroupMapping (ShellBasedUnixGroupsMapping and LdapGroupsMapping) on HDFS. HDFS, Hive and Impala works great with both local user principals as well as AD users. The problem I have now is with Spark (on YARN), where jobs submitted by local users work, but those submitted by AD users fail: main : run as user is ldap1
main : requested yarn user is ldap1
User ldap1 not found If I create user ldap1 on all hosts, then Spark works. What am I missing here? Thank you
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache YARN
-
Cloudera Manager
02-14-2019
12:54 PM
I managed to fix this by configuring CompositeGroupMapping instead of LdapGroupMapping.
... View more
02-14-2019
04:31 AM
Hello,
I have an HDFS-Hive-Impala regression script that works fine on my kerberized & sentry protected CDH cluster.
Now, I enabled LDAP authentication on HDFS (LdapGroupsMapping), Hive and Impala and the regression script passes HDFS and Hive but fails on the SELECT-INSERT-CREATE Impala actions:
Failure 1 & 2 (similar error for select and insert):
Query: select * from customer.cons limit 10
ERROR: AnalysisException: Failed to load metadata for table: 'customer.cons'
CAUSED BY: TableLoadingException: Failed to load file metadata for 1 paths for table customer.cons. Table's file metadata could be partially loaded. Check the Catalog server log for more details.
Failure 2:
Query: create table customer.test_141226 (id int)
ERROR: ImpalaRuntimeException: Error making 'createTable' RPC to Hive Metastore:
CAUSED BY: MetaException: Got exception: org.apache.hadoop.security.AccessControlException Permission denied: user=impala, access=WRITE, inode="/user/hive/warehouse/customer.db":hive:hive:drwxrwx--t
Note 1: Hive and Impala share the exact the same queries on the regression script. The latter seems like an impersonation problem, but why does it appear now and not before LDAP?
Note 2: services principals are localy (KDC) while user principals on AD.
Thank you,
Gerasimos
... View more
Labels:
02-13-2019
08:32 AM
core-site.xml had empty values not *. No issue.
... View more
02-13-2019
08:22 AM
Hello, In my kerberized and Sentry-protected CDH, I started getting the following errors on hive metastore: Caused by: org.apache.hadoop.security.authorize.AuthorizationException:
User: hive/master.hadoop.local@HADOOP.LOCAL is not allowed to
impersonate sentry/worker1.hadoop.local@HADOOP.LOCAL In core-site.xml I have: hadoop.proxyuser.hive.groups=*
hadoop.proxyuser.hive.users=* The error started after I was playing around with LDAP integration, though I rollback my configurations to the previous no-LDAP state. I am trying to figure out what I missed.
... View more
Labels:
- Labels:
-
Apache Sentry
-
Kerberos
02-13-2019
08:09 AM
Hello, I noticed the following behaviour: 1. I have set the following: <property>
<name>hive.metastore.kerberos.keytab.file</name>
<value>/etc/hive/conf/hive.keytab</value>
</property> 2. I have generated and placed the hive.keytab in both MetaStoreServer and HS2 nodes Now, each time I restart Hive these files are deleted as the whole /etc/hive/conf directory is re-created. Am I doing something wrong?
... View more
Labels:
- Labels:
-
Apache Hive
02-12-2019
04:15 AM
Hello @bgooley Cross-realm trust is OK. I can `kinit` principals from both MIT KDC and AD realms. Hue-LDAP authenticaion is also OK, however (for now) LDAP users can only perform action not related to HDFS, HIVE and IMPALA. My target is to have some users (humans) to be authenticated against LDAP (for Hue and all CLI hive-impala-etc actions) and some other users (oozie pipelines) as well as all services to be authenticated against MIT KDC. Now, I am reading here https://www.cloudera.com/documentation/enterprise/5-15-x/topics/cm_sg_ldap_grp_mappings.html that: "The local user:group accounts must be mapped to LDAP for group mappings in Hadoop. You must create the users and groups for your Hadoop services in LDAP. To integrate the cluster with an LDAP service, the user:group relationships must be contained in the LDAP directory. The admin must create the user accounts and define groups for user:group relationships on each host." This is confusing, as it is supposed (https://www.cloudera.com/documentation/enterprise/5-14-x/topics/sg_auth_overview.html#concept_n5q_5h2_bt__local-mit-to-active-dir-architecture) that only user principals should be configured in AD. My question is whether in this architecture I need to define services user:group relationships etc in LDAP. (for User-group mapping I am trying both LdapGroupsMapping and SSSD - none have worked yet though) Thank you, Gerasimos
... View more
02-08-2019
08:24 AM
You are right. I had forgot a dual backend configuration in hue_safety_valve.ini.
... View more
02-08-2019
08:06 AM
... but the "Password" fields are now disabled: I edited the HTML page and removed "readonly=true", and I managed to create the user.
... View more
02-08-2019
08:01 AM
Yeap! Removing the URL auto-removed all the accompanied LDAP parameters (so I have to re-write them later to enable LDAP, which I tried to avoid) Thank you, Gerasimos
... View more
02-08-2019
03:00 AM
Hello, I am experimenting with LDAP integration, which I managed to make it work in Hue. Now, I switched the 'backend' property back to 'desktop.auth.backend.AllowFirstUserDjango.Backend', restarted Hue and I can login with the local user as before LDAP. However, on the User Admin page I still see "Add/Sync LDAP user" and not the local "Add user". Which other option prevents the local functionality from showing up? Thank you, Gerasimos
... View more
Labels:
- Labels:
-
Cloudera Hue
02-06-2019
05:05 AM
Hello @bgooley This is the error when syncing an existing group: views WARNING There was a naming conflict while importing group sentryadmins in pattern sentryadmins and more specifically, this line of useradmin/views.py group, created = Group.objects.get_or_create(name=ldap_info['name']) returns group=sentryadmins
created=False So, I can tell that the group is not created at Django level when it already exists. Looking closer in the python code, there is a comment: # This is a Hue group, and shouldn't be overwritten which is right! The group already exists, should not be overwritten, but users should become members of the group during sync, which is not happening.
... View more
02-05-2019
06:40 AM
Assuming that I already had (OS & Hue) group "sentryadmins", then I am getting error on "Add/Sync LDAP group": But, if I first delete the Hue group "sentryadmins", then the sync functionality works. Any idea for this? It is supposed that sync will sync existing groups (my case) or add any new ones.
... View more
02-05-2019
01:28 AM
Hello @bgooley Thanks again. I enabled the logging, I saw how ldap queries are constructed and finally I got it working. Minor, but I think that I first need to do at least one " Add/Sync LDAP Group" of a specific group in order to be synced during login of new users. So, what I have managed so far is: 1. Define (new) users in AD 2. Define the old hadoop groups in AD as well and configure users' memberships appropriately (I guess I have to do this to keep Sentry working as before) 3. When a user login in Hue, he get the group membership from AD I am going further now with this, thank you again. Gerasimos
... View more
02-04-2019
12:35 AM
Hello @bgooley Thank you again for your guided reply. I spent some time with some hands-on so I have a better view now. I started with Hue integration, which seemed the most straightforward (before go to hadoop level). I set-up an Active Directory 2008, and created some users under the "Users" container. In there, I also defined a "sentryadmins" group and made "user1" member of this group. I would expect this group (which by the way also exists in Hue and OS level) to be imported to Hue when user1 logs-in (shouldn't I?) LDAP authentication works great when I login to Hue with "user1". I can also see that firstName, lastName and email fields have been imported. However, I have 2 issues with Hue authentication: 1. "sentryadmins" group is not imported as "user1" membership. I tried the "sync" functionality and nothing changes. 2. When I press "Sync LDAP users/groups" no users or groups are imported. Can these be addressed? Also, in case that something goes really bad with LDAP integration, how can I manually switch back to "AllowFirstUserDjangoBackend"? I am using CM for Hue configuration (and a bit of code in hue_safety_valve.ini) Thank you, Gerasimos
... View more
01-28-2019
11:42 PM
1 Kudo
The problem was on StreamSets, where I had not disabled Kerberos. Now the Enable Kerberos option is active again.
... View more
01-28-2019
11:39 PM
Hello, I am trying to totally remove Kerberos and re-enable it. I have followed all the rollback steps as described in several posts: https://community.cloudera.com/t5/Cloudera-Manager-Installation/Disabling-Kerberos/td-p/19654 https://stackoverflow.com/questions/29744821/how-to-disable-hadoop-kerberos http://bigdata-tips.blogspot.com/2017/03/how-to-disable-kerberos-in-cloudera-cdh.html I seems that kerberos is indeed disabled BUT I cannot get the Enable Kerberos button back: What should I do to re-enable it? Thank you, Gerasimos
... View more
Labels:
- Labels:
-
Cloudera Manager
-
Kerberos
01-22-2019
11:27 PM
Hello @bgooley, Thank you for the detailed explanation. To clarify myself, when I said "kerberos" I meant the MIT KDC implementation, and yes I do not know much about LDAP and AD. My organization has an Microsoft AD. It also has a CDH that uses MIT Kerberos for hadoop user and services authentication. CM and Hue have their own users. The task is to review what needs to be done in order to have users declared in AD to use the cluster, e.g. for submiting Spark jobs, executing Impala queries, use CM, Hue etc. As far as I have undestood, I can keep both existing user principals along with the AD users (on different realms). Is this right? For users controlled by AD, will I still need to create them in OS level? If no, how HDFS user and group permissions are affected? After your reply, I read again the link above, and I think that the key in this task is to undestand this: " A one-way, cross-realm trust must be set up from the local Kerberos realm to the central AD realm containing the user principals that require access to the CDH cluster". Thank you again for your effort.
... View more
01-22-2019
01:59 PM
Hello,
I have a kerberized 5.14 CDH cluster and I want to integrate with LDAP for user authentication (not service). I suppose this is what described in "Local MIT KDC with Active Directory Integration".
I have the following questions about this change, if I have understood the process right:
1. Users now will be defined in AD and not kerberos. This means that current kerberos keytab files will no longer be valid. Right?
2. In case of CLI pipelines, where we first have to do a kinit for the principal, how the authentication will be after LDAP?
3. Which services should be configured to work with LDAP?
- Cloudera Manager
- Hue
- Hive
- Impala
- ... ?
4. Will groups and Sentry permissions require to be re-configured after enabling LDAP?
Thank you,
gerasimos
... View more
Labels:
- Labels:
-
Cloudera Manager
-
Kerberos
-
Security
11-28-2018
03:07 AM
I am copying this from the Apache documentation: "A plan can be executed against an operational data node. Disk balancer should not interfere with other processes since it throttles how much data is copied every second." Does "should not" means "does not" here or "other processes should not run while the balancer runs"?
... View more
11-27-2018
07:24 AM
Hello, I am managing a CDH 5.13 cluster with 4 datanodes. Each datanode had 10 x 2.7 TB disks (~90% used) and we just added another 8 x 3.6 TB disks on each node. I did a "Rebalance" on HDFS service, which apparently did nothing as all nodes have the same disk usage (in total). Now, I followed this post in order to intra-node-balance the disks (with threshold set to 25). After 1 hour of execution, the progress is terribly slow (as you can see in the last /data/18 node which gets data): $ sudo df -h
...
/dev/sdc1 2.8T 2.4T 361G 88% /data/02
/dev/sdk1 2.8T 2.5T 315G 89% /data/10
/dev/sdg1 2.8T 2.5T 308G 89% /data/07
/dev/sdi1 2.8T 2.5T 314G 89% /data/08
/dev/sdj1 2.8T 2.5T 300G 90% /data/09
/dev/sde1 2.8T 2.5T 299G 90% /data/04
/dev/sdf1 2.8T 2.5T 303G 90% /data/06
/dev/sdh1 2.8T 2.4T 353G 88% /data/05
/dev/sdb1 2.8T 806G 2.0T 29% /data/01
/dev/sdd1 2.8T 2.5T 298G 90% /data/03
---#NEW DISKS#---
/dev/sdl1 3.7T 35M 3.7T 1% /data/11
/dev/sdm1 3.7T 36M 3.7T 1% /data/12
/dev/sdn1 3.7T 34M 3.7T 1% /data/13
/dev/sdo1 3.7T 35M 3.7T 1% /data/14
/dev/sdp1 3.7T 34M 3.7T 1% /data/15
/dev/sdq1 3.7T 34M 3.7T 1% /data/16
/dev/sdr1 3.7T 34M 3.7T 1% /data/17
/dev/sds1 3.7T 26G 3.7T 1% /data/18 I would like to ask the following: 1. Currently, there are not pipelines accessing HDFS, but tomorrow morning there will be, and it's obvious from the progress that disk balancing won't have finished. Is it safe to leave this process to finish, while having the cluster in production? 2. Is there something I can do to speed things up? 3. How can I terminate this process, safely, if required? Thank you, Gerasimos
... View more
Labels:
- Labels:
-
HDFS
09-18-2018
07:55 AM
1 Kudo
Thank you @Tomas79 I am also searching for architecture designs for Active-Active or Active-Passive DR configurations using 2 clusters. This article has some introductory info on this. I was wondring whether more resources are available on this topic. Best regards, Gerasimos
... View more
09-18-2018
12:52 AM
Hello, I have a kerberized + Sentry protected CDH cluster with: 1 x Edge 2 x Master 4 x Worker nodes. I want to setup a secondary cluster for Hive replication purposes. 1. What should be the minimum topology for this task? 2. Should the secondary cluster be Sentry protected as well? 3. Should the 2 cluster share the same KDC principals? If so, can the secondary cluster use the KDC server currently installed on Master1 node? Thank you, Gerasimos
... View more
Labels:
- Labels:
-
Apache Sentry
-
Cloudera Manager
08-24-2018
12:17 AM
Thank you, I added cloudera repo to my gradle configuration and builds and functions run from Impala without problems.
... View more
08-22-2018
06:42 AM
Dear @saranvisa Concerning you suggestion to export the table: > hive -S -e "export table $schema_file1.$tbl_file1 to '$HDFS_DATA_PATH/$tbl_file1';" I did export a partition of the table, and I saw that I got: 1. _metadata 2. a copy of the directory structure as in /user/hive/warehouse/myTable/thePartition Since step [2] is too time consuming and generates a replication of the partition's data, is there another way to export only the _metadata, and copy the file directly from hive warehouse folder? Or do I miss something more with the export command?
... View more
08-22-2018
12:38 AM
Hello, Versions: - Impala Shell v2.8.0-cdh5.11.0 - Cloudera Enterprise 5.14.4 (this should be bundled with Impala 2.11.x - how can I check?) Coordinators/Executors There is no special configuration for (all) impalad, though they belong to different Role Groups but only for resource allocation purposes (there is no special value in " Impala Command Line Argument Advanced Configuration Snippet (Safety Valve)")
... View more
08-21-2018
11:57 PM
That is what I am asking. I have CDH 5.11, which reports: - hadoop-2.6.0+cdh5.11.2+2429 - hive-1.1.0+cdh5.11.2+1082 Does this mean that I have to use hadoop-core-2.6.0 ? There isn't such version, as the max is hadoop-core-1.2.1.
... View more
08-21-2018
09:53 AM
Hello, How should I choose hadoop-core and hive-exec jar versions for Impala UDFs? What should they match in the target CDH installation? For instance, hive-exec is now at version 3.1.0 but most examples I've seen use 0.13.x... Thank you, Gerasimos
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Impala
08-03-2018
06:16 AM
Hello, I have a working (for some months) Java UDF for Impala. The CDH cluster I am working has 4 Impala Daemons. Today, this UDF behaved very strange after a DROP & CREATE: 1. From HUE (Impala) the CREATE FUNCTION succeeded and SHOW FUNCTIONS reported the right signature. 2. Then, from impala-shell I tried call the function but I got an ImpalaRuntimeException: select phoneme_index("test","test","test","test",true); ImpalaRuntimeException: Unable to find evaluate function with the correct signature: my.UDF.PhonemesIndex.evaluate(STRING, STRING, STRING, STRING, BOOLEAN) UDF contains: public java.lang.Double my.UDF.PhonemesIndex.evaluate(java.lang.String,java.lang.String,java.lang.String,java.lang.String) 3. Then, I logged in Hue and tested the same SQL query, which returned "1" as expected. After several tries, I finally figured that if I connect with impala-shell -i to a different daemon, the function works as expected! Any ideas why is this happening? It is the first time I am coming across with such an issue.
... View more
Labels:
- Labels:
-
Apache Impala
05-30-2018
03:46 AM
Thank you @saranvisa 10TB is the data to move (and delete to free space) 2TB is the space currently left on each node.
... View more