Member since
04-03-2019
97
Posts
7
Kudos Received
6
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
227 | 01-13-2025 11:17 AM | |
3812 | 01-21-2022 04:31 PM | |
6179 | 02-25-2020 10:02 AM | |
3871 | 02-19-2020 01:29 PM | |
2742 | 09-17-2019 06:33 AM |
02-04-2025
03:35 PM
1 Kudo
@ggangadharan Thanks for the advice. After I created user xxx on each data node, the Spark job ran successfully. Regarding user account synchronization from ldap to local OS, I had to create the user account on each node manually. Do you mean using SSSD? Regards,
... View more
01-30-2025
10:46 AM
The error from my Spark job is ++++ Failing this attempt.Diagnostics: Application application_1738011234567_0014 initialization failed (exitCode=255) with output: main : command provided 0 main : run as user is xxxx main : requested yarn user is xxxx User xxxx not found ++++ I read this post <https://community.cloudera.com/t5/Support-Questions/MapReduce-job-failing-after-kerberos/td-p/160273>. My group mapping configuration is hadoop.security.group.mapping = org.apache.hadoop.security.LdapGroupsMapping. I kinited xxxx before the job run. I added the AD user xxxx to an AD group hadoop. But I still got the same error. This online doc might be appliable <https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/security-authorization/topics/cm-security-authorization-ldap-group-mappings.html#ariaid-title3> I might need to add the flag -Dcom.cloudera.cmf.service.config.emitLdapBindPasswordInClientConfig=true to the variable CMF_JAVA_OPTS flag. But the documentation is for CDP 7.1.8 and does not exist for 7.1.7, which is my cluster. Thank you. Best regards,
... View more
Labels:
- Labels:
-
Apache Spark
-
Kerberos
01-22-2025
05:52 PM
James, Thanks for your help. Your reply that "user is required on the active NN" is right to the point. SSSD is mentioned in various online documents related to enabling Kerberos. In my case, SSSD is a background process and I do not need to configure it, right? Best regards,
... View more
01-13-2025
11:17 AM
The issue was resolved after I checked the "Enable HBase Thrift Http Server" property in HBase configuration. It turned out that the TLS implementation for the thrift server on CDP HBase is done at http layer, not at the Transport layer.
... View more
01-13-2025
11:07 AM
I use CDP Private Cloud Base 7.1.7 and just enabled Kerberos security. I followed the setup documentation but could not proceed further than this step <https://docs.cloudera.com/cdp-private-cloud-base/7.1.7/security-kerberos-authentication/topics/cm-security-kerberos-enabling-step7-prepare-cluster.html>. In short, I lost "supergroup" access to hdfs. Here are details. * I created an AD account mysuperuser@example.com and an AD group mysupergroup@example.com. * After Kerberos is enabled, I changed dfs.permissions.superusergroup=mysupergroup, and restarted the cluster. Certainly, "mysupergroup" and "mysuperuser" do not exist anywhere in Hdfs POSIX permission settings. * I kinited mysuperuser@example.com, but got hdfs permission denied error. It looks like that Kerberos could not understand AD groups associated with the kinited account. * Then I changed dfs.permissions.superusergroup=mysuperuser, restarted all services, but still got permission denied error. I intended to use Ranger to manage HDFS resource permissions. I could not get Ranger properly installed due to the HDFS permission error. Ranger depends on Solr and Solr uses HDFS. Right now Solr gave me an HDFS access error (Java error) - Caused by: org.apache.hadoop.ipc.RemoteException: Permission denied: user=solr, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x. I am trying to understand how HDFS permission works after enabling Kerberos but before Ranger is operational. Right now I can only access hdfs via kiniting the hdfs keytab file, which should only be used as a last resort. Thank you. Best regards,
... View more
Labels:
- Labels:
-
Kerberos
12-19-2024
11:22 AM
Additional connection tests show that port 9191 still works on unencrypted connections, although "TLS/SSL for HBase Thrift Server over HTTP" is enabled. Neither the log nor the Cloudera Manager UI gave any warnings or errors.
... View more
12-18-2024
10:36 AM
It appeared that the Thrift Server did not start completely, although it has a green light in Cloudera Manager. Inside the log hbase-cmf-hbase-HBASETHRIFTSERVER-mynode.log.out, there is no entry to acknowledge the start like ++ org.eclipse.jetty.server.AbstractConnector: Started ServerConnector@180e6ac4{SSL, (ssl, http/1.1)}{0.0.0.0:9191} ++ But I have no idea why the starting ended up incomplete. Therer was no warning or error from either the log or the Cloudera Manager UI. Thank you.
... View more
12-17-2024
10:28 PM
1 Kudo
This issue occurred right after I enabled TLS on my CDP Private Cloud Base 7.1.7. The client call to HBASE Thrift API failed at TLS hanshake. Below is the connection test output with the handshake failure. ++ $ openssl s_client -connect mycompany.com:9191 CONNECTED(00000003) write:errno=0 --- no peer certificate available --- No client certificate CA names sent --- SSL handshake has read 0 bytes and written 287 bytes Verification: OK --- New, (NONE), Cipher is (NONE) Secure Renegotiation IS NOT supported Compression: NONE Expansion: NONE No ALPN negotiated Early data was not sent Verify return code: 0 (ok) --- ++ My Thrift API port is 9191 (not the default 9090). This port worked well before TLS was enabled. There should be no certificate/ca issue because the Thrift (on the same node) UI over TLS works just fine. Below is the connection test output showing a successful handshake. ++ $ openssl s_client -connect mycompany.com:9095 CONNECTED(00000003) depth=2 CN = MYROOTCA ... --- Certificate chain ... --- Server certificate -----BEGIN CERTIFICATE----- ... ++ All my HBASE instances have green lights inside Cloudera Manager. I do not know where to look. It looks like something internal in SDX went wrong. Any suggestions? Thank you. Best regards,
... View more
Labels:
- Labels:
-
Apache HBase
-
Security
10-23-2023
03:43 PM
Ezerihun, Thanks for your reply. I repeated my test, which showed that you are correct. I was not sure what happened to my test case previously. When I dropped an external table, the warehouse path for that table "warehouse/tablespace/external/hive/testdb1.db/table1" remains. Actually, I can even re-create that external table again without any error, and files loaded to "warehouse/tablespace/external/hive/testdb1.db/table1" can be read through the re-created table. In other words, although Impala created this path "warehouse/tablespace/external/hive/testdb1.db/table1", Impala does not manage it at all. Thank you.
... View more
10-18-2023
04:50 PM
I ran into an interesting situation using the Impala external table. In short, I used "create external table" statement but ended up with a table like a managed one. Here are details. Step 1: creating an external table created external table testdb1.table1 ( fld1 STRING, fld2 STRING ) PARTITIONED BY ( loaddate INT ) STORED AS PARQUET tblproperties('parquet.compress'='SNAPPY','transactional'='false'); Step 2: adding partitions and loading data files. alter table testdb1.table1 add if not exists partition (loaddate=20231018); load data inpath '/mytestdata/dir1' into table testdb1.table1 partition (loaddate=20231018); Step 2 shows that this table1 behaves exactly like a managed table. Files at /mytestdata/dir1 are moved to hdfs warehouse path warehouse/tablespace/external/hive/testdb1.db/table1/loaddate=20231018 path. If I drop this partition 20231018, the directory at warehouse/tablespace/external/hive/testdb1.db/table1/loaddate=20231018 is removed. So what exactly is the difference between a managed vs external partitioned table, except for the different storage location /warehouse/tablespace/managed vs /warehouse/tablespace/external? From what I read, the key difference is that a managed table's storage is managed by hive/impala, but an external table is not. In my case, even this table1 is created as an external table, its storage is still managed by impala/hive. As I understand, if I add a partition (to an external table) and then add files using "load data inpath", then the storage is managed by hive. If I add a partition with the location specified, like alter table testdb.table1 add if not exists partition (loaddate=20231018 ) location '/mytestdata/dir1' Then the storage is NOT managed by hive. Is this correct?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Impala