Member since
01-18-2016
169
Posts
32
Kudos Received
21
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
971 | 06-27-2025 06:00 AM | |
1113 | 01-14-2025 06:30 PM | |
1783 | 04-06-2018 09:24 PM | |
1864 | 05-02-2017 10:43 PM | |
4948 | 01-24-2017 08:21 PM |
07-10-2025
09:31 AM
Also, if you use Iceberg table format instead of default Hive format it will act more intuitively as a NULL without the placeholder partition name "__HIVE_DEFAULT_PARTITION__". So with Iceberg you can just do this: SELECT * FROM aaa where data_dt IS NULL; Depending on the environment/version you're running in, you can use iceberg this in any or all of: Impala, Spark3 and Hive.
... View more
07-10-2025
09:21 AM
In Hive, each partition corresponds to a physical directory on the file system. Because an empty string ('') or a NULL value cannot be used as a directory name, Hive substitutes it with the default value from its configuration. The setting hive.exec.default.partition.name provides this value, which is __HIVE_DEFAULT_PARTITION__. When you query with a WHERE clause on a partitioned column, Hive performs partition pruning by filtering the directory names before it ever reads the data files. Therefore, to find data in the default partition, you must filter on the literal string name that Hive assigned to the directory. This will work: SELECT * FROM aaa WHERE data_dt = '__HIVE_DEFAULT_PARTITION__'; This won't work: SELECT * FROM aaa WHERE data_dt IS NULL; — This fails because the column's value is the string '__HIVE_DEFAULT_PARTITION__', not a true NULL.
... View more
06-27-2025
06:00 AM
The principal name may not be complete. Check the keytab and use the principal name it returns: klist -kt trino.keytab The Trino principal should include the realm (e.g. trino@YOURREALM ). And if it's a host-based principal it may include the hostname as well (trino/node1.yourdomain.com@YOURREALM). So you might have something like this where _HOST is a placeholder: trino/_HOST@EXAMPLE.COM. See this document. for more details. You should also add the full path to the keytab. If these don't fix it, it might be helpful to know what error messages you are getting on the Trino side and Hive side.
... View more
03-03-2025
05:43 AM
@Maulz - You can use Knox as a proxy through the cdp-proxy-api topology to connect to Hive or Impala with basic authentication (username/password) like Hue. Using "cdp-proxy-api" assumes Knox is configured for basic authentication instead of SAML or JWT, etc. If it's not, you can manually create a new topology for basic authentication with hive. Here's how to enable Knox to expose Hive with the cdp-proxy-api if it's not configured yet (perhaps select "all" for transport mode: https://docs.cloudera.com/cdp-private-cloud-base/7.3.1/securing-hive/topics/hive_secure_knox.html It sounds like you have both AD and MIT KDC. If your user you want to use hive with is in the MIT KDC realm, not AD, you can use a different krb5.conf file, or you can set up a on-way trust between AD and MIT KDC. You can create your own krb5.conf configured for MIT KDC (or even configured for the one-way-trust, but the trust has to be established between the AD and MIT KDCs). export KRB5_CONFIG=/path/to/your/custom_krb5.conf You can use the default ticket cache file in the krb5.conf default_ccache_name = FILE:/tmp/krb5cc_%{uid} or set it as an environment variable (KRB5CCNAME). Of course you'll need these set for python as well.
... View more
02-28-2025
11:55 AM
@Maulz - Check this document for how to query hive from python3. You don't need Hue for this. https://docs.cloudera.com/cdsw/1.10.5/import-data/topics/cdsw-accessing-data-from-apache-hive.html This example is for using Kerberos, but you can change or remove the authentication settings, depending on your authentication requiremnts.
... View more
01-14-2025
06:30 PM
@Seaport, Let's address the kerberos issue before Ranger. Can you kinit as hdfs user? (on the NN with the hdfs keytab /var/run/cloudera-scm-agent/process/<a_number>-hdfs-NAMENODE/hdfs.keytab) Once you have a hdfs kerberos ticket, can you list directories? Did you properly configure sssd, integrated with AD in example.com realm on ALL cluster nodes? For the HDFS issue you're seeing, the group mapping, via sssd, to user is required on the active NN, but eventually you need it working on all nodes. If you run the command "id mysuperuser", is he in mysupergroup? For the Solr issue, check the CM -> Solr -> Configurations -> HDFS Data Directory. It should be something like /solr. If it's correct, you need to selecting CM -> Solr -> Actions -> Create HDS Home Dir. Then restart Solr. Note that after you install Ranger, the service name, znode and HDFS Home Dir will change to something like /solr-infra. If you need Solr for your own data (not service infrastructure like Solr and Atlas), install a separate Solr instance after installing Ranger. Good luck.
... View more
11-22-2024
10:25 AM
1 Kudo
@weixin As a test, try using curl and make sure you have a kerberos ticket: curl -u : --negotiate http://YOURHOST:PORT/jmx You may need to open a support case for this. I also highly recommend upgrading to CDP 7.1.9.
... View more
02-28-2024
09:41 AM
When using two realms, there has to be a trust between realms and your krb5.conf has to be configured properly to handle both realms on both the client and server. Setting this up isn't super difficult if you've done it once or twice but can be hard if it's new to you. The krb5.conf requires proper host or domain realm mapping. If you set up a 1 way trust (but it can also be a 2 way trust), and assuming you use MIT KDC for cluster service principals but AD is the other realm, then MIT KDC has to trust AD, but AD doesn't have to trust MIT KDC. To set up the trust you need to do configurations in both environments. Here's an example: https://community.cloudera.com/t5/Community-Articles/One-Way-Trust-MIT-KDC-to-Active-Directory/ta-p/247638 If the KDC trust isn't the issue, it may be something in there's probably an issue with the driver configuration. And, if this is being done on a Windows computer, you may need to configure the Windows machine to know about the other realm. I also recommend opening a Cloudera support case.
... View more
02-28-2024
09:25 AM
Hi, Do you have a question? The HDP Sandbox is no longer available or supported.
... View more
01-19-2024
04:54 PM
2 Kudos
That's a lot of log. Some of the error messages you see are normal. I'm not sure what your issue is. Do you see Cloudera Management Service below the Cluster services in CM (at the very bottom when you click Cloudera Manager - top left)? If so, click Instances and figure out which components/roles are not started. You can also click and start them one by one. Then you can look at the startup logs in the CM UI pop-up after it starts or fails. Check in the order of STDOUT, STDERR and lastly ROLE LOG, which is the log after it is started. You may need to check the Full Log.
... View more