Member since
11-11-2019
637
Posts
34
Kudos Received
27
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1041 | 10-09-2025 12:29 AM | |
| 8792 | 02-19-2025 09:43 PM | |
| 2467 | 02-28-2023 09:32 PM | |
| 5178 | 02-27-2023 03:33 AM | |
| 26876 | 12-24-2022 05:56 AM |
08-14-2024
09:08 PM
2 Kudos
@APentyala Please find the answers below: 1. Which data modeling approach is recommended for this domain? Ans: If you have large data, we would recommend to go with Partitioning or multi-level partitioning. You could implement Bucketing if the data inside partition is large. 2. Are there any sample models available for reference? Ans: You could take a refrence for partitioning and bucketing in https://www.linkedin.com/pulse/what-partitioning-vs-bucketing-apache-hive-shrivastava/ You could create a new table perfroom CTAS with Dynamic Partitiining from the existing table Refrence: https://www.geeksforgeeks.org/overview-of-dynamic-partition-in-hive/ 3. What best practices should we follow to ensure data integrity and performance? Ans: Please follow below best parctices: a. Paartion and bucket it b. You could use Iceberg table which would reduce the significant load on Metastore, if you are using CDP Public CLoud or CDP private CLoud(ECS/Opesnshit) c. Use ORC/parquet d. Use EXTERNAL tables,if you dont perfrom Update/Delete as reading External table is faster. 4. How can we efficiently manage large-scale data ingestion and processing? Ans: The model follows as: Kafka/Spark Streaming: Ingestion Spark: Data Modelling Hive: Warehosuing where you extract the data Please. be specific on the use case. 5. Are there any specific challenges or pitfalls we should be aware of when implementing a lakehouse in this sector? Ans: There should be no challenges, we would request to provide more briefing on this.
... View more
07-30-2024
08:19 AM
@Maicat You can not typecast array to the string. There are 2 ways you can use 1. Select the nth object of the array. SELECT level5[0] AS first_genre FROM my_table; WHere 0 is the first object 2. You can flatten it SELECT column1 FROM my_table LATERAL VIEW explode(level5) genre_table AS level5;
... View more
02-23-2024
07:48 AM
f you deacivate and reactivate with "Public Locadbalancer" and "Public Executor", this should work
... View more
02-12-2024
11:57 PM
1 Kudo
To use an internal load balancer for Cloudera Data Warehouse (CDW), you must select the option to enable an internal load balancer while activating the Azure environment from the CDW UI. Otherwise, CDW uses the Standard public load balancer that is enabled by default when you provision an AKS cluster. Before activating you should remove.https://docs.cloudera.com/data-warehouse/cloud/azure-environments/topics/dw-azure-enable-internal-aks-lb.html
... View more
02-02-2024
12:01 AM
1 Kudo
Can you decstivate environment, remove loadbalancer and activate and check? I want to isolate if the issue is with LBR or with Hue itself
... View more
01-30-2024
01:33 AM
1 Kudo
I also see, there is an open jira and not resolved yet DWX-135. We need to wait until it is fixed.
... View more
01-30-2024
01:28 AM
1 Kudo
@andym did you follow https://docs.cloudera.com/data-warehouse/cloud/azure-environments/topics/dw-azure-enable-internal-aks-lb.html
... View more
09-27-2023
06:56 AM
3 Kudos
If you need to connect HiveServer2 from Third parties like Dbbeaver OR PowerBI or any Java client, you can connect using the Cloudera JDBC driver. The driver can be downloaded from Hive JDBC Connector 2.6.21 for Cloudera Enterprise. We will cover the scenario for SSL, Zookeeper,Kerberos, LDAP, and LoadBalancer to connect to HiveServer2 using the driver. Please note that the Beeline URL is not the same as the JDBC driver. The list of properties for the JDBC driver is listed at Cloudera JDBC Driver 2.6.21 for Apache Hive. You need to use com.cloudera.hive.jdbc.HS2Driver class to connect to HiveServer2. Lets assume: Host: c2345.node.cloudera.com Kerberos Realm: EXAMPLE.COM Plain Connection: jdbc:hive2://c2345.node.cloudera.com:10000/default;LogLevel=6;LogPath=/tmp where LogLevel =6 ==> is more verbose level of logging Kerberos+SSL+binary: jdbc:hive2://c2345.node.cloudera.com:10000/default;SSL=1;SSLTrustStore=/home/keystore-cdp/cm-auto-global_truststore.jks;SSLTrustStorePwd=xxxxxxxxxx;LogLevel=6;LogPath=/tmp/logs;KrbRealm=EXAMPLE.COM;KrbHostFQDN=c2345.node.cloudera.com;KrbServiceName=hive;AuthMech=1 If you import root certificate of HiveServer2 to CACERTS in JDK, you do not need to specify SSLTrustStore and SSLTrustStorePwd, it takes the trustsore and password from CACERTS jdbc:hive2://c2345.node.cloudera.com:10000/default;SSL=1;LogLevel=6;LogPath=/tmp/logs;KrbRealm=EXAMPLE.COM;KrbHostFQDN=c2345.node.cloudera.com;KrbServiceName=hive;AuthMech=1 where AuthMEch =1 ==> uses Kerberos authentication LDAP+SSL+binary: jdbc:hive2://c2345.node.cloudera.com.com:10000/default;SSL=1;SSLTrustStore=/home/keystore-cdp/cm-auto-global_truststore.jks;SSLTrustStorePwd=xxxx;LogLevel=6;LogPath=/tmp/logs;AuthMech=3;UID=test1;PWD=Password1 where AuthMEch =3 ==> uses LDAP authentication where UID and PWD is for the user present in the LDAP. LDAP+SSL+HTTP: jdbc:hive2://c2345.node.cloudera.com.com:10001/default;SSL=1;SSLTrustStore=/home/keystore-cdp/cm-auto-global_truststore.jks;SSLTrustStorePwd=xxxx;LogLevel=6;LogPath=/tmp/logs;AuthMech=3;UID=test1;PWD=Password1;transportMode=http;httpPath=cliservice The port has been changed from 10000 to 10001.transportMode and httpPath is added. Kerberos+SSL+HTTP: jdbc:hive2://c2345.node.cloudera.com.com:10001/default;SSL=1;SSLTrustStore=/home/keystore-cdp/cm-auto-global_truststore.jks;SSLTrustStorePwd=xxxx;LogLevel=6;LogPath=/tmp/logs;AuthMech=1;KrbRealm=EXAMPLE.COM;KrbHostFQDN=c2345.node.cloudera.com;KrbServiceName=hive The port has been changed from 10000 to 10001.transportMode and httpPath is added. Zookeeper+SSL+LDAP: You can use Zookeeper to connect to HiveServer2 for high availability (HA) jdbc:hive2://zk=c2345.node2.cloudera.com.com:2181/hiveserver2,c2345.node3.cloudera.com.com:2181/hiveserver2,c2345.node4.cloudera.com.com:2181/hiveserver2;SSL=1;SSLTrustStore=/home/keystore-cdp/cm-auto-global_truststore.jks;SSLTrustStorePwd=xxxx;LogLevel=6;LogPath=/tmp/logs;AuthMech=3;UID=test1;PWD=Password1 Zookeeper+SSL+Kerberos: jdbc:hive2://zk=c2345.node2.cloudera.com.com:2181/hiveserver2,c2345.node3.cloudera.com.com:2181/hiveserver2,c2345.node4.cloudera.com.com:2181/hiveserver2;SSL=1;SSLTrustStore=/home/keystore-cdp/cm-auto-global_truststore.jks;SSLTrustStorePwd=xxx;LogLevel=6;LogPath=/tmp/logs;AuthMech=1;KrbRealm=EXAMPLE.COM;KrbHostFQDN=_HOST;KrbServiceName=hive KrbHostFQDN=_HOST is used as string to connect to any hiveserver2 host. _HOST is replaced by exact hostname to which it will connect internally. HA-Proxy+SSL+Kerberos: Configure HA for hiveserer2 from Configuring the HiveServer load balancer. Connect using below URL: jdbc:hive2://ha-proxy-host.com:11000/default;SSL=1;SSLTrustStore=/home/keystore-cdp/cm-auto-global_truststore.jks;SSLTrustStorePwd=xxx;LogLevel=6;LogPath=/tmp/logs;AuthMech=1;KrbRealm=EXAMPLE.COM;KrbHostFQDN=_HOST;KrbServiceName=hive Where, KrbHostFQDN=_HOST is used as string to connect to any HiveServer2 host. _HOST is replaced by exact hostname to which it will connect internally. HA-Proxy+SSL+LDAP: jdbc:hive2://ha-proxy-host.com:11000/default;SSL=1;SSLTrustStore=/home/keystore-cdp/cm-auto-global_truststore.jks;SSLTrustStorePwd=xxx;LogLevel=6;LogPath=/tmp/logs;AuthMech=3;UID=test1;PWD=Password1
... View more
Labels:
09-20-2023
07:10 AM
1 Kudo
@PetiaLeshiy If I understand correctly, you want to dispaly NOT NULL You can use SELECT * FROM table_name WHERE column_name IS NOT NULL;
... View more
08-28-2023
05:23 AM
@Shivakuk Did you use Ranger masking for that column? Provide the below output show create table <tablename>
... View more