Member since
02-02-2017
106
Posts
31
Kudos Received
15
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2262 | 09-28-2023 11:26 AM | |
1792 | 07-05-2023 03:34 PM | |
13230 | 09-22-2018 04:48 AM | |
3125 | 09-19-2018 09:15 PM | |
1922 | 09-17-2018 06:52 PM |
01-11-2024
02:37 PM
@yusufu If nothing in /var/log/hive, can you check if you have log file inside /tmp/hive or /tmp/hive/hive? This would be helpful to debug the issue further.
... View more
12-07-2023
11:34 AM
@RS_11 If I grasp your inquiry accurately, you are referring to the behavior in older CDH versions (specifically 5.x), where executing insert overwrite with a selection of 0 rows resulted in the creation of a 0-byte file named 000000_0. However, in the newer Hive version 3.1.3, you observe that no 0-byte files are generated in such scenarios. Assuming my understanding is correct, may I ask why there is a desire to generate a 0-byte file? Having 0-byte or small files is deemed inefficient. It's noteworthy that there have been various changes in the past related to 0-byte files, as documented in the following issues: HIVE-22941 HIVE-21714
... View more
11-20-2023
08:24 PM
@HadoopHero Answer would vary based on query that you are running, assuming you have simple "Insert select */cols from Table" it is likely mapper only job and you may want to try tuning below. set tez.grouping.min-size=134217728; -- 128 MB min split
set tez.grouping.max-size=1073741824; -- 1 GB max split Try setting min-size and max-size to same value. I would not go below 128M.
... View more
09-29-2023
10:27 AM
PyHive is not maintained by Cloudera, Can you use Impyla to connect to HiveServer2? It should work with HS2 also. Ref: https://github.com/cloudera/impyla
... View more
09-28-2023
11:26 AM
1 Kudo
@jayes I had another look into the problem description and I see queries run with 2 categories Column names enclosed in backquotes Column names not enclosed in backquotes. below are the root cause for both the categories, please find the solution at the end of this comment. Case 1: This one fails because you are running SQL with -e option and and column values are enclosed in backticks "`", the value inside backticks is considered as a Linux command as opposed to be a part of SQL leading to the compilation error. Command : sudo -u hive beeline -u "jdbc:hive2://cdp1:2181,cdp2:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;;principal=hive/_HOST@hdpcluster.test" -n hive --showHeader=false --outputformat=tsv2 -e "SET hive.support.quoted.identifiers=column;use jayesh; create external table testkeyword1( `transform` string);"; error : Error: Error while compiling statement: FAILED: ParseException line 1:45 cannot recognize input near ')' '<EOF>' '<EOF>' in column type (state=42000,code=40000) Command : sudo -u hive beeline -u "jdbc:hive2://cdp1:2181,cdp2:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;;principal=hive/_HOST@hdpcluster.test" -n hive --showHeader=false --outputformat=tsv2 -e "set hive.support.sql11.reserved.keywords=false; set hive.support.quoted.identifiers=column;use jayesh; create external table testunderscore( `_name` string, `_id` string)"; error : -bash: _name: command not found -bash: _id: command not found Error: Error while compiling statement: FAILED: ParseException line 1:45 cannot recognize input near ',' 'string' ')' in column type (state=42000,code=40000) Case 2: This one fails because you are using reserved keyword transform which is not escaped properly. Command : sudo -u hive beeline -u "jdbc:hive2://cdp1:2181,cdp2:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;;principal=hive/_HOST@hdpcluster.test" -n hive --showHeader=false --outputformat=tsv2 -e "set hive.support.sql11.reserved.keywords=false; set hive.support.quoted.identifiers=column;use jayesh; create external table testkeyword3( transform string)"; error : Error: Error while compiling statement: FAILED: ParseException line 1:38 cannot recognize input near 'transform' 'string' ')' in column name or constraint (state=42000,code=40000) Command : sudo -u hive beeline -u "jdbc:hive2://cdp1.:2181,cdp2:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;;principal=hive/_HOST@hdpcluster.test" -n hive --showHeader=false --outputformat=tsv2 -e "SET hive.support.sql11.reserved.keywords=false;use jayesh; create external table testkeyowrd2( transform string)"; error : Error while compiling statement: FAILED: ParseException line 1:38 cannot recognize input near 'transform' 'string' ')' in column name or constraint (state=42000,code=40000) Solution: You can escape backticks while passing SQL from command line eg:- sudo -u hive beeline -u "jdbc:hive2://cdp1:2181,cdp2:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;;principal=hive/_HOST@hdpcluster.test" -n hive --showHeader=false --outputformat=tsv2 -e "SET use jayesh; create external table testkeyword1( \`transform\` string);"; You can also pass the SQL to beeline command as a file. eg;- transform.sql create external table testkeyword1( `transform` string); Beeline command: sudo -u hive beeline -u "jdbc:hive2://cdp1:2181,cdp2:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;;principal=hive/_HOST@hdpcluster.test" -n hive --showHeader=false --outputformat=tsv2 -f transform.sql Let me know if this helps fixing the issue.
... View more
09-26-2023
08:40 AM
Can you share your krb5.conf file on local spark instance? It is likely that you are missing krb5 related conf. You may want to try below to enable additional logging for krb5 before running spark job. export HADOOP_CLIENT_OPTS="-Dsun.security.krb5.debug=true"
... View more
09-26-2023
08:28 AM
Both the below queries should work by default. I have checked by running them against CDP 7.2.17. create external table testkeyword1( `transform` string)
create external table testunderscore1( `_name` string, `_id` string); What is the Apache Hive or CDP/CDH/HDP version you are using?
... View more
07-15-2023
12:03 PM
Are you sure that you are trying to connect using Kerberos to Hive? It appears to me that you are trying to connect to Hive over Knox which may have LDAP configured instead of Kerberos? Can you try below? by replacing your username and password df = spark.read \
.jdbc("jdbc:hive2://p-cdl-knox-prd-svc.hcscint.net:8443/default;transportMode=http;httpPath=gateway/default/hive;ssl=true;sslTrustStore=/Users/u530241/Downloads/gateway_prod_.jks;trustStorePassword=knoxprod;user=<username>;password=<password>")
... View more
07-14-2023
09:55 AM
1 Kudo
You seem to be using wrong driver to connect to HiveServer2. MySQL JDBC driver does not work to connect to HS2. You would need Apache Hive JDBC Driver or Cloudera Hive JDBC Driver to connect to Hive. You can download latest Apache Hive JDBC Driver from: https://repo1.maven.org/maven2/org/apache/hive/hive-jdbc/4.0.0-alpha-2/ File: hive-jdbc-4.0.0-alpha-2-standalone.jar Add the same jar to your spark.jars.
... View more
07-05-2023
03:34 PM
2 Kudos
@sevens You might want to use latest version: 2.6.21 and also there is no hard limit to use UGI with Cloudera Driver as opposed to Apache Drivers. One option is to use JAAS file like below. client.jaas Client {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="PathToTheKeyTab"
principal="cloudera@CLOUDERA"
doNotPrompt=true;
}; and set the java.security.auth.login.config system property to the location of the JAAS file. eg: java -Djava.security.auth.login.config=/opt/clientconf/client.jaas Doc ref: https://docs.cloudera.com/documentation/other/connectors/hive-jdbc/2-6-21/Cloudera-JDBC-Connector-for-Apache-Hive-Install-Guide.pdf
... View more