Member since
10-28-2020
504
Posts
31
Kudos Received
37
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
187 | 05-05-2024 01:27 PM | |
111 | 05-05-2024 01:09 PM | |
168 | 03-28-2024 09:51 AM | |
265 | 03-20-2024 03:54 AM | |
472 | 03-14-2024 06:29 AM |
03-17-2024
09:40 AM
@Hadoop16 Was it working before? Did anything change from Kerberos point of view? Try regenerating the hive keytab file and see if it helps.
... View more
03-14-2024
11:58 PM
1 Kudo
Hive does use stats from an external table in preparing query plan. When stats are accurate, it could estimate the size of intermediate data sets and select efficient join strategies. The only thing I noticed is the fetch task is not working.
... View more
03-14-2024
06:29 AM
Try this: CREATE external TABLE mytable (
col1 INT,
col2 STRING,
col3 STRING,
col4 STRING,
col5 INT,
col6 STRING,
col7 STRING
...
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = ",",
"quoteChar" = "\""
)
STORED AS TEXTFILE; It should work.
... View more
03-12-2024
11:56 PM
1 Kudo
@yashwanth It seems like you want to separate columns based on the position of comma. In that case, you may create the table as follows: CREATE TABLE my_table (
col1 STRING,
col2 INT,
...
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',' ...
... View more
03-12-2024
07:46 AM
@Leopold It is disabled for external tables as data in HDFS can change without Hive knowing about it. Unfortunately I do not see a way to enforce fetch task for a query with an aggregate function.
... View more
03-12-2024
04:00 AM
1 Kudo
@Leopold I just checked. Your observation is correct. For external tables, it does not use a fetch task. In the logs, I see the following message: 2024-03-12 10:48:37,247 INFO org.apache.hadoop.hive.ql.optimizer.StatsOptimizer: [b226e7aa-9a42-4af3-b99b-be4a6592fb7f HiveServer2-Handler-Pool: Thread-31145]: Table t7 is external. Skip StatsOptimizer. But enabling "hive.fetch.task.aggr=true" will help avoid the Reducer phase that is used for final aggregation. It will be a Map-only job.
... View more
03-12-2024
02:38 AM
@Leopold provided we have column stats available, Hive could use a fetch task to perform a simple aggregation task such as max(), instead of launching a Map job. Try hive.fetch.task.aggr=true . This property is disabled by default.
... View more
02-26-2024
05:21 AM
@frbelotto It does not require any installation. You just need to share the path to the Driver jar file.
... View more
02-24-2024
05:31 PM
1 Kudo
My bad. If you are using the Cloudera JDBC jar, the driver class should be com.cloudera.hive.jdbc.HS2Driver. As we are talking about Kerberos authentication, you should get a kerberos ticket in the client machine first, and use jdbc_url as follows: jar_file = '/path/to/hive-jdbc.jar'
jdbc_url = 'jdbc:hive2://{server}:{port}/default;principal={principal}'
# Connect to Hive
conn = jaydebeapi.connect('com.cloudera.hive.jdbc.HS2Driver', jdbc_url, ['', ''], jar_file)
cursor = conn.cursor()
... View more
02-18-2024
09:59 AM
1 Kudo
@frbelotto I have not tried with pyhive, I think it requires additional modules if you want to connect using zookeeper quorum. But you could use jaydebeapi python module to connect to Hive3. It works for any type of connection string knox/ZK. You would require Hive driver that you could download from here. An example on how to to make use of jaydebeapi module to connect to Hive: import jaydebeapi
# Connection parameters
jdbc_url = 'jdbc:hive2://knox.host:8443/default' # JDBC URL for HiveServer2
username = 'your-username'
password = 'your-password'
jar_file = '/path/to/hive-jdbc-driver.jar' # Path to the Hive JDBC driver JAR file
# Establish connection to Hive
conn = jaydebeapi.connect(
'org.apache.hive.jdbc.HiveDriver',
jdbc_url,
[username, password],
jar_file
)
# Create cursor
cursor = conn.cursor()
# Execute Hive query
cursor.execute('SELECT * FROM hive_table')
# Fetch results
result = cursor.fetchall()
# Close cursor and connection
cursor.close()
conn.close()
... View more