Member since
09-16-2021
330
Posts
52
Kudos Received
23
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
239 | 11-10-2024 11:19 PM | |
370 | 10-25-2024 05:02 AM | |
1941 | 09-10-2024 07:50 AM | |
697 | 09-04-2024 05:35 AM | |
1553 | 08-28-2024 12:40 AM |
10-09-2023
06:27 AM
Could you kindly provide the DDL and a sample dataset to facilitate a more in-depth explanation?
... View more
10-03-2023
02:07 PM
@yucai Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.@
... View more
10-03-2023
02:15 AM
It seems that the query involves dynamic partitioning, but the dynamic partition column is not included in either the select statement or the Common Table Expression (CTE). Please add the dynamic partition column 'date' to the select statement and validate it in Beeline.
... View more
10-03-2023
02:07 AM
It appears that you're currently following a two-step process: writing data to a Parquet table and then using that Parquet table to write to an ORC table. You can streamline this by directly writing the data into the ORC format table, eliminating the need to write the same data to a Parquet table before reading it. Ref - https://spark.apache.org/docs/2.4.0/sql-data-sources-hive-tables.html https://docs.cloudera.com/cdp-private-cloud-base/7.1.9/developing-spark-applications/topics/spark-loading-orc-data-predicate-push-down.html
... View more
10-03-2023
01:58 AM
You can use the following HQL (SQL) query to find the sum of new cases for each continent and identify the country with the highest number of cases in each continent along with the corresponding case count: WITH ContinentSum AS (
SELECT
continent,
SUM(new_cases) AS total_new_cases
FROM
sample_table_test
GROUP BY
continent
),
CountryMaxCases AS (
SELECT
continent,
location AS country,
MAX(total_case) AS max_cases
FROM
sample_table_test
GROUP BY
continent, location
)
SELECT
cs.continent,
cs.total_new_cases,
cm.country,
cm.max_cases
FROM
ContinentSum cs
JOIN
CountryMaxCases cm
ON
cs.continent = cm.continent
AND cs.total_new_cases = cm.max_cases; This query first calculates the sum of new cases for each continent in the ContinentSum CTE (Common Table Expression). Then, it finds the country with the highest total cases in each continent using the CountryMaxCases CTE. Finally, it joins the results from both CTEs to provide the desired output. Sample resultset for the shared data. +---------------+---------------------+--------------+---------------+
| cs.continent | cs.total_new_cases | cm.country | cm.max_cases |
+---------------+---------------------+--------------+---------------+
| Asia | 25.0 | Afghanistan | 25.0 |
+---------------+---------------------+--------------+---------------+
... View more
09-30-2023
11:39 AM
To pinpoint the root cause, kindly provide a few samples of data
... View more
09-29-2023
05:27 AM
To gain a better understanding of the issue, kindly provide the HS2 jstacks at 30-second intervals until the query completes
... View more
09-29-2023
05:25 AM
The stack traces for Error 1 and Error 3 are incomplete. To gain a better understanding of the issue, please provide the complete stack traces. Sharing the complete appLogs will provide a comprehensive view of the situation Regarding error 2, it appears that the job is attempting to create over 2000 dynamic partitions on a single node, which is an unusual behavior. Please review the partition column values for correctness. If everything appears to be in order, you can consider adjusting the following configurations: hive.exec.max.dynamic.partitions hive.exec.max.dynamic.partitions.pernode
... View more
09-29-2023
05:17 AM
It appears that the Hive Metastore (HMS) is unable to establish a connection with the BackendDB, possibly due to an incorrect hostname or BackendDB configuration within the Hive service. Please validate the BackendDB configurations and attempt to start the service again. Exception in thread "main" java.lang.RuntimeException: org.postgresql.util.PSQLException: The connection attempt failed.
at com.cloudera.cmf.service.hive.HiveMetastoreDbUtil.countTables(HiveMetastoreDbUtil.java:203)
at com.cloudera.cmf.service.hive.HiveMetastoreDbUtil.printTableCount(HiveMetastoreDbUtil.java:284)
at com.cloudera.cmf.service.hive.HiveMetastoreDbUtil.main(HiveMetastoreDbUtil.java:354)
Caused by: org.postgresql.util.PSQLException: The connection attempt failed.
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:297)
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)
at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:217)
at org.postgresql.Driver.makeConnection(Driver.java:458)
at org.postgresql.Driver.connect(Driver.java:260)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:247)
at com.cloudera.enterprise.dbutil.SqlRunner.open(SqlRunner.java:193)
at com.cloudera.enterprise.dbutil.SqlRunner.getDatabaseName(SqlRunner.java:264)
at com.cloudera.cmf.service.hive.HiveMetastoreDbUtil.countTables(HiveMetastoreDbUtil.java:197)
... 2 more
Caused by: java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at org.postgresql.core.PGStream.<init>(PGStream.java:81)
at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:93)
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:197)
... View more
09-27-2023
07:43 AM
if the partition data exists like below: <s3:bucket>/<some_location>/<part_column>=<part_value>/<filename> you can create a external table by specifiying above location and run 'msck repair table <table_name> sync partitions' to sync partitions. validate the data by running some sample select statements. Once it's done you can create new external table with another bucket and run insert statement with dynamic partition. Ref - https://cwiki.apache.org/confluence/display/hive/dynamicpartitions
... View more