Member since
09-16-2021
423
Posts
55
Kudos Received
39
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 818 | 10-22-2025 05:48 AM | |
| 827 | 09-05-2025 07:19 AM | |
| 1587 | 07-15-2025 02:22 AM | |
| 2199 | 06-02-2025 06:55 AM | |
| 2412 | 05-22-2025 03:00 AM |
06-24-2025
06:26 AM
Please share the Maven coordinates (groupId and artifactId) for the dependency. On a related note, I recommend we begin a transition from our current HDP dependencies to the latest versions available on the CDP. The HDP dependencies are no longer current, and migrating to CDP will improve our application's stability and supportability.
... View more
06-23-2025
04:11 AM
Mostly it will be under hortonworks-repo . Look for particular dependency , for example hadoop-common-repo
... View more
06-02-2025
06:55 AM
It does looks like query failed with ClassCastException. It indicates that ( org.apache.hadoop.hive.serde2.io.HiveDecimalWritable cannot be cast to org.apache.hadoop.io.LongWritable ) a mismatch between the data type Hive expects and the data type it's actually encountering while processing the query. From the Error trace , Hive treats a value as a DECIMAL(HiveDecimalWritable) but the metadata seems to be Long(LongWritable). One possible Reason might be Schema Mismatch: Hive table schema defines a column but the underlying data file (e.g., Parquet, ORC, ...) for that column actually contains DECIMAL Values. To validate , Run DESCRIBE FORMATTED <your_table_name>; for the table involved in the failing query. Pay close attention to the data types of all columns, especially those that might be involved in the conversion. Compare these Hive schema data types with the actual data types in your source data files. For example, if you're using Parquet, use tools like parquet-tools to inspect the schema of Parquet files. if you're using ORC , use hive --orcfiledump to inspect the schema of orc files. Also make sure that Serde's pointing to valid underlying file formats.
... View more
05-26-2025
06:45 AM
Yes, your understanding is correct
... View more
05-26-2025
05:06 AM
Impala stats are stored in Backend DB metadata which is getting propagated from HS2 to Tez dag plan. That's the reason. Above config helps to skip those impala stats. hive.plan.mapwork.serialization.skip.properties=impala_intermediate_stats_chunk.*
... View more
05-22-2025
03:00 AM
1 Kudo
Validate the explain plan using explain extended <query> , if the explain plan contains "impala_intermediate_stats_chunk" . set the following session-level property to run the impacted query: hive.plan.mapwork.serialization.skip.properties=impala_intermediate_stats_chunk.* When setting these property, you may encounter an error: Cannot modify hive.plan.mapwork.serialization.skip.properties at runtime. It is not in the list of parameters that are allowed to be modified at runtime (state=42000, code=1) To avoid this issue, whitelist the parameter and restart HS2 before retrying the query setting. hive.security.authorization.sqlstd.confwhitelist.append=hive\.plan\.mapwork\.serialization\.skip\.properties
... View more
05-19-2025
03:10 AM
Would like to mention few more recommendations , Hive will be fast with columnar storage and predicate pushdown. Store the table as ORC (with Snappy/Zlib) if possible . Ref - https://docs.cloudera.com/runtime/7.2.0/hive-performance-tuning/topics/hive_prepare_to_tune_performance.html#:~:text=,vectorized%20by%20examining%20explain%20plans Collect statistics and enable predicate push-down (hive.optimize.ppd=true, default in Hive recent versions) so that filtering on code skips irrelevant data. If code column has limited distinct values, consider partitioning or bucketing on it: a partitioned ORC table will read only the needed partition. Also keep vectorization enabled (hive.vectorized.execution.enabled=true), which processes rows in batches – a big speedup for scans.
... View more
05-19-2025
03:02 AM
Hive is not a low-latency OLTP Database like Oracle. Hive is designed for batch processing, not fast single-row lookups. Every Select you run triggers a full query execution plan. From the code snippet observed , queries executing row by row. (i.e.) executeQuery() multiple times , it looks expensive. hive.fetch.task.conversion won't help here, since it will be useful for optimizing simple SELECT's into client-side fetches, but Hive still builds a full plan behind the scenes . Better approach would be , Refactor the loop into a single IN clause. SELECT code, surname, name FROM mytable WHERE code IN ('ABC123', 'FLS163', 'XYZ001', ...) Then store the results in a map. Map<String, String> codeToName = new HashMap<>();
while (r.next()) {
codeToName.put(r.getString("code"), r.getString("surname") + " " + r.getString("name"));
} Even if you must process row-by-row , fetching all data in a batch drastically reduces query overhead. If the list is too large for IN clause, insert those values in temp Hive table. // Insert your id list into a temp table
CREATE TEMPORARY TABLE tmp_ids (code STRING);
-- Then insert all your codes into tmp_ids
SELECT a.code, a.surname, a.name
FROM mytable a
JOIN tmp_ids b ON a.code = b.code; Hive optimize the join rather than executing multiple separate queries.
... View more
04-23-2025
05:49 AM
please try the same in hive, i haven't validated from IMPALA end.
... View more
04-23-2025
12:46 AM
When using the -f flag with Beeline to execute an HQL script file, Beeline follows a fail-fast approach. If any single SQL statement fails, Beeline will stop execution immediately and exit with an error. This behavior helps ensure data integrity , it avoids executing dependent or subsequent statements that may rely on a failed operation (e.g., missing table, failed insert). Beeline does not support built-in error handling or continuation of script execution after a failed statement within the same script file. To continue executing remaining statements even if one fails: Use a Bash (or similar) wrapper script to: Loop over each statement/block. Execute each one using a separate beeline -e "<statement>" Check the exit code after each call. Log errors to a file (e.g., stderr, exit code, or custom messages). Proceed to the next statement regardless of the previous outcome. Sample BASH Script [root@node4 ~]# cat run_hive_statements.sh
#!/bin/bash
# Set the HQL file to read SQL-like Hive queries from
HQL_FILE="sample_hql_success_failure.hql"
# Log file for capturing errors or output, named with the current date
LOG_FILE="error_log_$(date +%Y%m%d).log"
# Read the contents of the HQL file into a variable
HQL_CONTENT=$(cat "$HQL_FILE")
# Preprocess the file: ensure each SQL statement ends with a newline after semicolon
STATEMENTS=$(sed 's/;/;\n/g' "$HQL_FILE")
# Loop through each SQL-like statement (semicolon-separated)
for STATEMENT in "${STATEMENTS[@]}"; do
# Clean up the statement: remove empty lines and comments
SQL_SCRIPT=$(echo "$STATEMENT" | sed '/^\s*$/d' | sed '/^--.*$/d')
# If the cleaned SQL statement is not empty, proceed
if [ -n "$SQL_SCRIPT" ]; then
# Split potential multiline statements using awk with record separator as ';'
echo "$SQL_SCRIPT" | awk '
BEGIN { RS=";" }
{
gsub(/^[ \t\r\n]+|[ \t\r\n]+$/, "", $0); # Trim leading/trailing whitespace
if(length($0)) print $0 ";"
}
' | while read -r STATEMENT; do
# Execute each final cleaned and trimmed SQL statement
echo "Executing: $STATEMENT"
beeline -n hive -p hive -e "$STATEMENT" >> "$LOG_FILE" 2>&1
EXIT_CODE=$?
if [ "$EXIT_CODE" -ne 0 ]; then
echo "Error executing statement:"
echo "$FINAL_STATEMENT;" >> "$LOG_FILE"
echo "Beeline exited with code: $EXIT_CODE" >> "$LOG_FILE"
fi
done
fi
done
[root@node4 ~]# Sample HQL file [root@node4 ~]# cat sample_hql_success_failure.hql
-- This script demonstrates both successful and failing INSERT statements
-- Successful INSERT
CREATE TABLE IF NOT EXISTS successful_table (id INT, value STRING);
INSERT INTO successful_table VALUES (1, 'success1');
INSERT INTO successful_table VALUES (2, 'success2');
SELECT * FROM successful_table;
-- Intentionally failing INSERT (ParseException)
CREATE TABLE IF NOT EXISTS failing_table (id INT, value INT);
INSERT INTO failing_table VALUES (1, 'this_will_fail);
SELECT * FROM failing_table;
-- Another successful INSERT (will only run if the calling script continues)
CREATE TABLE IF NOT EXISTS another_successful_table (id INT, data STRING);
INSERT INTO another_successful_table VALUES (101, 'continued_data');
SELECT * FROM another_successful_table;
-- Intentionally failing INSERT (table does not exist)
INSERT INTO non_existent_table SELECT * FROM successful_table;
[root@node4 ~]# Sample ERROR log created as part of the script. [root@node4 ~]# grep -iw "error" error_log_20250423.log
Error: Error while compiling statement: FAILED: ParseException line 1:54 character '<EOF>' not supported here (state=42000,code=40000)
Error: Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 1:12 Table not found 'non_existent_table' (state=42S02,code=10001)
[root@node4 ~]#
... View more