Member since
10-28-2020
511
Posts
33
Kudos Received
38
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
98 | 05-28-2024 11:06 AM | |
323 | 05-05-2024 01:27 PM | |
217 | 05-05-2024 01:09 PM | |
222 | 03-28-2024 09:51 AM | |
333 | 03-20-2024 03:54 AM |
03-28-2024
09:51 AM
1 Kudo
@hegdemahendra You may try Cloudera Hive JDBC Driver. The driver class name would be "com.cloudera.hive.jdbc.HS2Driver".
... View more
03-18-2024
02:56 PM
3 Kudos
@yashwanth Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.
... View more
03-14-2024
11:58 PM
1 Kudo
Hive does use stats from an external table in preparing query plan. When stats are accurate, it could estimate the size of intermediate data sets and select efficient join strategies. The only thing I noticed is the fetch task is not working.
... View more
03-06-2024
12:38 AM
It seems like there might be an issue with the way you're using single quotes in the loop. The variable eachline should be expanded, but it won't if it's enclosed in single quotes. Try using double quotes around the variable and see if that resolves the issue. Here's the corrected loop: for eachline in "${testarray[@]}"
do
beeline -f "${eachline}.sql"
done This way, the value of eachline will be correctly expanded when constructing the command. Also, ensure that the SQL files are present in the correct path or provide the full path if needed. If the issue persists, please provide more details on the error message or behavior you're experiencing for further assistance.
... View more
01-24-2024
06:26 PM
Hello Smruti, I tried to clear the usercache dir in all danotes as suggested. But the issue is resolved. Basically it is issue with Tez session. It is happening all the jobs which are involved tez . Thanks Narasimha.
... View more
01-18-2024
10:04 PM
Much Thanks .
... View more
12-27-2023
09:48 AM
2 Kudos
Hive 3.0 introduced an option to re-attempt a failed Hive query, in case the first run fails. It would only make sense if we fixed whatever was the issue in the previous run. We'll discuss the ways to configure this once without having to intervene after each failure event. The following Hive property enables query re-execution. This should be enabled out of the box. hive.query.reexecution.enabled=true;
hive.query.reexecution.strategies=overlay,reoptimize,recompile_without_cbo,reexecute_lost_am; Re-execution strategies: Overlay Using this method, we can set a Hive property that should be applied on the re-execution. It works by adding a configuration subtree as an overlay to the actual hive settings(reexec.overlay.*). set reexec.overlay.{hive_property}=new_value Every hive setting which has a prefix of "reexec.overlay" will be set for all re-executions. e.g. In case our Hive queries fail with OOM while performing Map Joins, which could occur when we do not have correct stats for the tables, we could try disabling hive.auto.convert.join for the next attempt: set reexec.overlay.hive.auto.convert.join=false;
set hive.query.reexecution.strategies=overlay; Reoptimize Throughout the execution of a query, the system actively monitors the real count of rows passing through each operator. This recorded information is leveraged in subsequent re-planning stages, potentially leading to the generation of a more optimized query plan. Instances where this becomes essential include: - Absence of statistics. - Inaccurate statistics. - Scenarios involving numerous joins. In order to enable this, use: set hive.query.reexecution.strategies=overlay,reoptimize
set hive.query.reexecution.stats.persist.scope=query hive.query.reexecution.stats.persist.scope provides an option to persists the runtime stats at different levels: query - only used during the reexecution hiveserver2 - persisted in the HS2 until restarted metastore - persisted in the metastore; and loaded on hive server startup. Avoid setting it to "metastore" due to the bug discussed in HIVE-26978 recompile_without_cbo When CBO fails during compilation phase, it falls back to legacy optimizer, but in many cases the it is unable to correctly recreate the AST. HIVE-25792 helps recompile the query without CBO in case it fails. reexecute_lost_am Re-executes query if it failed due to tez am node gets decommissioned. Some relevant configurations : Configuration default hive.query.reexecution.always.collect.operator.stats false Enable to gather runtime statistics on all queries. hive.query.reexecution.enabled true Feature enabler hive.query.reexecution.max.count 1 number of reexecution that may happen hive.query.reexecution.stats.cache.batch.size -1 If runtime stats are stored in metastore; the maximal batch size per round during load. hive.query.reexecution.stats.cache.size 100 000 Size of the runtime statistics cache. Unit is: OperatorStat entry; a query plan consist ~100. hive.query.reexecution.stats.persist.scope query runtime statistics can be persisted: query: - only used during the reexecution hiveserver: persisted during the lifetime of the hiveserver metastore: persisted in the metastore; and loaded on hiveserver startu[ hive.query.reexecution.strategies overlay,reoptimize,recompile_without_cbo,reexecute_lost_am reexecution plugins; currently overlay and reoptimize is supported runtime.stats.clean.frequency 3600s Frequency at which timer task runs to remove outdated runtime stat entries. runtime.stats.max.age 3days Stat entries which are older than this are removed.
... View more
12-27-2023
06:53 AM
1 Kudo
@wert_1311 It should be in the following location in S3: s3://<s3_bucket_name>/clusters/<environment_id>/<database_catalog_id>/warehouse/tablespace/external/hive/sys.db/logs/dt=YYYY-MM-DD/ns=<virtual_warehouse_id>/ Hiveserver2 log directory will be named app=hiveserver2/ You could also view hiveserver2 pod logs using kubectl: e.g. kubectl logs hiveserver2-0 -n compute-1584641610-f14f -c hiveserver2
... View more
12-26-2023
10:49 PM
Hello @smruti , Please let me know if any way where we can have call or meet to review the script ? I will in mean time will try to remove the -n options and check if that works.
... View more
12-15-2023
11:09 AM
Cloudera's official statement on this subject can be found here. Cloudera supports various RDBMS options, each of which have multiple possible strategies to implement HA. Cloudera cannot reasonably test and certify on each strategy for each RDBMS. Cloudera expects HA solutions for RDBMS to be transparent to Cloudera software, and therefore are not supported and debugged by Cloudera. It is the responsibility of the customer to provision, configure, and manage the RDBMS HA deployment, so that Cloudera software behaves as it would when interfacing with a single, non-HA service.
... View more