About jAnshula

jAnshula · ‎02-29-2024

Hi @dave22 The table data is stored in a HDFS location, so you may consume information out of it # hdfs dfs -du -s -h /path/to/database/ --> From here you can pick the owner of the path, which should be the username --> sub folders/files are the tables within the database --> The modification date can be used as the last updated date --> size of the table can be picked using below command # hdfs dfs -du -s -h/path/to/database/ Let us know if this helps

jAnshula · ‎02-28-2024

Hi @wert_1311 Can you once "sync the user for the cluster" and then check if you are able to access the Web UI in your CDW cluster or not

jAnshula · ‎02-27-2024

Hi @mananasaly refer below stackoverflow discussion, this may help you to bypass ssl certificate in python https://stackoverflow.com/questions/15445981/how-do-i-disable-the-security-certificate-check-in-python-requests Let us know if this helps

jAnshula · ‎02-27-2024

@Beat you may use the latest Cloudera ODBC Connector Version: 2.7.0 which has support for Windows 11

jAnshula · ‎02-27-2024

Hi @Iga21207 Please find below the definition of parameters you have mentioned. We don't think that tweaking these parameters will cause much impact on the loading of metadata to impala daemon from catalog. ================================================================== catalog_max_lock_skipped_topic_updates, default value is 3: Maximum number of topic updates skipped for a table due to lock contention in catalogd after which it must be added to the topic the update log. This limit only applies to distinct lock operations which block the topic update thread. topic_update_tbl_max_wait_time_ms, default value is 120000: Maximum time (in milliseconds) catalog's topic update thread will wait to acquire lock on table. If the topic update thread cannot acquire a table lock it skips the table from that topic update and processes the table in the next update. However to prevent starvation it only skips the table catalog_max_lock_skipped_topic_updates many times. After that limit is hit, topic thread block until it acquires the table lock. A value of 0 disables the timeout based locking which means topic update thread will always block until table lock is acquired. max_wait_time_for_sync_ddl_s, default value is 0, Maximum time (in seconds) until which a sync ddl operation will wait for the updated tables to be the added to the catalog topic. A value of 0 means sync ddl operation will wait as long as necessary until the update is propogated to all the coordinators. This flag only takes effect when topic_update_tbl_max_wait_time_ms is enabled. A value greater than 0 means catalogd will wait until that number of seconds before throwing an error indicating that not all the coordinators might have applied the changes caused due to the ddl. ================================================================== However, you may increase the num_metadata_loading_threads this determines how much parallelism Impala devotes to loading metadata in the background. The default is 64. You might increase this value for systems with huge numbers of databases, tables, or partitions. NOTE: Increasing this value will cause more CPU utilization.

jAnshula · ‎02-27-2024

@skasireddy Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.

DianaTorres · ‎02-26-2024

@echodot Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.

jAnshula · ‎02-23-2024

Hi @hanbsi This mainly happens when your setuptools are out of date. You can fix this by executing below command # pip install --upgrade setuptools

jAnshula · ‎02-23-2024

Hi @yo_leo As per document, it is mentioned that you should have at most 50 executors per coordinator. Which is the recommended maximum value. However this may vary as per the workload and complexity of the queries executed in Impala https://impala.apache.org/docs/build/asf-site-html/topics/impala_scaling_limits.html https://impala.apache.org/docs/build/asf-site-html/topics/impala_dedicated_coordinator.html

jAnshula · ‎02-23-2024

Hi @Kropiciel To connect ODBC driver with multiple HiveServers, you can configure High Availability for Hive using LB, refer below doc for same. The use the LB hostname and port in the connection string https://docs.cloudera.com/cdp-private-cloud-base/7.1.9/configuring-apache-hive/topics/hive-ha-loadbalancer.html Let us know if this helps. As for the python connection use below code and check how it goes. CODE: --------- from sqlalchemy import create_engine #Input Information host = hive_hostname port = 10000 schema = schema_name table = table_name #Execution engine = create_engine(f'hive://{host}:{port}/{schema}') engine.execute(QUERY)

Online	Offline
Last Visited	‎11-14-2024 02:34 PM

Member Since	‎09-15-2020 08:21 AM
Last Visited	‎11-14-2024 02:34 PM
Posts	211
Kudos received	16

Cloudera Community

Re: How to deny admin user/group from dropping hiv...

Re: Impala Long running Queries

Re: How to deny admin user/group from dropping hiv...

Re: Getting authentication token of Hue on CDP pub...

Re: Adding new column to iceberg table silently fa...

Re: impala tables to show table size/ user who cre...

Re: Impala WebUI Login Issue

Re: How i can ignore ssl certification on python?

Re: Driver Windows 11 Impala

Re: How topic_update_tbl_max_wait_time_ms and cata...

Re: How to set a yarn jobpriority in oozie?

Re: Please check that it is a valid Parquet file. ...

Re: python problems

Re: What is the maximum number of executors that c...

Re: How to connect to multiple hosts using pyhive ...