About smruti

smruti · ‎10-07-2021

@sat_046 Please refer to the Cloudera doc on Starting and stopping services.

smruti · ‎10-07-2021

@Shab are we talking about the "Metastore canary failed to create a database" error or the warning messages you have pasted here? If we are addressing the canary error, please check the Service Monitor logs if you find any relevant error or warning messages. If you notice any timeout error, you could increase 'hive.metastore.client.socket.timeout ' value to, say, 5 mins, under the "Service Monitor Client Config Overrides" in Hive configuration. Or if it's a different error, please paste them here. Service Monitor log file location : /var/log/cloudera-scm-firehose/ Refer to https://community.cloudera.com/t5/Support-Questions/The-Hive-Metastore-canary-failed-to-create-a-database/td-p/81021, if that helps.

smruti · ‎10-04-2021

@enirys You could refer to the following doc for Hive tuning: https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.5/bk_hive-performance-tuning/content/ch_connectivity-admission-control.html#guidelines-hiveserver2-heaps If you have other services running on the same HS2 node as well, you might want to reduce Hive heap size or move a service to a different node. Are you expecting too many connections as explained in the above doc, else you might want to bring down HS2 heap size? If you do not see too many connections, but notice high heap usage, you might want to take a heap dump as @asish mentioned, and see if there is memory leak. Load balancing across HS2 could take place based on how you are accessing Hive. You could use zooKeeper based connection string.

smruti · ‎09-29-2021

@enirys htop lists every single thread as a separate process. Every individual connection to HS2 and HMS shows up as different processes. So, you do not need to worry about that. As you say, you have 3 hiveserver2 nodes, see if you could reduce heap size of the hiveserver2. You could also lower 'hive.server2.thrift.max.worker.threads' so, it does not spin up too many threads in a single hs2 node. Make sure that your workload is getting distributed across HS2s.

smruti · ‎09-28-2021

Hi @DamienO As you say you are scanning Hive schemas, this must be putting a lot of load on Hive metastore. The driver logs suggest there are issues establishing new connections to Metastore. org.apache.hadoop.hive.ql.metadata.HiveException:java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient:33:1 You could review HS2 logs to see how many concurrent connections were there to the HMS at the time of the issue. Check if there is scope to increase max_connection value in your rdbms, or increase Metastore Heap size to accommodate more number of connections.

smruti · ‎09-28-2021

@Anitauzanna you could do it exactly the same way. e.g. CREATE TABLE final_table(col1 int, col3 int) PARTITIONED BY (year int, month int, day int) stored as orc; INSERT OVERWRITE TABLE final_table PARTITION(year,month,day) select col1, col3, substring(col2,1,4), substring(col2,5,6), substring(col2,7,8) from orig_table;

smruti · ‎09-28-2021

@Anitauzanna partition column will appear at the end of the table if you query it from hive cli or beeline. But partition col data will not be part of the actual table data when you check in hdfs. They will appear as directories in hdfs filesystem. Answering your second question, yes, you could use substring(or regexp_extract) to take part of a column and use it as a partition. check : https://community.cloudera.com/t5/Support-Questions/Hive-partitions-based-on-date-from-timestamp/td-p/179583

smruti · ‎09-27-2021

@Anitauzanna sure you could do that. I believe you have date in the int format, e.g. 20210927 . Correct me if I am wrong. You could create a non-partitioned table and load the dataset into it. Once done, create the final partitioned table, with the 2nd column as a partition. e.g. CREATE TABLE table_final (col1 int, col3 int) PARTITIONED BY (col2 int) STORED AS orc; Set the following Hive properties, and then load the data to the new partitioned table: SET hive.exec.dynamic.partition=true; SET hive.exec.dynamic.partition.mode=nonstrict; INSERT OVERWRITE TABLE table_final PARTITION(col2) SELECT col1,col3,col2 FROM <first table>; I hope this answers your question. If it does please accept this as the solution.

smruti · ‎09-25-2021

@mailsmail I am afraid authorisation tools might not be able to help us here. HIVE-18755 is talking about having two separate catalogs named hive and spark, under Hive metastore db for Hive and Spark services respectively. Here, you could either create databases with different names, or have multiple metastore instances, pointing them to different HMS databases. Then, you will need to have individual HS2 instance(with help of Config groups) connecting to separate Metastores.

smruti · ‎09-24-2021

@mailsmail If you are planning to user a single metastore, then you could not create two databases by same name. So, as you pointed out, users could create separate schemas, and if required they could create tables by same names under those schemas: user1: create database user1db1 [LOCATION] create table user1db1.tbl1 ... user2: create database user2db1 [LOCATION] create table user2db1.tbl1 ...

Online	Offline
Last Visited	‎12-21-2024 12:27 PM

Member Since	‎10-28-2020 05:19 AM
Last Visited	‎12-21-2024 12:27 PM
Posts	554
Kudos received	44

Cloudera Community

Re: ANALYZE command not write data into hive metas...

Re: HBase stores base64 data when data is inserted...

Re: Deleting hive service on CDP Private Base and ...

Re: Not Able to run import command. it fails with ...

Re: Any alternate for org.apache.hive:hive-jdbc ma...

Re: Services Restart Strategy for Cloudera Hadoop ...

Re: The Hive Metastore canary failed to create a d...

Re: Hive memory stuffing

Re: Hive memory stuffing

Re: Hortonworks ODBC Driver Crashes - STATUS_STACK...

Re: Hive partitioning

Re: Hive partitioning

Re: Hive partitioning

Re: Multi-Tenancy in Hive Metastore

Re: Multi-Tenancy in Hive Metastore