Member since
10-28-2020
572
Posts
46
Kudos Received
40
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
557 | 02-17-2025 06:54 AM | |
4775 | 07-23-2024 11:49 PM | |
795 | 05-28-2024 11:06 AM | |
1343 | 05-05-2024 01:27 PM | |
846 | 05-05-2024 01:09 PM |
09-28-2021
08:21 AM
Hi @DamienO As you say you are scanning Hive schemas, this must be putting a lot of load on Hive metastore. The driver logs suggest there are issues establishing new connections to Metastore. org.apache.hadoop.hive.ql.metadata.HiveException:java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient:33:1 You could review HS2 logs to see how many concurrent connections were there to the HMS at the time of the issue. Check if there is scope to increase max_connection value in your rdbms, or increase Metastore Heap size to accommodate more number of connections.
... View more
09-28-2021
08:11 AM
@Anitauzanna you could do it exactly the same way. e.g. CREATE TABLE final_table(col1 int, col3 int) PARTITIONED BY (year int, month int, day int) stored as orc; INSERT OVERWRITE TABLE final_table PARTITION(year,month,day) select col1, col3, substring(col2,1,4), substring(col2,5,6), substring(col2,7,8) from orig_table;
... View more
09-28-2021
03:50 AM
@Anitauzanna partition column will appear at the end of the table if you query it from hive cli or beeline. But partition col data will not be part of the actual table data when you check in hdfs. They will appear as directories in hdfs filesystem. Answering your second question, yes, you could use substring(or regexp_extract) to take part of a column and use it as a partition. check : https://community.cloudera.com/t5/Support-Questions/Hive-partitions-based-on-date-from-timestamp/td-p/179583
... View more
09-27-2021
02:22 PM
@Anitauzanna sure you could do that. I believe you have date in the int format, e.g. 20210927 . Correct me if I am wrong. You could create a non-partitioned table and load the dataset into it. Once done, create the final partitioned table, with the 2nd column as a partition. e.g. CREATE TABLE table_final (col1 int, col3 int) PARTITIONED BY (col2 int) STORED AS orc; Set the following Hive properties, and then load the data to the new partitioned table: SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
INSERT OVERWRITE TABLE table_final PARTITION(col2) SELECT col1,col3,col2 FROM <first table>; I hope this answers your question. If it does please accept this as the solution.
... View more
09-25-2021
11:57 AM
@mailsmail I am afraid authorisation tools might not be able to help us here. HIVE-18755 is talking about having two separate catalogs named hive and spark, under Hive metastore db for Hive and Spark services respectively. Here, you could either create databases with different names, or have multiple metastore instances, pointing them to different HMS databases. Then, you will need to have individual HS2 instance(with help of Config groups) connecting to separate Metastores.
... View more
09-24-2021
01:45 PM
@mailsmail If you are planning to user a single metastore, then you could not create two databases by same name. So, as you pointed out, users could create separate schemas, and if required they could create tables by same names under those schemas: user1: create database user1db1 [LOCATION] create table user1db1.tbl1 ... user2: create database user2db1 [LOCATION] create table user2db1.tbl1 ...
... View more
09-24-2021
01:04 PM
@dv_conan Although in Impala docs it says, you could use a metastore db directly without using Hive metastore service, it is not recommended. https://impala.apache.org/docs/build/html/topics/impala_prereqs.html In Cloudera CDP, you could not remove the dependency between Hive and Impala. Under Impala Configuration, you will see a greyed out tick box for Hive, which means it's not optional. https://docs.cloudera.com/cdp-private-cloud-base/7.1.6/installation/topics/cdpdc-service-dependencies.html
... View more
09-22-2021
06:19 AM
@Rohan44 Please do test this command once before you run it on actual data. You could also take a backup of the hdfs data, to be safe.
... View more
09-22-2021
06:17 AM
@Rohan44 A simple one liner could help here: for i in $(hdfs dfs -ls -R /tmp/| awk '{print $8}'| grep [A-Z] ); do hdfs dfs -mv $i `echo $i | tr 'A-Z' 'a-z'`; done In this example I have directories with upper case names under /tmp. /tmp/MONTH=07/DAY=31/HOUR=00 As I am using a simple `mv`, when it renames the parent directory, it will fail to rename the child directories. So, you might see 'no such file or directory' errors. But, run the same command a couple of times based on the depth of your partition directories. Run it as hdfs superuser.
... View more
09-19-2021
06:03 AM
@Kiddo you either need to get the md5 of all the records or you could collect it for the concatenated form of all the column names in a table. e.g. select md5(concat(col1,col2)) from table1; or. select md5(concat(*)) from (SELECT collect_list(column_name) as col_name from information_schema.columns where table_schema='db1' and
table_name='table1')e; Hope this answers your question.
... View more
- « Previous
- Next »