Created 09-23-2018 01:47 PM
Hi there,
I am using manual compactions to my table. but when I run
show compactions;
it shows :
Database Table Partition Type State Worker Start Time Duration(ms) HadoopJobId
default adevents date=2014-07-07 MINOR initiated --- --- --- None
default adevents date=2010-08-13 MAJOR initiated --- --- --- None
Time taken: 0.009 seconds, Fetched: 3 row(s)
No workerid nor Jobid, and status keeps initated.
Could anyone please help me solve this?
FYI I have changed the compactor settings in hive-site.xml:
<property> <name>hive.compactor.initiator.on</name> <value>true</value> </property> <property> <name>hive.compactor.worker.threads</name> <value>1</value> </property>
Thank you so much!
Created 09-23-2018 04:51 PM
Do you have the standalone metastore running? that is where compaction jobs are actually generated and submitted.
Created 09-23-2018 08:47 PM
I am using Hive in EMR. It is supposed to use metastore in Hive. Do I need to add an external matestore for Hive, like GLUE Data Catalog? But the pain is Glue does not support Hive transactions. Do you know how can I solve it? Thank you so much for your reply! @Eugene Koifman
Created 09-23-2018 09:49 PM
Hive metastore is a service. That may run in embedded mode and in stand alone mode. You can have several instances running in your cluster, all instances must share the same backend RDMBS. This should help: https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin#AdminManualMetastoreAdmi...
Created 09-24-2018 11:46 PM
Thank so much for your resources!
Created 09-24-2018 07:32 AM
Hi @Xue Chen!
I have a couple of questions.
1) How long do you see it in initiated mode?
2) Does the compaction ever happen?
Normally, I've faced issues with compaction if I run the compaction with a user other than hive. Also, the owner of the table should also be hive.
You could try setting the owner of the table to hive and run the compactions as hive user if not done already.
Please let me know if that helps you!
Regards,
Megh
Created 09-24-2018 11:59 PM
Thank you for your reply!
Compaction never happen.
I used Hive in ASW EMR. create the table from S3 ORC files using the following command:
CREATE TABLE tablename (id STRING, ts TIMESTAMP, category STRING)
PARTITIONED BY(`date` STRING)
clustered by (category) into 1 buckets
STORED AS ORC LOCATION 's3://orc-mutation-test/20/'
tblproperties ("orc.compress"="SNAPPY","transactional"="true");
In this way, the user and owner of this table is Hive, right?
I feel it was because I did not add in remote metastore, so the compaction cannot be generated and submitted.