Created on 02-13-2019 02:39 AM - edited 09-16-2022 07:09 AM
We have taken backup one of the production database data and moved it to development local filesystem.
In development movied data from local mountpoint to hive database hdfs location.
Question1: Hive msck repair in managed partition table failed with below error message.
hive> msck repair table testsb.xxx_bk1;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
What does exception means.
After dropping the table and re-create the table in external type. it worked successfully.
hive> use testsb;
OK
Time taken: 0.032 seconds
hive> msck repair table XXX_bk1;
xxx_bk1:payloc=YYYY/client_key=MISSDC/trxdate=20140109
.
.
Repair: Added partition to metastore xxx_bk1:payloc=0002/client_key=MISSDC/trxdate=20110105
.
.
Time taken: 16347.793 seconds, Fetched: 94156 row(s)
Can you please confirm why it not worked in managed table?
Question:2. Where else select * from table; query able to fetch in non-partition table. Why?
We have done testsb database creation and Table creation with ddl script.
And moved the data from local to hdfs hive table location.
Created 02-13-2019 03:47 AM
Created 02-21-2019 11:49 AM
In non-partition table having multiple files in table location. When select statement triggered it worked.
How it fetch the data where else without running msck repair command?
Created 04-01-2019 01:47 PM
Ans 1: The exception posted is very generic. Need the complete error message that was seen on the terminal upon running MSCK to come to see what could have gone wrong.
Suggestions: By default, Managed tables store their data in HDFS under the path "/user/hive/warehouse/<table_name>" or "/user/hive/warehouse/<db_name>/<table_name>". So if you have created a managed table and loaded the data into some other HDFS path manually i.e., other than "/user/hive/warehouse", the table's metadata will not get refreshed when you do a MSCK REPAIR on it.
This could be one of the reasons, when you created the table as external table, the MSCK REPAIR worked as expected.
Ans 2: For an unpartitioned table, all the data of the table will be stored in a single directory/folder in HDFS. For example, a table T1 in default database with no partitions will have all its data stored in the HDFS path - "/user/hive/warehouse/T1/" . Even when a MSCK is not executed, the queries against this table will work since the metadata already has the HDFS location details from where the files need to be read. On the other hand, a partitioned table will have multiple directories for each and every partition. If a new partition is added manually by creating the directory and keeping the file in HDFS, a MSCK will be needed to refresh the metadata of the table to let it know about the newly added data.
Hope this helps!