Created 06-18-2018 09:09 AM
Hi Team, i am facing this issue recursively.
Currently i am working on HIVE tables and facing issue with hive partitions ,we have script to drop partitions if exist based on dynamic values and add the new partitions based on new data comes.
i have uploaded two files with same partitions with different data in gap of 1 hours, both files are processed successfully and logs showing that partitions are adding drooped properly, but while checking the data in table, data was not there.
After some time i have load same files same which i have uploaded last and partitions are properly placed and i could able to see the data in hive.
Please help us here how to resolve this issue ? what might be root cause of this issue, we are facing this issue recursively.
Created 06-18-2018 01:19 PM
Hi @rama
have you issued this command after creating a partition?
MSCK REPAIR TABLE <tablename>;
Are you working with internal or internal tables?
Kind regards,
Paul
Created 06-18-2018 02:04 PM
HI @Paul Hernandez
I am not triggering MSCK REPAIR TABLE <tablename>, i hope which may works here i will do it but my quick question
till now we have not use MSCK REPAIR, why should we need this now only.
i am using external table using azure blob storage .
Created 06-18-2018 02:59 PM
Hello @rama
Are you issuing the "alter table [table name] add partition [partition name and value]" command?
When dealing with external tables (or manually adding data to a managed table.. i.e. adding a file on hdfs), you need to manually tell hive that there's a new partition.
Paul's suggestion to running "msck repair table" triggers a automatic partition discovery.
I would suggest :
1-When adding a new partition, issue the above alter table statement.
2-If you forgot to do step #1, issue a msck command to recover all missing partitions.
Makes sense?
Created 06-19-2018 09:28 AM
Hi @rama
from the official documentation:
An external table describes the metadata / schema on external files. External table files can be accessed and managed by processes outside of Hive. External tables can access data stored in sources such as Azure Storage Volumes (ASV) or remote HDFS locations. If the structure or partitioning of an external table is changed, an MSCK REPAIR TABLE table_name statement can be used to refresh metadata information.
To make the metastore aware of partitions that were added directly to HDFS, you can use the metastore check command (MSCK)
Since you are adding and dropping partitions to an external table, you need to make the metastore aware of it.
Hope that clarifies the issue.
Kind regards,
Paul
Created 06-18-2018 03:58 PM
Hi @rama!
How much is set for hive.exec.max.dynamic.partitions?
BTW, could you check if your hive log is showing smtg related to lock?
If it is, try to unlock your tables (sintax e.g. -> UNLOCK TABLE <TABLE_NAME>).
And one last thing, just asking, but are you using External tables? And you're running a MSCK REPAIR TABLE <TABLE_NAME> after each batch?
Hope this helps!
Created 06-19-2018 07:26 PM
Thanks @ Vinicius Higa Murakami
my cluster hive.exec.max.dynamic.partitions = 5000
i have checked the hivemetastore.log but i could not find any locks in hive log.
no i have not running MSCK repair table after my batch load this is second time i am facing this issue in row, i resolved this issue by reload the files again, once i reload file all partitions are properly added.but i worried about recurring this issue.
Please note one point i forgot to mention, after i load the file it will check the if partitions are existed or not if partitioned already exist it will drop the old one and add the new parititons, BUT here the complete partitions are not visible.
EX: I have partitions based on key and month, i have keys A,B,C and months 01,02,03 and if i load a new file with key=A and month=03 , it is drooping only and key=A and month=03 and adding the new partitions but after process complete i could not see data for entire Key=A for all months in hive tables. After some time if reload the file , i could see entire data.
I have bellow error message on my hivemetastore.log is this causing the any issue? and related to this?
.IOException Scope named api_alter_partitions is not closed, cannot be opened. java.io.IOException: Scope named api_alter_partitions is not closed, cannot be opened.
Created 09-10-2018 01:55 PM
Hello @rama.
Hm. that's strange.
By any chance to have the add partition + insert in a short period between them?
I'm asking this, cause I'm suspecting 2 things:
- That you had your partition added and somehow the table got locked. You can check this by running show locks;
- Check if your HiveMetaStore DB (mysql, derby or etc) is healthy.
So guess, in the next time you can try to do the following:
- Enable DEBUG mode for HiveMetastore logs and check if you find something.
- Login into the DB and check if your partitions have been added properly
- Login into Hive with verbose and run SHOW LOCKS;
- Just to confirm, make sure that you're running the msck repair table <TABLE>; after the whole process ended.
Hope this helps!
Created 09-04-2018 02:06 PM
Hi @rama: I'm facing similar issue with HIVE partitions.
Could you please post the solution if this is resolved for you, it would be really helpful for us.
Thanks
Yamini
Created 09-04-2018 03:22 PM
Hi @Rama,
I'm also facing exactly the same issue as yours.
Please let me know if your issue is resolved, if yes, it would be great if you can post the solution here.
Thanks
Yamini