Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive Partition - Can we add more data?

avatar
Explorer

I am facing few issues with hive. Please provide your inputs. Thanks.

 

I have an hive external table (staging) in text format which points to the location where sqoop imports the file. I have another hive internal table (final) in parquet format with date level partitioning. then load data into final table using  Insert into table final partition (date)

select col1, col2,...dtCol  from staging.

 

oozie job: 

step1 - sqoop to hive external table location

step2  - load to final table using insert SQL above

 

 

1. I noticed that sometimes though oozie job runs successfully and files are available in external table path, select * from staging returns 0 results. If I drop and create the table again, then it is returning data. 

Any idea why? Do I need to run something to refresh the table after files are placed everytime? 

This happens randomly, not every time.

 

2.  I am seeing above issue with final table also. I can see partition folder created and file in it but select * is not returning any values. If I give show partitions, it is displaying none. This is also happening sometimes. If I drop, create and run the job again, it works. But I can do this as for internal table, all data will deleted if I drop the table. 

 

 

3. if I run the rerun job more than once, I see one than one file under the partition folder in hdfs. Files names are like below. If I do select * from final, it it only returning data from first file. Wont hive return duplicate records? How I can add more data/file to the existing partition? 

 

00000_0

00000_0_copy_1

00000_0_copy_2

 

 

4 REPLIES 4

avatar
Expert Contributor

Hello @csegokul ,

 

With read on your environment, I am assuming that you aren't using Impala or Spark for query or inserting for Parquet tables. Your show partitions is showing none because info about the partition isn't been seen by metastore. You may choose to sync it with msck repair table. But please bear in-mind that msck repair table isn't a cheap operation. So, depending on the requirement you can choose to use ALTER TABLE. For e.g. for your regular procedure for partition creation, you can use ALTER TABLE with ADD PARTITION. 

 

Hope that helps.

avatar
Explorer
Yes, I am not using Impala or Spark. Only hive tables and queries. I am using below query to insert into parquet table. This oozie job needs to run daily. So every day when new partition is added, I need also need to run msck repair table?

Insert into table final partition (date)
select col1, col2,...dtCol from staging.

avatar
Explorer
I found the issue for all 3 queries. Hue editor which i was using was not reflecting, I was getting the results when I ran from command line. After logging off and logging back in Hue, I was able to get the results.
Thanks.

avatar
Super Guru
Hi @csegokul

It sounds weird Hue behaves this way, I have never heard such thing that you need to logout and log back into Hue to get Hive results, something is not correct. Did Hue report any issues/errors in the interface?

Cheers
Eric