Member since
08-17-2017
5
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3230 | 09-20-2017 01:46 PM |
04-18-2018
03:57 PM
1 Kudo
Hi Sam, Recently we upgraded our cluster from HDP2.5.6 to HDP2.6.4 and I am getting the similar error. With the current version Hive is more stricter on INSERT OVERWRITE TABLE. What it means is you might be deleting the data prior to loading the table and not dropping the partition when you do INSERT OVERWRITE TABLE. To get around it, Try to delete the data and drop partition,prior to running the INSERT OVERWRITE TABLE. OR don't delete the data/drop partition for the external table let the INSERT OVERWRITE TABLE replace it. Regards Khaja Hussain. Similar Error: Caused by: java.util.concurrent.ExecutionException: org.apache.hadoop.hive.ql.metadata.HiveException: Destination directory hdfs://data_dir/pk_business=bsc/pk_data_source=pos/pk_frequency=bnw/pk_data_state=c13251_ps2111_bre000_pfc00000_spr000_pfs00000/pk_reporttype=BNN/pk_ppweek=2487 has not be cleaned up.
... View more
09-20-2017
01:46 PM
Thanks Rajesh for the response. I will take this back to the table and make few adjustment to see if it works. One other thing I want to clarify like we have 3 queue, HIGH, MEDIUM & LOW. If the user submits the a work load (example data processing for Jan 2017) in the medium queue and same work load(Feb 2017) is submitted by the user after few secs while the 1st job is running , I want the execution time to be very similar. What I am seeing is vast difference in execution time, which is what I am trying to avoid here. The scenarios are happening within the queue. I think order policy as FAIR in this instance should fetch me the similar execution time.
... View more
09-19-2017
07:10 PM
Is
there a way to set minimum/maximum number of containers for an application?
What I am observing in my cluster is that, when YARN try to submit a job. It
pretty much put all the available resources to a given job depending on
the queue setting. When the next jobs comes in to the queue let’s say after 5
sec, it tries to find out how much resource are available. At this point of
time, when all the resources are given to the 1 st job which is still
running, it allots minimum allocation set-up on the cluster. This create a
large gap between 2 jobs, mean the same work load was submitted to the cluster.
The 1 st jobs completed may be in 10 minutes because it got lot of
resource. But the 2 nd job when it came in got minimum allocation
took 3 hours to complete. I am trying to avoid such big gap of execution time.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache YARN