- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
alter table add partition took almost an hour
- Labels:
-
Apache Hive
Created ‎03-03-2018 09:09 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
i have a job that runs every hour, i put a csv file into hdfs location and do an alter table to add that new location to the partition. Weirdly it took more than 50 min when it just takes 5-10 seconds. I am not sure why? how to start root cause analysis on this?
Created ‎03-04-2018 08:18 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is this smth related to yarn.scheduler.capacity.maximum-am-resource-percent
<value>0.2</value>
which is Maximum percent of resources in the cluster which can be used to run application masters i.e. controls number of concurrent running applications.
at the time this happened, i had about 5-6 jobs using same queue... i guess this is the reason AM for this job didnt get resources allocated until rest of them finished.
so is the 0.2 mean 20% per queue or altogether all queues ?
Created ‎03-04-2018 03:29 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎03-04-2018 04:15 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for reply. My table is huge, msck just hangs.
Also, i see that although job started at 20:28 pm, the container dint launch until 20:55 and i dont see any logs.
what does explain exended do? how to use debug mode for a single query without actually changing in the configurations?
FinalStatus Reported by AM: | SUCCEEDED |
---|---|
Started: | Sat Mar 03 20:21:14 -0500 2018 |
Elapsed: | 34mins, 44sec |
Log Type: launch_container.sh
Log Upload Time: Sat Mar 03 20:55:59 -0500 2018|
Log Length: 9051
Showing 4096 bytes of 9051 total. Click here for the full log.
I dont see anythign in logs. this happened twice during the last 2 days.
Created ‎03-04-2018 08:18 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is this smth related to yarn.scheduler.capacity.maximum-am-resource-percent
<value>0.2</value>
which is Maximum percent of resources in the cluster which can be used to run application masters i.e. controls number of concurrent running applications.
at the time this happened, i had about 5-6 jobs using same queue... i guess this is the reason AM for this job didnt get resources allocated until rest of them finished.
so is the 0.2 mean 20% per queue or altogether all queues ?
Created ‎03-05-2018 02:41 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @PJ
Which you have said like it didnt launched the containers till 8:55 which means its not getting the proper resource to start the process and as there are already jobs running in the same queue support the issue as well. try decreasing the value to 0.1 .
Created ‎03-05-2018 02:33 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yeah .. but i thought i have to increase the value so there can be more resources available to launch more AM's...
Created ‎03-05-2018 10:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This can be related to HIVE-13901. Depending on the FS, MSCK & Add partition can be slow.
Can you try Setting "hive.fetch.task.conversion=none" ?
Created ‎03-05-2018 02:46 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"i put a csv file into hdfs location and do an alter table to add that new location to the partition". Can you please explain this operation?
