Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

set hive.tez.exec.print.summary=true causes odd behavior with hive over S3 queries

avatar
Contributor

Can anyone explain exactly what's going on here? When running "set hive.tez.exec.print.summary=true;" with large hive queries over S3, the job is only about half over when Hive/Tez prints all the job stats as if the job is complete. But the following is the final line (slightly obfuscated) and the copy takes as long as the query itself.

INFO  : Moving data to: s3a://xxxxxxxxxxx/incoming/mha/poc/.hive-staging_hive_2016-09-26_17-49-00_060_4187715327928xxxxxx-3/-ext-10000 from s3a://xxxxxxxxxxxx/incoming/mha/poc/.hive-staging_hive_2016-09-26_17-49-00_060_4187715327928xxxxxx-3/-ext-10002

What is the reason for the data being moved? If the same thing happens with HDFS it's not noticeably, probably because it's just moving pointers around, but on S3 it seems to be actually moving the data. (a) is this true and (b) why the movement?

1 ACCEPTED SOLUTION

avatar
Rising Star

This is not specific to "hive.tez.exec.print.summary=true", which prints the summary details of the DAG. In this case, DAG ran a lot faster and the delay you are observing is due to the file movement from S3 to S3 as a part of final cleanup activity of the job.

Hive moves the job output to final location and this activity is carried out in the hive-client. In S3, rename is a "copy + delete" operation. So even though this rename is done in the AWS side, it takes time depending on the amount of data that is churned out by the job. In HDFS, rename is a lot cheaper operation and hence you do not observe this delay in HDFS. Alternate option is to write the data to local HDFS and move the data to S3 via distcp.

View solution in original post

4 REPLIES 4

avatar
Rising Star

This is not specific to "hive.tez.exec.print.summary=true", which prints the summary details of the DAG. In this case, DAG ran a lot faster and the delay you are observing is due to the file movement from S3 to S3 as a part of final cleanup activity of the job.

Hive moves the job output to final location and this activity is carried out in the hive-client. In S3, rename is a "copy + delete" operation. So even though this rename is done in the AWS side, it takes time depending on the amount of data that is churned out by the job. In HDFS, rename is a lot cheaper operation and hence you do not observe this delay in HDFS. Alternate option is to write the data to local HDFS and move the data to S3 via distcp.

avatar
Contributor

Thanks. That's what I thought---it's negligible in HDFS but not always trivial in S3 because it's a copy+delete. Interesting idea about using distcp to transfer the data. Not sure if that would actually help with EBS backing HDFS but it's worth a try.

avatar
Contributor

This brings up an issue. When the S3->S3 moves occur, does the data move across the local LAN link or does this occur entirely within the S3 infrastructure. I.e., if you copy, a NAS-backed file on a server it is read in across the LAN and then written out again. S3 isn't a NAS in that sense--but is this what it does, or does S3 move the data around on its own networks when the move is S3-S3? This matters because network is probably our limiting resource with our query types.

@Rajesh Balamohan

avatar
Expert Contributor

@Peter Coates: There is no local download and upload (distcp does that, which is bad). This makes more sense if you think of S3 as a sharded key-value store (instead of a NAS). The filename is the key, so that whenever the key changes, the data moves from one shard to the other - the command will not return successfully until the KV store is done moving the data between those shards, which is a data operation and not a metadata operation - this can be pretty fast in some scenarios where the change of the key does not result in a shard change, In a FileSystem like HDFS, the block-ids of the data are independent of the name of the file - The name maps to an Inode and the Inode maps to the blocks. So the rename is entirely within metadata, due to the extra indirection of the Inode.