Created 02-01-2016 11:04 AM
Created 02-01-2016 02:27 PM
You can rewrite your query in a way that won't evaluate the time for every row by putting it in a subquery.
SELECT user_id, B.house_id, style, category_id FROM c2.buzz_p B, c2.houses_p H, (select myTimeStamp from (select from_unixtime(unix_timestamp(now() - interval 365 days), 'yyyy-MM-dd HH:mm:ss.S') as myTimeStamp) a) TimeSource WHERE B.house_id = H.house_id AND B.created > myTimeStamp
Created 02-01-2016 10:14 AM
Interesting finding. So is now() the killer or from_unixtimestamp? If it's now() then in theory we can inject 'now' in the query before running it... in any case we probably prefer having a consistant time being used there across all partitions of the query.
regarding it being a regression, I'm guessing the planner just planned this a little different? We're still working on setting up a true testing environment where we can hopefully repro this.
-Sean
Created 02-01-2016 10:23 AM
It looked like unixtimestamp() executes a system call to get the timezone. It doesn't look like this system call changed from 5.4 to 5.5.
I gather you're running queries with the same condition in the where clause under 5.4? It'd be interesting to see the profile to see if there's any difference in the plan. Generally it's an optimisation to push conditions to the scan but this is an exception.
There may be something subtle like the function being treated as a constant expression in one version or not the other.
Created 02-01-2016 10:54 AM
Here's the profile comparison for the same query:
5.4: https://gist.github.com/birdychang/923af04c5db893efc555
5.5: https://gist.github.com/sobrien05/4cdeeb85d314d08d4e2a
It seems like the major difference is in MaterializeTupleTime. However, I don't understand why the HDFS_SCAN_NODE TotalTime in 5.5 is much lower.
Also, I don't think unixtimestamp() is causing the slowness since this was happening for simple table joins as well.
Created 01-18-2016 02:06 PM