I receive some warnings about Impala Assisgment Locality falling below the set threshold.
My understanding is that this happens when a query may need to use data from two different sources, which are possibly stores on different data nodes.
If I know two tables are regualry going to be used in a join clause, is there away I can force hadoop - HDFS to store the data blocks for these tables on the same datanode? This would seen to not be the best approach to me though.
I guess my question is i'm unsure how to imporve assigment locality, as I have impalad running on each datanode.