Impala query skew issue

Hello CM community,


I'm working on exploring impala query's real behavior on large clusters,I'm trying to reproduce a skew scenario to better understanding the Impala behavior.


I found the following article about one of the common skew causes:


However, I have some difficulties in reproducing it in the above way. I wonder if the CM community could help me on that. If you could point me to a skew dataset, that'll be really helpful. Or is there other ways of creating the Impala skew problem?