I am doing some research jobs on leverageTableau to perform analytics task on our big data hosted on Cloudera.
I have establish the connection, because the data volume is often billion records level, I don't know how to deal with the volume, what I came up is to optimize the query in advance in Impala so that it retrieves the reasonable size of data in acceptable time.
After the query optimization, it usually comes up with some joins, with multiple conditions on partition columns and/or non-partition columns.
I believe I need to put the query in Custom SQL to get the data in Tableau.
Even though the optimized query pull out the final result of 50 rows in Impala in less than one minute, the Custom SQL seems hanging here with no result for over 10 minutes now (I will let it running until I see an exception or result)
So my question is how do I use Tableau to do this type of analytical works on big data? what is the best practice? I think my current approach is not the right one.