I have a file of size 70GB that I copy from local FS to HDFS. Once copied, I created an external table over it. This size will continue to grow and I feel querying this table is exhausting cluster resources completely. SELECT statement for one day data is creating 273 mappers and it takes forever to query it.
What's the best way to handle this? I have this data in CSV format. Am I doing something wrong here?
I have tried using copyFromLocal and put.
Table is not partitioned because I have datetime column and no date column. Please suggest