Hi,
I have certain set of question which Im trying to understand in spark which are mentioned below:
What the best compression codec that can be used in spark. In hadoop we should not use gz compression unless it is cold data where input splits of very less use. But if we were to choose any other compression w.r.t (lzo/bzip2/snappy etc) then based on what parameters do we need to choose the compressions?
Does spark makes use of the input splits if the files are compressed?
How does spark handles compression when compared with MR?
Does compression increases the amount of data which is being shuffled?
Thanks in advance!!