I have 84 files in HDFS ,40 MB each.Read files from HDFS using Dataset and parse it which results in rows of 4-5 fields.
There are approx. 70,00,000 rows per file.I repartitioned the Dataset into 84 partitions so that each file parse in separate partitions.
There is approx 3.2 GB suffle write takes place.And then it leads to below error.
Getting OutOfMemory/TimeOutException after TaskMemoryManager: Failed to allocate a page (268435456 bytes) warning, try again while adding Rows list in Dataset.