About arunkumar_d

Enis · ‎11-03-2016

TableInputFormat used in HBase will create 1 map task per table region. The data size will depend on how big your regions are.

divakarreddy_a · ‎08-03-2016

Basically you increased your YARN memory from 32Gb to 64GB, it means you increased all containers memory. Container is a unit for YARN submitting the jobs in-terms of CPU and RAM. you increased YARN container size then what about Tez container size? --> ideally tez container size should be multiple of YARN Memory. --> ideally we can allocate two containers per disk and per CPU.

bleonhardi · ‎07-22-2016

@Arunkumar Dhanakumar You can simply compress text files before you upload them. Common codecs include gzip, snappy and lzo. HDFS does not care. All Mapreduce/Hive/pig jobs support these standard codecs and identify them by their file extension. If you use gzip you just need to make sure that each file is not too big since its not splittable. I.e. each gzip file will result in one mapper. You can also compress the output of jobs. So you could run a pig job that reads the text files and writes them again. I think you simply need to add the name .gz for example to the output. Again you need to understand that now each part file is gzipped and will run in one mapper later. Lzo and snappy on the other hand are splittable but do not provide as good a compression. http://stackoverflow.com/questions/4968843/how-do-i-store-gzipped-files-using-pigstorage-in-apache-pig

Online	Offline
Last Visited	‎03-12-2019 12:51 PM

Member Since	‎12-15-2015 12:02 PM
Last Visited	‎03-12-2019 12:51 PM
Posts	6
Kudos received	1

Cloudera Community

Re: MapReduce performance on the HBase input table...

Re: Yarn memory utilization.

Re: How much actual space required to store 10GB t...