Member since
12-15-2015
6
Posts
1
Kudos Received
0
Solutions
11-03-2016
05:56 PM
1 Kudo
TableInputFormat used in HBase will create 1 map task per table region. The data size will depend on how big your regions are.
... View more
08-03-2016
06:37 AM
Basically you increased your YARN memory from 32Gb to 64GB, it means you increased all containers memory. Container is a unit for YARN submitting the jobs in-terms of CPU and RAM. you increased YARN container size then what about Tez container size? --> ideally tez container size should be multiple of YARN Memory. --> ideally we can allocate two containers per disk and per CPU.
... View more
07-22-2016
12:14 PM
@Arunkumar Dhanakumar You can simply compress text files before you upload them. Common codecs include gzip, snappy and lzo. HDFS does not care. All Mapreduce/Hive/pig jobs support these standard codecs and identify them by their file extension. If you use gzip you just need to make sure that each file is not too big since its not splittable. I.e. each gzip file will result in one mapper. You can also compress the output of jobs. So you could run a pig job that reads the text files and writes them again. I think you simply need to add the name .gz for example to the output. Again you need to understand that now each part file is gzipped and will run in one mapper later. Lzo and snappy on the other hand are splittable but do not provide as good a compression. http://stackoverflow.com/questions/4968843/how-do-i-store-gzipped-files-using-pigstorage-in-apache-pig
... View more