Member since
03-22-2017
26
Posts
3
Kudos Received
0
Solutions
11-08-2017
09:22 AM
Thanks for your suggestions and will try to incorporate your suggestions and come back to you with more questions!
... View more
11-07-2017
12:42 PM
kgautam, Thanks for your reply.
1) Currently, I'm not using any combiner. My map phase output
<key,value> pair is <string/text,string/text>. As my value
is string/text in map phase output <key,value> pair, I think that
It will be difficult to write the combiner. Usually,the function of the
combiner is same as the reducer. Here, I'm not able to think of writing
the combiner for this particular problem.
2) Currently,we tried with this compression for map output "-D
mapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.Lz4Codec".
Is this configuration enough to compress map output? Do we have to
modify or write some statements in our mapreduce code to use this
compression?
3) May i know that where do you get this rule of thumb "A reducer should process 1 GB of data" ?
4) When i have 24 logical cores in one data node, Why you have mentioned 20 * 7? I think that it should be 24*7?
5) How to handle skewed key? Can i handle it using partitioner? Do we have any other way?
Thanks.
... View more
11-07-2017
11:08 AM
Dear Community, I have a Mapreduce job which processes 1.8TB data set. My map task generates around 2.5 TB of intermediate data and the number of distinct keys would easily cross a billion . I have set a split size to be 128MB. So, total number of splits generated is approximately 14,000/-. I have set a number of reducers to be 166. My cluster size is 8 nodes. 7 nodes are data nodes out of 8 nodes. 1 is a name node. Each data node has got 24 logical cores and 128GB RAM. When the job is running with this configuration, map completes its execution but my reduce phase stucks at 26%. May i know that what should be the split size and number of reducers i should have for this particular problem with my current cluster size. Please provide suggestions. Thanks.
... View more
Labels:
- Labels:
-
Apache Hadoop
08-08-2017
03:20 PM
Hi Ankit, I'm already using Gzip for compressing my reduce tasks output. But If i use gzip compression for map output i will not be able to split map output among reducers. correct me if i am wrong!!!? so i didnt use compression for map output. how to update sort algorithm? you have any tutorial for doing this? Also, can you explain me how to set io.sort.mb and io.sort.factor parameters?
... View more
07-31-2017
01:44 PM
Hello All, I would like to know that where map's intermediate data is written that is context.write() writes data to hard disk or network immediately after its generation? Which Hadoop parameter to be tuned when the amount of intermediate data generated by map() task is over 45 GB and a huge amount of data(for example above 50 GB) to be shuffled over the network in a multinode cluster set up? Will i get any performance improvement if i increase io.sort.mb paramter when Map() task generates huge amount of data? Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Hadoop
04-04-2017
11:56 AM
Ok. thanks. Now i understood that by default the amount of memory allotted for an executor is 1 GB and this value can be controlled through --executor-memory option. Now, I would like to know that By default, how many executors will be created for an application in a node and what is the total number of executors created in a cluster? How to control number of executors created in a node? Also, by default how many cores will be allotted to an executor in a node(I think that the number of cores allotted for an executor is unlimited in a node.Am i right?)?
... View more
04-04-2017
09:13 AM
Hello All, In Hadoop MapReduce, By default, the number of mappers created is depends on number of input splits. For example, if 192 MB is your inpur file size and 1 block is of 64 MB then number of input splits will be 3. So number of mappers will be 3. The same way, I would like to know that, In spark, if i submit an application in standalone cluster(a sort of pseudo distributed) to process 750 MB input data, how many executors will be created in Spark?
... View more
Labels:
- Labels:
-
Apache Spark
04-04-2017
07:48 AM
My set up is as follows: 1 laptop where i am running wordcount scala program through spark-submit command. The input for my application is a text file which is placed in HDFS. I'm using Spark's standalone cluster for managing my cluster. I'm running my application on a kind of pseudo distributed mode. while executing spark-submit command i don't use --executor-memory option. My execution command looks like this? I would like to know that how much memory will be allotted to executor by default when --executor-memory option is not given. My interface looks like the one in the image attached. spark-submit --class Wordcount --master spark://saravanan:7077 /home/hduser/sparkapp/target/scala-2.11/sparkapp_2.11-0.1.jar hdfs://127.0.0.1:9000//inp_wrd hdfs://127.0.0.1:9000//amazon_wrd_count1
... View more
Labels:
- Labels:
-
Apache Spark