Member since
03-03-2017
10
Posts
2
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1844 | 03-08-2017 10:08 AM | |
2440 | 03-03-2017 10:57 AM |
03-08-2017
10:08 AM
you can send the key from mapper : customer id value from mapper : amount since your data is large, you can set the combiner with the reducer class so that a part of summing the values will be performed on map side. j.setCombinerClass(reducerclass.class); You can increase number of reducers by using: j.setNumReduceTasks(3) // it creates 3 reducers. you use both concepts combiners and partitioners in your program.
... View more
03-04-2017
12:00 PM
@Aruna, delete all files and directories in /home/aruna/hadoop-2.7.3/hadoop2_data/hdfs/datanode restart the cluster. Still if you are having problem,
Please follow these steps
1. create the following directories in hadoop-2.7.3.
name
data
2. add the following lines in hdfs-site.xml <property>
<name> dfs.namenode.name.dir </name>
<value>file://home/aruna/hadoop-2.7.3/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file://home/aruna/hadoop-2.7.3/data</value>
</property> 3. delete all log files in hadoop-2.7.3/logs 4. format namenode 5. start cluster Still you have problem reply with log files All the best
... View more
03-04-2017
08:17 AM
Hello, I cleared HDPCD: Java on 10 Feb 2017. What is the validity for my certification? shall i attempt the new expert level certification? The expert level certification was not yet released. When they will be started? Please clarify my doubts. Thank you.
... View more
Labels:
- Labels:
-
Certification
03-04-2017
08:09 AM
Hello, I cleared HDPCD: Java on 10 Feb 2017. What is the validity for my certification? shall i attempt the new expert level certification? The expert level certification was not yet released. When they will be started? Please clarify my doubts. Thank you.
... View more
03-03-2017
02:51 PM
Sol 1: Reduce side join create separate mappers for all 4 csv files and produce the time stamp as key from all mappers and remaining fields + tag field to represent from which file it is returned as value. handle them in reduce side.. Sol 2: Map side join (if 3 files are small) add 3 csv files into distributed cache and merge them with large csv file in mapper.
... View more
03-03-2017
02:39 PM
Are you using a single node cluster? in such a case, if u are giving large file and there is no sufficient memory then the container might not be initialized. verify the pig configuration files also.
... View more
03-03-2017
11:30 AM
you can use combiners in this situation. increasing number of reducers is another solution.
... View more
03-03-2017
11:00 AM
Set your hadoop path in .bashrc.
... View more
03-03-2017
10:57 AM
1 Kudo
On what basis does a key value pair is generated? It is depending on the data set and the required output. In general, the key value pairs are to be specified in 4 places: Map Input, Map Output, Reduce Input and Reduce Output. Map-Input: By default it will take the line offset as the key and the content of the line will be the value as Text. We can modify them by using custum input format. Map-Output: The basic responsibility of map is to filter the data and provide the environment for grouping of data based on key. Key: It will be the field / text / object on which the data has to be grouped and aggregated on reducer side. For example, if you want to find the maximum salary for each department then, we have to group all the values of same department and send it to reduce. So, the department_name or department_id can be selected as the key. Value: It will be the fields / text / object which are to be handled within each individual reduce method. In the above example, we have to find the maximum salary for each department. In each reduce method, all the salaries related to a specific key are available in Iterable format. We can find the maximum of those values because all values are related to one department. Reduce-Input: It is same as Map-Output because the output of map is the input for reduce. Reduce-Output: These key value pairs are depending on the required output. In our example, if the required output is like : department name salary then the key for reducer might be the input key of reducer because its the department name and the value will be the calculated salary within the reduce logic. if the required output is like : department name - salary then, the key might be null and the value can be the concatenation of department name+"-"+salary.
How to decide the no of mappers and its associated reducers? In general, one mapper will be created for each split. Suppose, if your data is less than 128 MB and the split size is also 128 MB then one mapper will be created. if your data is 200 MB then, 2 mappers will be created. Number of reducers can be specified by programmer based on how many output files to be created and how many partitions we are using with in out program.
... View more