About balaram38489

balaram38489 · ‎03-08-2017

you can send the key from mapper : customer id value from mapper : amount since your data is large, you can set the combiner with the reducer class so that a part of summing the values will be performed on map side. j.setCombinerClass(reducerclass.class); You can increase number of reducers by using: j.setNumReduceTasks(3) // it creates 3 reducers. you use both concepts combiners and partitioners in your program.

balaram38489 · ‎03-04-2017

@Aruna, delete all files and directories in /home/aruna/hadoop-2.7.3/hadoop2_data/hdfs/datanode restart the cluster. Still if you are having problem, Please follow these steps 1. create the following directories in hadoop-2.7.3. name data 2. add the following lines in hdfs-site.xml <property> <name> dfs.namenode.name.dir </name> <value>file://home/aruna/hadoop-2.7.3/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file://home/aruna/hadoop-2.7.3/data</value> </property> 3. delete all log files in hadoop-2.7.3/logs 4. format namenode 5. start cluster Still you have problem reply with log files All the best

balaram38489 · ‎03-04-2017

Hello, I cleared HDPCD: Java on 10 Feb 2017. What is the validity for my certification? shall i attempt the new expert level certification? The expert level certification was not yet released. When they will be started? Please clarify my doubts. Thank you.

balaram38489 · ‎03-04-2017

Hello, I cleared HDPCD: Java on 10 Feb 2017. What is the validity for my certification? shall i attempt the new expert level certification? The expert level certification was not yet released. When they will be started? Please clarify my doubts. Thank you.

balaram38489 · ‎03-03-2017

Sol 1: Reduce side join create separate mappers for all 4 csv files and produce the time stamp as key from all mappers and remaining fields + tag field to represent from which file it is returned as value. handle them in reduce side.. Sol 2: Map side join (if 3 files are small) add 3 csv files into distributed cache and merge them with large csv file in mapper.

balaram38489 · ‎03-03-2017

Are you using a single node cluster? in such a case, if u are giving large file and there is no sufficient memory then the container might not be initialized. verify the pig configuration files also.

balaram38489 · ‎03-03-2017

you can use combiners in this situation. increasing number of reducers is another solution.

balaram38489 · ‎03-03-2017

Set your hadoop path in .bashrc.

balaram38489 · ‎03-03-2017

On what basis does a key value pair is generated? It is depending on the data set and the required output. In general, the key value pairs are to be specified in 4 places: Map Input, Map Output, Reduce Input and Reduce Output. Map-Input: By default it will take the line offset as the key and the content of the line will be the value as Text. We can modify them by using custum input format. Map-Output: The basic responsibility of map is to filter the data and provide the environment for grouping of data based on key. Key: It will be the field / text / object on which the data has to be grouped and aggregated on reducer side. For example, if you want to find the maximum salary for each department then, we have to group all the values of same department and send it to reduce. So, the department_name or department_id can be selected as the key. Value: It will be the fields / text / object which are to be handled within each individual reduce method. In the above example, we have to find the maximum salary for each department. In each reduce method, all the salaries related to a specific key are available in Iterable format. We can find the maximum of those values because all values are related to one department. Reduce-Input: It is same as Map-Output because the output of map is the input for reduce. Reduce-Output: These key value pairs are depending on the required output. In our example, if the required output is like : department name salary then the key for reducer might be the input key of reducer because its the department name and the value will be the calculated salary within the reduce logic. if the required output is like : department name - salary then, the key might be null and the value can be the concatenation of department name+"-"+salary. How to decide the no of mappers and its associated reducers? In general, one mapper will be created for each split. Suppose, if your data is less than 128 MB and the split size is also 128 MB then one mapper will be created. if your data is 200 MB then, 2 mappers will be created. Number of reducers can be specified by programmer based on how many output files to be created and how many partitions we are using with in out program.

Online	Offline
Last Visited	‎03-08-2017 10:20 AM

Member Since	‎03-03-2017 10:02 AM
Last Visited	‎03-08-2017 10:20 AM
Posts	10
Kudos received	2

Cloudera Community

Re: Distribution of key,value in mappers and Reduc...

Re: Generation of Key value pair

Re: Distribution of key,value in mappers and Reduc...

Re: when try to Format namenode ,its format the te...

Regarding eligibility for Hortonworks Certified Ex...

Re: HCA - HORTONWORKS CERTIFIED ASSOCIATE Certific...

Re: merge csv files based on a column timestamp to...

Re: Container Exit Code 65 - Meaning of this code

Re: Generation of Key value pair

Re: HDFS not starting; hdfs command not found

Re: Generation of Key value pair