Member since
11-26-2018
80
Posts
12
Kudos Received
0
Solutions
02-05-2019
03:49 PM
1 Kudo
@Harshali Patel Once a map reduce program is built a driver class has to be created that will be submitted to the cluster. For this we create the object of JobConf class. One of the properties of this object is setMapperClass. conf.setMapperClass() method is used set your mapper class to your driver class. It helps the driver class to get the details like reading data and generating key-Value pairs out of the mapper. Mapper class is for writing code for mapper function. The map phase is the first primary phase of Hadoop mapreduce programming structure which is responsible for performing operation on the provided input dataset.The Mapper class is a generic type, with four formal parameter types that specify the input key, input value, output key and output value types of the map function. The driver class which communicates with the Hadoop framework and specifies the configuration elements required to run a MapReduce job. This involves aspects such as telling Hadoop which Mapper and Reducer classes to use, where to find the input data and in what format, and where to place the output data and how to format it. Please accept my answer if you found helpful.
... View more
01-30-2019
05:44 PM
2 Kudos
Hi @Dukool SHarma The number of map tasks for a given job is driven by the number of input splits. So, the number of map tasks is equal to the number of input splits. Split is logical split of the data, basically used during data processing using MapReduce program. Suppose you have a file of 200MB and HDFS default block configuration is 128MB.Then it will consider two splits. But if you have specified the split size(say 200MB) in your MapReduce program then both blocks(2 block) will be considered as a single split for the MapReduce processing and one Mapper will get assigned for this job. If you want n number of Map, divide the file size by n as follows: Parameters: conf.set(“mapred.max.split.size”, “41943040”); // maximum split file size in bytes conf.set(“mapred.min.split.size”, “20971520”); // minimum split file size in bytes. Please accept my answer if it is found helpful.
... View more
01-23-2019
07:17 AM
1 Kudo
@Michael Bronson Kindly look into the below JIRA: https://issues.apache.org/jira/browse/ZOOKEEPER-2125
... View more
01-21-2019
04:29 PM
1 Kudo
Hi @Michael Bronson By default network communication of ZooKeeper isn’t encrypted. However, each user and service can leverage the SSL feature and/or custom authentication implementation in order to use ZooKeeper in secure mode. Kindly refer the below link. LINK: https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeper+SSL+User+Guide This feature is added from Zookeeper version 3.5.1 and so on. Unfortunately the HDP version (which is HDP 2.6.4) includes the Zookeeper version "Apache ZooKeeper 3.4.6" so this feature is not available in HDP yet. Please accept this answer if you found it helpful.
... View more
01-21-2019
09:16 AM
3 Kudos
Hi @Vinay 1)Kindly check whether you have mentioned the path correctly. 2) Try running the query in the host where HS2 is running. If you get any permission error or No files found error kindly add this property "hive.users.in.admin.role=hive" in Custom hiveserver2-site via ambari and try running LOAD DATA query as hive user. I think this will work for you. Please accept this answer if you found it helpful.
... View more
01-09-2019
05:15 AM
2 Kudos
@Abdul M If the cluster is managed by Ambari, it should be managed via ambari only else it will override out manual changes made to the log4j.propeties file which is done on the backend during component restart "etc/hadoop/conf/log4j.properties". Hence if you want to made any changes in the log4j.properties, You can change the logging config from ambari UI.
Login to Ambari Goto the config tab of HDFS component. Filter for Advanced hdfs-log4j. LINK : https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.2.2/bk_ambari-operations/content/customizing_log_settings.html Hope this answer will be helpful to you.
... View more
01-08-2019
02:20 PM
1 Kudo
@Nikhil Raina In hadoop, Mapreduce breaks the jobs into task and these task runs in a parallel way. So that the overall execution time may reduce. Now among the divided tasks, if one of the tasks take more time than desired, then the overall execution time of job increases. The reason can be anything: node busy, network congestion, etc, which limits the total execution time of the Job, and the system should wait for the slow running tasks to be completed. It may be difficult to detect causes since the tasks still complete successfully, although more time is taken than the expected time. Hadoop doesn’t try to diagnose and fix slow running tasks, instead, it tries to detect them and runs backup tasks for them. The backup tasks will be preferentially scheduled on the faster nodes. This is called "speculative execution" in Hadoop. The "backup task" are "speculative Tasks". When a task successfully completes, then duplicate tasks that are running are killed since they are no longer needed. If the original task finishes first, then the speculative task will be killed. On the other hand, if the speculative task finishes first, then the original one will be killed. Simply, "Speculative execution" is a "MapReduce job optimization technique" in Hadoop that is enabled by default. To disable that set the property value "mapred.map.tasks.speculative.execution" - "false" and "mapred.reduce.tasks.speculative.execution" - "false" in "mapred-site.xml". Please accept this answer if you found it helpful.
... View more
01-06-2019
05:48 PM
1 Kudo
@Michael Bronson In HDFS 2.x provides a “balancer” utility to help balance the blocks across DataNodes in the cluster. But from HDFS 3.x onwards we have Disk level Balancer that rebalance data across multiple disks of a DataNode. It is useful to correct skewed data distribution often seen after adding or replacing disks. Disk Balancer can be enabled by setting dfs.disk.balancer.enabled to true in hdfs-site.xml. It can be invoked by running "hdfs diskbalancer”.
JIRA: https://issues.apache.org/jira/browse/HDFS-1312 For more detail: https://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html Please accept this answer if you found it helpful.
... View more