Member since
05-26-2016
17
Posts
0
Kudos Received
0
Solutions
06-19-2016
02:39 PM
@Benjamin Leonhardi I am trying to read a CSV file of total size around 50 GB. Around 310 splits get created but I have only 3 maps in running status at a time eventhough I have four datanodes. Each of the datanode has 16 GB RAM and one disk and Cores (CPU):2 (2) . I am using CSVNLineInputFormat from (https://github.com/mvallebr/CSVInputFormat/blob/master/src/main/java/org/apache/hadoop/mapreduce/lib/input/CSVNLineInputFormat.java) to red my CSV files.
... View more
06-19-2016
02:32 PM
Thanks ..@Rajkumar Singh .. @Benjamin Leonhardi Below are my settings in the cluster. Map Memory : 8192 Sort Allocation Memory : 2047 MR Map Java Heap Size : -Xmx8192mmapreduce.admin.map.child.java.opts & mapred.child.java.opts : -server -XX:NewRatio=8 -Djava.net.preferIPv4Stack=true -Dhdp.version=${hdp.version} I havent found mapred.child.java.opts through Ambari.
... View more
06-19-2016
02:12 PM
Hi, When we run a mapreduce job we re getting GC overhead limit exceeded error during the map phase and the job gets terminated. Please let us know how this can be resolved? Error: GC overhead limit exceeded
16/06/19 17:34:39 INFO mapreduce.Job: map 18% reduce 0%
16/06/19 17:36:42 INFO mapreduce.Job: map 19% reduce 0%
16/06/19 17:37:18 INFO mapreduce.Job: Task Id : attempt_1466342436828_0001_m_000008_2, Status : FAILED
Error: Java heap space Regards, Venkadesh S
... View more
Labels:
- Labels:
-
Apache Hadoop
06-13-2016
08:19 AM
@Joy There are not much error or warning. Most of it is INFO is getting printed as below. INFO [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version),
... View more
06-13-2016
07:06 AM
Hi, We are using the below hadoop version. Most of the time the zookeeper log file zookeeper.out grows so large around GBs and doesn't get rotated. Let us know the possible solutions. Hadoop 2.7.1.2.3.2.0-2950
Subversion git@github.com:hortonworks/hadoop.git -r 5cc60e0003e33aa98205f18bccaeaf36cb193c1c
Compiled by jenkins on 2015-09-30T18:08Z
Compiled with protoc 2.5.0
From source with checksum 69a3bf8c667267c2c252a54fbbf23d
This command was run using /usr/hdp/2.3.2.0-2950/hadoop/lib/hadoop-common-2.7.1.2.3.2.0-2950.jar Regards, Venkadesh S
... View more
Labels:
06-13-2016
07:03 AM
@nmaillardWhen the process starts I see the YARN memory as 100% on the Ambari homepage.How do we find the time taken to get an Application master.
... View more
06-12-2016
10:44 AM
Thanks @nmaillard. When the job starts it prints the below warnings and it takes around ten minutes before the job gets started. Is this something normal or need to be looked into ? [root@emrbldbgdapd1 exchange]# yarn jar ExchangeLogsMR.jar /ExchangeLogs/OnPremise/2016 /ExchangeLogs/output1
16/06/12 13:40:33 INFO impl.TimelineClientImpl: Timeline service address: http://emrbldbgdapd2.emaar.ae:8188/ws/v1/timeline/
16/06/12 13:40:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
16/06/12 13:40:34 INFO input.FileInputFormat: Total input paths to process : 8
16/06/12 13:40:35 WARN hdfs.DFSClient: DFSInputStream has been closed already
16/06/12 13:42:23 WARN hdfs.DFSClient: DFSInputStream has been closed already
16/06/12 13:44:27 WARN hdfs.DFSClient: DFSInputStream has been closed already
16/06/12 13:46:37 WARN hdfs.DFSClient: DFSInputStream has been closed already
16/06/12 13:48:42 WARN hdfs.DFSClient: DFSInputStream has been closed already
16/06/12 13:50:28 WARN hdfs.DFSClient: DFSInputStream has been closed already
16/06/12 13:50:28 WARN hdfs.DFSClient: DFSInputStream has been closed already
16/06/12 13:50:28 WARN hdfs.DFSClient: DFSInputStream has been closed already
16/06/12 13:50:28 INFO mapreduce.JobSubmitter: number of splits:116
16/06/12 13:50:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1465711050466_0004
16/06/12 13:50:28 INFO impl.YarnClientImpl: Submitted application application_1465711050466_0004
... View more
06-12-2016
10:11 AM
Hi,
We have written a mapreduce job to process log files. As of now we have around 52GB of input files but it is taking around an hour to process the data.It creates only one reducer job by default.Often we get to see a timeout error in the reduce task and then it restarts and gets completed. Below is the stats for the successful completion of the job. Kindly let us know how the performance can be improved. File System Counters
FILE: Number of bytes read=876100387
FILE: Number of bytes written=1767603407
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=52222279591
HDFS: Number of bytes written=707429882
HDFS: Number of read operations=351
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Failed reduce tasks=1
Launched map tasks=116
Launched reduce tasks=2
Other local map tasks=116
Total time spent by all maps in occupied slots (ms)=9118125
Total time spent by all reduces in occupied slots (ms)=7083783
Total time spent by all map tasks (ms)=3039375
Total time spent by all reduce tasks (ms)=2361261
Total vcore-seconds taken by all map tasks=3039375
Total vcore-seconds taken by all reduce tasks=2361261
Total megabyte-seconds taken by all map tasks=25676640000
Total megabyte-seconds taken by all reduce tasks=20552415744
Map-Reduce Framework
Map input records=49452982
Map output records=5730971
Map output bytes=864140911
Map output materialized bytes=876101077
Input split bytes=13922
Combine input records=0
Combine output records=0
Reduce input groups=1082133
Reduce shuffle bytes=876101077
Reduce input records=5730971
Reduce output records=5730971
Spilled Records=11461942
Shuffled Maps =116
Failed Shuffles=0
Merged Map outputs=116
GC time elapsed (ms)=190633
CPU time spent (ms)=4536110
Physical memory (bytes) snapshot=340458307584
Virtual memory (bytes) snapshot=1082745069568
Total committed heap usage (bytes)=378565820416
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=52222265669
File Output Format Counters
Bytes Written=707429882
... View more
Labels:
- Labels:
-
Apache Hadoop
05-26-2016
12:46 PM
@Ravi MutyalaThere is no job running currently. So I believe the files can be removed manually. But does this happen with all failed jobs? Will it be a manual process every time to remove such kind of leftover files or any process is available to remove these files periodically?
... View more
05-26-2016
12:12 PM
@Sagar Shimpi Thanks a lot for your response. Our Hadoop version is Hadoop 2.7.1.2.3.2.0-2950 and all the settings related to log configuration looks fine. I have checked the yarn logs but found only the below warnings. 2016-05-25 15:39:55,813 WARN logaggregation.LogAggregationService (LogAggregationService.java:verifyAndCreateRemoteLogDir(195)) - Remote Root Log Dir [/app-logs] already exist, but with inc
orrect permissions. Expected: [rwxrwxrwt], Found: [rwxrwxrwx]. The cluster may have problems with multiple users.
2016-05-25 15:39:55,813 WARN logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:<init>(190)) - rollingMonitorInterval is set as -1. The log rolling mornitoring interval is disab
led. The logs will be aggregated after this application is finished. We see some output files in the appcache folder which is taking more space. /u01/hadoop/yarn/local/usercache/hive/appcache
... View more