Member since
01-25-2017
396
Posts
28
Kudos Received
11
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
835 | 10-19-2023 04:36 PM | |
4366 | 12-08-2018 06:56 PM | |
5458 | 10-05-2018 06:28 AM | |
19877 | 04-19-2018 02:27 AM | |
19899 | 04-18-2018 09:40 AM |
02-27-2017
11:01 PM
Yes I did, but since i didn't the catch the issue on time i got: The filesystem under path '/user/dataint/.staging' has 0 CORRUPT files I will try to catch the issue when i happens, I'm suspect also on bad disk that might cause the issue, why such directory could be 1 replica? Is there a default for this, all my cluster with replication factor 3.
... View more
02-27-2017
12:35 PM
The weird thing it's happening sporadic and on the same error on different block: When i'm trying to list the file in the HDFS i cann't find the file, i suspect it happen on specifc disk on specific data node but it's happens only with one job, the job after 3 failures on the same node blaclisted the node till it blacklisted all data nodes and fail then. At the next run it success 2017-02-27 13:36:03,460 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1486363199991_135199_m_000014_0: Error: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-76191351-10.160.96.6-1447247246852:blk_1158119244_84380902 file=/user/dataint/.staging/job_1486363199991_135199/job.split
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:963)
... View more
02-27-2017
07:09 AM
1 Kudo
Hi, Anyone can help to understand this ERROR: The IP: 10.160.96.6 is the standby NN 2017-02-26 01:35:40,427 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1486363199991_126195_m_000026_3: Error: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-76191351-10.160.96.6-1447247246852:blk_1157585017_83846591 file=/user/dataint/.staging/job_1486363199991_126195/job.split at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:963) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:610) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:851) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:904) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:704) at java.io.DataInputStream.readByte(DataInputStream.java:265) at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308) at org.apache.hadoop.io.WritableUtils.readVIntInRange(WritableUtils.java:348) at org.apache.hadoop.io.Text.readString(Text.java:471) at org.apache.hadoop.io.Text.readString(Text.java:464) at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1707) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 2017-02-26 01:35:40,427 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1486363199991_126195_m_000026_3 TaskAttempt Transitioned from RUNNING to FAIL_FINISHING_CONTAINER 2017-02-26 01:35:40,429 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1486363199991_126195_m_000026 Task Transitioned from RUNNING to FAILED 2017-02-26 01:35:40,429 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 20 2017-02-26 01:35:40,429 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Job failed as tasks failed. failedMaps:1 failedReduces:0 2017-02-26 01:35:40,430 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1486363199991_126195Job Transitioned from RUNNING to FAIL_WAIT 2017-02-26 01:35:40,435 ERROR [Thread-53] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Could not deallocate container for task attemptId attempt_1486363199991_126195_r_000000_0 2017-02-26 01:35:40,435 INFO [Thread-53] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Processing the event EventType: CONTAINER_DEALLOCATE 2017-02-26 01:35:40,435 ERROR [Thread-53] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Could not deallocate container for task attemptId attempt_1486363199991_126195_r_000001_0 2017-02-26 01:35:40,435 INFO [Thread-53] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Processing the event EventType: CONTAINER_DEALLOCATE 2017-02-26 01:35:40,435 ERROR [Thread-53] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Could not deallocate container for task attemptId attempt_1486363199991_126195_r_000002_0
... View more
Labels:
- Labels:
-
MapReduce
02-26-2017
08:07 AM
I manage to solve by adding mapred-site.xml at the oozie server under /etc/hadoop/conf and overwriting the submit replication
... View more
02-26-2017
04:18 AM
I want to find the place that i can disable passing specific host to submit the job through. Is see that the oozie launcher for the job is submitting from slpr-mha01 which is the JT,NN and Oozie node but he the job itself is submitted through DN. The jobs are scheduled using Oozie.
... View more
02-26-2017
03:04 AM
Hi, Can i enforce this at the cluster level? This is the coord job configuration for the running job, vlpr-mha01 is acts as JT and NN. <configuration> <property> <name>jobType</name> <value>rm</value> </property> <property> <name>dwhType</name> <value>da</value> </property> <property> <name>oozie.coord.application.path</name> <value>hdfs://vlpr-mha01:54310/liveperson/code/server_dataaccess_retention/lp-dataaccess-retention-1.0.0.1/sched/</value> </property> <property> <name>recycleBinDir</name> <value>hdfs://vlpr-mha01:54310/liveperson/data/server_dataaccess_retention/recycle_bin/</value> </property> <property> <name>freq</name> <value>1440</value> </property> <property> <name>workflowAppUri</name> <value>hdfs://vlpr-mha01:54310/liveperson/code/server_dataaccess_retention/lp-dataaccess-retention-1.0.0.1/sched/</value> </property> <property> <name>start</name> <value>2014-03-02T10:24Z</value> </property> <property> <name>user.name</name> <value>dataaccess</value> </property> <property> <name>jobRoot</name> <value>hdfs://vlpr-mha01:54310/liveperson/code/server_dataaccess_retention/lp-dataaccess-retention-1.0.0.1</value> </property> <property> <name>workingOnDir</name> <value>hdfs://vlpr-mha01:54310/liveperson/data/server_dataaccess_retention/recycle_bin/</value> </property> <property> <name>oozie.libpath</name> <value>hdfs://vlpr-mha01:54310/liveperson/code/server_dataaccess_retention/lp-dataaccess-retention-1.0.0.1/lib</value> </property> <property> <name>nameNode</name> <value>hdfs://vlpr-mha01:54310</value> </property> <property> <name>end</name> <value>2020-01-01T00:00Z</value> </property> <property> <name>jobTracker</name> <value>vlpr-mha01:54311</value> </property> </configuration> This is an old cluster that trying not make changes at the job level, should be dead in 6 months
... View more
02-26-2017
02:05 AM
Hi, I see some jobs in my cluster that submitted the job via the Jobtracker node. Looking at all data nodes and mapred.submit.replication is 2, in the job tracker mapred-site.xml there is no mapred.submit.replication property, i added it manually to the file and restarted the job tracker, but still see in the job file for the running jobs that has job tracker as the Submit Host the mapred.submit.replication is 10 and not 2.
... View more
Labels:
- Labels:
-
MapReduce
02-25-2017
12:35 PM
Just thinking to delete the spark service and then add it. Anyone who experienced the same issue?
... View more
02-23-2017
08:18 PM
Hi, Is it planned to add this ability to the express cloudera manager version? is there any similar thing i can do woth the express version?
... View more
02-23-2017
12:49 PM
Digging down in the cluster, i found one of the application that runs outside of the hadoop cluster has clients that make hdfs dfs -put to the hadoop cluster, these clients weren't have hdfs-site.xml and it got the default replication factor for the cluster, what i did? tested the hdfs dfs -put from a cleint server in my cluster and the client out side the cluster and notice the client outside the cluster put files with replication factor 3, to solve the issue i added hdfs-site.xml to each of the clients outside the cluster and override the default replication factor at the file.
... View more