Member since
10-07-2015
17
Posts
7
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2788 | 09-15-2021 03:40 AM | |
4786 | 02-28-2019 03:31 PM | |
56255 | 12-02-2016 03:04 AM | |
5977 | 11-23-2016 08:15 AM |
09-15-2021
03:40 AM
1 Kudo
Often this happens as there is a "hidden" character at the end of the file or folder name. For example a line break (\n, \r, etc). If you list the files you can get a clue that is the case as usually the output will look strange with an extra line or something there. You can try running a few commands like the following to see if it matches a file: hdfs dfs -ls $'/path/to/folder\r' hdfs dfs -ls $'/path/to/folder\n' hdfs dfs -ls $'/path/to/folder\r\n' If any of those match, then you can delete the incorrect one with a similar command. If you get no luck with that, then pipe the ls output into "od -c" and it will show the special characters hdfs dfs -ls /path/to/folder | od -c
... View more
03-04-2019
05:43 PM
Thanks. I successfully rescued the unrecognized blocks. Addressing the underlying issue of missing but finalized blocks will take time. Hopefully upgrading to a later CDH will work.
... View more
10-16-2018
09:37 AM
Hi Guys, I am facing similar issue. I have a new installation of Cloudera and i am trying to run a simple Map reduce Pi Example. The job gets stuck at the map 0% and reduce 0% step as shown below. test@spark-1 ~]$ sudo -u hdfs hadoop jar /data/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 100
Number of Maps = 10
Samples per Map = 100
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
18/10/16 12:33:25 INFO input.FileInputFormat: Total input paths to process : 10
18/10/16 12:33:26 INFO mapreduce.JobSubmitter: number of splits:10
18/10/16 12:33:26 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1539705370715_0002
18/10/16 12:33:26 INFO impl.YarnClientImpl: Submitted application application_1539705370715_0002
18/10/16 12:33:26 INFO mapreduce.Job: The url to track the job: http://spark-4:8088/proxy/application_1539705370715_0002/
18/10/16 12:33:26 INFO mapreduce.Job: Running job: job_1539705370715_0002
18/10/16 12:33:31 INFO mapreduce.Job: Job job_1539705370715_0002 running in uber mode : false
18/10/16 12:33:31 INFO mapreduce.Job: map 0% reduce 0% I made multiple config changes, but cannot find a solution for this. The only error i could trace was in the nodemanager log file as below : ERROR org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL 15: SIGTERM I tried checking various properties discussed in this thread, but i still have that issue. Can someone please help in solving this issue? Please let me know what all details i can provide.
... View more
02-09-2017
06:59 AM
You can put the s3 credentials in the s3 URI, or you can just pass the parameters on the command line, which is what I prefer, eg: hadoop fs -Dfs.s3a.access.key="" -Dfs.s3a.secret.key="" -ls s3a://bucket-name/ Its also worth knowing that if you run the command like I have given above, it will override any other settings that are defined in the cluster config, such as core-site.xml etc.
... View more
11-23-2016
08:15 AM
2 Kudos
Hi, It you set the balancer bandwidth in Cloudera manager, then when the datanodes are started, they will have that bandwidth seting for balancing operations. However, using the command line it is possible to change the bandwidth while the datanodes or balancer is running, without restarting them. If you do this, you just need to remember that if you restart any datanodes, the bandwidth setting will revert back to that set in Cloudera Manager and you will need to run the command again. To set the balancer bandwidth from the command line and without a restart you can run the following: sudo -u hdfs hdfs dfsadmin -setBalancerBandwidth 104857600 If you have HA setup for HDFS, the above command may fail, and you should check which is the active namenode and run the command like follows (substituting the correct hostname for activeNamenode below): sudo -u hdfs hdfs dfsadmin -fs hdfs://activeNamenode:8020/ -setBalancerBandwidth 104857600 To check this command worked, the following log entries should appear within the Datanode log files: INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeCommand action: DNA_BALANCERBANDWIDTHUPDATE INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Updating balance throttler bandwidth from 10485760 bytes/s to: 104857600 bytes/s.
... View more