About sodonnell

sodonnell · ‎09-15-2021

Often this happens as there is a "hidden" character at the end of the file or folder name. For example a line break (\n, \r, etc). If you list the files you can get a clue that is the case as usually the output will look strange with an extra line or something there. You can try running a few commands like the following to see if it matches a file: hdfs dfs -ls $'/path/to/folder\r' hdfs dfs -ls $'/path/to/folder\n' hdfs dfs -ls $'/path/to/folder\r\n' If any of those match, then you can delete the incorrect one with a similar command. If you get no luck with that, then pipe the ls output into "od -c" and it will show the special characters hdfs dfs -ls /path/to/folder | od -c

Punravel · ‎03-04-2019

Thanks. I successfully rescued the unrecognized blocks. Addressing the underlying issue of missing but finalized blocks will take time. Hopefully upgrading to a later CDH will work.

Rash · ‎10-16-2018

Hi Guys, I am facing similar issue. I have a new installation of Cloudera and i am trying to run a simple Map reduce Pi Example. The job gets stuck at the map 0% and reduce 0% step as shown below. test@spark-1 ~]$ sudo -u hdfs hadoop jar /data/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 100 Number of Maps = 10 Samples per Map = 100 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Starting Job 18/10/16 12:33:25 INFO input.FileInputFormat: Total input paths to process : 10 18/10/16 12:33:26 INFO mapreduce.JobSubmitter: number of splits:10 18/10/16 12:33:26 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1539705370715_0002 18/10/16 12:33:26 INFO impl.YarnClientImpl: Submitted application application_1539705370715_0002 18/10/16 12:33:26 INFO mapreduce.Job: The url to track the job: http://spark-4:8088/proxy/application_1539705370715_0002/ 18/10/16 12:33:26 INFO mapreduce.Job: Running job: job_1539705370715_0002 18/10/16 12:33:31 INFO mapreduce.Job: Job job_1539705370715_0002 running in uber mode : false 18/10/16 12:33:31 INFO mapreduce.Job: map 0% reduce 0% I made multiple config changes, but cannot find a solution for this. The only error i could trace was in the nodemanager log file as below : ERROR org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL 15: SIGTERM I tried checking various properties discussed in this thread, but i still have that issue. Can someone please help in solving this issue? Please let me know what all details i can provide.

sodonnell · ‎02-09-2017

You can put the s3 credentials in the s3 URI, or you can just pass the parameters on the command line, which is what I prefer, eg: hadoop fs -Dfs.s3a.access.key="" -Dfs.s3a.secret.key="" -ls s3a://bucket-name/ Its also worth knowing that if you run the command like I have given above, it will override any other settings that are defined in the cluster config, such as core-site.xml etc.

sodonnell · ‎11-23-2016

Hi, It you set the balancer bandwidth in Cloudera manager, then when the datanodes are started, they will have that bandwidth seting for balancing operations. However, using the command line it is possible to change the bandwidth while the datanodes or balancer is running, without restarting them. If you do this, you just need to remember that if you restart any datanodes, the bandwidth setting will revert back to that set in Cloudera Manager and you will need to run the command again. To set the balancer bandwidth from the command line and without a restart you can run the following: sudo -u hdfs hdfs dfsadmin -setBalancerBandwidth 104857600 If you have HA setup for HDFS, the above command may fail, and you should check which is the active namenode and run the command like follows (substituting the correct hostname for activeNamenode below): sudo -u hdfs hdfs dfsadmin -fs hdfs://activeNamenode:8020/ -setBalancerBandwidth 104857600 To check this command worked, the following log entries should appear within the Datanode log files: INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeCommand action: DNA_BALANCERBANDWIDTHUPDATE INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Updating balance throttler bandwidth from 10485760 bytes/s to: 104857600 bytes/s.

Online	Offline
Last Visited	‎11-21-2024 07:31 AM

Member Since	‎10-07-2015 08:58 AM
Last Visited	‎11-21-2024 07:31 AM
Posts	17
Kudos received	7

Cloudera Community

Re: hdfs has two folder with same names

Re: DataNode block report incomplete, RemoteExcept...

Re: I run a Hadoop job, but it got stucked and not...

Re: How can I use balancer option "setBalancerBand...

Re: hdfs has two folder with same names

Re: DataNode block report incomplete, RemoteExcept...

Re: I run a Hadoop job, but it got stucked and not...

Re: Unable to access S3 bucket using S3A, HDFS CLI...

Re: How can I use balancer option "setBalancerBand...