Member since
07-25-2016
40
Posts
5
Kudos Received
0
Solutions
05-02-2023
03:51 AM
Please add zkcli command to login in znode and remove directory. Hope you understand. zookeeper-client -server <zookeeper-server-host>:2181 (May use sudo if permission issue or login from HDFS User) ls / or ls /hadoop-ha (If you don't see any znode /hadoop-ha in ZK znode list, skip the step below) rmr /hadoop-ha/nameservice1
... View more
09-25-2017
06:41 AM
1 Kudo
Hi @oula.alshiekh@gmail.com alshiekh , The error looks not related to number of threads running in the datanode. It really looks related to connection problem. It would be really helpful if you can provide more detaile stacktrace. My GUESS is that there could be chances that, Next datanode in the pipeline (given by the namenode) is down. So first datanode is not able to connect to next datanode and throwing the above mentioned exception. Since you have 6 datanodes, writes could be successful with remaining nodes in the cluster.
... View more
09-25-2017
06:44 AM
1 Kudo
Hi @oula.alshiekh@gmail.com alshiekh , From the above stacktraces it looks like, socket timeouts are set to very less values as 300ms. Hadoop's default values are ReadTimeout=60000, WriteTimeout=8*60000 Please check below configurations in Datanode's configs, "dfs.client.socket-timeout" "dfs.datanode.socket.write.timeout" If the values are set to 300ms, then please increase these values and restart the datanodes.
... View more
08-15-2017
03:51 PM
The parameter to specify more than one path for storage in Hadoop is in hdfs-site.xml. Property: dfs.datanode.data.dir (Please verify) dfs.datanode.data.dir value can be any directory which is available on the datanode.It determines where on the local filesystem data node should store its blocks. It can be a directory where disk partitions are mounted like '/user1/hadoop/data, /user2/hadoop/data' which is in case if you have multiple disks partitions to be used for HDFS the purpose. When it has multiple values, data is copied to the HDFS in a round-robin fashion. If one of the directory's disk is full, round-robin data copy will continue on the rest of the directories. You can also define the Storage system in the HDFS for multiple locations Please refer below link. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_hdfs_admin_tools/content/configuring_archival_storage.html
... View more
04-20-2017
12:13 AM
Hi Team, Similar problem, Unable to find this package -- httpfs-3.0.0-alpha2.tar.gz I am using Hadooop 2.7.3 on Centos-7 -- Lokesh
... View more
04-05-2017
11:29 AM
1 Kudo
@oula.alshiekh@gmail.com alshiekh There are basically two methods which ship with Hadoop: "shell" and "sshfence". The sshfence option SSHes to the target node and uses fuser to kill the process listening on the service’s TCP port. In order for this fencing option to work, it must be able to SSH to the target node without providing a passphrase. You can define username though, one must also configure the dfs.ha.fencing.ssh.private-key-files option, which is a comma-separated list of SSH private key files.
However you can define the username/port/timeout of your choice as mentioned below. "sshfence([[username][:port]])" <property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence([[username][:port]])</value>
</property> [1] Reference: https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html .
... View more
01-06-2017
07:15 PM
There are people who instead of helping what they do is confusing. This is a straight answer to the point. Congratulations and thank you.
... View more
12-08-2016
10:55 AM
Another Question Please:what is the benefit of installing hive server on other nodes rather than name node if we choose another node to install hive server rather than name node will hive commands be handled from this node or we should install hive on name node firstly
... View more
12-08-2016
11:42 AM
3 Kudos
@oula.alshiekh@gmail.com alshiekhadd datanode if you are running out of storage capacity of cluster, add computation node when you see bottleneck in processing, by adding more computation nodes you can launch more mapreduce/spark task. you can also use your node to store data as well as to add more processing capacity(in terms of more no mapreduce tasks)
... View more