Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

how to increase hdfs disk space

avatar
Expert Contributor

I have hdp installed on server with ambari.

HDFS disk space is 100% utilized after I have run service check. As per my understanding, some folders were created during the process. I have minimal understanding of it though.

Now, there are .staging folder created in the hdfs directory which I believe is because service check could not be completed?

My disk space is only 1GB I guess in the cluster. Do I need to increase it? (I guess I do). If yes, How do I increase it? What would be the ideal amount?

Also, theoretically, there are always multiple data nodes in the cluster. Ambari shows only one. Do I need to create new ones myself?

What are the advantages and how do I create them?

Also, what should be the ideal number of data nodes and why?

1 ACCEPTED SOLUTION

avatar
Master Guru

@simran kaur

Too many questions 🙂

Now, there are .staging folder created in the hdfs directory which I believe is because service check could not be completed?

--> YARN requires a staging directory for temporary files created by running jobs.

My disk space is only 1GB I guess in the cluster. Do I need to increase it? (I guess I do). If yes, How do I increase it? What would be the ideal amount?

--> Please have a look at below properties

   <property>
      <name>dfs.datanode.data.dir</name>
      <value>/hadoop/hdfs/data</value>
      <final>true</final>

This property has list of disks to be used for HDFS, you can add new disks to your linux machine and mention here by comma separated list.

Have a look at https://community.hortonworks.com/questions/9772/how-to-add-more-disks-to-hdfs.html for more details

Also, theoretically, there are always multiple data nodes in the cluster. Ambari shows only one. Do I need to create new ones myself?

Yes. You can spin up one more VM and add it using ambari. Here is the guide to add new node using ambari - http://hortonworks.com/hadoop-tutorial/using-apache-ambari-add-new-nodes-existing-cluster/

What are the advantages and how do I create them?

with multiple nodes, you will get more storage capacity and processing power.

Also, what should be the ideal number of data nodes and why?

You can run every component on single node, it really depends on your use case.

Hope this information helps!

View solution in original post

1 REPLY 1

avatar
Master Guru

@simran kaur

Too many questions 🙂

Now, there are .staging folder created in the hdfs directory which I believe is because service check could not be completed?

--> YARN requires a staging directory for temporary files created by running jobs.

My disk space is only 1GB I guess in the cluster. Do I need to increase it? (I guess I do). If yes, How do I increase it? What would be the ideal amount?

--> Please have a look at below properties

   <property>
      <name>dfs.datanode.data.dir</name>
      <value>/hadoop/hdfs/data</value>
      <final>true</final>

This property has list of disks to be used for HDFS, you can add new disks to your linux machine and mention here by comma separated list.

Have a look at https://community.hortonworks.com/questions/9772/how-to-add-more-disks-to-hdfs.html for more details

Also, theoretically, there are always multiple data nodes in the cluster. Ambari shows only one. Do I need to create new ones myself?

Yes. You can spin up one more VM and add it using ambari. Here is the guide to add new node using ambari - http://hortonworks.com/hadoop-tutorial/using-apache-ambari-add-new-nodes-existing-cluster/

What are the advantages and how do I create them?

with multiple nodes, you will get more storage capacity and processing power.

Also, what should be the ideal number of data nodes and why?

You can run every component on single node, it really depends on your use case.

Hope this information helps!