Created 11-07-2016 01:06 PM
Hello,
I deploy my first cluster HDP since one month and it are used by all my departement.
So i want to store various files in the cluster. But i dont know the best pratice to do it. can store file in master node? Edge node? data node? ...
Examples for files i want to store are :
- files for proof of concept
- jars files for application likes spark
- files for teradata client
- ifexp files
Created 11-07-2016 01:33 PM
1. Never use master or data node local storage
Best practice is definitely not to touch the master nodes or data nodes for local filesystem storage or command line interface (use edge node CLI or local machine via Ambari Views or integration through Knox gateway).
2. 3rd party tools
3rd party tools will specify where to locate their files/jars.
3. Edge node
If you need files (typically jars) for client interface to cluster, place on edge node and use client there.
If you simply want to archive files (e.g. POC work) you can do this on the edge node local file system.
4. HDFS
If you are archiving files on the edge node and it does not have high availability or backup (e.g. autoreplication of mounts) and you want this, putting it into HDFS is a good idea since each is replicated 3x.
When putting into HDFS, from a client perspective there is no specification of name node or data node -- you interact with the namenode and it will store it on the data nodes. The name node is your interface with the data nodes.
In HDFS, you could define a path like /misc and store these files there. You can also manage read-write permissions on this folder.
You can manage files (make dir, put file, get file) in hdfs through the command line (edge node is good host for this) or Ambari file view.
See: http://hortonworks.com/hadoop-tutorial/using-commandline-manage-files-hdfs/
http://hortonworks.com/blog/best-practices-in-hdfs-authorization-with-apache-ranger/
Created 11-07-2016 01:07 PM
can i create a linux file system to store all file in? and where node can i use? Thanks
Created 11-07-2016 01:51 PM
If you do anything with the linux file system, it should be on edge node only. See fuller answer below.
Created 11-07-2016 01:33 PM
1. Never use master or data node local storage
Best practice is definitely not to touch the master nodes or data nodes for local filesystem storage or command line interface (use edge node CLI or local machine via Ambari Views or integration through Knox gateway).
2. 3rd party tools
3rd party tools will specify where to locate their files/jars.
3. Edge node
If you need files (typically jars) for client interface to cluster, place on edge node and use client there.
If you simply want to archive files (e.g. POC work) you can do this on the edge node local file system.
4. HDFS
If you are archiving files on the edge node and it does not have high availability or backup (e.g. autoreplication of mounts) and you want this, putting it into HDFS is a good idea since each is replicated 3x.
When putting into HDFS, from a client perspective there is no specification of name node or data node -- you interact with the namenode and it will store it on the data nodes. The name node is your interface with the data nodes.
In HDFS, you could define a path like /misc and store these files there. You can also manage read-write permissions on this folder.
You can manage files (make dir, put file, get file) in hdfs through the command line (edge node is good host for this) or Ambari file view.
See: http://hortonworks.com/hadoop-tutorial/using-commandline-manage-files-hdfs/
http://hortonworks.com/blog/best-practices-in-hdfs-authorization-with-apache-ranger/
Created 11-07-2016 01:48 PM
Hello Greg,
Thanks for your answers; I don't talk about data files that i can store in hdfs but files like applications jars (jars for spark application) or teradata generate file.
Thanks
Created 11-07-2016 01:53 PM
As mentioned in previous comment -- you should only store files in local file system of edge node. You should never use the actual cluster (master and data nodes) for local file storage. The fuller answer gives the benefit of HDFS if you are worried about automatic backup of files. (I have seen edge nodes go down and everything lost; thus, either have automatic backup or go to hdfs for files you want to backup.)
Created 11-07-2016 02:16 PM
ok! thanks very much.
Created 11-07-2016 02:26 PM
If you feel like you have everything you need, let me know by accepting the answer; else, good to wait for additional answers or follow up with additional questions.