Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Pushing file to HDFS and Cluster AND How to connect to Cloudera from remote machine which is not part of cluster

avatar
New Contributor

Hi,

I have a Linux box which i configured HDFS client to connect and able to put files in the directories.

Note that the machine is not part of the cluster.

 

1. Is it possible to run hive queries through hive CLI from this client machine? If so what all configuration i need to to. 

2. Need one understanding on pushing file to hdfs. If i am pushing the file to hive table under below location and if it is a clustered ENV. will there be any side effect? As of now mine is a single node cluster so using the below.

hdfs dfs -rm -r /user/hive/warehouse/schema.db/table
hdfs dfs -copyFromLocal -f /scratch/mano/table.dat /user/hive/warehouse/schema.db/table

 

Thanks in advance.

Manoj

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Hi Manoj,

 

Question 1: Yes, it is possible to install client packages in a separate machine and connect to a Hadoop server. Follow the link (https://docs.cloudera.com/documentation/enterprise/5-3-x/topics/cm_mc_client_config.html) which has detailed information about how to download and deploy client configuration file manually.

 

Question 2:  If you push a file from local to HDFS normal HDFS write operation will be performed. Check out the link(https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html) for detailed information

 

Let's Hadoop

Rajkumar

View solution in original post

2 REPLIES 2

avatar
Expert Contributor

Hi Manoj,

 

Question 1: Yes, it is possible to install client packages in a separate machine and connect to a Hadoop server. Follow the link (https://docs.cloudera.com/documentation/enterprise/5-3-x/topics/cm_mc_client_config.html) which has detailed information about how to download and deploy client configuration file manually.

 

Question 2:  If you push a file from local to HDFS normal HDFS write operation will be performed. Check out the link(https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html) for detailed information

 

Let's Hadoop

Rajkumar

avatar
New Contributor

Hi MRajkumar,

I added the the machine to the cluster,

but interested in knowing if i don't add the machine to cluster and i want to setup client configuration for HDFS I am able to ping and update any file in HDFS but can i configure for hive as well to run hive qurery through command-line like below.

USE schemaName;
LOAD DATA INPATH '${HDFS_LOAD_DIR}/table.csv' OVERWRITE INTO TABLE table;

 

2nd question is: When i used HDFS client commands and placed the file under the hive warehouse table directory directly the data file i can see the same data when i query on hue, Note that mine is a single node cluster means only one data node is present. What if it is a multi node and many data nodes are present, in that case  the file will be replicated in all of them, if i perform the below command will it be propagated? and replicated the same on all the data nodes?

 hdfs dfs -rm -r $TABLE_PATH/*
hdfs dfs -copyFromLocal -f $dat/csvfilepath $TABLE_PATH

 

my $TABLE_PATH is the warehouse directory+schamaname.db+/tablename

$TABLE_PATH=/user/hive/warehouse/schema1.db/table

Thank you.