02-22-2014 06:34 AM
I have a CDH4.5 cluster, and I want to upload files into it from another server (e.g. database server).
With vanilla Hadoop and Hive, I can change the configuration files, pointing the namenode and metastore to remote services, and simply run:
dba@db-001$ hadoop fs -copyFromLocal /path/to/export.tsv
dba@db-001$ hive -e "load data local inpath '/path/to/export.tsv' into table test.my_table"
But how about CDH? What components should I install on other servers?
02-27-2014 02:14 PM
I think what you're describing is what we refer to as a "Gateway" machine. On a cluster under Cloudera Manager's control, we allow you to add the "Gateway" role to a machine outside the cluster. This installs the base CDH packages and deploys client configurations to that machine so that it can run regular hadoop commands like you describe and can upload files and run jobs against the cluster.
It sounds like your database server is already able to do this, can you clarify the question?