Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (2)
Rising Star

Hdfs can be accessed in R by specifying namenode host (hdfs://<hostname>:/user/test), but in case of namenode failover it won't work. So we should set hdfs config directory in R so it recognizes the hadoop namenode and failover configurations..

Set Hadoop and Spark home and config directories into R environment as below:

# set up the SPARK_HOME
Sys.setenv (SPARK_HOME="/usr/hdp/current/spark-client")
#set up the HADOOP config dir
Sys.setenv (YARN_CONF_DIR="/usr/hdp/current/hadoop-client/conf")
Sys.setenv (HADOOP_CONF_DIR="/usr/hdp/current/hadoop-client/conf")
# read the data file 
sc = sparkR.init(sparkPackages="com.databricks:spark-csv_2.11:1.0.3")
sqlContext = sparkRSQL.init(sc)
people = read.df(sqlContext, "hdfs://HDFS-HA/users/people.json", "json")
head(people)
1,339 Views
0 Kudos
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
1 of 1
Last update:
‎05-25-2016 11:54 AM
Updated by:
 
Contributors
Top Kudoed Authors