Options
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Expert Contributor
Created on 05-25-2016 11:54 AM
Hdfs can be accessed in R by specifying namenode host (hdfs://<hostname>:/user/test), but in case of namenode failover it won't work. So we should set hdfs config directory in R so it recognizes the hadoop namenode and failover configurations..
Set Hadoop and Spark home and config directories into R environment as below:
# set up the SPARK_HOME Sys.setenv (SPARK_HOME="/usr/hdp/current/spark-client") #set up the HADOOP config dir Sys.setenv (YARN_CONF_DIR="/usr/hdp/current/hadoop-client/conf") Sys.setenv (HADOOP_CONF_DIR="/usr/hdp/current/hadoop-client/conf") # read the data file sc = sparkR.init(sparkPackages="com.databricks:spark-csv_2.11:1.0.3") sqlContext = sparkRSQL.init(sc) people = read.df(sqlContext, "hdfs://HDFS-HA/users/people.json", "json") head(people)