- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to change SparkR working directory? How to find default SparkR working directory on HDFS?
- Labels:
-
Apache Hadoop
-
Apache Spark
Created ‎06-09-2016 02:54 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I know these questions seem very basic, but there seems to be a discrepancy between the HDFS structure in my sparkR and what I see in Ambari. In SparkR, the default working directory is "/usr/hdp/2.4.0.0-169/spark". But in Ambari, I don't see /usr, but /user, which does contain a /spark directory but this just contains a /.sparkStaging direcotry, which is empty.
I have tried to change the workign directory with setwd() but if I just pass directory path as string, e.g. "/user/" it throws error cannot change working directory. I can only seem to change to /tmp.
I could include more details, but I think I am missing something basic here, which will probably solve lots of other questions. Help please?
Thanks
Aidan
Created ‎06-13-2016 12:35 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello ,
Thanks everyone. As it turned out, some Ambari features were in maintenance mode, which meant there actually was a discrepancy between the discoverable folder structures. Turning off maintenance mode and rebooting did the trick!
Thanks
Aidan
Created ‎06-09-2016 03:04 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I believe the working directory will be local filesystem i.e under /usr. However the /user directory is hdfs location where it has each user home directory and spark will use it for staging area. You need to point the setwd() to some local path instead of hdfs path.
Created ‎06-11-2016 12:42 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Below is the example is used to access hdfs from SparkR.
-bash-4.1$ hadoop fs -ls /user/hdfs/passwd -rw-r--r-- 3 hdfs hdfs 2296 2016-06-09 16:29 /user/hdfs/passwd -bash-4.1$ -bash-4.1$ SparkR > sqlContext <- sparkRSQL.init(sc) > people <- read.df(sqlContext,"/user/hdfs/passwd", "text") > head(people)
If you created the hive table in non default location then kindly use below command to see the underlining hdfs location.
hive> desc extended "tablename";
To access hive through SparkR.
-bash-4.1$ SparkR
> hiveContext <- sparkRHive.init(sc) > sql(hiveContext, "CREATE TABLE IF NOT EXISTS src (key INT, value STRING)") > sql(hiveContext, "LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src") # Queries can be expressed in HiveQL. > results <- sql(hiveContext, "FROM src SELECT key, value") # results is now a DataFrame >head(results)
Created ‎06-13-2016 11:57 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Aidan Condron Did you tried with above steps?
Created ‎06-09-2016 03:10 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can change other elements of the default configuration by modifying spark-env.sh. You can change the following:
SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports SPARK_WORKER_CORES, to set the number of cores to use on this machine SPARK_WORKER_MEMORY, to set how much memory to use (for example 1000MB, 2GB) SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT SPARK_WORKER_INSTANCE, to set the number of worker processes per node SPARK_WORKER_DIR, to set the working directory of worker processes
Created ‎06-10-2016 11:24 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks guys, but those answers aren't quite on point. I suppose the real question is how to access HDFS through SparkR. For example, I know hive tables are accessible, but if they are not in the default /apps/warehouse/ location, how do I find and read them? Thanks a million!
Created ‎06-13-2016 12:35 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello ,
Thanks everyone. As it turned out, some Ambari features were in maintenance mode, which meant there actually was a discrepancy between the discoverable folder structures. Turning off maintenance mode and rebooting did the trick!
Thanks
Aidan
