Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

R installation on Spark cluster running on cloudera5.9 failing

Highlighted

R installation on Spark cluster running on cloudera5.9 failing

Contributor

Hi,

I have 3 node cluster having Cloudera 5.9 running on CentOS6.7.

My Spark is running on Yarn. I have to install R on Spark home directoryso that I can use SparkR.

I got the epel RPM, and then tried to install R using YUM however its giving error. I even tried some other RPM however they are giving error too. Using --skip-broken option is also not working. Please help

[root@LnxMasterNode01 spark]# rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm

Retrieving http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm

warning: /var/tmp/rpm-tmp.XuRVi8: Header V3 RSA/SHA256 Signature, key ID 0608b895: NOKEY

Preparing... ########################################### [100%]

1:epel-release ########################################### [100%]

Last part of the code shows error below:

[root@LnxMasterNode01 spark]# yum install R
Loaded plugins: fastestmirror, security
Setting up Install Process
Loading mirror speeds from cached hostfile
 * epel: ftp.osuosl.org
Resolving Dependencies
--> Running transaction check
---> Package R.i686 0:2.13.0-2.el6.rf will be updated
---> Package R.x86_64 0:3.3.2-2.el5 will be an update
--> Processing Dependency: libRmath-devel = 3.3.2-2.el5 for package: R-3.3.2-2.el5.x86_64
--> Processing Dependency: R-devel = 3.3.2-2.el5 for package: R-3.3.2-2.el5.x86_64
--> Running transaction check
.
.
---> Package ppl.x86_64 0:0.10.2-11.el6 will be installed
---> Package texlive-texmf-errata-dvips.noarch 0:2007-7.1.el6 will be installed
---> Package texlive-texmf-errata-fonts.noarch 0:2007-7.1.el6 will be installed
--> Finished Dependency Resolution
Error: Package: R-core-3.3.2-2.el5.x86_64 (epel)
           Requires: libtk8.4.so()(64bit)
Error: Package: R-core-3.3.2-2.el5.x86_64 (epel)
           Requires: libtcl8.4.so()(64bit)
Error: Package: R-core-3.3.2-2.el5.x86_64 (epel)
           Requires: libgssapi.so.2(libgssapi_CITI_2)(64bit)


Error: Package: R-core-3.3.2-2.el5.x86_64 (epel)
           Requires: libRblas.so()(64bit)
Error: Package: libRmath-3.3.2-2.el5.x86_64 (epel)
           Requires: libgssapi.so.2(libgssapi_CITI_2)(64bit)
Error: Package: libRmath-3.3.2-2.el5.x86_64 (epel)
           Requires: libgssapi.so.2()(64bit)
Error: Package: R-core-3.3.2-2.el5.x86_64 (epel)
           Requires: libgssapi.so.2()(64bit)
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest
[root@LnxMasterNode01 spark]# 

I also checked here http://hortonworks.com/hadoop-tutorial/a-lap-around-apache-spark/ and http://www.jason-french.com/blog/2013/03/11/installing-r-in-linux/ these two links also suggest the same thing. Am I doing something wrong.

Please suggest.

Thanks,

Shilpa

4 REPLIES 4
Highlighted

Re: R installation on Spark cluster running on cloudera5.9 failing

Contributor

this issue is resolved.

I guess I previously install the wrong EPEL release package on this machine. So to resolve it, I did:

[root@LnxMasterNode01 spark]# yum clean all

[root@LnxMasterNode01 spark]# yum install epel-release

[root@LnxMasterNode01 spark]# yum install R

Now, I am able to run 'R' however I cannot see it in my Spark home directory nor spark/bin has sparkR.

[root@LnxMasterNode01 spark]# ll
total 36276
drwxr-xr-x 3 root root     4096 Oct 21 05:00 assembly
drwxr-xr-x 2 root root     4096 Oct 21 05:00 bin
drwxr-xr-x 2 root root     4096 Oct 21 05:00 cloudera
lrwxrwxrwx 1 root root       15 Nov 25 16:01 conf -> /etc/spark/conf
-rw-r--r-- 1 root root    12232 Jan  4 16:20 epel-release-5-4.noarch.rpm
drwxr-xr-x 3 root root     4096 Oct 21 05:00 examples
drwxr-xr-x 2 root root     4096 Oct 21 05:08 lib
-rw-r--r-- 1 root root    17352 Oct 21 05:00 LICENSE
drwxr-xr-x 2 root root     4096 Jan  2 18:09 logs
-rw-r--r-- 1 root root    23529 Oct 21 05:00 NOTICE
drwxr-xr-x 6 root root     4096 Oct 21 05:00 python
-rw-r--r-- 1 root root 37053596 Jan  4 17:16 R-2.13.0-2.el6.rf.i686.rpm
-rw-r--r-- 1 root root        0 Oct 21 05:00 RELEASE
drwxr-xr-x 2 root root     4096 Oct 21 05:00 sbin
lrwxrwxrwx 1 root root       19 Nov 25 16:01 work -> /var/run/spark/work
[root@LnxMasterNode01 spark]#

Is it same as SparkR? Please guide

[root@LnxMasterNode01 ~]# R
R version 3.3.2 (2016-10-31) -- "Sincere Pumpkin Patch"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-redhat-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions. .
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> q()
Save workspace image? [y/n/c]: n
[root@LnxMasterNode01 ~]# 

Thanks,

Shilpa

Re: R installation on Spark cluster running on cloudera5.9 failing

Rising Star

Hi @shilpa kumar. sparkR is already available in spark home.

ls -lrt /usr/hdp/current/spark-client/bin total 36
-rwxr-xr-x 1 root root 1023 Aug 26 02:42 spark-submit
-rwxr-xr-x 1 root root 1044 Aug 26 02:42 spark-sql
-rwxr-xr-x 1 root root 2988 Aug 26 02:42 spark-shell
-rwxr-xr-x 1 root root 1026 Aug 26 02:42 sparkR
-rwxr-xr-x 1 root root 3478 Aug 26 02:42 spark-class
-rwxr-xr-x 1 root root 2333 Aug 26 02:42 run-example
-rwxr-xr-x 1 root root 3431 Aug 26 02:42 pyspark
-rwxr-xr-x 1 root root 2479 Aug 26 02:42 load-spark-env.sh
-rwxr-xr-x 1 root root 1067 Aug 26 02:42 beeline

But to use sparkR you need to have R installed on all your node. Otherwise you will get below error. As you have already installed R it should work fine now.

./sparkR
env: R: No such file or directory
Highlighted

Re: R installation on Spark cluster running on cloudera5.9 failing

Contributor

Hi @nyadav,

Remember I am using cloudera distrbution. And here we dont have SparkR by default. I have installed R on my node but still can not see sparkR under $spark_home/bin directory. However can go to R console, See this and let me know what do you think:

[root@LnxMasterNode01 spark]# ll
total 36276
drwxr-xr-x 3 root root     4096 Oct 21 05:00 assembly
drwxr-xr-x 2 root root     4096 Oct 21 05:00 bin
drwxr-xr-x 2 root root     4096 Oct 21 05:00 cloudera
lrwxrwxrwx 1 root root       15 Nov 25 16:01 conf -> /etc/spark/conf
-rw-r--r-- 1 root root    12232 Jan  4 16:20 epel-release-5-4.noarch.rpm
drwxr-xr-x 3 root root     4096 Oct 21 05:00 examples
drwxr-xr-x 2 root root     4096 Oct 21 05:08 lib
-rw-r--r-- 1 root root    17352 Oct 21 05:00 LICENSE
drwxr-xr-x 2 root root     4096 Jan  2 18:09 logs
-rw-r--r-- 1 root root    23529 Oct 21 05:00 NOTICE
drwxr-xr-x 6 root root     4096 Oct 21 05:00 python
-rw-r--r-- 1 root root 37053596 Jan  4 17:16 R-2.13.0-2.el6.rf.i686.rpm
-rw-r--r-- 1 root root        0 Oct 21 05:00 RELEASE
drwxr-xr-x 2 root root     4096 Oct 21 05:00 sbin
lrwxrwxrwx 1 root root       19 Nov 25 16:01 work -> /var/run/spark/work
[root@LnxMasterNode01 spark]# cd bin
[root@LnxMasterNode01 bin]# ll
total 24
-rwxr-xr-x 1 root root 2857 Oct 21 05:00 load-spark-env.sh
-rwxr-xr-x 1 root root 3459 Oct 21 05:00 pyspark
-rwxr-xr-x 1 root root 2384 Oct 21 05:00 run-example
-rwxr-xr-x 1 root root 2858 Oct 21 05:00 spark-class
-rwxr-xr-x 1 root root 3026 Oct 21 05:00 spark-shell
-rwxr-xr-x 1 root root 1050 Oct 21 05:00 spark-submit
[root@LnxMasterNode01 bin]#[root@LnxMasterNode01 bin]# R


R version 3.3.2 (2016-10-31) -- "Sincere Pumpkin Patch"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-redhat-linux-gnu (64-bit)


R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.


  Natural language support but running in an English locale


R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.


Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.


>q()
Save workspace image? [y/n/c]: n


[root@LnxMasterNode01 bin]# ./sparkR
-bash: ./sparkR: No such file or directory
[root@LnxMasterNode01 bin]#



Highlighted

Re: R installation on Spark cluster running on cloudera5.9 failing

Don't have an account?
Coming from Hortonworks? Activate your account here