Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

How to remotely connect to HDP2.4VM with RStudio using R?

Explorer

Hi,

I have experience using Hive with Ambari. However, I would like to use Hive on the RDP2.4VM with RStudio using R. Simply put, connecting to the hortonwork vm remotely using R. If a user has done this, can they please tell me here how to do this and also where to find literature online on how to accomplish this if it is documented? I would also appreciate any tips on how to set up the dependencies if this was accomplished using rHive.

Thanks, Heath

1 ACCEPTED SOLUTION

14 REPLIES 14

@Heath Yates

Take a look at the below posting. It lists all the dependencies as well as setup instructions (not all steps will apply to you though).

http://www.rdatamining.com/big-data/r-hadoop-setup-guide

Explorer

Sure. Will you be around tonight in case I have questions? I'll try this in an hour or two after my commute home and dinner. I actually got RStudio working on the HDP sandbox. Do you think that will make things simpler? At that point things local and hopefully simpler. I do hope to be able to access a Hadoop/hive server remotely eventually, but hope doing things locally will simplify this problem.

@Heath Yates

link for RStudio Commercial pro Version:

https://www.rstudio.com/products/rstudio/download-commercial/ Pro will work for 45 days without license.

Download Server:

https://www.rstudio.com/products/rstudio/download-server/

Documentation:

https://s3.amazonaws.com/rstudio-server/rstudio-server-pro-0.99.903-admin-guide.pdf

URL for connecting Remotely

http://<SandBox IP:8787/auth-sign-in

Explorer

Going to try HADOOP_HOME and HIVE_HOME paths soon with the tutorial I found. I will let you know if it works. In the meantime, could you please tell me how you found that path information? I am willing to learn and appreciate the time you have taken to reply.

I installed RStudio on my production environment and handling Since 2 yrs.

Explorer

I am getting the error '/root/RHive/usr/lib/hive/lib does not exist' when I do ant build in the ~/Rhive directory. Please see tutorial here for details. I am stuck on step 4.

Here is the link for RHive

https://github.com/nexr/RHive

Just FYI..RHive & rhdfs both are same.

Explorer

Just kidding, the error has not resolved. I will mark yours as answer if I can get ant to build Rhive or get Hive working R. 🙂

sure..If this is what you wanted, please vote the response and accepted it as a best answer.

Explorer

I still am getting the error for step 4. You mentioned rhdfs and rhive are both the same. Which is newer and which one should I be using in R then? I am just trying to get Hive functionality in my Rscripts. Thanks. 🙂

Explorer

I think we are close? The error states 'BUILD FAILED' at /root/Rhive/build.xml:39: /root/RHive/usr/hdp/current/hive-server2/lib does not exist'. Shouldn't it be looking in /usr/hdp/current and not in /root/Rhive? Not sure what build.xml is doing. 😞

you can try tar.gz file from below link if you have an issue with Build.xml

https://cran.r-project.org/src/contrib/Archive/RHive/

R CMD INSTALL RHive_2.0-0.10.tar.gz

Explorer

I'm not sure if this will fix the problem. I will try, but curios as to why you suggest this? I think this original question I asked here has been resolved, but this ant build problem has merited a separate question and asked it here.

Explorer

Here is a broken tutorial. I think we need the updated HADOOP_HOME and HIVE paths. Can someone please help? See tutorial for details.