Created 06-16-2016 02:52 PM
Created 06-17-2016 01:24 PM
Take a look at this list of tutorials. They should get you forward a few more steps.
http://henning.kropponline.de/2014/07/13/hive-r/
First, this R/JDBC tutorial (or @Sindhu's post above) can get you through making a database connection. From the link above, you can see a couple of lines where this guy pulls data from a table and does a simple plot.
sample_08<-dbReadTable(conn,"sample_08") plot(sample_08$sample_08.salary)
You'll probably want to do more sophisticated SQL and plots, though.
The documentation for RJDBC can be found here: https://cran.r-project.org/web/packages/RJDBC/index.html To run an arbitrary query, you use the dbSendQuery() and dbFetch commands as from this tutorial: http://www.inside-r.org/packages/cran/DBI/docs/dbGetQuery
res <- dbSendQuery(con, "SELECT * FROM mtcars WHERE cyl = 4;") data <- dbFetch(res)
Now 'data' will have the results you can plot.
To do any kind of sophisticated plots in R, the typical thing to do is use the 'ggplots' library. There are lots of tutorials out there. The connection to what you've done with RJDBC is that the 'data' object above is a dataframe that you can use in building your charts. Here's one ggplots tutorial: http://tutorials.iq.harvard.edu/R/Rgraphics/Rgraphics.html
hist(data$some.value)
Created 06-16-2016 04:03 PM
There's a really simple example that uses RODBC to query Hive from R. Should work in RStudio just fine, but you might need to adjust some instructions based on your Hive environment versus the HDInsight example.
https://blogs.technet.microsoft.com/meacoex/2014/06/07/connecting-r-to-hdinsight-through-hive/
Created 06-16-2016 04:59 PM
You can also use RJDBC as below to connect to Hive:
library("DBI")
library("rJava")
library("RJDBC")
hive.class.path = list.files(path=c("/usr/hdp/current/hive-client/lib"), pattern="jar", full.names=T);
hadoop.lib.path = list.files(path=c("/usr/hdp/current/hive-client/lib"), pattern="jar", full.names=T);
hadoop.class.path = list.files(path=c("/usr/hdp/2.4.0.0-169/hadoop"), pattern="jar", full.names=T);
cp = c(hive.class.path, hadoop.lib.path, hadoop.class.path, "/usr/hdp/2.4.0.0-169/hadoop-mapreduce/hadoop-mapreduce-client-core.jar")
.jinit(classpath=cp)
drv <- JDBC("org.apache.hive.jdbc.HiveDriver","hive-jdbc.jar",identifier.quote="`")
url.dbc <- paste0("jdbc:hive2://ironhide.hdp.local:10000/default");
conn <- dbConnect(drv, url.dbc, "hive", “redhat");
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
dbListTables(conn);
Thanks and Regards,
Sindhu
Created 07-20-2016 09:45 PM
How would you modify this code to connect to the hortonworks HDP sandbox? I know you have to modify your variables url.dbc and conn, but not quite sure how that would work. Could you please help?
Created 06-16-2016 08:52 PM
I have made the connection already, but i want to analyze the data by plotting graphs.can you help me with that.
Created 06-17-2016 01:24 PM
Take a look at this list of tutorials. They should get you forward a few more steps.
http://henning.kropponline.de/2014/07/13/hive-r/
First, this R/JDBC tutorial (or @Sindhu's post above) can get you through making a database connection. From the link above, you can see a couple of lines where this guy pulls data from a table and does a simple plot.
sample_08<-dbReadTable(conn,"sample_08") plot(sample_08$sample_08.salary)
You'll probably want to do more sophisticated SQL and plots, though.
The documentation for RJDBC can be found here: https://cran.r-project.org/web/packages/RJDBC/index.html To run an arbitrary query, you use the dbSendQuery() and dbFetch commands as from this tutorial: http://www.inside-r.org/packages/cran/DBI/docs/dbGetQuery
res <- dbSendQuery(con, "SELECT * FROM mtcars WHERE cyl = 4;") data <- dbFetch(res)
Now 'data' will have the results you can plot.
To do any kind of sophisticated plots in R, the typical thing to do is use the 'ggplots' library. There are lots of tutorials out there. The connection to what you've done with RJDBC is that the 'data' object above is a dataframe that you can use in building your charts. Here's one ggplots tutorial: http://tutorials.iq.harvard.edu/R/Rgraphics/Rgraphics.html
hist(data$some.value)