Support Questions

Find answers, ask questions, and share your expertise

Please if anyone can give me good examples of HIve Queries used with RSTUDIO.

avatar
Rising Star
 
1 ACCEPTED SOLUTION

avatar
Contributor
@khushi kalra

Take a look at this list of tutorials. They should get you forward a few more steps.

http://henning.kropponline.de/2014/07/13/hive-r/

First, this R/JDBC tutorial (or @Sindhu's post above) can get you through making a database connection. From the link above, you can see a couple of lines where this guy pulls data from a table and does a simple plot.

sample_08<-dbReadTable(conn,"sample_08")
plot(sample_08$sample_08.salary)

You'll probably want to do more sophisticated SQL and plots, though.

The documentation for RJDBC can be found here: https://cran.r-project.org/web/packages/RJDBC/index.html To run an arbitrary query, you use the dbSendQuery() and dbFetch commands as from this tutorial: http://www.inside-r.org/packages/cran/DBI/docs/dbGetQuery

res <- dbSendQuery(con, "SELECT * FROM mtcars WHERE cyl = 4;")
data <- dbFetch(res)

Now 'data' will have the results you can plot.

To do any kind of sophisticated plots in R, the typical thing to do is use the 'ggplots' library. There are lots of tutorials out there. The connection to what you've done with RJDBC is that the 'data' object above is a dataframe that you can use in building your charts. Here's one ggplots tutorial: http://tutorials.iq.harvard.edu/R/Rgraphics/Rgraphics.html

hist(data$some.value)

View solution in original post

5 REPLIES 5

avatar
Contributor

There's a really simple example that uses RODBC to query Hive from R. Should work in RStudio just fine, but you might need to adjust some instructions based on your Hive environment versus the HDInsight example.

https://blogs.technet.microsoft.com/meacoex/2014/06/07/connecting-r-to-hdinsight-through-hive/

avatar
@khushi kalra

You can also use RJDBC as below to connect to Hive:

library("DBI")

library("rJava")

library("RJDBC")

hive.class.path = list.files(path=c("/usr/hdp/current/hive-client/lib"), pattern="jar", full.names=T);

hadoop.lib.path = list.files(path=c("/usr/hdp/current/hive-client/lib"), pattern="jar", full.names=T);

hadoop.class.path = list.files(path=c("/usr/hdp/2.4.0.0-169/hadoop"), pattern="jar", full.names=T);

cp = c(hive.class.path, hadoop.lib.path, hadoop.class.path, "/usr/hdp/2.4.0.0-169/hadoop-mapreduce/hadoop-mapreduce-client-core.jar")

.jinit(classpath=cp)

drv <- JDBC("org.apache.hive.jdbc.HiveDriver","hive-jdbc.jar",identifier.quote="`")

url.dbc <- paste0("jdbc:hive2://ironhide.hdp.local:10000/default");

conn <- dbConnect(drv, url.dbc, "hive", “redhat");

log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).

log4j:WARN Please initialize the log4j system properly.

log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

dbListTables(conn);

Thanks and Regards,

Sindhu

avatar
Rising Star

How would you modify this code to connect to the hortonworks HDP sandbox? I know you have to modify your variables url.dbc and conn, but not quite sure how that would work. Could you please help?

avatar
Rising Star

I have made the connection already, but i want to analyze the data by plotting graphs.can you help me with that.

avatar
Contributor
@khushi kalra

Take a look at this list of tutorials. They should get you forward a few more steps.

http://henning.kropponline.de/2014/07/13/hive-r/

First, this R/JDBC tutorial (or @Sindhu's post above) can get you through making a database connection. From the link above, you can see a couple of lines where this guy pulls data from a table and does a simple plot.

sample_08<-dbReadTable(conn,"sample_08")
plot(sample_08$sample_08.salary)

You'll probably want to do more sophisticated SQL and plots, though.

The documentation for RJDBC can be found here: https://cran.r-project.org/web/packages/RJDBC/index.html To run an arbitrary query, you use the dbSendQuery() and dbFetch commands as from this tutorial: http://www.inside-r.org/packages/cran/DBI/docs/dbGetQuery

res <- dbSendQuery(con, "SELECT * FROM mtcars WHERE cyl = 4;")
data <- dbFetch(res)

Now 'data' will have the results you can plot.

To do any kind of sophisticated plots in R, the typical thing to do is use the 'ggplots' library. There are lots of tutorials out there. The connection to what you've done with RJDBC is that the 'data' object above is a dataframe that you can use in building your charts. Here's one ggplots tutorial: http://tutorials.iq.harvard.edu/R/Rgraphics/Rgraphics.html

hist(data$some.value)