Member since
06-15-2017
5
Posts
0
Kudos Received
0
Solutions
11-04-2017
09:05 PM
I have many questions, as I have been fiddling with Sandbox as a Hadoop newbie starting with the more basic one first:
¶ I have seen in that from the CLI/shell one can go view `/usr/hdp/current/spark2-thriftserver/conf/hive-site.xml` or `/usr/hdp/current/spark2-client/conf/hive-site.xml` and under port property find the listed port (10016) for Thrift Server. Is this the efficient/preferred way one does this.
Further I am trying doing this to try and use this for an ODBC Spark SQL connection to connect to visualization tool, Spotfire.
I have successfully connected to the hive datatables in Hive-Server2 from sporfire on my laptop at port 10000 by downloading the Apache Hive connector, now I am hoping to do the same with the Spark ODBC driver, any hints or advice.
¶ I am a newbie to HDP and just trying to learn to work with data in hadoop file system, but frankly I don't know what is the reason to want to use one connector over the other is? other than that I'd like to be able to connect with the different methods ( I am an R user and succeeded in getting the hive tables in R as well with OBDC connectors anything I can do in R running on my laptop I could use it with Spotfire which is what I am currently using for analytics), a discussion/answer to this point will be much appreciated.
• ¶ Then there are some more challenging things I'd like to do ( You see I understand that I can install R on HDP sandbox and carry out computations, I have seen the SparkR predicting airline delays tutorial; but if I can connect to the data in HDP HDFS outside of HDP sandbox I can start leveraging R's power with Spotfire client's in-built R engine with data from hadoop file system (apparently Spotfire Server has lot more data access/connectivity options but I don't have access to Spotfire Server , so with that in mind some of the things I am trying to get to are::)
With SparkR from R session running in windows laptop how can I use (csv) files in HDFS in HDP to construst SparkDataFrame, either using Hive table in the Hive Server 2 or some other way? I can only think of extracting the data that I am interested in from the HDP and then make it into a SparkDataFrame to carry out analysis with SparkR library in windows R session. But is there such an option as connecting to a remote spark cluster in HDP sandbox.
And 'Livy for Spark2 Server' is this something I should be getting familiar with first for my purpose of accessing data outside of sandbox. Here is a https://spark.rstudio.com/deployment.html reference from sparklyr package that alludes to this possibility.
Thanks, I don't know how naive my questions are but bare with me and any clarification or attempt there at will be really appreciated.
Best
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark