Archives of Support Questions (Read Only)

blsingh · ‎11-04-2017

I have many questions, as I have been fiddling with Sandbox as a Hadoop newbie starting with the more basic one first:

¶ I have seen in that from the CLI/shell one can go view `/usr/hdp/current/spark2-thriftserver/conf/hive-site.xml` or `/usr/hdp/current/spark2-client/conf/hive-site.xml` and under port property find the listed port (10016) for Thrift Server. Is this the efficient/preferred way one does this.

Further I am trying doing this to try and use this for an ODBC Spark SQL connection to connect to visualization tool, Spotfire.

I have successfully connected to the hive datatables in Hive-Server2 from sporfire on my laptop at port 10000 by downloading the Apache Hive connector, now I am hoping to do the same with the Spark ODBC driver, any hints or advice.

¶ I am a newbie to HDP and just trying to learn to work with data in hadoop file system, but frankly I don't know what is the reason to want to use one connector over the other is? other than that I'd like to be able to connect with the different methods ( I am an R user and succeeded in getting the hive tables in R as well with OBDC connectors anything I can do in R running on my laptop I could use it with Spotfire which is what I am currently using for analytics), a discussion/answer to this point will be much appreciated.

• ¶ Then there are some more challenging things I'd like to do ( You see I understand that I can install R on HDP sandbox and carry out computations, I have seen the SparkR predicting airline delays tutorial; but if I can connect to the data in HDP HDFS outside of HDP sandbox I can start leveraging R's power with Spotfire client's in-built R engine with data from hadoop file system (apparently Spotfire Server has lot more data access/connectivity options but I don't have access to Spotfire Server , so with that in mind some of the things I am trying to get to are::)

With SparkR from R session running in windows laptop how can I use (csv) files in HDFS in HDP to construst SparkDataFrame, either using Hive table in the Hive Server 2 or some other way? I can only think of extracting the data that I am interested in from the HDP and then make it into a SparkDataFrame to carry out analysis with SparkR library in windows R session. But is there such an option as connecting to a remote spark cluster in HDP sandbox.
And 'Livy for Spark2 Server' is this something I should be getting familiar with first for my purpose of accessing data outside of sandbox. Here is a https://spark.rstudio.com/deployment.html reference from sparklyr package that alludes to this possibility.

Thanks, I don't know how naive my questions are but bare with me and any clarification or attempt there at will be really appreciated.

Best

jsensharma · ‎11-04-2017

@Balsher Singh

The Easiest way will be to find the port using Ambari UI

Login to Ambari UI --> Spark2 --> Configs (Tab) --> Advanced (Sub Tab) --> Advanced spark2-hive-site-override
(OR)
Login to Ambari UI --> Spark --> Configs (Tab) --> Advanced (Sub Tab) --> Advanced spark-hive-site-override

.

The default Spark Thrift server port is 10015 (for Spark2 10016). To specify a different port, you can navigate to the hive.server2.thrift.port setting in the "Advanced spark-hive-site-override" category of the Spark configuration section and update the setting with your preferred port number.
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_spark-component-guide/content/config-sts...

.

You can also use Ambari API to find the port using curl call as following:

# curl -u admin:admin -i -H 'X-Requested-By: ambari' -X GET "http://localhost:8080/api/v1/clusters/Sandbox/configurations?type=spark2-hive-site-override"
(OR)
# curl -u admin:admin -i -H 'X-Requested-By: ambari' -X GET "http://localhost:8080/api/v1/clusters/Sandbox/configurations?type=spark-hive-site-override"

Above command will list the various tags. You need to use the latest Tag ID (like "tag=version1509830820763") and then run the command with that tag ID as following:

# curl -u admin:admin -H 'X-Requested-By: ambari' -X GET "http://localhost:8080/api/v1/clusters/Sandbox/configurations?type=spark2-hive-site-override&tag=version1509830820763"
{
  "href" : "http://localhost:8080/api/v1/clusters/Sandbox/configurations?type=spark2-hive-site-override&tag=version1509830820763",
  "items" : [
    {
      "href" : "http://localhost:8080/api/v1/clusters/Sandbox/configurations?type=spark2-hive-site-override&tag=version1509830820763",
      "tag" : "version1509830820763",
      "type" : "spark2-hive-site-override",
      "version" : 2,
      "Config" : {
        "cluster_name" : "Sandbox",
        "stack_id" : "HDP-2.6"
      },
      "properties" : {
        "hive.metastore.client.connect.retry.delay" : "5",
        "hive.metastore.client.socket.timeout" : "1800",
        "hive.server2.enable.doAs" : "false",
        "hive.server2.thrift.port" : "10016",
        "hive.server2.transport.mode" : "binary"
      }
    }
  ]
}

.

NOTE: Please make sure that you put the whole URL inside Quotation mark as it contains & symbol in it.

Another option will be to use the config.sh , you can find the port as following by running the below command from Ambari Server Host:

For Spark2

#  /var/lib/ambari-server/resources/scripts/configs.sh -u admin -p admin get localhost Sandbox spark2-hive-site-override

OUTPUT
--------
USERID=admin
PASSWORD=admin
########## Performing 'GET' on (Site:spark2-hive-site-override, Tag:version1509830820763)
"properties" : {
"hive.metastore.client.connect.retry.delay" : "5",
"hive.metastore.client.socket.timeout" : "1800",
"hive.server2.enable.doAs" : "false",
"hive.server2.thrift.port" : "10017",
"hive.server2.transport.mode" : "binary"
}

.

For Old Spark.

#  /var/lib/ambari-server/resources/scripts/configs.sh -u admin -p admin get localhost Sandbox spark-hive-site-override

OUTPUT
--------
USERID=admin
PASSWORD=admin
########## Performing 'GET' on (Site:spark-hive-site-override, Tag:INITIAL)
"properties" : {
"hive.metastore.client.connect.retry.delay" : "5",
"hive.metastore.client.socket.timeout" : "1800",
"hive.server2.enable.doAs" : "false",
"hive.server2.thrift.port" : "10015",
"hive.server2.transport.mode" : "binary"
}

.

NOTE: In the above commands please replace "Sandbox" word with yoru HDP ClusterName.
"localhost" with your ambari server hostname.

.

View solution in original post

jsensharma · ‎11-04-2017