Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

newbie questions:: changing the thrift sever port ; difference between Hive and Thrift server ; connecting to data in HDP file system from my windows R session.

avatar
New Member

I have many questions, as I have been fiddling with Sandbox as a Hadoop newbie starting with the more basic one first:

¶ I have seen in that from the CLI/shell one can go view `/usr/hdp/current/spark2-thriftserver/conf/hive-site.xml` or `/usr/hdp/current/spark2-client/conf/hive-site.xml` and under port property find the listed port (10016) for Thrift Server. Is this the efficient/preferred way one does this.

Further I am trying doing this to try and use this for an ODBC Spark SQL connection to connect to visualization tool, Spotfire.

I have successfully connected to the hive datatables in Hive-Server2 from sporfire on my laptop at port 10000 by downloading the Apache Hive connector, now I am hoping to do the same with the Spark ODBC driver, any hints or advice.

¶ I am a newbie to HDP and just trying to learn to work with data in hadoop file system, but frankly I don't know what is the reason to want to use one connector over the other is? other than that I'd like to be able to connect with the different methods ( I am an R user and succeeded in getting the hive tables in R as well with OBDC connectors anything I can do in R running on my laptop I could use it with Spotfire which is what I am currently using for analytics), a discussion/answer to this point will be much appreciated.

• ¶ Then there are some more challenging things I'd like to do ( You see I understand that I can install R on HDP sandbox and carry out computations, I have seen the SparkR predicting airline delays tutorial; but if I can connect to the data in HDP HDFS outside of HDP sandbox I can start leveraging R's power with Spotfire client's in-built R engine with data from hadoop file system (apparently Spotfire Server has lot more data access/connectivity options but I don't have access to Spotfire Server , so with that in mind some of the things I am trying to get to are::)

  1. With SparkR from R session running in windows laptop how can I use (csv) files in HDFS in HDP to construst SparkDataFrame, either using Hive table in the Hive Server 2 or some other way? I can only think of extracting the data that I am interested in from the HDP and then make it into a SparkDataFrame to carry out analysis with SparkR library in windows R session. But is there such an option as connecting to a remote spark cluster in HDP sandbox.
  2. And 'Livy for Spark2 Server' is this something I should be getting familiar with first for my purpose of accessing data outside of sandbox. Here is a https://spark.rstudio.com/deployment.html reference from sparklyr package that alludes to this possibility.

Thanks, I don't know how naive my questions are but bare with me and any clarification or attempt there at will be really appreciated.

Best

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Balsher Singh

The Easiest way will be to find the port using Ambari UI

Login to Ambari UI --> Spark2 --> Configs (Tab) --> Advanced (Sub Tab) --> Advanced spark2-hive-site-override
(OR)
Login to Ambari UI --> Spark --> Configs (Tab) --> Advanced (Sub Tab) --> Advanced spark-hive-site-override

.

The default Spark Thrift server port is 10015 (for Spark2 10016). To specify a different port, you can navigate to the hive.server2.thrift.port setting in the "Advanced spark-hive-site-override" category of the Spark configuration section and update the setting with your preferred port number.
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_spark-component-guide/content/config-sts...

.

You can also use Ambari API to find the port using curl call as following:

# curl -u admin:admin -i -H 'X-Requested-By: ambari' -X GET "http://localhost:8080/api/v1/clusters/Sandbox/configurations?type=spark2-hive-site-override"
(OR)
# curl -u admin:admin -i -H 'X-Requested-By: ambari' -X GET "http://localhost:8080/api/v1/clusters/Sandbox/configurations?type=spark-hive-site-override"


Above command will list the various tags. You need to use the latest Tag ID (like "tag=version1509830820763") and then run the command with that tag ID as following:

# curl -u admin:admin -H 'X-Requested-By: ambari' -X GET "http://localhost:8080/api/v1/clusters/Sandbox/configurations?type=spark2-hive-site-override&tag=version1509830820763"
{
  "href" : "http://localhost:8080/api/v1/clusters/Sandbox/configurations?type=spark2-hive-site-override&tag=version1509830820763",
  "items" : [
    {
      "href" : "http://localhost:8080/api/v1/clusters/Sandbox/configurations?type=spark2-hive-site-override&tag=version1509830820763",
      "tag" : "version1509830820763",
      "type" : "spark2-hive-site-override",
      "version" : 2,
      "Config" : {
        "cluster_name" : "Sandbox",
        "stack_id" : "HDP-2.6"
      },
      "properties" : {
        "hive.metastore.client.connect.retry.delay" : "5",
        "hive.metastore.client.socket.timeout" : "1800",
        "hive.server2.enable.doAs" : "false",
        "hive.server2.thrift.port" : "10016",
        "hive.server2.transport.mode" : "binary"
      }
    }
  ]
}

.

NOTE: Please make sure that you put the whole URL inside Quotation mark as it contains & symbol in it.

Another option will be to use the config.sh , you can find the port as following by running the below command from Ambari Server Host:

For Spark2

#  /var/lib/ambari-server/resources/scripts/configs.sh -u admin -p admin get localhost Sandbox spark2-hive-site-override

OUTPUT
-------- USERID=admin PASSWORD=admin ########## Performing 'GET' on (Site:spark2-hive-site-override, Tag:version1509830820763) "properties" : { "hive.metastore.client.connect.retry.delay" : "5", "hive.metastore.client.socket.timeout" : "1800", "hive.server2.enable.doAs" : "false", "hive.server2.thrift.port" : "10017", "hive.server2.transport.mode" : "binary" }

.

For Old Spark.

#  /var/lib/ambari-server/resources/scripts/configs.sh -u admin -p admin get localhost Sandbox spark-hive-site-override

OUTPUT
--------
USERID=admin
PASSWORD=admin
########## Performing 'GET' on (Site:spark-hive-site-override, Tag:INITIAL)
"properties" : {
"hive.metastore.client.connect.retry.delay" : "5",
"hive.metastore.client.socket.timeout" : "1800",
"hive.server2.enable.doAs" : "false",
"hive.server2.thrift.port" : "10015",
"hive.server2.transport.mode" : "binary"
}

.


NOTE: In the above commands please replace "Sandbox" word with yoru HDP ClusterName.
"localhost" with your ambari server hostname.

.

View solution in original post

1 REPLY 1

avatar
Master Mentor

@Balsher Singh

The Easiest way will be to find the port using Ambari UI

Login to Ambari UI --> Spark2 --> Configs (Tab) --> Advanced (Sub Tab) --> Advanced spark2-hive-site-override
(OR)
Login to Ambari UI --> Spark --> Configs (Tab) --> Advanced (Sub Tab) --> Advanced spark-hive-site-override

.

The default Spark Thrift server port is 10015 (for Spark2 10016). To specify a different port, you can navigate to the hive.server2.thrift.port setting in the "Advanced spark-hive-site-override" category of the Spark configuration section and update the setting with your preferred port number.
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_spark-component-guide/content/config-sts...

.

You can also use Ambari API to find the port using curl call as following:

# curl -u admin:admin -i -H 'X-Requested-By: ambari' -X GET "http://localhost:8080/api/v1/clusters/Sandbox/configurations?type=spark2-hive-site-override"
(OR)
# curl -u admin:admin -i -H 'X-Requested-By: ambari' -X GET "http://localhost:8080/api/v1/clusters/Sandbox/configurations?type=spark-hive-site-override"


Above command will list the various tags. You need to use the latest Tag ID (like "tag=version1509830820763") and then run the command with that tag ID as following:

# curl -u admin:admin -H 'X-Requested-By: ambari' -X GET "http://localhost:8080/api/v1/clusters/Sandbox/configurations?type=spark2-hive-site-override&tag=version1509830820763"
{
  "href" : "http://localhost:8080/api/v1/clusters/Sandbox/configurations?type=spark2-hive-site-override&tag=version1509830820763",
  "items" : [
    {
      "href" : "http://localhost:8080/api/v1/clusters/Sandbox/configurations?type=spark2-hive-site-override&tag=version1509830820763",
      "tag" : "version1509830820763",
      "type" : "spark2-hive-site-override",
      "version" : 2,
      "Config" : {
        "cluster_name" : "Sandbox",
        "stack_id" : "HDP-2.6"
      },
      "properties" : {
        "hive.metastore.client.connect.retry.delay" : "5",
        "hive.metastore.client.socket.timeout" : "1800",
        "hive.server2.enable.doAs" : "false",
        "hive.server2.thrift.port" : "10016",
        "hive.server2.transport.mode" : "binary"
      }
    }
  ]
}

.

NOTE: Please make sure that you put the whole URL inside Quotation mark as it contains & symbol in it.

Another option will be to use the config.sh , you can find the port as following by running the below command from Ambari Server Host:

For Spark2

#  /var/lib/ambari-server/resources/scripts/configs.sh -u admin -p admin get localhost Sandbox spark2-hive-site-override

OUTPUT
-------- USERID=admin PASSWORD=admin ########## Performing 'GET' on (Site:spark2-hive-site-override, Tag:version1509830820763) "properties" : { "hive.metastore.client.connect.retry.delay" : "5", "hive.metastore.client.socket.timeout" : "1800", "hive.server2.enable.doAs" : "false", "hive.server2.thrift.port" : "10017", "hive.server2.transport.mode" : "binary" }

.

For Old Spark.

#  /var/lib/ambari-server/resources/scripts/configs.sh -u admin -p admin get localhost Sandbox spark-hive-site-override

OUTPUT
--------
USERID=admin
PASSWORD=admin
########## Performing 'GET' on (Site:spark-hive-site-override, Tag:INITIAL)
"properties" : {
"hive.metastore.client.connect.retry.delay" : "5",
"hive.metastore.client.socket.timeout" : "1800",
"hive.server2.enable.doAs" : "false",
"hive.server2.thrift.port" : "10015",
"hive.server2.transport.mode" : "binary"
}

.


NOTE: In the above commands please replace "Sandbox" word with yoru HDP ClusterName.
"localhost" with your ambari server hostname.

.