Member since
10-09-2015
76
Posts
33
Kudos Received
11
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5030 | 03-09-2017 09:08 PM | |
5365 | 02-23-2017 08:01 AM | |
1754 | 02-21-2017 03:04 AM | |
2127 | 02-16-2017 08:00 AM | |
1122 | 01-26-2017 06:32 PM |
02-15-2017
03:01 AM
A full stack trace would help understand which interaction is resulting in this. If IDE based code is being used then you could try to not use the spark-assembly jar that is present on HDFS and instead use the local spark-assembly jar from the Spark build being compiled against. This could be done by overriding spark.yarn.jar config. Could be the that compile dependency of Spark in your IDE is different from the runtime dependency on HDFS. Another thing could be scala version mismatch.
... View more
02-07-2017
06:43 PM
Yes. For Spark 1.6 it is GA in HDP 2.5.3. The documentation is available from the Github site for a given SHC release tag. That is the source of truth.
... View more
02-01-2017
10:01 PM
1 Kudo
--files will add it to the working directory of the YARN app master and container and this would mean that those files (and not jars) would be in the classpath of the app master and container. But in client mode jobs the main driver code is running in the client machine. So these --files are not available on the driver. SPARK_CLASSPATH adds these files to the driver classpath. Its an env var. So one could say the following. Note it will warn saying its deprecated and cannot be used concurrently with --driver-class-path option. More information can be found here. https://github.com/hortonworks-spark/shc export SPARK_CLASSPATH=/a/b/c/hbase-site.xml;/d/e/f/hive-site.xml
... View more
01-30-2017
03:49 AM
2 Kudos
Unfortunately, that kind of functionality does not exist for Spark Streaming. Spark Streaming runs as a standard YARN job and YARN commands could be used to start, stop (kill) and re-submit a job. A properly written Spark streaming job should be able to support at-least once or exactly-once semantics through this lifecycle. But other than that there is no UI or other automation support for it. Zeppelin is designed for interactive analysis and running Spark streaming via Zeppelin is not recommended (other than demos for presentations).
... View more
01-26-2017
06:32 PM
This is not available in any distribution since its a package and can be used independently. The latest 1.6 release is https://github.com/hortonworks-spark/shc/tree/v1.0.1-1.6 You can build that with the hbase version that matches your env.
... View more
01-23-2017
07:54 PM
The flow seems right. Thats a good use case for livy. Assuming it goes YourApp->Livy->Spark and back. You will need to look at Livy client logs or livy logs for session id 339. Seems like the client is asking for a session (livy spark job) that does not exist anymore. Could have been not started and abandoned or lost.
... View more
01-23-2017
07:45 PM
1 Kudo
SHC does not have a notion of listing tables in HBase. It works on the table catalog provided to the data source in the program. Hive will also not list HBase tables because they are not present in the metastore. There is some rudimentary way to add Hbase external tables in Hive but I dont think that really used. I could be wrong. To list Hbase tables, currently the only reliable way would be to use HBase API's inside the spark program to list tables.
... View more
01-23-2017
01:33 AM
Hive and HiveContext in Spark can only show the tables that are registered in the Hive Metastore and Hbase tables are usually not there because the schema of most Hbase tables are not easily defined in the metastore. To read HBase tables from Spark using DataFrame API please consider Spark HBase Connector
... View more
01-23-2017
01:30 AM
1 Kudo
In HDP 2.5 Zeppelin JDBC interpreter can run 1 query in a paragraph. There is a limit of 10 queries that can be run simultaneously overall.
... View more
01-17-2017
08:33 PM
First of all, which Spark version are you using. Apache Spark 2.0 has support for automatically acquiring HBase security tokens correctly for that job and all its executors. Apache Spark 1.6 does not have that feature but in HDP Spark 1.6 we have backported that feature and it can acquire the HBase tokens for the jobs. The tokens are automatically acquired if 1) security is enabled and 2) hbase-site.xml is present on the client classpath 3) that hbase-site.xml has kerberos security configured. Then hbase tokens for the hbase master specified in that hbase-site.xml are acquired and used in the job. In order to obtain the tokens spark client needs to use hbase code and so specific hbase jars need to be present in the client classpath. This is documented in here on the SHC github page. Search for "secure" on that page. To access hbase inside the spark jobs the job obviously needs hbase jars to be present for the driver and/or executors. That would be part of your existing job submission for non-secure clusters, which I assume already works. If this job is going to be long running and run beyond the token expire time (typically 7 days) then you need to submit the Spark job with the --keytab and --principal option such that Spark can use that keytab to re-acquire tokens before the current ones expire.
... View more