About bikas

bikas · ‎12-24-2016

Are you running the spark job via YARN? Then go to the Resource Manager (RM) UI. It will be running on your RM machine on port 8088. From there find the Applications link that lists all running application. Navigate to the application page for your application. There you will find a link for Application Master which will connect you to the running application master. If the job has finished then the link will be History which will connect you to the Spark History Server and show you the same UI for the completed app. In an HDP cluster Spark history server is always running if Spark service is installed via Ambari. After a Spark job is running you cannot manually change its number of executors or memory

bikas · ‎12-24-2016

Nice! BTW, HDP 2.5 has Livy built-in. Can be found under Spark service in Ambari.

bikas · ‎12-22-2016

Without the full exception stack trace its difficult to know what happened. If you are instantiating hive then you may need to add hive-site.xml and the data-nucleus jars to the job. e.g. like --jars /usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar --files /usr/hdp/current/spark-client/conf/hive-site.xml

bikas · ‎12-19-2016

I am not sure if the HBase filters that SHC provides would help here or this points to some more feature work necessary in SHC. Could you please elaborate with some code samples?

bikas · ‎12-15-2016

Like mentioned in the answer the command line to add the package to your job is $SPARK_HOME/bin/spark-shell --packages com.databricks:spark-xml_2.10:0.4.1 Of course to write your project code you will also need to add this package to your project maven pom dependency. If you build an uber jar for your project that includes this package then you dont need to change your command line for submission. There are many packages for spark that you can check at spark-packages.org.

bikas · ‎12-15-2016

We do not recommend writing data into HBase using SHC using the default internal custom format of SHC. This format is not well defined and mainly used for testing etc. This format can change without being compatible. For storing data via SHC into HBase please use a standard and robust format like Avro. Currently SHC is supports Avro and we plan to support others like Phoenix types.

bikas · ‎12-15-2016

STS is supported in HDP 2.3.4+

bikas · ‎12-13-2016

If the AM timed out, then in the AM log you will find "Session timed out". If the AM crashed, you will find an exception in the AM log or some error in the AM stderr/stdout.

bikas · ‎12-12-2016

Sorry about the bad builds. We are working through the automation process that builds different versions of SHC. My comment was mainly about the configuration section in the README for SHC for secure clusters. That is independent of SHC. Its just instructions on how to set up Spark to access HBase tokens.

bikas · ‎12-12-2016

Please see the steps outlined here for accessing Hbase securely via Spark. No code change should be needed in your app for typical use cases.

Online	Offline
Last Visited	‎09-23-2018 04:18 AM

Member Since	‎10-09-2015 06:38 PM
Last Visited	‎09-23-2018 04:18 AM
Posts	76
Kudos received	33

Cloudera Community

Re: Dataframe Insert into ORC table is slow compar...

Re: Spark map vs foreachRdd

Re: How to configure spark-log4j-properties in Amb...

Re: Spark 2 Technical preview with patches

Re: Spark Hbase connector latest version for Spark...

Re: How access to Spark Web UI ?

Re: Submitting Spark Jobs From Apache NiFi Using L...

Re: Can we have a long running Spark application w...

Re: Error in Spark-HBase Connector - unsupported d...

Re: Parsing XML in Spark RDD

Re: Error in Spark-HBase Connector - unsupported d...

Re: Spark thrift server not starting

Re: Error with Tez jobs

Re: Spark HBase issue on secured cluster using new...

Re: Spark HBase issue on secured cluster using new...