Member since
09-27-2016
73
Posts
9
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1227 | 09-15-2017 01:37 PM | |
2271 | 09-14-2017 10:08 AM |
01-31-2018
09:58 AM
Thanks for this article. Everything works fine, except that my thrift server fails to behave properly after hbase user kerberos ticket expiration (10h in my case). Is there a way to automatically refresh/renew ticket so that my thrift server runs endlessly ? Thanks
... View more
11-06-2017
03:19 PM
And also, should I keep using "--files" option with hbase-site.xml on the command line or not ?
... View more
11-06-2017
08:11 AM
Thanks for your help ! Just an additional question : you had to manually copy hbase-site.xml into $SPARK_HOME/conf folder on ALL nodes of the cluster ?
... View more
10-23-2017
02:51 PM
I am in the exact same configuration (no way to reach internet from our cluster...), did you find any other option to make it run ?
... View more
10-19-2017
03:23 PM
Hi, I'm trying to execute python code with SHC (spark hbase connector) to connect to hbase from a python spark-based script. Here is a simple example I can provide to illustrate : # readExample.py
from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext()
sqlc = SQLContext(sc)
data_source_format = 'org.apache.spark.sql.execution.datasources.hbase'
catalog = ''.join("""{
"table":{"namespace":"default", "name":"firsttable"},
"rowkey":"key",
"columns":{
"firstcol":{"cf":"rowkey", "col":"key", "type":"string"},
"secondcol":{"cf":"d", "col":"colname", "type":"string"}
}
}""".split())
df = sqlc.read\
.options(catalog=catalog)\
.format(data_source_format)\
.load()
df.select("secondcol").show()
In order to execute this properly, I successfully executed following command line : spark-submit --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories http://repo.hortonworks.com/content/groups/public/ --files /etc/hbase/conf/hbase-site.xml readExample.py Great 🙂 Now, I would like to run this exact same example from my jupyter notebook... After a while, I finally figured out how to proceed to pass the required "package" to spark by adding following cell at the begining of my notebook : import os
import findspark
os.environ["SPARK_HOME"] = '/usr/hdp/current/spark-client'
findspark.init('/usr/hdp/current/spark-client')
os.environ['PYSPARK_SUBMIT_ARGS'] = ("--repositories http://repo.hortonworks.com/content/groups/public/ " "--packages com.hortonworks:shc-core:1.1.1-1.6-s_2.10 " " pyspark-shell") ...But when I ran all the cells from my notebook, I got following exception : Py4JJavaError: An error occurred while calling o50.showString.
: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the locations
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:312)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:151)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:59)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)
at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:295)
at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160)
at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:155)
at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:821)
at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:193)
at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:89)
at org.apache.hadoop.hbase.client.MetaScanner.listTableRegionLocations(MetaScanner.java:343)
at org.apache.hadoop.hbase.client.HRegionLocator.listRegionLocations(HRegionLocator.java:142)
at org.apache.hadoop.hbase.client.HRegionLocator.getStartEndKeys(HRegionLocator.java:118)
at org.apache.spark.sql.execution.datasources.hbase.RegionResource$$anonfun$1.apply(HBaseResources.scala:109)
at org.apache.spark.sql.execution.datasources.hbase.RegionResource$$anonfun$1.apply(HBaseResources.scala:108)
at org.apache.spark.sql.execution.datasources.hbase.ReferencedResource$class.releaseOnException(HBaseResources.scala:77)
at org.apache.spark.sql.execution.datasources.hbase.RegionResource.releaseOnException(HBaseResources.scala:88)
at org.apache.spark.sql.execution.datasources.hbase.RegionResource.<init>(HBaseResources.scala:108)
at org.apache.spark.sql.execution.datasources.hbase.HBaseTableScanRDD.getPartitions(HBaseTableScan.scala:61)
From what I understood, this exception probably came up because the hbase client component could not use right hbase-site.xml (that defines zookeeper quorum...) I tried to add "--files /etc/hbase/conf/hbase-site.xml" in the content of the PYSPARK_SUBMIT_ARGS environment variable, but this did not change anything... Any idea how to pass the hbase-site.xml properly ? Thanks for your help
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Spark
09-15-2017
01:37 PM
In fact I realized that I had to set this properties in the "Custom spark-default" section in Ambari. This way, they are written to spark-defaults.conf configuration file and things work fine
... View more
09-14-2017
10:18 AM
Hi, I'm trying to set a default value for "spark.driver.extraJavaOptions" configuration property from ambari, in order to avoid that all my users have to define it in the command line arguments. I tried to define this property in the "Custom spark-javaopts-properties" section in the ambari UI, but it didn't work (the property seems not to be used anywhere), and even worse, I am not able to find out where this property ends up ? I thought it should be written to a spark configuration file (spark-defaults.conf or anything else), but couldn't find the property anywhere ... Does anyone know if I picked up the right place to define the property and where it goes in configuration files ? Thanks for your help
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Spark
09-14-2017
10:08 AM
I figured out what was wrong : In fact my class has to extends Configured and implements Tools in order to parse the confirguration properties from the command line. Works fine now ! I even figured out that I could set the property in ambari : label "MR Map Java Heap Size" actually maps the "mapreduce.map.java.opts" property, which is pretty confusing ...
... View more
09-13-2017
02:01 PM
Hi, I'm currently struggling with map reduce configuration... I'm trying to implement the common "wordcount example", but I modified the implementation so that mappers calls an HTTPS web service to track overall progression (just for the sake of demonstration).
I have to provide the mappers' JVM with a custom truststore that containe the certificate of the CA that issued the web server's certificate and I tried to use following syntax : hadoop jar mycustommr.jar TestHttpsMR -Dmapreduce.map.java.opts="-Djavax.net.ssl.trustStore=/my/custom/path/cacerts -Djavax.net.ssl.trustStorePassword=mypassword" wordcount_in wordcount_out But I systematically hit following error : "Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory wordcount_in already exists" which indicates that arguments are not properly parsed : it seems that -Dmapreduce.map.java.opts="-Djavax.net.ssl.trustStore=/my/custom/path/cacerts -Djavax.net.ssl.trustStorePassword=mypassword" is interpreted as an application argument (the first one) instead of being passed to the mappers' JVM What's wrong with this syntax ? How could I override mapreduce.map.java.opts property without disturbing application parameters ? Thanks for your help
... View more
Labels:
- Labels:
-
Apache Hadoop
09-12-2017
02:39 PM
If I understand properly, this configuration is used by spark to secure data exhanges between the nodes, but my use case is slightly different : my executor runs custom java code that performs a call to an HTTPS server and in that context, the SSL handshake relies on the default truststore of the JVM instead of the one I configured with my own CA certificate...Maybe that's not possible and the only way to achieve this is to use the properties I mentionned previously... Thanks for your help
... View more