Member since
09-27-2016
73
Posts
9
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
670 | 09-15-2017 01:37 PM | |
1233 | 09-14-2017 10:08 AM |
02-07-2018
09:57 AM
Hi, On my HDP 2.6.3 cluster, I'm trying to decommission a HBase region server from Ambari (V2.5.2), but I get an error : resource_management.core.exceptions.ExecutionFailed: Execution of '/usr/bin/kinit -kt /etc/security/keytabs/hbase.service.keytab hbase/fr0-datalab-p23.bdata.corp@BDATA.CORP; /usr/hdp/current/hbase-master/bin/hbase --config /usr/hdp/current/hbase-master/conf -Djava.security.auth.login.config=/usr/hdp/current/hbase-master/conf/hbase_master_jaas.conf org.jruby.Main /usr/hdp/current/hbase-master/bin/draining_servers.rb add fr0-datalab-p53.bdata.corp' returned 1. Error: Could not find or load main class org.jruby.Main After a quick analysis, I realized that jruby jar file has been moved to $HBASE_HOME/lib/ruby/ folder (it was located in $HBASE_HOME/lib/ folder in HDP 2.5.3 release). When trying to figure out how to fix this issue, I understood that hbase script invokes hbase.distro script that is supposed to properly add jruby jar file to the classpath "only when required", meaning only when it receives "shell" or "org.jruby.Main" as $1 argument (after --config one). When debugguing its execution, I could see that it considers $1 as "-Djava.security.auth.login.config=/usr/hdp/current/hbase-master/conf/hbase_master_jaas.conf" ... and does not add jruby jar file to the classpath... If I try to remove -Djava... arg and I manually execute "/usr/hdp/current/hbase-master/bin/hbase --config /usr/hdp/current/hbase-master/conf org.jruby.Main /usr/hdp/current/hbase-master/bin/draining_servers.rb add fr0-datalab-p39.bdata.corp", it seems to work properly (at least I cannot see any error in the terminal) My question is pretty simple : what is the best way to fix this problem : Either I try to change the way Ambary builds the command line to remove the -Djava option (but I'm not sure this will not break something else), Or I update hbase script to systematically add jruby jar file (from its new location) to HBASE_CLASSPATH Or I update hbase.distro not to consider $1 but instead $2 as the command to check in order to decide if jruby is required or not
...
Thanks for your advices Sebastien
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache HBase
01-31-2018
09:58 AM
Thanks for this article. Everything works fine, except that my thrift server fails to behave properly after hbase user kerberos ticket expiration (10h in my case). Is there a way to automatically refresh/renew ticket so that my thrift server runs endlessly ? Thanks
... View more
01-31-2018
09:56 AM
I finally figured out what happened : In fact, if hbase user has a valid kerberos ticket when he starts the thrift server, eveything works fine during the ticket lifetime (10h in my case), but thrift server fails to behave properly after this ticket expired...Is there a way to configure something on thrift server side to automatically refresh/renew the kerberos ticket ? At the moment, I must stop/restart the thrift server every 10h... Thanks for your advices Sebastien
... View more
01-25-2018
04:01 PM
Hi,
I use a kerberized 2.6.3 HDP cluster comprising 3 HBase master nodes (named p21, p22 & p23). On one of these master nodes (p23) , I started up a thrift server following explanations from : https://community.hortonworks.com/articles/87655/start-and-test-hbase-thrift-server-in-a-kerberised.html Everything went fine and when I launch (being "hbase" user) the DemoClient from one of the 3 master nodes ("hbase org.apache.hadoop.hbase.thrift.DemoClient p23 9090 true"), it executes successfully (even when launching from the 2 master nodes that don't host the thrift server) Anyway, when I try to execute some client code from an edge node (and from another "non hbase" user) to access my thrift server, it fails to executes properly : I can see many instances of following error in thrift server logs :
2018-01-25 14:12:40,051 INFO [thrift-worker-0] client.RpcRetryingCaller: Call exception, tries=10, retries=35, started=48576 ms ago, cancelled=false, msg=com.google.protobuf.ServiceException: org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Call to p21/10.XXX.XXX.XXX:16000 failed on local exception: org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Connection to p21/10.XXX.XXX.XXX:16000 is closing. Call id=10, waitTime=7 ...And after a while, my client fails and return same kind of error... It's worth saying that I can successfully execute "hbase shell" queries from this edge node, so I guess my "hbase-site.xml" file is correct... Any idea why this error can happen only for "thrift-based" queries ? Thanks for your help Sebastien
... View more
Labels:
- Labels:
-
Apache HBase
11-06-2017
03:19 PM
And also, should I keep using "--files" option with hbase-site.xml on the command line or not ?
... View more
11-06-2017
08:11 AM
Thanks for your help ! Just an additional question : you had to manually copy hbase-site.xml into $SPARK_HOME/conf folder on ALL nodes of the cluster ?
... View more
10-23-2017
02:51 PM
I am in the exact same configuration (no way to reach internet from our cluster...), did you find any other option to make it run ?
... View more
10-23-2017
02:44 PM
Hi, I'm trying to execute pyspark code with SHC (spark hbase connector) to read data from hbase on a secured (kerberos) cluster. Here is a simple example I can provide to illustrate : # readExample.py
from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext()
sqlc = SQLContext(sc)
data_source_format = 'org.apache.spark.sql.execution.datasources.hbase'
catalog = ''.join("""{
"table":{"namespace":"default", "name":"firsttable"},
"rowkey":"key",
"columns":{
"firstcol":{"cf":"rowkey", "col":"key", "type":"string"},
"secondcol":{"cf":"d", "col":"colname", "type":"string"}
}
}""".split())
df = sqlc.read\
.options(catalog=catalog)\
.format(data_source_format)\
.load()
df.select("secondcol").show()
In order to execute this properly, I successfully executed following command line : spark-submit --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories http://repo.hortonworks.com/content/groups/public/ --files /etc/hbase/conf/hbase-site.xml --keytab=/path/to/my/keytab/myuser.keytab --principal=myuser@DOMAIN.CORP readExample.py which I guess is striclty equivalent to : spark-submit --master local[*] --deploy-mode client --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories http://repo.hortonworks.com/content/groups/public/ --files /etc/hbase/conf/hbase-site.xml --keytab=/path/to/my/keytab/myuser.keytab --principal=myuser@DOMAIN.CORP readExample1.py This is fine, but now I would like to execute the same on the cluster : I tried out following options, but both failed : 1 - client mode spark-submit --master yarn --deploy-mode client --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories http://repo.hortonworks.com/content/groups/public/ --files /etc/hbase/conf/hbase-site.xml --keytab=/path/to/my/keytab/myuser.keytab --principal=myuser@DOMAIN.CORP readExample1.py Driver execution hangs waiting for executors that fail to connect to HBase : Logs from an executor : 17/10/23 16:28:10 ERROR AbstractRpcClient: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'.
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:642)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$600(RpcClientImpl.java:166)
2 - cluster mode spark-submit --master yarn --deploy-mode cluster --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories http://repo.hortonworks.com/content/groups/public/ --files /etc/hbase/conf/hbase-site.xml --keytab=/path/to/my/keytab/myuser.keytab --principal=myuser@DOMAIN.CORP readExample1.py Here, even the driver (that runs on the cluster) fails to connect to HBase : Logs from the driver : 17/10/23 14:02:18 ERROR AbstractRpcClient: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'.
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:642)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$600(RpcClientImpl.java:166)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:769)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:766)
My question are rather simple : - is it possible that my driver & executor successfully connect to HBase ? - what should I do in addition to passing them my kerberos keytab/principal to make this work ? Thanks for your help
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Spark
-
Apache YARN
10-19-2017
03:23 PM
Hi, I'm trying to execute python code with SHC (spark hbase connector) to connect to hbase from a python spark-based script. Here is a simple example I can provide to illustrate : # readExample.py
from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext()
sqlc = SQLContext(sc)
data_source_format = 'org.apache.spark.sql.execution.datasources.hbase'
catalog = ''.join("""{
"table":{"namespace":"default", "name":"firsttable"},
"rowkey":"key",
"columns":{
"firstcol":{"cf":"rowkey", "col":"key", "type":"string"},
"secondcol":{"cf":"d", "col":"colname", "type":"string"}
}
}""".split())
df = sqlc.read\
.options(catalog=catalog)\
.format(data_source_format)\
.load()
df.select("secondcol").show()
In order to execute this properly, I successfully executed following command line : spark-submit --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories http://repo.hortonworks.com/content/groups/public/ --files /etc/hbase/conf/hbase-site.xml readExample.py Great 🙂 Now, I would like to run this exact same example from my jupyter notebook... After a while, I finally figured out how to proceed to pass the required "package" to spark by adding following cell at the begining of my notebook : import os
import findspark
os.environ["SPARK_HOME"] = '/usr/hdp/current/spark-client'
findspark.init('/usr/hdp/current/spark-client')
os.environ['PYSPARK_SUBMIT_ARGS'] = ("--repositories http://repo.hortonworks.com/content/groups/public/ " "--packages com.hortonworks:shc-core:1.1.1-1.6-s_2.10 " " pyspark-shell") ...But when I ran all the cells from my notebook, I got following exception : Py4JJavaError: An error occurred while calling o50.showString.
: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the locations
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:312)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:151)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:59)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)
at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:295)
at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160)
at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:155)
at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:821)
at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:193)
at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:89)
at org.apache.hadoop.hbase.client.MetaScanner.listTableRegionLocations(MetaScanner.java:343)
at org.apache.hadoop.hbase.client.HRegionLocator.listRegionLocations(HRegionLocator.java:142)
at org.apache.hadoop.hbase.client.HRegionLocator.getStartEndKeys(HRegionLocator.java:118)
at org.apache.spark.sql.execution.datasources.hbase.RegionResource$$anonfun$1.apply(HBaseResources.scala:109)
at org.apache.spark.sql.execution.datasources.hbase.RegionResource$$anonfun$1.apply(HBaseResources.scala:108)
at org.apache.spark.sql.execution.datasources.hbase.ReferencedResource$class.releaseOnException(HBaseResources.scala:77)
at org.apache.spark.sql.execution.datasources.hbase.RegionResource.releaseOnException(HBaseResources.scala:88)
at org.apache.spark.sql.execution.datasources.hbase.RegionResource.<init>(HBaseResources.scala:108)
at org.apache.spark.sql.execution.datasources.hbase.HBaseTableScanRDD.getPartitions(HBaseTableScan.scala:61)
From what I understood, this exception probably came up because the hbase client component could not use right hbase-site.xml (that defines zookeeper quorum...) I tried to add "--files /etc/hbase/conf/hbase-site.xml" in the content of the PYSPARK_SUBMIT_ARGS environment variable, but this did not change anything... Any idea how to pass the hbase-site.xml properly ? Thanks for your help
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Spark
09-15-2017
01:37 PM
In fact I realized that I had to set this properties in the "Custom spark-default" section in Ambari. This way, they are written to spark-defaults.conf configuration file and things work fine
... View more
09-14-2017
10:18 AM
Hi, I'm trying to set a default value for "spark.driver.extraJavaOptions" configuration property from ambari, in order to avoid that all my users have to define it in the command line arguments. I tried to define this property in the "Custom spark-javaopts-properties" section in the ambari UI, but it didn't work (the property seems not to be used anywhere), and even worse, I am not able to find out where this property ends up ? I thought it should be written to a spark configuration file (spark-defaults.conf or anything else), but couldn't find the property anywhere ... Does anyone know if I picked up the right place to define the property and where it goes in configuration files ? Thanks for your help
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Spark
09-14-2017
10:08 AM
I figured out what was wrong : In fact my class has to extends Configured and implements Tools in order to parse the confirguration properties from the command line. Works fine now ! I even figured out that I could set the property in ambari : label "MR Map Java Heap Size" actually maps the "mapreduce.map.java.opts" property, which is pretty confusing ...
... View more
09-13-2017
02:01 PM
Hi, I'm currently struggling with map reduce configuration... I'm trying to implement the common "wordcount example", but I modified the implementation so that mappers calls an HTTPS web service to track overall progression (just for the sake of demonstration).
I have to provide the mappers' JVM with a custom truststore that containe the certificate of the CA that issued the web server's certificate and I tried to use following syntax : hadoop jar mycustommr.jar TestHttpsMR -Dmapreduce.map.java.opts="-Djavax.net.ssl.trustStore=/my/custom/path/cacerts -Djavax.net.ssl.trustStorePassword=mypassword" wordcount_in wordcount_out But I systematically hit following error : "Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory wordcount_in already exists" which indicates that arguments are not properly parsed : it seems that -Dmapreduce.map.java.opts="-Djavax.net.ssl.trustStore=/my/custom/path/cacerts -Djavax.net.ssl.trustStorePassword=mypassword" is interpreted as an application argument (the first one) instead of being passed to the mappers' JVM What's wrong with this syntax ? How could I override mapreduce.map.java.opts property without disturbing application parameters ? Thanks for your help
... View more
Labels:
- Labels:
-
Apache Hadoop
09-12-2017
02:39 PM
If I understand properly, this configuration is used by spark to secure data exhanges between the nodes, but my use case is slightly different : my executor runs custom java code that performs a call to an HTTPS server and in that context, the SSL handshake relies on the default truststore of the JVM instead of the one I configured with my own CA certificate...Maybe that's not possible and the only way to achieve this is to use the properties I mentionned previously... Thanks for your help
... View more
09-11-2017
03:06 PM
1 Kudo
Hi,
I'm trying to run a spark job for which all executors have to call a secured (HTTPS) web service on a dedicated server. During SSL handshake, this server returns a certificate that has been signed by a private (company specific) CA. The certificate of this CA has been added to a custom truststore (cacert) that I would like to point to in spark configuration in order for the executors to validate server's certificates without any extra configuration. I know that I can pass following option to my spark-submit command line : "--conf "spark.executor.extraJavaOptions=-Djavax.net.ssl.trustStore=<MyCaCert> -Djavax.net.ssl.trustStorePassword=<MyPassword>" ...but I would like to avoid asking this to all our users (because they are not supposed to know where this trustore is located and its password). I tried to use the "ssl.client.truststore.location" property as described in https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_security/content/ch_wire-webhdfs-mr-yarn.html
but it didn't change anything. Obviously spark does not use this configuration ? Do you guys know how is configured the default truststore used by spark executors ? Any help will be highly appreciated 🙂 Thanks
... View more
Labels:
- Labels:
-
Apache Spark
08-09-2017
07:35 AM
1 Kudo
End of the story : In fact, the problem was related to https://issues.apache.org/jira/browse/HADOOP-10786 I moved to hadoop-common 2.6.1 and used AuthUtil class: http://hbase.apache.org/1.2/devapidocs/org/apache/hadoop/hbase/AuthUtil.html And everything started to work fine 🙂 Thanks for your help
... View more
08-07-2017
06:19 AM
Hi Josh, Thanks for your help. Unfortunately, I'm still stuck with this issue, which seems related to hbase only, not a pure kerberos/hadoop problem if I understand properly : I gave a try to a "non-hbase" web service that simply displays the content of an HDFS folder, with the exact same idea (log on the cluster at application startup + background thread that periodically renew), and it works like a charm : I invoke the WS that properly displays the files in the HDFS folder, then I can wait for several days without any other activity on the web application and call it again successfully. Perfect. Then, back to my hbase example : my web service logs in at startup, creates an HBase connection and displays the name of one table. But if I wait more than the ticket lifetime, when I invoke again the web service, I face the previously mentionned warnings. According to your answer, I guess I can ignore the first ones, but the latest one is probably the reason why my web service ends with socket timeout error : 17/08/0116:02:01 WARN ipc.AbstractRpcClient:Couldn't setup connection for myuser@mydomain.com to hbase/myserver.mydomain.com@mydomain.com ...
As you were wondering what would occur next, I waited for a couple of minutes (>10), and got the same warning sequence again and again during this period, leading to a socket timeout error on client side (which is not acceptable...). Finally, I took a look at your last suggestion, but when I try to proceed with 'kinit -R', I face following : kinit: KDC can't fulfill requested option while renewing credentials And my ticket expiration time is not updated by this command...Could it be the root cause of my problem ? Thanks again
... View more
08-03-2017
07:11 AM
Hi,
I'm trying to setup web services that interact with my hadoop/hbase kerberized cluster.
My application is deployed in a tomcat server and I would like to avoid recreating a new HBase connection each and every time I have to access HBase.
Similarly, I want my application to be self sufficient, i.e I dont want to proceed with 'kinit' commands before starting up my tomcat server.
Thus, I would like to implement a utility class in charge of managing login operation on the cluster and connection to hbase, but I'm struggling with kind of "ticket expiration" issues. First time my GetHbaseConnection() method is invoked, it properly connects to the cluster using provided keytab & principal (using UserGroupInformation.loginUserFromKeytab(user, keyTabPath) method), and create a brand new hbase connection (ConnectionFactory.createConnection(conf)) => perfect. By default, obtained ticket has a 10h lifetime (default value from /etc/krb5.conf file), so everything seems to work fine during first 10 hours period. Unfortunately, after this ticket has expired, my code fails with following exception : 17/08/01 07:40:52 http-nio-8443-exec-4 WARN AbstractRpcClient:699 - Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
17/08/01 07:40:52 http-nio-8443-exec-4 ERROR AbstractRpcClient:709 - SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'. javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
=> I had to setup a dedicated thread that invoke UserGroupInformation.checkTGTAndReloginFromKeytab() method on a regular basis in order to refresh the ticket.
Anyway, after a long time of inactivity (typically a whole night), when I try to invoke my web service, I can see following warnings in my tomcat logs : 17/08/03 08:25:28 hconnection-0x51b0ea6-shared--pool1-t51 WARN UserGroupInformation:1113 - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
17/08/03 08:25:29 hconnection-0x51b0ea6-shared--pool1-t51 WARN UserGroupInformation:1113 - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
17/08/03 08:25:30 hconnection-0x51b0ea6-shared--pool1-t51 WARN UserGroupInformation:1113 - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
17/08/03 08:25:31 hconnection-0x51b0ea6-shared--pool1-t51 WARN UserGroupInformation:1113 - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
17/08/03 08:25:35 hconnection-0x51b0ea6-shared--pool1-t51 WARN AbstractRpcClient:695 - Couldn't setup connection for myuser@mydomain.com to hbase/myserver.mydomain.com@mydomain.com
...And my call to the web service finally fails with SocketTimeoutException... To reproduce the issue quickly, I wrote a simple java application (outside of tomcat), removed the code that logs the user in the cluster to delegate this part to an external/manual kinit operation :
Proceed with a 'kinit' operation outside of my java application. This way I am able to get a "short-life" (1 minute) ticket using a custom krb5.conf file : env KRB5_CONFIG=/local/home/myuser/mykrb5.conf kinit -kt /local/home/myuser/myuser.keytab myuser@mydomain.com
Then I execute my java standalone application that displays the name of one table in HBase on a regular basis (every 10 seconds). Note that I create a new HBase connection for every iteration, I dont try to reuse connection at the moment : public static void main(String[] args) throws IOException, InterruptedException {
System.setProperty( "sun.security.krb5.debug", "true");
Configuration configuration = HBaseConfiguration.create();
while (true) {
Connection conn = ConnectionFactory.createConnection(configuration);
Admin admin = conn.getAdmin();
TableName[] tableNames = admin.listTableNames();
System.out.println(tableNames[0].getNameWithNamespaceInclAsString());
Thread.currentThread().sleep(10000);
}
}
During 1 minute, it works perfectly, but then I face endless warnings and my code does not execute properly : 17/08/01 16:01:55 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
17/08/01 16:01:57 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
17/08/01 16:01:59 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
17/08/01 16:02:00 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
17/08/01 16:02:01 WARN ipc.AbstractRpcClient: Couldn't setup connection for myuser@mydomain.com to hbase/myserver.mydomain.com@mydomain.com ...
I dont understand the how kerberos ticket expiration and hbase connection work together, does anyone could help on this topic ? In other words, I would like that my application connects to the cluster when it starts up, and create an hbase connection that I can keep "forever". Is it possible ? What did I miss ? Thanks for your help
... View more
Labels:
- Labels:
-
Apache HBase
07-06-2017
03:01 PM
Hi, I just started to configure knox gateway in order to enforce https protocol and require authentication to access our cluster UIs & services.
I gave a try to yarn UI, following https://knox.apache.org/books/knox-0-9-0/user-guide.html#Yarn+UI documentation, but faced several problems related to wrong URLs : First, the "main" UI works as expected using following URL :
https://knoxserver:8443/gateway/{cluster}/yarn => It properly displays the list of all the yarn applications.
If I click on one application in this screen, it displays the details about this application (the application type is "MAPREDUCE") Next, if I click on the "Logs" link for the first attempt of the application, my browser complains about the URL, because this URL is something similar to "http://https//knoxserver:8443/gateway/pam/yarn/nodemanager/node/containerlogs/container_e70_1499097971447_0001_01_000001/..." Similarly, if I click on the "History" link (next to "Tracking URL:" label), my browser displays the details about the M/R job, but all the links in this page are wrongs because they miss "gateway/{cluster}/yarn" in their URL. As an example, here is the URL for the "1" link for successfull Maps operations : https://knoxserver:8443/jobhistory/attempts/job_1499097971447_0001/m/SUCCESSFUL It seems to be a problem due to the way these pages are built...? Is there a way to fix/workaround this problem with "rewrite" rules in knox service configuration ? I dont think so but as I'm a beginer with knox stuff, I prefer ask to you guys
Thanks for your help
... View more
Labels:
- Labels:
-
Apache Knox
-
Apache YARN
07-06-2017
06:25 AM
Hi Josh, You are right, I gave a try using another principal for my client to match the realm of the principal used by PQS and it works fine now... Thanks a lot for your help
... View more
07-05-2017
10:02 AM
I'm trying to use Phoenix Query Server on my kerberized cluster. I tried to connect to it with provided thin client tool without any success : /sqlline-thin.py http://myserver.fqdn:8765 Results with following error : ...
17/07/05 11:49:51 DEBUG auth.HttpAuthenticator: Authentication succeeded
17/07/05 11:49:51 DEBUG conn.DefaultManagedHttpClientConnection: http-outgoing-0: Close connection
17/07/05 11:49:51 DEBUG execchain.MainClientExec: Connection discarded
17/07/05 11:49:51 DEBUG conn.PoolingHttpClientConnectionManager: Connection released: [id: 0][route: {}->http://fr0-datalab-p31.bdata.corp:8765][total kept alive: 0; route allocated: 0 of 25; total allocated: 0 of 100]
java.lang.RuntimeException: Failed to execute HTTP Request, got HTTP/403
at org.apache.calcite.avatica.remote.AvaticaCommonsHttpClientSpnegoImpl.send(AvaticaCommonsHttpClientSpnegoImpl.java:148)
at org.apache.calcite.avatica.remote.RemoteProtobufService._apply(RemoteProtobufService.java:44)
at org.apache.calcite.avatica.remote.ProtobufService.apply(ProtobufService.java:81)
at org.apache.calcite.avatica.remote.Driver.connect(Driver.java:175)
at sqlline.DatabaseConnection.connect(DatabaseConnection.java:157)
at sqlline.DatabaseConnection.getConnection(DatabaseConnection.java:203)
However, it ends up with the cli prompt : 0: jdbc:phoenix:thin:url=http://myserver> ... but without any valid connection : As soon as I try to proceed with a "select" statement, I get "No current connection" message. To solve this, I tried to execute !connect myserver.fqdn myuser mypassword
but faced No known driver to handle "myserver.fqdn" It's worth saying that : I performed a successfull "kinit" beforehand. PQS runs fine on myserver.fqdn and listens on 8765 port (default one) Any idea how I could investigate further this issue ? Thanks for your help
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Phoenix
07-04-2017
01:59 PM
Hi, I'm facing a strange issue about knox configuration but cannot figure out what's wrong :
We have 2 instances of knox on our cluster (let's say on server1 and server2), and we have configured them for webhdfs HA. Extract of our topology file : <topology>
<gateway>
...
<provider>
<role>ha</role>
<name>HaProvider</name>
<enabled>true</enabled>
<param>
<name>WEBHDFS</name>
<value>maxFailoverAttempts=3;failoverSleep=1000;maxRetryAttempts=300;retrySleep=1000;enabled=true</value>
</param>
...
</provider>
</gateway>
...
<service>
<role>WEBHDFS</role>
<url>http://namenode1:50070/webhdfs</url>
<url>http://namenode2:50070/webhdfs</url>
</service>
</topology>
The strange thing is that if we perform a webhdfs operation through knox on server1 (with curl) : curl -s -i -k -H "Authorization: Basic dGEtMXQ3Ny1iZGF0YS1zY2g6QW50b2luZTg3MSE=" -X GET 'https://server1:8443/gateway/pam/webhdfs/v1//user/myuser/myFile.txt?op=OPEN' => we get a redirect to a https URL on server1
But if we send the same request to server2 gateway : curl -s -i -k -H "Authorization: Basic dGEtMXQ3Ny1iZGF0YS1zY2g6QW50b2luZTg3MSE=" -X GET 'https://server2:8443/gateway/pam/webhdfs/v1//user/myuser/myFile.txt?op=OPEN' => we get a redirect to a http URL on one datanode (port 1022) Cannot find any difference about server1 and server2 knox configuration, so where should I look to understand how knox redirect incoming requests to webhdfs service ? Any help will be greatly appreciated !
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Knox
06-30-2017
08:23 AM
@Jay SenSharma Hi Jay, I ran your sample requests successfully, but I'm wondering if this is possible to get all the values of the dummy metric with the REST API. In your example, you post 2 values for your dummy metric but the GET request only returns the latest value. Documentation states that ambari collector only returns latest value if startTime & endTime are not specified in the request, but even by adding thoses 2 parameters, I couldn't get the 2 original values...Any idea ? Thanks again for your help
... View more
06-30-2017
07:20 AM
Well, nobody knows apparently...Will give a try to another component because ambari metrics documentation is definitely not at the right maturity level.
... View more
06-27-2017
02:40 PM
In other words, I would like to figureout if all values are stored somewhere and AMS aggregates those values when a client requests for some metrics or is there an aggregation step when AMS receive new values for a metric ? Is this possible to retrieve several values for a metric in a single REST GET call ? I tried out to run the exact same requests as described in https://cwiki.apache.org/confluence/display/AMBARI/Metrics+Collector+API+Specification (2 values separated by 1s interval), but I never got 2 values when sending my GET request...Did someone ran this example successfully ?
... View more
06-27-2017
01:58 PM
Hi, I'm trying to understand ambari metrics API but cannot understand what data is stored/returned by the server with followings :
First, I post new metric data with 2 values at t0 & t0 + 2s : curl -H "Content-Type: application/json" -X POST -d '{"metrics":
[{"metricname": "MyMetric", "appid": "amssmoketestfake", "hostname":
"sandbox.hortonworks.com", "timestamp": 1498569374758, "starttime":
1498569374758, "metrics": {"1498569374758": 0.963781711428,
"1498569376758": 1432075898000}}]}'
"http://127.0.0.1:6188/ws/v1/timeline/metrics"
If I try to request this metric with a GET request, here what AMS returns to me : curl -H "Content-Type: application/json" -X GET
"http://127.0.0.1:6188/ws/v1/timeline/metrics?metricNames=MyMetric&appId=amssmoketestfake&hostname=sandbox.hortonworks.com"&startTime=1498569373758&endTime=1498569377758 Result :
{"metrics":[{"timestamp":1498569214725,"metadata":{},"metricname":"MyMetric","appid":"amssmoketestfake","hostname":"sandbox.hortonworks.com","starttime":1498569374758,"metrics":{"1498569376758":1.432075898E12}}]}
Why AMS does not return the 2 values for MyMetric when I explicitely define the timeframe in the GET request ? I would expect to get all the values for the timestamps between the boundaries ... Thanks for your help...
... View more
Labels:
- Labels:
-
Apache Ambari
06-21-2017
12:01 PM
@Aravindan Vijayan Thanks a lot for these helpful pointers. Actually I was not aware about this time boundary, and as soon as I changed timestamps in my json data, it started to work fine !
... View more
06-20-2017
09:48 AM
I finally figured out how to connect to ambari metrics tables with phoenix : by default, sqlline.py points to "main" hbase configuration, not the ams embedded instance. By defining the HBASE_CONF_DIR env variable, got it working : export
HBASE_CONF_DIR=/etc/ambari-metrics-collector/conf /usr/hdp/current/phoenix-client/bin/sqlline.py
fr0-datalab-p09.bdata.corp:61181:/ams-hbase-secure I guess there is something similar when trying to connect to zookeeper, to point out to embedded instance instead of "main" zookeeper of the cluster, but couldn't solve this at the moment...
... View more
06-20-2017
07:47 AM
Hi, I'm struggling with ambari metrics since a couple of days and cannot figure out how to investigate further : Basically, I have a secured (kerberized) HDP 2.5 cluster and I would like to post custom metrics in ambari metrics. It's worth saying that timeline.metrics.service.operation.mode property has "embedded" value, which means (if I understood properly) that ams has an embedded hbase instance with its own zookeeper. Let's name the server running ambari-metrics collector : server1.mydomain.com I gave a try with following requests : curl -H "Content-Type: application/json" -X POST -d '{"metrics": [{"metricname": "AMBARI_METRICS.SmokeTest.FakeMetric", "appid": "amssmoketestfake", "hostname": "server1.mydomain.com", "timestamp": 1432075898000, "starttime": 1432075898000, "metrics": {"1432075898000": 0.963781711428, "1432075899000": 1432075898000}}]}' "http://server1.mydomain.com:6188/ws/v1/timeline/metrics" => Returned HTTP 200 code, with following json data : {"errors":[]} Then, I tried to retrieve this dummy metrics with following request : curl -H "Content-Type: application/json" -X GET "http://server1.mydomain.com:6188/ws/v1/timeline/metrics?metricNames=AMBARI_METRICS.SmokeTest.FakeMetric&appId=amssmoketestfake&hostname=server1.mydomain.com" => Returned HTTP 200 code, with following json data : {"metrics":[]} Trying to figure out why my metrics dont come up in this GET request, but I'm facing security concerns : First, I want to connect to phoenix/hbase to check if metrics were properly stored. I checked following properties in /etc/ambari-metrics-collector/cinf/hbase.site file : - property hbase.zookeeper.property.clientPort : 61181 - zookeeper.znode.parent : /ams-hbase-secure So I gave a try to following command : /usr/hdp/current/phoenix-client/bin/sqlline.py server1.mydomain.com:61181:/ams-hbase-secure I receive following warning every 15 seconds, and connection never succeeds : 17/06/20 08:46:04 WARN ipc.AbstractRpcClient: Couldn't setup connection for myuser@mydomain.com to hbase/server1.mydomain.com@mydomain.com => Should I use a particular user to execute this command (ams, hbase, ... ?). Is it simply possible to connect to embedded phoenix instance like this ? I also tried out to connect to the embedded hbase's zookeeper instance with following command : zookeeper-client -server server1.mydomain.com:61181 But couldn't connect and received following errors : 2017-06-20 09:30:24,306 - ERROR [main-SendThread(server1.mydomain.com:61181):ZooKeeperSaslClient@388] - An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7))]) occurred when evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state. 2017-06-20 09:30:24,306 - ERROR [main-SendThread(server1.mydomain.com:61181):ClientCnxn$SendThread@1059] - SASL authentication with Zookeeper Quorum member failed: javax.security.sasl.SaslException: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7))]) occurred when evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state. What's wrong with this ? I performed a kinit operation beforehand, but it seems that my ticket is not granted sufficient permissions...Should I try to connect with specific user in order to read zookeeper content ? Thanks for your help
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache HBase
06-19-2017
12:05 PM
@Jay SenSharma I just gave a try to your suggestion, but still cannot get the same result as you : 1 - POST dummy metric : curl -H "Content-Type: application/json" -X POST -d '{"metrics": [{"metricname": "AMBARI_METRICS.SmokeTest.FakeMetric", "appid": "amssmoketestfake", "hostname": "sandbox.hortonworks.com", "timestamp": 1432075898000, "starttime": 1432075898000, "metrics": {"1432075898000": 0.963781711428, "1432075899000": 1432075898000}}]}' "http://sandbox.hortonworks.com:6188/ws/v1/timeline/metrics" Output : {"errors":[]} 2 - GET dummy metrics : curl -H "Content-Type: application/json" -X GET "http://sandbox.hortonworks.com:6188/ws/v1/timeline/metrics?metricNames=AMBARI_METRICS.SmokeTest.FakeMetric&appId=amssmoketestfake&hostname=sandbox.hortonworks.com" Output : {"metrics":[]} Cannot understand why posted metrics don't show up here...
... View more