Member since
01-21-2016
66
Posts
44
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1698 | 03-29-2017 11:14 AM | |
1476 | 03-27-2017 10:01 AM | |
2477 | 02-29-2016 10:00 AM | |
9077 | 01-28-2016 08:26 AM | |
2985 | 01-22-2016 03:55 PM |
03-21-2016
03:56 PM
1 Kudo
@Rahul Pathak - yes I have as I said they dont seem to work...
... View more
03-21-2016
03:35 PM
1 Kudo
Hello, I am using HDP 2.3.4 sandbox, the sandbox has been kerberoized. I have loaded some struct and array types data into Hive, where the schema looks like this: exampletable
|-- listOfPeople: array (nullable = false)
| |-- element: struct (containsNull = true)
| | |-- Name: string (nullable = false)
| | |-- id: integer (nullable = false)
| | |-- Email: string (nullable = false)
| | |-- holiday: array (nullable = false)
| | | |-- element: integer (containsNull = true)
|-- departmentName: string (nullable = false)
, and trying run this query in Hive View: SELECT explode(listofpeople.name) AS name from exampletable; with these Hive View settings: However I am getting these: INFO : Tez session hasn't been created yet. Opening session
ERROR : Failed to execute tez graph.
org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1458555995218_0002 failed 2 times due to AM Container for appattempt_1458555995218_0002_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://sandbox.hortonworks.com:8088/cluster/app/application_1458555995218_0002Then, click on links to logs of each attempt.
Diagnostics: Application application_1458555995218_0002 initialization failed (exitCode=255) with output: main : command provided 0
main : run as user is hive
main : requested yarn user is hive
Requested user hive is not whitelisted and has id 504,which is below the minimum allowed 1000
Failing this attempt. Failing the application.
at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:726)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:217)
at org.apache.hadoop.hive.ql.exec.tez.TezTask.updateSession(TezTask.java:271)
at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:151)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1703)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1460)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1237)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1101)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1096)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154)
at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71)
at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
This thread kind of says this is the desire behaviour and google suggests to change the allowed.system.users in yarn-site (but it doesnt seems to work) If I just want to run the query successfully on the sandbox what needs to be done? Or what is the best practice solution for this? Thank you.
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hive
-
Apache Tez
03-18-2016
08:44 AM
1 Kudo
can you add this to the VM option -Dsun.security.krb5.debug=true and also enable a bit more log, maybe with these in log4j or equivalent: log4j.logger.org.apache.hadoop=DEBUG hopefully you can learn a bit more from the enhanced log. These are what I I get in a successful login: DEBUG 2016-03-18 08:39:57,465 6788 org.apache.hadoop.hbase.ipc.AbstractRpcClient [hconnection-0x2da3fac3-metaLookup-shared--pool3-t1] RPC Server Kerberos principal name for service=ClientService is hbase/sandbox.hortonworks.com@KRB.HDP
DEBUG 2016-03-18 08:39:57,465 6788 org.apache.hadoop.hbase.ipc.AbstractRpcClient [hconnection-0x1efd2e3d-metaLookup-shared--pool4-t1] RPC Server Kerberos principal name for service=ClientService is hbase/sandbox.hortonworks.com@KRB.HDP
DEBUG 2016-03-18 08:39:57,466 6789 org.apache.hadoop.hbase.ipc.AbstractRpcClient [hconnection-0x2da3fac3-metaLookup-shared--pool3-t1] Use KERBEROS authentication for service ClientService, sasl=true
DEBUG 2016-03-18 08:39:57,466 6789 org.apache.hadoop.hbase.ipc.AbstractRpcClient [hconnection-0x1efd2e3d-metaLookup-shared--pool4-t1] Use KERBEROS authentication for service ClientService, sasl=true
DEBUG 2016-03-18 08:39:57,484 6807 org.apache.hadoop.hbase.ipc.AbstractRpcClient [hconnection-0x2da3fac3-metaLookup-shared--pool3-t1] Connecting to sandbox.hortonworks.com/10.184.26.82:16020
DEBUG 2016-03-18 08:39:57,484 6807 org.apache.hadoop.hbase.ipc.AbstractRpcClient [hconnection-0x1efd2e3d-metaLookup-shared--pool4-t1] Connecting to sandbox.hortonworks.com/10.184.26.82:16020
DEBUG 2016-03-18 08:39:57,491 6814 org.apache.hadoop.security.UserGroupInformation [hconnection-0x2da3fac3-metaLookup-shared--pool3-t1] PrivilegedAction as:spark-Sandbox@KRB.HDP (auth:KERBEROS) from:org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:734)
DEBUG 2016-03-18 08:39:57,491 6814 org.apache.hadoop.security.UserGroupInformation [hconnection-0x1efd2e3d-metaLookup-shared--pool4-t1] PrivilegedAction as:spark-Sandbox@KRB.HDP (auth:KERBEROS) from:org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:734)
DEBUG 2016-03-18 08:39:57,495 6818 org.apache.hadoop.hbase.security.HBaseSaslRpcClient [hconnection-0x2da3fac3-metaLookup-shared--pool3-t1] Creating SASL GSSAPI client. Server's Kerberos principal name is hbase/sandbox.hortonworks.com@KRB.HDP
DEBUG 2016-03-18 08:39:57,495 6818 org.apache.hadoop.hbase.security.HBaseSaslRpcClient [hconnection-0x1efd2e3d-metaLookup-shared--pool4-t1] Creating SASL GSSAPI client. Server's Kerberos principal name is hbase/sandbox.hortonworks.com@KRB.HDP
Found ticket for spark-Sandbox@KRB.HDP to go to krbtgt/KRB.HDP@KRB.HDP expiring on Sat Mar 19 08:39:55 GMT 2016
Found ticket for spark-Sandbox@KRB.HDP to go to krbtgt/KRB.HDP@KRB.HDP expiring on Sat Mar 19 08:39:55 GMT 2016
Entered Krb5Context.initSecContext with state=STATE_NEW
Entered Krb5Context.initSecContext with state=STATE_NEW
Found ticket for spark-Sandbox@KRB.HDP to go to krbtgt/KRB.HDP@KRB.HDP expiring on Sat Mar 19 08:39:55 GMT 2016
... View more
03-16-2016
02:14 PM
@Jitendra Yadav thanks just had a look at the jira. I think in this case I will need to wait until we upgrade to spark 1.6 then. Thanks!
... View more
03-16-2016
01:00 PM
@Jitendra Yadav thanks for your reply. I believe these are for yarn while I am trying to run master = local[*], similar to spark-shell on sandbox. I am using spark 1.5.2 on HDP 2.3.4
... View more
03-15-2016
05:27 PM
2 Kudos
Hello, I need to run spark (1.5.2) job in a kerberoized environment (I am currently testing on HDP 2.3.4 sandbox). The job needs to be able to read and write to hive (I am using HiveContext). Also I am using master = local[*], which is similar to spark-shell. I am able to do this in spark by running kinit beforehand. However is there any other way to authenticate programatically within the spark job? e.g. I am about to read / write in kerberos hdfs by running the following before the spark code, without kinit. Is there something similar I can do for hive:
// following works for HDFS, but not for Hive
System.setProperty("java.security.krb5.conf", krb5ConfPath);
final Configuration newConf = new Configuration();
newConf.set(SERVER_PRINCIPAL_KEY, "spark-Sandbox@KRB.HDP");
newConf.set(SERVER_KEYTAB_KEY, keyTabPath);
LOG.info("Logging in now... ******************* THIS REPLACE kinit **************************");
org.apache.hadoop.security.SecurityUtil.login(newConf, SERVER_KEYTAB_KEY, SERVER_PRINCIPAL_KEY, "sandbox.hortonworks.com");
LOG.info("Logged in !!! ******************* THIS REPLACE kinit **************************");
Thanks in advance. UPDATE: I have enabled lots of logging and tracked it down to the following differences in the log: with kinit I get: DEBUG 2016-03-16 11:12:09,557 6889 org.apache.hadoop.security.Groups [main] Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000
>>> KrbCreds found the default ticket granting ticket in credential cache.
>>> Obtained TGT from LSA: Credentials:
client=spark-Sandbox@KRB.HDP
server=krbtgt/KRB.HDP@KRB.HDP
authTime=20160316111142Z
endTime=20160317111142Z
renewTill=null
flags=FORWARDABLE;INITIAL
EType (skey)=17
(tkt key)=18
DEBUG 2016-03-16 11:12:09,560 6892 org.apache.hadoop.security.UserGroupInformation [main] hadoop login
DEBUG 2016-03-16 11:12:09,561 6893 org.apache.hadoop.security.UserGroupInformation [main] hadoop login commit
DEBUG 2016-03-16 11:12:09,562 6894 org.apache.hadoop.security.UserGroupInformation [main] using kerberos user:spark-Sandbox@KRB.HDP
DEBUG 2016-03-16 11:12:09,562 6894 org.apache.hadoop.security.UserGroupInformation [main] Using user: "spark-Sandbox@KRB.HDP" with name spark-Sandbox@KRB.HDP
DEBUG 2016-03-16 11:12:09,562 6894 org.apache.hadoop.security.UserGroupInformation [main] User entry: "spark-Sandbox@KRB.HDP"
DEBUG 2016-03-16 11:12:09,565 6897 org.apache.hadoop.security.UserGroupInformation [main] UGI loginUser:spark-Sandbox@KRB.HDP (auth:KERBEROS)
DEBUG 2016-03-16 11:12:09,567 6899 org.apache.hadoop.security.UserGroupInformation [TGT Renewer for spark-Sandbox@KRB.HDP] Found tgt Ticket (hex) =
whereas at the moment login with code (and NO kinit) got me these: DEBUG 2016-03-16 11:09:58,902 7194 org.apache.hadoop.security.Groups [main] Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000
>>>KinitOptions cache name is C:\Users\davidtam\krb5cc_davidtam
>> Acquire default native Credentials
Using builtin default etypes for default_tkt_enctypes
default etypes for default_tkt_enctypes: 17 16 23.
>>> Found no TGT's in LSA
DEBUG 2016-03-16 11:09:58,910 7202 org.apache.hadoop.security.UserGroupInformation [main] hadoop login
DEBUG 2016-03-16 11:09:58,910 7202 org.apache.hadoop.security.UserGroupInformation [main] hadoop login commit
DEBUG 2016-03-16 11:09:58,911 7203 org.apache.hadoop.security.UserGroupInformation [main] using kerberos user:null
DEBUG 2016-03-16 11:09:58,912 7204 org.apache.hadoop.security.UserGroupInformation [main] using local user:NTUserPrincipal: davidtam
DEBUG 2016-03-16 11:09:58,912 7204 org.apache.hadoop.security.UserGroupInformation [main] Using user: "NTUserPrincipal: davidtam" with name davidtam
DEBUG 2016-03-16 11:09:58,912 7204 org.apache.hadoop.security.UserGroupInformation [main] User entry: "davidtam"
DEBUG 2016-03-16 11:09:58,914 7206 org.apache.hadoop.security.UserGroupInformation [main] UGI loginUser:davidtam (auth:KERBEROS)
INFO 2016-03-16 11:09:58,931 7223 hive.metastore [main] Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083
DEBUG 2016-03-16 11:09:58,963 7255 org.apache.hadoop.security.UserGroupInformation [main] PrivilegedAction as:c009003 (auth:KERBEROS) from:org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
DEBUG 2016-03-16 11:09:58,963 7255 org.apache.thrift.transport.TSaslTransport [main] opening transport org.apache.thrift.transport.TSaslClientTransport@7c206b14
>>>KinitOptions cache name is C:\Users\davidtam\krb5cc_davidtam
>> Acquire default native Credentials
Using builtin default etypes for default_tkt_enctypes
default etypes for default_tkt_enctypes: 17 16 23.
>>> Found no TGT's in LSA
I am running on windows connecting to the sandbox.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
02-29-2016
10:00 AM
3 Kudos
ok at the end I have found a way to both read and write from phoenix into Java spark app.: // read
// using jdbc - which isnt the best way of doing this as there is no push-down optimization...
DataFrame dfFromHbase = SPARK_MANAGED_RESOURCE.getSparkSqlContext().read().format("jdbc")
.options(ImmutableMap.of(
"driver" , "org.apache.phoenix.jdbc.PhoenixDriver", "url",
"jdbc:phoenix:sandbox.hortonworks.com:2181:/hbase-unsecure",
"dbtable", tableName)).load();
// write
// there is no column family specify - it uses whatever that has been linked up in the phoenix table
dfICreated.write().format("org.apache.phoenix.spark")
.mode(SaveMode.Overwrite)
.options(ImmutableMap.of(
"zkUrl", "sandbox:2181:/hbase-unsecure",
"table", tableName)).save();
These are for sandbox 2.3.4. I hope hortonworks will upgrade to latest phoenix (4.6 or 4.7?) soon as the read would provide push down query, which I dont think the jdbc driver is doing at the moment...
... View more
02-24-2016
09:06 AM
1 Kudo
I have spent a few days back then to get this to work. What you need is : jdbc:phoenix:sandbox.hortonworks.com:2181:/hbase-unsecure no username / password needed. And in your host file you need (I presume you run squirrel on windows) : 127.0.0.1 sandbox.hortonworks.com
... View more
02-19-2016
02:48 PM
2 Kudos
Hello I actually have couple of questions regarding phoenix-spark on HBase I am on HDP 2.3.4, therefore with phoenix 4.4.0.2.3.4.0-3485, and Spark 1.5.2 First question regarding read, I am trying out this very nice example here , but I am getting (following from spark-shell, but also got the same in java): org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 (TID 14, sandbox.hortonworks.com): java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericMutableRow cannot be cast to org.apache.spark.sql.Row
at org.apache.spark.sql.SQLContext$$anonfun$7.apply(SQLContext.scala:445)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:215)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:215)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Which seems to be an issue with the particular spark + phoenix combo on HDP 2.3.4 according to PHOENIX-2287, and it is fixed in phoenix 4.5.3+. Is there any other way to get round this or have to wait until Hortonworks do an upgrade? Secondly due to a decision made high up in my organization to not use Scala, I can only use Java and it seems that this example from phoenix (in particular the saveToPhoenix method) : sc.parallelize(dataSet)
.saveToPhoenix(
"OUTPUT_TEST_TABLE",
Seq("ID","COL1","COL2"),
zkUrl = Some("phoenix-server:2181")
)
is not available to java according this thread on SO. Is this true? Anyway I tried with Java by firstly creating this simple table in phoenix: CREATE TABLE EXAMPLE1 (id BIGINT NOT NULL PRIMARY KEY, COLUMN1 VARCHAR) And then run the following code java to write the dataframe: DataFrame writeDF = df.withColumnRenamed("Key", "id")
.withColumnRenamed("somecolumn", "COLUMN1")
.selectExpr(new String[]{"id", "COLUMN1"})
// doesnt work even if I renamed with prefix "0." with any of the following:
// .withColumnRenamed("COLUMN1", "0.COLUMN1")
// .withColumnRenamed("COLUMN1", "`0.COLUMN1`")
;
df.write()
.format("org.apache.phoenix.spark")
.options( ImmutableMap.of("table" , "EXAMPLE1",
"zkUrl", "sandbox:2181:/hbase-unsecure"))
.mode(SaveMode.Overwrite)
.save();
But I am getting these: org.apache.spark.sql.AnalysisException: cannot resolve '0.COLUMN1' given input columns id, 0.COLUMN1;
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:56)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:53)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:293)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)
In any case is there a way or example to read / write DataFrame via phoenix for the specific versions of HDP / Phoenix using java? Thank you in advance!
... View more
Labels:
02-18-2016
08:28 AM
Thanks all for the input. The phoenix-spark example looks very close to what we need but I am not sure if people in my team would be happy with phoenix but I will bring this up and see. Meanwhile I think I will also follow the HBase jira and hope that it will be out soon. Thank you!
... View more
- « Previous
- Next »