About David_Tam

David_Tam · ‎03-21-2016

@Rahul Pathak - yes I have as I said they dont seem to work...

David_Tam · ‎03-21-2016

Hello, I am using HDP 2.3.4 sandbox, the sandbox has been kerberoized. I have loaded some struct and array types data into Hive, where the schema looks like this: exampletable |-- listOfPeople: array (nullable = false) | |-- element: struct (containsNull = true) | | |-- Name: string (nullable = false) | | |-- id: integer (nullable = false) | | |-- Email: string (nullable = false) | | |-- holiday: array (nullable = false) | | | |-- element: integer (containsNull = true) |-- departmentName: string (nullable = false) , and trying run this query in Hive View: SELECT explode(listofpeople.name) AS name from exampletable; with these Hive View settings: However I am getting these: INFO : Tez session hasn't been created yet. Opening session ERROR : Failed to execute tez graph. org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1458555995218_0002 failed 2 times due to AM Container for appattempt_1458555995218_0002_000002 exited with exitCode: -1000 For more detailed output, check application tracking page:http://sandbox.hortonworks.com:8088/cluster/app/application_1458555995218_0002Then, click on links to logs of each attempt. Diagnostics: Application application_1458555995218_0002 initialization failed (exitCode=255) with output: main : command provided 0 main : run as user is hive main : requested yarn user is hive Requested user hive is not whitelisted and has id 504,which is below the minimum allowed 1000 Failing this attempt. Failing the application. at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:726) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:217) at org.apache.hadoop.hive.ql.exec.tez.TezTask.updateSession(TezTask.java:271) at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:151) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1703) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1460) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1237) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1101) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1096) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154) at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71) at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) This thread kind of says this is the desire behaviour and google suggests to change the allowed.system.users in yarn-site (but it doesnt seems to work) If I just want to run the query successfully on the sandbox what needs to be done? Or what is the best practice solution for this? Thank you.

David_Tam · ‎03-18-2016

can you add this to the VM option -Dsun.security.krb5.debug=true and also enable a bit more log, maybe with these in log4j or equivalent: log4j.logger.org.apache.hadoop=DEBUG hopefully you can learn a bit more from the enhanced log. These are what I I get in a successful login: DEBUG 2016-03-18 08:39:57,465 6788 org.apache.hadoop.hbase.ipc.AbstractRpcClient [hconnection-0x2da3fac3-metaLookup-shared--pool3-t1] RPC Server Kerberos principal name for service=ClientService is hbase/sandbox.hortonworks.com@KRB.HDP DEBUG 2016-03-18 08:39:57,465 6788 org.apache.hadoop.hbase.ipc.AbstractRpcClient [hconnection-0x1efd2e3d-metaLookup-shared--pool4-t1] RPC Server Kerberos principal name for service=ClientService is hbase/sandbox.hortonworks.com@KRB.HDP DEBUG 2016-03-18 08:39:57,466 6789 org.apache.hadoop.hbase.ipc.AbstractRpcClient [hconnection-0x2da3fac3-metaLookup-shared--pool3-t1] Use KERBEROS authentication for service ClientService, sasl=true DEBUG 2016-03-18 08:39:57,466 6789 org.apache.hadoop.hbase.ipc.AbstractRpcClient [hconnection-0x1efd2e3d-metaLookup-shared--pool4-t1] Use KERBEROS authentication for service ClientService, sasl=true DEBUG 2016-03-18 08:39:57,484 6807 org.apache.hadoop.hbase.ipc.AbstractRpcClient [hconnection-0x2da3fac3-metaLookup-shared--pool3-t1] Connecting to sandbox.hortonworks.com/10.184.26.82:16020 DEBUG 2016-03-18 08:39:57,484 6807 org.apache.hadoop.hbase.ipc.AbstractRpcClient [hconnection-0x1efd2e3d-metaLookup-shared--pool4-t1] Connecting to sandbox.hortonworks.com/10.184.26.82:16020 DEBUG 2016-03-18 08:39:57,491 6814 org.apache.hadoop.security.UserGroupInformation [hconnection-0x2da3fac3-metaLookup-shared--pool3-t1] PrivilegedAction as:spark-Sandbox@KRB.HDP (auth:KERBEROS) from:org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:734) DEBUG 2016-03-18 08:39:57,491 6814 org.apache.hadoop.security.UserGroupInformation [hconnection-0x1efd2e3d-metaLookup-shared--pool4-t1] PrivilegedAction as:spark-Sandbox@KRB.HDP (auth:KERBEROS) from:org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:734) DEBUG 2016-03-18 08:39:57,495 6818 org.apache.hadoop.hbase.security.HBaseSaslRpcClient [hconnection-0x2da3fac3-metaLookup-shared--pool3-t1] Creating SASL GSSAPI client. Server's Kerberos principal name is hbase/sandbox.hortonworks.com@KRB.HDP DEBUG 2016-03-18 08:39:57,495 6818 org.apache.hadoop.hbase.security.HBaseSaslRpcClient [hconnection-0x1efd2e3d-metaLookup-shared--pool4-t1] Creating SASL GSSAPI client. Server's Kerberos principal name is hbase/sandbox.hortonworks.com@KRB.HDP Found ticket for spark-Sandbox@KRB.HDP to go to krbtgt/KRB.HDP@KRB.HDP expiring on Sat Mar 19 08:39:55 GMT 2016 Found ticket for spark-Sandbox@KRB.HDP to go to krbtgt/KRB.HDP@KRB.HDP expiring on Sat Mar 19 08:39:55 GMT 2016 Entered Krb5Context.initSecContext with state=STATE_NEW Entered Krb5Context.initSecContext with state=STATE_NEW Found ticket for spark-Sandbox@KRB.HDP to go to krbtgt/KRB.HDP@KRB.HDP expiring on Sat Mar 19 08:39:55 GMT 2016

David_Tam · ‎03-16-2016

@Jitendra Yadav thanks just had a look at the jira. I think in this case I will need to wait until we upgrade to spark 1.6 then. Thanks!

David_Tam · ‎03-16-2016

@Jitendra Yadav thanks for your reply. I believe these are for yarn while I am trying to run master = local[*], similar to spark-shell on sandbox. I am using spark 1.5.2 on HDP 2.3.4

David_Tam · ‎03-15-2016

Hello, I need to run spark (1.5.2) job in a kerberoized environment (I am currently testing on HDP 2.3.4 sandbox). The job needs to be able to read and write to hive (I am using HiveContext). Also I am using master = local[*], which is similar to spark-shell. I am able to do this in spark by running kinit beforehand. However is there any other way to authenticate programatically within the spark job? e.g. I am about to read / write in kerberos hdfs by running the following before the spark code, without kinit. Is there something similar I can do for hive: // following works for HDFS, but not for Hive System.setProperty("java.security.krb5.conf", krb5ConfPath); final Configuration newConf = new Configuration(); newConf.set(SERVER_PRINCIPAL_KEY, "spark-Sandbox@KRB.HDP"); newConf.set(SERVER_KEYTAB_KEY, keyTabPath); LOG.info("Logging in now... ******************* THIS REPLACE kinit **************************"); org.apache.hadoop.security.SecurityUtil.login(newConf, SERVER_KEYTAB_KEY, SERVER_PRINCIPAL_KEY, "sandbox.hortonworks.com"); LOG.info("Logged in !!! ******************* THIS REPLACE kinit **************************"); Thanks in advance. UPDATE: I have enabled lots of logging and tracked it down to the following differences in the log: with kinit I get: DEBUG 2016-03-16 11:12:09,557 6889 org.apache.hadoop.security.Groups [main] Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000 >>> KrbCreds found the default ticket granting ticket in credential cache. >>> Obtained TGT from LSA: Credentials: client=spark-Sandbox@KRB.HDP server=krbtgt/KRB.HDP@KRB.HDP authTime=20160316111142Z endTime=20160317111142Z renewTill=null flags=FORWARDABLE;INITIAL EType (skey)=17 (tkt key)=18 DEBUG 2016-03-16 11:12:09,560 6892 org.apache.hadoop.security.UserGroupInformation [main] hadoop login DEBUG 2016-03-16 11:12:09,561 6893 org.apache.hadoop.security.UserGroupInformation [main] hadoop login commit DEBUG 2016-03-16 11:12:09,562 6894 org.apache.hadoop.security.UserGroupInformation [main] using kerberos user:spark-Sandbox@KRB.HDP DEBUG 2016-03-16 11:12:09,562 6894 org.apache.hadoop.security.UserGroupInformation [main] Using user: "spark-Sandbox@KRB.HDP" with name spark-Sandbox@KRB.HDP DEBUG 2016-03-16 11:12:09,562 6894 org.apache.hadoop.security.UserGroupInformation [main] User entry: "spark-Sandbox@KRB.HDP" DEBUG 2016-03-16 11:12:09,565 6897 org.apache.hadoop.security.UserGroupInformation [main] UGI loginUser:spark-Sandbox@KRB.HDP (auth:KERBEROS) DEBUG 2016-03-16 11:12:09,567 6899 org.apache.hadoop.security.UserGroupInformation [TGT Renewer for spark-Sandbox@KRB.HDP] Found tgt Ticket (hex) = whereas at the moment login with code (and NO kinit) got me these: DEBUG 2016-03-16 11:09:58,902 7194 org.apache.hadoop.security.Groups [main] Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000 >>>KinitOptions cache name is C:\Users\davidtam\krb5cc_davidtam >> Acquire default native Credentials Using builtin default etypes for default_tkt_enctypes default etypes for default_tkt_enctypes: 17 16 23. >>> Found no TGT's in LSA DEBUG 2016-03-16 11:09:58,910 7202 org.apache.hadoop.security.UserGroupInformation [main] hadoop login DEBUG 2016-03-16 11:09:58,910 7202 org.apache.hadoop.security.UserGroupInformation [main] hadoop login commit DEBUG 2016-03-16 11:09:58,911 7203 org.apache.hadoop.security.UserGroupInformation [main] using kerberos user:null DEBUG 2016-03-16 11:09:58,912 7204 org.apache.hadoop.security.UserGroupInformation [main] using local user:NTUserPrincipal: davidtam DEBUG 2016-03-16 11:09:58,912 7204 org.apache.hadoop.security.UserGroupInformation [main] Using user: "NTUserPrincipal: davidtam" with name davidtam DEBUG 2016-03-16 11:09:58,912 7204 org.apache.hadoop.security.UserGroupInformation [main] User entry: "davidtam" DEBUG 2016-03-16 11:09:58,914 7206 org.apache.hadoop.security.UserGroupInformation [main] UGI loginUser:davidtam (auth:KERBEROS) INFO 2016-03-16 11:09:58,931 7223 hive.metastore [main] Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083 DEBUG 2016-03-16 11:09:58,963 7255 org.apache.hadoop.security.UserGroupInformation [main] PrivilegedAction as:c009003 (auth:KERBEROS) from:org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) DEBUG 2016-03-16 11:09:58,963 7255 org.apache.thrift.transport.TSaslTransport [main] opening transport org.apache.thrift.transport.TSaslClientTransport@7c206b14 >>>KinitOptions cache name is C:\Users\davidtam\krb5cc_davidtam >> Acquire default native Credentials Using builtin default etypes for default_tkt_enctypes default etypes for default_tkt_enctypes: 17 16 23. >>> Found no TGT's in LSA I am running on windows connecting to the sandbox.

David_Tam · ‎02-29-2016

ok at the end I have found a way to both read and write from phoenix into Java spark app.: // read // using jdbc - which isnt the best way of doing this as there is no push-down optimization... DataFrame dfFromHbase = SPARK_MANAGED_RESOURCE.getSparkSqlContext().read().format("jdbc") .options(ImmutableMap.of( "driver" , "org.apache.phoenix.jdbc.PhoenixDriver", "url", "jdbc:phoenix:sandbox.hortonworks.com:2181:/hbase-unsecure", "dbtable", tableName)).load(); // write // there is no column family specify - it uses whatever that has been linked up in the phoenix table dfICreated.write().format("org.apache.phoenix.spark") .mode(SaveMode.Overwrite) .options(ImmutableMap.of( "zkUrl", "sandbox:2181:/hbase-unsecure", "table", tableName)).save(); These are for sandbox 2.3.4. I hope hortonworks will upgrade to latest phoenix (4.6 or 4.7?) soon as the read would provide push down query, which I dont think the jdbc driver is doing at the moment...

David_Tam · ‎02-24-2016

I have spent a few days back then to get this to work. What you need is : jdbc:phoenix:sandbox.hortonworks.com:2181:/hbase-unsecure no username / password needed. And in your host file you need (I presume you run squirrel on windows) : 127.0.0.1 sandbox.hortonworks.com

David_Tam · ‎02-19-2016

Hello I actually have couple of questions regarding phoenix-spark on HBase I am on HDP 2.3.4, therefore with phoenix 4.4.0.2.3.4.0-3485, and Spark 1.5.2 First question regarding read, I am trying out this very nice example here , but I am getting (following from spark-shell, but also got the same in java): org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 (TID 14, sandbox.hortonworks.com): java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericMutableRow cannot be cast to org.apache.spark.sql.Row at org.apache.spark.sql.SQLContext$$anonfun$7.apply(SQLContext.scala:445) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$10.next(Iterator.scala:312) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:215) at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:215) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Which seems to be an issue with the particular spark + phoenix combo on HDP 2.3.4 according to PHOENIX-2287, and it is fixed in phoenix 4.5.3+. Is there any other way to get round this or have to wait until Hortonworks do an upgrade? Secondly due to a decision made high up in my organization to not use Scala, I can only use Java and it seems that this example from phoenix (in particular the saveToPhoenix method) : sc.parallelize(dataSet) .saveToPhoenix( "OUTPUT_TEST_TABLE", Seq("ID","COL1","COL2"), zkUrl = Some("phoenix-server:2181") ) is not available to java according this thread on SO. Is this true? Anyway I tried with Java by firstly creating this simple table in phoenix: CREATE TABLE EXAMPLE1 (id BIGINT NOT NULL PRIMARY KEY, COLUMN1 VARCHAR) And then run the following code java to write the dataframe: DataFrame writeDF = df.withColumnRenamed("Key", "id") .withColumnRenamed("somecolumn", "COLUMN1") .selectExpr(new String[]{"id", "COLUMN1"}) // doesnt work even if I renamed with prefix "0." with any of the following: // .withColumnRenamed("COLUMN1", "0.COLUMN1") // .withColumnRenamed("COLUMN1", "`0.COLUMN1`") ; df.write() .format("org.apache.phoenix.spark") .options( ImmutableMap.of("table" , "EXAMPLE1", "zkUrl", "sandbox:2181:/hbase-unsecure")) .mode(SaveMode.Overwrite) .save(); But I am getting these: org.apache.spark.sql.AnalysisException: cannot resolve '0.COLUMN1' given input columns id, 0.COLUMN1; at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:56) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:53) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:293) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:293) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51) In any case is there a way or example to read / write DataFrame via phoenix for the specific versions of HDP / Phoenix using java? Thank you in advance!

David_Tam · ‎02-18-2016

Thanks all for the input. The phoenix-spark example looks very close to what we need but I am not sure if people in my team would be happy with phoenix but I will bring this up and see. Meanwhile I think I will also follow the HBase jira and hope that it will be out soon. Thank you!

Online	Offline
Last Visited	‎02-04-2019 10:23 PM

Member Since	‎01-21-2016 11:27 AM
Last Visited	‎02-04-2019 10:23 PM
Posts	66
Kudos received	44

Cloudera Community

Re: Running phoenix flashback queries / setting cu...

Re: Running phoenix flashback queries / setting cu...

Re: Phoenix / HBase problem with HDP 2.3.4 and Jav...

Re: Oozie SparkAction failing

Re: oozie SparkAction a simple job that extract-tr...

Re: "Requested user hive is not whitelisted and ha...

"Requested user hive is not whitelisted and has id...

Re: I'm getting an error trying to write from Spar...

Re: Accessing Hive from spark without using kinit

Re: Accessing Hive from spark without using kinit

Accessing Hive from spark without using kinit

Re: Phoenix / HBase problem with HDP 2.3.4 and Jav...

Re: SQuirreL on phoenix - Sandbox

Phoenix / HBase problem with HDP 2.3.4 and Java

Re: Reading from and Writing to HBase with a spark...