About David_Tam

Datafireball · ‎03-18-2020

Thanks for this great tutorial and I got your tutorial mostly working. However, the Python workers all failed with this following error message, not sure if because the cluster that I am working with is kerberozied but it somehow looks related to authentication and authorization. ["PYTHON_WORKER_FACTORY_SECRET"] == client_secret: File "/data12/yarn/nm/usercache/yolo/appcache/application_1579645850066_329429/container_e40_1579645850066_329429_02_000002/PY_ENV/py36yarn/lib/python3.6/os.py", line 669, in __getitem__ raise KeyError(key) from None KeyError: 'PYTHON_WORKER_FACTORY_SECRET' 20/03/18 19:25:06 ERROR executor.Executor: Exception in task 2.2 in stage 0.0 (TID 4) org.apache.spark.SparkException: Python worker exited unexpectedly (crashed) at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:230) at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234) at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166) ... 11 more 20/03/18 19:25:06 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 5 20/03/18 19:25:06 INFO executor.Executor: Running task 2.3 in stage 0.0 (TID 5)

David_Tam · ‎08-21-2018

According to this spark jira this is only available (or planned) in spark 2.4. @jzhang could you confirm?

David_Tam · ‎05-23-2017

Hello @Robert Levas thank you for your reply. I am pretty sure the conf file is being picked up. If I comment out this line: livy.server.auth.type = kerberos Then the server can start up fine, and requests are served fine, just that without authentication. Adding or removing the following has no effect: livy.server.kerberos.keytab = /etc/security/keytabs/livy.headless.keytab even though the log kind of suggest it is looking for it, if kerberos is switched on.

MilesYao · ‎08-13-2017

This feature behaves unexpectedly when the table is migrated from another HBase cluster. In this case, the table creation time can be much later than the row timestamps of all its data. A flashback query meant to select an earlier subset of data will return the following failure instead: scala> df.count 2017-08-11 20:12:40,550 INFO [main] mapreduce.PhoenixInputFormat: UseSelectColumns=true, selectColumnList.size()=3, selectColumnList=TIMESTR,DBID,OPTION 2017-08-11 20:12:40,550 INFO [main] mapreduce.PhoenixInputFormat: Select Statement: SELECT "TIMESTR","DBID","OPTION" FROM NS.USAGES 2017-08-11 20:12:40,558 ERROR [main] mapreduce.PhoenixInputFormat: Failed to get the query plan with error [ERROR 1012 (42M03): Table undefined. tableName=NS.USAGES] org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=[count#13L]) +- TungstenExchange SinglePartition, None +- TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)], output=[count#16L]) +- Project +- Scan ExistingRDD[TIMESTR#10,DBID#11,OPTION#12] at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49) at org.apache.spark.sql.execution.aggregate.TungstenAggregate.doExecute(TungstenAggregate.scala:80) ... Which apparently means that Phoenix considers the table nonexistent at this point. I tested the same approach in sqlline and sure enough, the table is missing from "!tables" Any workaround?

David_Tam · ‎03-27-2017

Ok thanks I have finally got it working. It can be run like this for example: sqlline.py "sandbox:2181/hbase-secure;currentSCN=1490372958713"

imartinez · ‎12-07-2018

spark.executor.extraLibraryPath did the job. Thanks! @David Tam and @Jitendra Yadav

David_Tam · ‎06-07-2016

@Jitendra Yadav - yes this has works thanks! This is how the process looks like when I run ps: root 17484 1 99 13:47 pts/0 00:00:59 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.65-0.b17.el6_7.x86_64/bin/java -server -XX:NewRatio=3 -XX:+UseConcMarkSweepGC -XX:-UseGCOverheadLimit -XX:CMSInitiatingOccupancyFraction=60 -Dsun.zip.disableMemoryMapping=true -Xms512m -Xmx2048m -Djava.security.auth.login.config=/etc/ambari-server/conf/krb5JAASLogin.conf -Djava.security.krb5.conf=/etc/krb5.conf -Djavax.security.auth.useSubjectCredsOnly=false -Djava.library.path=/usr/hdp/current/hadoop-client/lib/native -cp /etc/ambari-server/conf:/usr/lib/ambari-server/*:/usr/share/java/postgresql-jdbc.jar org.apache.ambari.server.controller.AmbariServer It seems setting -Djava.library.path is the only thing required - I have subsequently remove the snappy link in /usr/lib/ambari-server/ and can confirm it still works.

David_Tam · ‎05-16-2016

Hello I have forgotten about this, but at the end I have actually got it to work. What needs to be done was using kadmin to create a new keytab, and and add principal ambari-server@KRB.HDP to the keytab. Also it needs a full restart of the sandbox. See Setup Kerberos for Ambari Server Thanks to @Geoffrey Shelton Okot for pointing the right direction

David_Tam · ‎03-16-2016

@Jitendra Yadav thanks just had a look at the jira. I think in this case I will need to wait until we upgrade to spark 1.6 then. Thanks!

kushalbohra · ‎03-11-2018

For me adding the line below to spark-defaults.conf helped based on packages installed on my test cluster. spark.executor.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native/:/usr/hdp/current/share/lzo/0.6.0/lib/native/Linux-amd64-64/

Online	Offline
Last Visited	‎02-04-2019 10:23 PM

Member Since	‎01-21-2016 11:27 AM
Last Visited	‎02-04-2019 10:23 PM
Posts	66
Kudos received	44

Cloudera Community

Re: Running phoenix flashback queries / setting cu...

Re: Running phoenix flashback queries / setting cu...

Re: Phoenix / HBase problem with HDP 2.3.4 and Jav...

Re: Oozie SparkAction failing

Re: oozie SparkAction a simple job that extract-tr...

Re: Running PySpark with Conda Env

Re: Using VirtualEnv with PySpark

Re: kerberos livy :- "requirement failed: Kerberos...

Re: Running phoenix flashback queries / setting cu...

Re: Running phoenix flashback queries / setting cu...

Re: Spark saveAsTextFile with SnappyCodec on YARN ...

Re: Ambari HDFS File view - File Preview for snapp...

Re: "Requested user hive is not whitelisted and ha...

Re: Accessing Hive from spark without using kinit

Re: this version of libhadoop was built without sn...