Member since
01-21-2016
66
Posts
44
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1672 | 03-29-2017 11:14 AM | |
1454 | 03-27-2017 10:01 AM | |
2438 | 02-29-2016 10:00 AM | |
8940 | 01-28-2016 08:26 AM | |
2942 | 01-22-2016 03:55 PM |
03-18-2020
04:41 PM
Thanks for this great tutorial and I got your tutorial mostly working. However, the Python workers all failed with this following error message, not sure if because the cluster that I am working with is kerberozied but it somehow looks related to authentication and authorization. ["PYTHON_WORKER_FACTORY_SECRET"] == client_secret:
File "/data12/yarn/nm/usercache/yolo/appcache/application_1579645850066_329429/container_e40_1579645850066_329429_02_000002/PY_ENV/py36yarn/lib/python3.6/os.py", line 669, in __getitem__
raise KeyError(key) from None
KeyError: 'PYTHON_WORKER_FACTORY_SECRET'
20/03/18 19:25:06 ERROR executor.Executor: Exception in task 2.2 in stage 0.0 (TID 4)
org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:230)
at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166)
... 11 more
20/03/18 19:25:06 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 5
20/03/18 19:25:06 INFO executor.Executor: Running task 2.3 in stage 0.0 (TID 5)
... View more
08-21-2018
12:22 PM
According to this spark jira this is only available (or planned) in spark 2.4. @jzhang could you confirm?
... View more
05-23-2017
08:20 AM
Hello @Robert Levas thank you for your reply. I am pretty sure the conf file is being picked up. If I comment out this line: livy.server.auth.type = kerberos
Then the server can start up fine, and requests are served fine, just that without authentication. Adding or removing the following has no effect: livy.server.kerberos.keytab = /etc/security/keytabs/livy.headless.keytab
even though the log kind of suggest it is looking for it, if kerberos is switched on.
... View more
08-13-2017
09:20 PM
This feature behaves unexpectedly when the table is migrated from another HBase cluster. In this case, the table creation time can be much later than the row timestamps of all its data. A flashback query meant to select an earlier subset of data will return the following failure instead: scala> df.count
2017-08-11 20:12:40,550 INFO [main] mapreduce.PhoenixInputFormat: UseSelectColumns=true, selectColumnList.size()=3, selectColumnList=TIMESTR,DBID,OPTION
2017-08-11 20:12:40,550 INFO [main] mapreduce.PhoenixInputFormat: Select Statement: SELECT "TIMESTR","DBID","OPTION" FROM NS.USAGES
2017-08-11 20:12:40,558 ERROR [main] mapreduce.PhoenixInputFormat: Failed to get the query plan with error [ERROR 1012 (42M03): Table undefined. tableName=NS.USAGES]
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=[count#13L])
+- TungstenExchange SinglePartition, None
+- TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)], output=[count#16L])
+- Project
+- Scan ExistingRDD[TIMESTR#10,DBID#11,OPTION#12]
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
at org.apache.spark.sql.execution.aggregate.TungstenAggregate.doExecute(TungstenAggregate.scala:80)
... Which apparently means that Phoenix considers the table nonexistent at this point. I tested the same approach in sqlline and sure enough, the table is missing from "!tables" Any workaround?
... View more
03-27-2017
10:01 AM
Ok thanks I have finally got it working. It can be run like this for example: sqlline.py "sandbox:2181/hbase-secure;currentSCN=1490372958713"
... View more
12-07-2018
12:24 PM
spark.executor.extraLibraryPath did the job. Thanks! @David Tam and @Jitendra Yadav
... View more
06-07-2016
12:54 PM
@Jitendra Yadav - yes this has works thanks! This is how the process looks like when I run ps: root 17484 1 99 13:47 pts/0 00:00:59 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.65-0.b17.el6_7.x86_64/bin/java -server -XX:NewRatio=3 -XX:+UseConcMarkSweepGC -XX:-UseGCOverheadLimit -XX:CMSInitiatingOccupancyFraction=60 -Dsun.zip.disableMemoryMapping=true -Xms512m -Xmx2048m -Djava.security.auth.login.config=/etc/ambari-server/conf/krb5JAASLogin.conf -Djava.security.krb5.conf=/etc/krb5.conf -Djavax.security.auth.useSubjectCredsOnly=false -Djava.library.path=/usr/hdp/current/hadoop-client/lib/native -cp /etc/ambari-server/conf:/usr/lib/ambari-server/*:/usr/share/java/postgresql-jdbc.jar org.apache.ambari.server.controller.AmbariServer It seems setting -Djava.library.path is the only thing required - I have subsequently remove the snappy link in /usr/lib/ambari-server/ and can confirm it still works.
... View more
05-16-2016
11:53 AM
Hello I have forgotten about this, but at the end I have actually got it to work. What needs to be done was using kadmin to create a new keytab, and and add principal ambari-server@KRB.HDP to the keytab. Also it needs a full restart of the sandbox. See Setup Kerberos for Ambari Server Thanks to @Geoffrey Shelton Okot for pointing the right direction
... View more
03-16-2016
02:14 PM
@Jitendra Yadav thanks just had a look at the jira. I think in this case I will need to wait until we upgrade to spark 1.6 then. Thanks!
... View more
03-11-2018
08:32 PM
For me adding the line below to spark-defaults.conf helped based on packages installed on my test cluster. spark.executor.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native/:/usr/hdp/current/share/lzo/0.6.0/lib/native/Linux-amd64-64/
... View more