Member since
01-23-2019
5
Posts
0
Kudos Received
0
Solutions
03-18-2020
04:41 PM
Thanks for this great tutorial and I got your tutorial mostly working. However, the Python workers all failed with this following error message, not sure if because the cluster that I am working with is kerberozied but it somehow looks related to authentication and authorization. ["PYTHON_WORKER_FACTORY_SECRET"] == client_secret:
File "/data12/yarn/nm/usercache/yolo/appcache/application_1579645850066_329429/container_e40_1579645850066_329429_02_000002/PY_ENV/py36yarn/lib/python3.6/os.py", line 669, in __getitem__
raise KeyError(key) from None
KeyError: 'PYTHON_WORKER_FACTORY_SECRET'
20/03/18 19:25:06 ERROR executor.Executor: Exception in task 2.2 in stage 0.0 (TID 4)
org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:230)
at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166)
... 11 more
20/03/18 19:25:06 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 5
20/03/18 19:25:06 INFO executor.Executor: Running task 2.3 in stage 0.0 (TID 5)
... View more
04-19-2019
02:27 AM
The hard part here is that Hive returns STRUCT columns as JSON strings, so even if we can parse the JSON, we've lost the type information. It's possible we can retrieve it from the metadata and (if so) create a nested record from the results. Please feel free to file a Jira for this enhancement.
... View more
01-24-2019
12:24 PM
Then use several replacetext processor in a chain
1st Processor
replace : "additionl_information" :
with: (empty string)
2nd processor replace: =
with: " : "
3rd Processor
replace: ;
with: "(newline)
note: for the (newline) you should actually should hit shift+enter to get the new line in the nifi processor
... View more