About mariano_billing

Datafireball · ‎03-18-2020

Thanks for this great tutorial and I got your tutorial mostly working. However, the Python workers all failed with this following error message, not sure if because the cluster that I am working with is kerberozied but it somehow looks related to authentication and authorization. ["PYTHON_WORKER_FACTORY_SECRET"] == client_secret: File "/data12/yarn/nm/usercache/yolo/appcache/application_1579645850066_329429/container_e40_1579645850066_329429_02_000002/PY_ENV/py36yarn/lib/python3.6/os.py", line 669, in __getitem__ raise KeyError(key) from None KeyError: 'PYTHON_WORKER_FACTORY_SECRET' 20/03/18 19:25:06 ERROR executor.Executor: Exception in task 2.2 in stage 0.0 (TID 4) org.apache.spark.SparkException: Python worker exited unexpectedly (crashed) at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:230) at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234) at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166) ... 11 more 20/03/18 19:25:06 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 5 20/03/18 19:25:06 INFO executor.Executor: Running task 2.3 in stage 0.0 (TID 5)

mburgess · ‎04-19-2019

The hard part here is that Hive returns STRUCT columns as JSON strings, so even if we can parse the JSON, we've lost the type information. It's possible we can retrieve it from the metadata and (if so) create a nested record from the results. Please feel free to file a Jira for this enhancement.

mariano_billing · ‎01-24-2019

Then use several replacetext processor in a chain 1st Processor replace : "additionl_information" : with: (empty string) 2nd processor replace: = with: " : " 3rd Processor replace: ; with: "(newline) note: for the (newline) you should actually should hit shift+enter to get the new line in the nifi processor

Online	Offline
Last Visited	‎02-11-2019 02:31 PM

Member Since	‎01-23-2019 09:45 AM
Last Visited	‎02-11-2019 02:31 PM
Posts	5

Cloudera Community

Re: Running PySpark with Conda Env

Re: Convert Hive Column type String to Hive Column...

Re: How to convert key value string to json in NiF...