About mariano_billing

mariano_billing · ‎02-11-2019

Hi @hkropp I followed your post and when I trigger my spark-submit and the application is accepted by the yarn cluster I get the following error: Do you know what might be the reason? In STDOUT ./CONDA_TEST/test_env3/bin/python: ./CONDA_TEST/test_env3/bin/python: cannot execute binary file In STDERR 19/02/11 10:48:43 INFO ApplicationMaster: Preparing Local resources 19/02/11 10:48:44 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1548016788605_1234_000002 19/02/11 10:48:44 INFO SecurityManager: Changing view acls to: mbilling 19/02/11 10:48:44 INFO SecurityManager: Changing modify acls to: mbilling 19/02/11 10:48:44 INFO SecurityManager: Changing view acls groups to: 19/02/11 10:48:44 INFO SecurityManager: Changing modify acls groups to: 19/02/11 10:48:44 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(mbilling); groups with view permissions: Set(); users with modify permissions: Set(mbilling); groups with modify permissions: Set() 19/02/11 10:48:44 INFO ApplicationMaster: Starting the user application in a separate Thread 19/02/11 10:48:44 INFO ApplicationMaster: Waiting for spark context initialization... 19/02/11 10:48:44 ERROR ApplicationMaster: User application exited with status 126 19/02/11 10:48:44 INFO ApplicationMaster: Final app status: FAILED, exitCode: 126, (reason: User application exited with status 126) 19/02/11 10:48:44 ERROR ApplicationMaster: Uncaught exception: org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:194) at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:428) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:281) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:783) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:67) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:66) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:781) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) Caused by: org.apache.spark.SparkUserAppException: User application exited with 126 at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:104) at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:654) 19/02/11 10:48:44 INFO ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: User application exited with status 126) 19/02/11 10:48:44 INFO ApplicationMaster: Deleting staging directory hdfs://pmids01/user/mbilling/.sparkStaging/application_1548016788605_1234 19/02/11 10:48:44 INFO ShutdownHookManager: Shutdown hook called Im running like this export SPARK_MAJOR_VERSION=2; \ export PYSPARK_PYTHON=./CONDA_TEST/test_env3/bin/python; \ export PYSPARK_DRIVER_PYTHON=./CONDA_TEST/test_env3/bin/python; \ export PYSPARK_DRIVER_PYTHON_OPTS=''; \ cd deploy;\ spark-submit --master yarn --deploy-mode cluster \ --verbose \ --num-executors 1 --driver-memory 1g --executor-memory 1g \ --files /usr/hdp/current/spark2-client/conf/hive-site.xml \ --jars /usr/hdp/current/hive-webhcat/share/hcatalog/hive-hcatalog-core.jar \ --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./CONDA_TEST/test_env3/bin/python \ --archives test_env3.zip#CONDA_TEST \ --py-files main.py main.py

mariano_billing · ‎01-24-2019

Then use several replacetext processor in a chain 1st Processor replace : "additionl_information" : with: (empty string) 2nd processor replace: = with: " : " 3rd Processor replace: ; with: "(newline) note: for the (newline) you should actually should hit shift+enter to get the new line in the nifi processor

mariano_billing · ‎01-23-2019

You have to use a replace text Processor and configure it in the following way ReplaceText Processor will replace content in the flowfile based on a regular expression match. Everything that matches the regular expression will be replaced. The key of the regular expression is the use of parentheses (), in the way that the content of the first parentheses can be referenced later as $1 and the content of the second parentheses can be referenced as $2. Based on your example this is the content of the incoming flowfile And this the result after the ReplaceText processor with the above configuration

mariano_billing · ‎01-23-2019

@Yoel Barsheshet You can use a replacetext processor with a regular expression as in this example I did for you Example input flowfile After the replacetext processor with the above configuration

mariano_billing · ‎01-23-2019

Dear community I have executed a SELECT * to fetch all the tables from a hive table using Nifi 1.6 SelectHiveQL processor. The problem I have is the source table has a column (satellite_metadata) which is type struct<record_source:string,load_time:timestamp,checksum:string,device_hash:string>). However in the flowfile returned by SelectHiveQL the type of column satellite_metadata is string. After fetching the data, I am converting AVRO to ORC, storing the file in HDFS, extracting the HIVE DDL from the flowfile and creating the hive table. The content of column satellite_metadata for a single record looks like this: {"record_source":"RAB","load_time":"2019-01-18 03:16:26.93","checksum":"11396be4b6cfe13542d3d6708546a4a4","device_hash":"2eac97fce07480194301e482680fe05e"} I tried to define the correct structure at CREATE TABLE and also afterwards using ALTER TABLE But I get the following error when I try to do SELECT * ORC does not support type conversion from file type string (14) to reader type struct Any ideas how can I set the proper type for this struct column? I've tried also to store the avro file in HDFS, but later I don't know how to create an external hive table that can read the schema from the .avro file.

Online	Offline
Last Visited	‎02-11-2019 02:31 PM

Member Since	‎01-23-2019 09:45 AM
Last Visited	‎02-11-2019 02:31 PM
Posts	5

Cloudera Community

Re: Running PySpark with Conda Env

Re: How to convert key value string to json in NiF...

Re: How to convert key value string to json in NiF...

Re: How to convert key value string to json in NiF...

Convert Hive Column type String to Hive Column typ...