Member since
01-23-2019
5
Posts
0
Kudos Received
0
Solutions
02-11-2019
02:31 PM
Hi @hkropp I followed your post and when I trigger my spark-submit and the application is accepted by the yarn cluster I get the following error: Do you know what might be the reason? In STDOUT ./CONDA_TEST/test_env3/bin/python: ./CONDA_TEST/test_env3/bin/python: cannot execute binary file In STDERR 19/02/11 10:48:43 INFO ApplicationMaster: Preparing Local resources
19/02/11 10:48:44 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1548016788605_1234_000002
19/02/11 10:48:44 INFO SecurityManager: Changing view acls to: mbilling
19/02/11 10:48:44 INFO SecurityManager: Changing modify acls to: mbilling
19/02/11 10:48:44 INFO SecurityManager: Changing view acls groups to:
19/02/11 10:48:44 INFO SecurityManager: Changing modify acls groups to:
19/02/11 10:48:44 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(mbilling); groups with view permissions: Set(); users with modify permissions: Set(mbilling); groups with modify permissions: Set()
19/02/11 10:48:44 INFO ApplicationMaster: Starting the user application in a separate Thread
19/02/11 10:48:44 INFO ApplicationMaster: Waiting for spark context initialization...
19/02/11 10:48:44 ERROR ApplicationMaster: User application exited with status 126
19/02/11 10:48:44 INFO ApplicationMaster: Final app status: FAILED, exitCode: 126, (reason: User application exited with status 126)
19/02/11 10:48:44 ERROR ApplicationMaster: Uncaught exception:
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:194)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:428)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:281)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:783)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:67)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:66)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:781)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
Caused by: org.apache.spark.SparkUserAppException: User application exited with 126
at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:104)
at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:654)
19/02/11 10:48:44 INFO ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: User application exited with status 126)
19/02/11 10:48:44 INFO ApplicationMaster: Deleting staging directory hdfs://pmids01/user/mbilling/.sparkStaging/application_1548016788605_1234
19/02/11 10:48:44 INFO ShutdownHookManager: Shutdown hook called Im running like this export SPARK_MAJOR_VERSION=2; \
export PYSPARK_PYTHON=./CONDA_TEST/test_env3/bin/python; \
export PYSPARK_DRIVER_PYTHON=./CONDA_TEST/test_env3/bin/python; \
export PYSPARK_DRIVER_PYTHON_OPTS=''; \
cd deploy;\
spark-submit --master yarn --deploy-mode cluster \
--verbose \
--num-executors 1 --driver-memory 1g --executor-memory 1g \
--files /usr/hdp/current/spark2-client/conf/hive-site.xml \
--jars /usr/hdp/current/hive-webhcat/share/hcatalog/hive-hcatalog-core.jar \
--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./CONDA_TEST/test_env3/bin/python \
--archives test_env3.zip#CONDA_TEST \
--py-files main.py main.py
... View more
01-24-2019
12:24 PM
Then use several replacetext processor in a chain
1st Processor
replace : "additionl_information" :
with: (empty string)
2nd processor replace: =
with: " : "
3rd Processor
replace: ;
with: "(newline)
note: for the (newline) you should actually should hit shift+enter to get the new line in the nifi processor
... View more
01-23-2019
02:26 PM
You have to use a replace text Processor and configure it in the following way ReplaceText Processor will replace content in the flowfile based on a regular expression match. Everything that matches the regular expression will be replaced. The key of the regular expression is the use of parentheses (), in the way that the content of the first parentheses can be referenced later as $1 and the content of the second parentheses can be referenced as $2. Based on your example this is the content of the incoming flowfile And this the result after the ReplaceText processor with the above configuration
... View more
01-23-2019
02:22 PM
@Yoel Barsheshet
You can use a replacetext processor with a regular expression as in this example I did for you Example input flowfile After the replacetext processor with the above configuration
... View more
01-23-2019
10:23 AM
Dear community I have executed a SELECT * to fetch all the tables from a hive table using Nifi 1.6 SelectHiveQL processor. The problem I have is the source table has a column (satellite_metadata) which is type struct<record_source:string,load_time:timestamp,checksum:string,device_hash:string>). However in the flowfile returned by SelectHiveQL the type of column satellite_metadata is string. After fetching the data, I am converting AVRO to ORC, storing the file in HDFS, extracting the HIVE DDL from the flowfile and creating the hive table. The content of column satellite_metadata for a single record looks like this: {"record_source":"RAB","load_time":"2019-01-18 03:16:26.93","checksum":"11396be4b6cfe13542d3d6708546a4a4","device_hash":"2eac97fce07480194301e482680fe05e"} I tried to define the correct structure at CREATE TABLE and also afterwards using ALTER TABLE But I get the following error when I try to do SELECT * ORC does not support type conversion from file type string (14) to reader type struct Any ideas how can I set the proper type for this struct column? I've tried also to store the avro file in HDFS, but later I don't know how to create an external hive table that can read the schema from the .avro file.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache NiFi