Created 05-14-2017 11:08 PM
Hello,
I am trying to convert an incoming html file (from invokeHTTP processor) to a xml file with python using the ExecuteScript processor. The content of the flowfile is the html page. I used the following python code:
import sys import traceback from java.nio.charset import StandardCharsets from org.apache.commons.io import IOUtils from org.apache.nifi.processor.io import StreamCallback from org.python.core.util import StringUtil from lxml import html,etree class TransformCallback(StreamCallback): def __init__(self): pass def process(self, inputStream, outputStream): try: # Read input FlowFile content input_text = IOUtils.toString(inputStream, StandardCharsets.UTF_8) # Write output content html_file = html.fromstring(input_text) xml_file = etree.tostring(html_file) outputStream.write(StringUtil.toBytes(xml_file)) except: traceback.print_exc(file=sys.stdout) raise flowFile = session.get() if flowFile != None: flowFile = session.write(flowFile, TransformCallback()) # Finish by transferring the FlowFile to an output relationship session.transfer(flowFile, REL_SUCCESS)
The processor did not show any incoming file and throws the following exception:
2017-05-15 00:56:12,422 WARN [StandardProcessScheduler Thread-6] o.a.n.controller.StandardProcessorNode Timed out while waiting for OnScheduled of 'ExecuteScript' processor to finish. An attempt is made to cancel the task via Thread.interrupt(). However it does not guarantee that the task will be canceled since the code inside current OnScheduled operation may have been written to ignore interrupts which may result in a runaway thread. This could lead to more issues, eventually requiring NiFi to be restarted. This is usually a bug in the target Processor 'ExecuteScript[id=015c113e-74cc-13ad-e411-f861a69dbeec]' that needs to be documented, reported and eventually fixed. 2017-05-15 00:56:12,422 ERROR [StandardProcessScheduler Thread-6] o.a.nifi.processors.script.ExecuteScript ExecuteScript[id=015c113e-74cc-13ad-e411-f861a69dbeec] ExecuteScript[id=015c113e-74cc-13ad-e411-f861a69dbeec] failed to invoke @OnScheduled method due to java.lang.RuntimeException: Timed out while executing one of processor's OnScheduled task.; processor will not be scheduled to run for 30 seconds: java.lang.RuntimeException: Timed out while executing one of processor's OnScheduled task. 2017-05-15 00:56:12,424 ERROR [StandardProcessScheduler Thread-6] o.a.nifi.processors.script.ExecuteScript java.lang.RuntimeException: Timed out while executing one of processor's OnScheduled task. at org.apache.nifi.controller.StandardProcessorNode.invokeTaskAsCancelableFuture(StandardProcessorNode.java:1447) ~[na:na] at org.apache.nifi.controller.StandardProcessorNode.access$100(StandardProcessorNode.java:100) ~[na:na] at org.apache.nifi.controller.StandardProcessorNode$1.run(StandardProcessorNode.java:1275) ~[na:na] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_121] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_121] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_121] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_121] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121] Caused by: java.util.concurrent.TimeoutException: null at java.util.concurrent.FutureTask.get(FutureTask.java:205) [na:1.8.0_121] at org.apache.nifi.controller.StandardProcessorNode.invokeTaskAsCancelableFuture(StandardProcessorNode.java:1432) ~[na:na] ... 9 common frames omitted 2017-05-15 00:56:12,424 ERROR [StandardProcessScheduler Thread-6] o.a.n.controller.StandardProcessorNode Failed to invoke @OnScheduled method due to java.lang.RuntimeException: Timed out while executing one of processor's OnScheduled task. java.lang.RuntimeException: Timed out while executing one of processor's OnScheduled task. at org.apache.nifi.controller.StandardProcessorNode.invokeTaskAsCancelableFuture(StandardProcessorNode.java:1447) ~[na:na] at org.apache.nifi.controller.StandardProcessorNode.access$100(StandardProcessorNode.java:100) ~[na:na] at org.apache.nifi.controller.StandardProcessorNode$1.run(StandardProcessorNode.java:1275) ~[na:na] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_121] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_121] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_121] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_121] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121] Caused by: java.util.concurrent.TimeoutException: null at java.util.concurrent.FutureTask.get(FutureTask.java:205) [na:1.8.0_121] at org.apache.nifi.controller.StandardProcessorNode.invokeTaskAsCancelableFuture(StandardProcessorNode.java:1432) ~[na:na] ... 9 common frames omitted
Do you have an idea?
Thank you in advance!
Kind Regards,
Jan
Created 05-15-2017 01:40 PM
I am having trouble importing the "etree" module, I have tried with brew-installed Python 2.7 and Anaconda 2.7 (where I believe the etree submodule is part of "xml" not "lxml"). Do I need any additional configuration?
Looking in the lxml package, I see some native libraries (.so files, e.g.). If lxml is a native library, Jython (the "python" script engine in ExecuteScript) will not be able to load/execute it. All imported modules (and their dependencies) must be pure Python (no native code like CPython for example) for Jython to execute the script successfully. Perhaps there is a different library you can use?
If you don't have a requirement on Jython/Python, consider using Javascript, Groovy, or Clojure instead. Their Module Directory allows you to use third-party Java libraries to accomplish this conversion, such as NekoHTML, JTidy, or JSoup.