Created on 09-21-2017 02:05 PM - edited 08-17-2019 11:01 PM
I am using ConvertExcelToCSVProcessor in NIFI to convert.xlsx file to csv. However I see the processor is throwing the following error. I have atatched the image of my flow along with it too,
87accf18ba] ConvertExcelToCSVProcessor[id=ba4c3f67-dd21-1af9-95a3-1887accf18ba] failed to process session due to org.apache.nifi.processor.exception.FlowFileHandlingException: StandardFlowFileRecord[uuid=9ef1d06a-c2a4-4f7a-826f-ebfc33f3eef0,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1506002021780-183, container=default, section=183], offset=0, length=912278],offset=0,name=4289436879924664,size=912278] transfer relationship not specified: {} org.apache.nifi.processor.exception.FlowFileHandlingException: StandardFlowFileRecord[uuid=9ef1d06a-c2a4-4f7a-826f-ebfc33f3eef0,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1506002021780-183, container=default, section=183], offset=0, length=912278],offset=0,name=4289436879924664,size=912278] transfer relationship not specified at org.apache.nifi.controller.repository.StandardProcessSession.checkpoint(StandardProcessSession.java:248) at org.apache.nifi.controller.repository.StandardProcessSession.commit(StandardProcessSession.java:318) at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:28) at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1120) at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:147) at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47) at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748)
Created 09-21-2017 04:11 PM
Hi @Lovelesh Chawla,
It looks like someone has encountered a similar issue (https://stackoverflow.com/questions/45792912/nifi-convertexceltocsvprocessor-error). In /logs/nifi-app.log, can you provide the full stacktrace of the error? Have you confirmed that the data going into the ConvertExcelToCSV processor is the proper format (.xlsx (XSSF 2007 OOXML file format) Excel documents and not older .xls (HSSF '97(-2007) file format) documents).
Created 09-21-2017 04:53 PM
I saw this post and one other however neither of them have a resolution. I am using ,xlsx file. Please see the attachedjax-shipment-profile-report-monday-18-september-20.zip sample file.
Created 09-21-2017 05:36 PM
I was able to reproduce the issue using the sample file you provided. If I save that .xlsx file (without making any modifications) using my Excel (Microsoft Excel for Mac Version 15.18) and use that file instead, the ConvertExcelToCSV processor has no errors. Please see attached file: jax-shipment-profile-report-monday-18-september-20.zip
Trying to determine what difference is causing the error.
Created 09-21-2017 06:04 PM
Looking more closely at nifi-app.log, I see the following errors:
2017-09-21 13:58:36,314 ERROR [Timer-Driven Process Thread-9] o.a.n.p.poi.ConvertExcelToCSVProcessor ConvertExcelToCSVProcessor[id=a4cfc1b5-015e-1000-b59d-535f6969973d] Failed to process incoming Excel document: java.lang.UnsupportedOperationException: Only .xlsx Excel 2007 OOXML files are supported java.lang.UnsupportedOperationException: Only .xlsx Excel 2007 OOXML files are supported at org.apache.nifi.processors.poi.ConvertExcelToCSVProcessor$1.process(ConvertExcelToCSVProcessor.java:195) at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2136) at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2106) at org.apache.nifi.processors.poi.ConvertExcelToCSVProcessor.onTrigger(ConvertExcelToCSVProcessor.java:151) at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1120) at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:147) at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47) at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.poi.openxml4j.exceptions.InvalidFormatException: Package should contain a content type part [M1.13] at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:197) at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:696) at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:280) at org.apache.nifi.processors.poi.ConvertExcelToCSVProcessor$1.process(ConvertExcelToCSVProcessor.java:159) ... 15 common frames omitted 2017-09-21 13:58:36,430 ERROR [Timer-Driven Process Thread-9] o.a.n.p.poi.ConvertExcelToCSVProcessor ConvertExcelToCSVProcessor[id=a4cfc1b5-015e-1000-b59d-535f6969973d] Failed to process incoming Excel document: java.lang.NullPointerException java.lang.NullPointerException: null at org.apache.nifi.processors.poi.ConvertExcelToCSVProcessor.handleExcelSheet(ConvertExcelToCSVProcessor.java:249) at org.apache.nifi.processors.poi.ConvertExcelToCSVProcessor.access$000(ConvertExcelToCSVProcessor.java:72) at org.apache.nifi.processors.poi.ConvertExcelToCSVProcessor$1.process(ConvertExcelToCSVProcessor.java:190) at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2136) at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2106) at org.apache.nifi.processors.poi.ConvertExcelToCSVProcessor.onTrigger(ConvertExcelToCSVProcessor.java:151) at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1120) at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:147) at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47) at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
Created 09-21-2017 06:17 PM
Were you able to see whats the difference in the files ?
That file is automatically generated from a vendor app. I need to be able to convert it to csv so I can use it in hive.
Created 09-22-2017 03:40 PM
I changed the extensions on both .xlsx files to .zip. Unzipping them reveals the folder structure of those files. Going through the included XML files, there were some differences but nothing that stood out to cause these errors.
Do you know how the vendor generates the Excel files? Is it possible these files are really .xls files but just have the .xlsx file extension? Do you know what version of Excel they use?
Created 09-22-2017 07:22 PM
We do not have access to the vendor code to see how it is really generated. There are option to generate .xls or .xlsx file. So I believe it should be .xlsx. I do not know what version of excel to they use either. The application is black box