Support Questions

Find answers, ask questions, and share your expertise

Convert Excel to CSV Processor returns error

avatar

I am using ConvertExcelToCSVProcessor in NIFI to convert.xlsx file to csv. However I see the processor is throwing the following error. I have atatched the image of my flow along with it too,

87accf18ba] ConvertExcelToCSVProcessor[id=ba4c3f67-dd21-1af9-95a3-1887accf18ba] failed to process session due to org.apache.nifi.processor.exception.FlowFileHandlingException: StandardFlowFileRecord[uuid=9ef1d06a-c2a4-4f7a-826f-ebfc33f3eef0,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1506002021780-183, container=default, section=183], offset=0, length=912278],offset=0,name=4289436879924664,size=912278] transfer relationship not specified: {} org.apache.nifi.processor.exception.FlowFileHandlingException: StandardFlowFileRecord[uuid=9ef1d06a-c2a4-4f7a-826f-ebfc33f3eef0,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1506002021780-183, container=default, section=183], offset=0, length=912278],offset=0,name=4289436879924664,size=912278] transfer relationship not specified at org.apache.nifi.controller.repository.StandardProcessSession.checkpoint(StandardProcessSession.java:248) at org.apache.nifi.controller.repository.StandardProcessSession.commit(StandardProcessSession.java:318) at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:28) at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1120) at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:147) at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47) at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748)

40459-flowimg.png

7 REPLIES 7

avatar
Guru

Hi @Lovelesh Chawla,

It looks like someone has encountered a similar issue (https://stackoverflow.com/questions/45792912/nifi-convertexceltocsvprocessor-error). In /logs/nifi-app.log, can you provide the full stacktrace of the error? Have you confirmed that the data going into the ConvertExcelToCSV processor is the proper format (.xlsx (XSSF 2007 OOXML file format) Excel documents and not older .xls (HSSF '97(-2007) file format) documents).

avatar

I saw this post and one other however neither of them have a resolution. I am using ,xlsx file. Please see the attachedjax-shipment-profile-report-monday-18-september-20.zip sample file.

avatar
Guru

I was able to reproduce the issue using the sample file you provided. If I save that .xlsx file (without making any modifications) using my Excel (Microsoft Excel for Mac Version 15.18) and use that file instead, the ConvertExcelToCSV processor has no errors. Please see attached file: jax-shipment-profile-report-monday-18-september-20.zip

Trying to determine what difference is causing the error.

avatar
Guru

Looking more closely at nifi-app.log, I see the following errors:

2017-09-21 13:58:36,314 ERROR [Timer-Driven Process Thread-9] o.a.n.p.poi.ConvertExcelToCSVProcessor ConvertExcelToCSVProcessor[id=a4cfc1b5-015e-1000-b59d-535f6969973d] Failed to process incoming Excel document: java.lang.UnsupportedOperationException: Only .xlsx Excel 2007 OOXML files are supported
java.lang.UnsupportedOperationException: Only .xlsx Excel 2007 OOXML files are supported
at org.apache.nifi.processors.poi.ConvertExcelToCSVProcessor$1.process(ConvertExcelToCSVProcessor.java:195)
at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2136)
at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2106)
at org.apache.nifi.processors.poi.ConvertExcelToCSVProcessor.onTrigger(ConvertExcelToCSVProcessor.java:151)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1120)
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:147)
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.poi.openxml4j.exceptions.InvalidFormatException: Package should contain a content type part [M1.13]
at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:197)
at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:696)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:280)
at org.apache.nifi.processors.poi.ConvertExcelToCSVProcessor$1.process(ConvertExcelToCSVProcessor.java:159)
... 15 common frames omitted
2017-09-21 13:58:36,430 ERROR [Timer-Driven Process Thread-9] o.a.n.p.poi.ConvertExcelToCSVProcessor ConvertExcelToCSVProcessor[id=a4cfc1b5-015e-1000-b59d-535f6969973d] Failed to process incoming Excel document: java.lang.NullPointerException
java.lang.NullPointerException: null
at org.apache.nifi.processors.poi.ConvertExcelToCSVProcessor.handleExcelSheet(ConvertExcelToCSVProcessor.java:249)
at org.apache.nifi.processors.poi.ConvertExcelToCSVProcessor.access$000(ConvertExcelToCSVProcessor.java:72)
at org.apache.nifi.processors.poi.ConvertExcelToCSVProcessor$1.process(ConvertExcelToCSVProcessor.java:190)
at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2136)
at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2106)
at org.apache.nifi.processors.poi.ConvertExcelToCSVProcessor.onTrigger(ConvertExcelToCSVProcessor.java:151)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1120)
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:147)
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

avatar

Were you able to see whats the difference in the files ?

That file is automatically generated from a vendor app. I need to be able to convert it to csv so I can use it in hive.

avatar
Guru

I changed the extensions on both .xlsx files to .zip. Unzipping them reveals the folder structure of those files. Going through the included XML files, there were some differences but nothing that stood out to cause these errors.

Do you know how the vendor generates the Excel files? Is it possible these files are really .xls files but just have the .xlsx file extension? Do you know what version of Excel they use?

avatar

We do not have access to the vendor code to see how it is really generated. There are option to generate .xls or .xlsx file. So I believe it should be .xlsx. I do not know what version of excel to they use either. The application is black box