Support Questions

Find answers, ask questions, and share your expertise

ExcelReader Exception

avatar
New Contributor

Hi everyone
I try to split to records an Excel sheet, using ExcelReader. But get this error:

Spoiler
SplitRecord[id=01921045-9bc9-1353-b2a2-96dfb4f1e8af] Failed to split FlowFile[filename=test.xlsx]: org.apache.nifi.processor.exception.ProcessException: IOException thrown from SplitRecord[id=01921045-9bc9-1353-b2a2-96dfb4f1e8af]: java.io.IOException: org.apache.nifi.serialization.MalformedRecordException: Read next Record from Excel XLSX failed on row 546 in sheet reportdata1
- Caused by: java.io.IOException: org.apache.nifi.serialization.MalformedRecordException: Read next Record from Excel XLSX failed on row 546 in sheet reportdata1
- Caused by: org.apache.nifi.serialization.MalformedRecordException: Read next Record from Excel XLSX failed on row 546 in sheet reportdata1
- Caused by: java.lang.NumberFormatException: For input string: ""


Indeed, there are 545 filled lines on the sheet, but why ExcelReader doesn't stop, when find empty line?

2 REPLIES 2

avatar
Community Manager

@Mikhai, Welcome to our community! To help you get the best possible answer, I have tagged in our NiFi experts @SAMSAL @MattWho  who may be able to assist you further.

Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Super Guru

Hi @Mikhai ,

Its hard to say what is going on without looking at the data itself or seeing the ExcelReader Configuration. I know providing the data is not easy but if you can replicate the issue using dummy data then please share. Also if you can provide more details on how you configured the ExcelReader, for example are you using custom schema or infering the schema?

I would try the following:

1- Try to find table boundary in excel and delete empty rows. If you cant then for sake of testing copy the table with the rows you need into new excel and see if that works.

2- If ExcelReader works with 545 rows , then I will try and provide custom schema - if not provided - and try to set some of the fields where there should be a value to not allow null. Maybe by doing so it will help the ExcelReader not to import empty rows.

I tried to use ExcelReader before but ran into issues when the excel has some formula columns because of a bug in the reader itself. Im not sure if those issues were addressed but as workaround I used Python Extension to develop custom processor that takes excel input and convert into Json using Pandas library. This might be an option to consider if you are still having problems with the ExcelReader service but you have to use Nifi 2.0 version in order to use python extension.

If that helps please accept the solution,

Thanks