Support Questions

alecssander · ‎11-26-2024

hello,
I'm trying to read a parquet file using the ConvertRecord processor and I'm getting the error:

ConvertRecord[id=e599cd8f-9a1d-3134-4daf-af7bc91cdd57] Failed to process FlowFile[filename=objectTable_tract_5074_DC2_2_2i_runs_DP0_2_v23_0_1_PREOPS-905_step3_31_20220314T212509Z-part0_output.parquet]; will route to failure: org.apache.avro.SchemaParseException: Illegal initial character: 0

In my file the columns are numeric and the first one starts with 0 (zero).

SAMSAL · ‎11-27-2024

Sure, If you come up with a solution different than what I suggested please do post about it so it can help others who might run into similar situation. good luck

View solution in original post

DianaTorres · ‎11-26-2024

@alecssander Welcome to the Cloudera Community!

To help you get the best possible solution, I have tagged our NiFi experts @MattWho @SAMSAL who may be able to assist you further.

Please keep us updated on your post, and we hope you find a satisfactory solution to your query.

Regards,

Diana Torres,
Community Moderator

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Community Guidelines
How to use the forum

SAMSAL · ‎11-26-2024

Hi ,

Can you provide more explanation\screenshot of your dataflow and the configuration set on each processor\controller service? Also if you can provide sample data that can be converted to parquet which can then reproduce the error that would be helpful as well.

Thanks

alecssander · ‎11-26-2024

The process is simple, I take a parquet file from a bucket and try to insert it into a postresql database:

My file has 301 columns ranging from 0 to 300 with more than 280 lines:

SAMSAL · ‎11-26-2024

It seems like whenever dealing with parquet reader\writer services , those services are trying to use Avro schema, possibly to make sense of the data when passing it along to the target processors ( like PutDatabaseRecord ) since parquet is in binary format. The problem with this is that Avro has limitation on how fields should be called. Actually this is reported as a bug in Jira but it doesnt seem to have been resolved. According to the ticket Avro fields should only start with the following characters [A-Za-z_] . Given this , it seems you have to think of some workaround to address this issue since Nifi doesnt provide a solution out of the box. you can check my answer to this post as an option. Basically, you can use python to read the parquet content and transfer to another format (such as CSV as an example) then pass the CSV to the PutDatabaseRecord. This should work as I have tested it. Since you seem to be using Nifi 2.0 , you can develop python extension processor for this instead of ExecuteStreamCommand mentioned in the post.

Hope that helps. If it does, please accept the solution.

Thanks

alecssander · ‎11-27-2024

Thanks for the support

SAMSAL · ‎11-27-2024

Sure, If you come up with a solution different than what I suggested please do post about it so it can help others who might run into similar situation. good luck

DianaTorres · ‎12-09-2024

@alecssander Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.

Regards,

Diana Torres,
Community Moderator

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Community Guidelines
How to use the forum

Cloudera Community

Support Questions

Problem when trying to convert parquet file

Write / Read Parquet File in Spark

HDF/NiFi to convert row-formatted text files to co...

How can I sort record in parquet file?

Convert Parquet file to CSV using NiFi

Converting a Large JSON File into CSV

Put data from Parquet files into DynamoDB with NiF...

Converting CSV Files to Apache Hive Tables with Ap...

Convert to parquet format

Spark to parse Weblogs text files and write output...

Trying to move files via ExecuteStreamCommand