- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Problem when trying to convert parquet file
- Labels:
-
Apache NiFi
Created on 11-26-2024 05:17 AM - edited 11-26-2024 05:20 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hello,
I'm trying to read a parquet file using the ConvertRecord processor and I'm getting the error:
ConvertRecord[id=e599cd8f-9a1d-3134-4daf-af7bc91cdd57] Failed to process FlowFile[filename=objectTable_tract_5074_DC2_2_2i_runs_DP0_2_v23_0_1_PREOPS-905_step3_31_20220314T212509Z-part0_output.parquet]; will route to failure: org.apache.avro.SchemaParseException: Illegal initial character: 0
In my file the columns are numeric and the first one starts with 0 (zero).
Created 11-27-2024 11:12 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sure, If you come up with a solution different than what I suggested please do post about it so it can help others who might run into similar situation. good luck
Created 11-26-2024 07:11 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@alecssander Welcome to the Cloudera Community!
To help you get the best possible solution, I have tagged our NiFi experts @MattWho @SAMSAL who may be able to assist you further.
Please keep us updated on your post, and we hope you find a satisfactory solution to your query.
Regards,
Diana Torres,Community Moderator
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Created 11-26-2024 10:52 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi ,
Can you provide more explanation\screenshot of your dataflow and the configuration set on each processor\controller service? Also if you can provide sample data that can be converted to parquet which can then reproduce the error that would be helpful as well.
Thanks
Created on 11-26-2024 12:36 PM - edited 11-26-2024 12:37 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The process is simple, I take a parquet file from a bucket and try to insert it into a postresql database:
My file has 301 columns ranging from 0 to 300 with more than 280 lines:
Created on 11-26-2024 02:03 PM - edited 11-26-2024 02:04 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It seems like whenever dealing with parquet reader\writer services , those services are trying to use Avro schema, possibly to make sense of the data when passing it along to the target processors ( like PutDatabaseRecord ) since parquet is in binary format. The problem with this is that Avro has limitation on how fields should be called. Actually this is reported as a bug in Jira but it doesnt seem to have been resolved. According to the ticket Avro fields should only start with the following characters [A-Za-z_] . Given this , it seems you have to think of some workaround to address this issue since Nifi doesnt provide a solution out of the box. you can check my answer to this post as an option. Basically, you can use python to read the parquet content and transfer to another format (such as CSV as an example) then pass the CSV to the PutDatabaseRecord. This should work as I have tested it. Since you seem to be using Nifi 2.0 , you can develop python extension processor for this instead of ExecuteStreamCommand mentioned in the post.
Hope that helps. If it does, please accept the solution.
Thanks
Created 11-27-2024 09:20 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the support
Created 11-27-2024 11:12 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sure, If you come up with a solution different than what I suggested please do post about it so it can help others who might run into similar situation. good luck
Created 12-09-2024 05:02 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@alecssander Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.
Regards,
Diana Torres,Community Moderator
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
