Member since
11-21-2023
4
Posts
0
Kudos Received
0
Solutions
11-30-2023
10:29 AM
Thanks for your answer SAMSAL. I was hoping to be able to use a processor directly to add my schema but if that's not possible, I'll use a script. As well as changing the names of several columns, I also need to change the type of some of them, as some are of type "large_string" and one is of type "bool". I had this error for example when I tried to add the schema (retrieved with Python code from my Parquet file) to the ConvertRecord processor: 'schema-text' validated against '{ "type": "record", "name": "de_train", "fields": [ { "name": "cell_type", "type": "string" }, { "name": "sm_name", "type": "string" }, { "name": "sm_lincs_id", "type": "string" }, { "name": "SMILES", "type": "string" }, { "name": "control", "type": "bool" }, { "name": "A1BG", "type": "double" }, { "name": "A1BG_AS1", "type": "double" }, { "name": "A2M", "type": "double" }, { "name": "A2M_AS1", "type": "double" }, { "name": "A2MP1", "type": "double" } ] }' is invalid because Not a valid Avro Schema: "bool" is not a defined name. The type of the "control" field must be a defined name or a {"type": ...} expression. I had to change "large_string" to "string" and "bool" to "boolean" to get no more errors in the AvroSchemaRegistry. So how do I change the types in a Parquet file? Is it possible to do this from the dataframe as well as for names?
... View more
11-25-2023
09:05 AM
Sorry I haven't had much time to visit the site in the last few days. 🙂
... View more
11-25-2023
09:02 AM
Thanks for the video! 🙂 It solved one of my problems: in fact, as I have a list of items to insert, I need to use PutDynamoDBRecord and not PutDynamoDB. So I can insert data after converting one of my Parquet files. But I still have a problem with another file. Here's the error: UTC ERROR ConvertRecord[id=92018f18-018b-1000-fd6f-0a3466abe069] Failed to process FlowFile[filename=mini_de_train.parquet]; will route to failure: org.apache.avro.SchemaParseException: Illegal character in: A1BG-AS1 There are some characters that are not accepted (like "-" in "A1BG-AS1") so I've changed them all in the schema, the beginning of which is shown below (there are more than 18,000 columns): So I tried to add it via an UpdateAttribute processor before the ConvertRecord where I put the name of the schema (de_train), and an AvroSchemaRegistry used by my JsonRecordSetWriter which calls this schema : But after these modifications I still get the same error: What am I doing wrong?
... View more
11-21-2023
07:10 AM
Hello, I want to integrate data into DynamoDB from Parquet files using NiFi (which I run in a Docker container). I fetch my files from AWS S3 using the ListS3 and FetchS3Object processors and then, as I understand it, convert the files to JSON using ConvertRecord and send the data using PutDynamoDB. I've tried configuring the AvroSchemaRegistry, ParquetReader and JsonRecordSetWriter controllers, but I'm obviously doing it wrong... I've tried using an UpdateAttribute processor too but nothing works. I don't really understand if I have to add the schema and where to add it. Thanks to anyone who can help me!
... View more
Labels:
- Labels:
-
Apache NiFi