Support Questions

Find answers, ask questions, and share your expertise

Unable to upload JSON file using PutBigQuery

avatar
New Contributor

So I'm making a flow that extract data from elasticsearch using SearchElasticsearch processor and dump the data into my table in BigQuery using PutBigQuery processor.
The data extracted from elasticsearch is json with new line as delimiter, like this:

 

 

 

{"_index":"twitter","_id":"123", "_source":{"message":"bla bla bla", type:"tweet"}}
{"_index":"twitter","_id":"124", "_source":{"message":"blalalala", type:"tweet"}}

 

 

 

And then I'm doing some cleaning to change some column name and make all the hits into one json and write it as pretty json like:

 

 

[

{"_index":"twitter",

"_id":"123",

"_source":

   {"tweet":"bla bla bla",

    "type":"tweet"}},
{"_index":"twitter",

"_id":"124",

"_source":

  {"tweet":"blalalala",

  type:"tweet"}}

 

 

And then, I'm trying to convert flowfile into JSON with my defined schema, so I used UpdateAtribute so my flowfile has my schema name of atribute.

 

Fahmihamzah84_0-1679135749564.png

Then I used ConvertRecord so each record is using the same avro schema (Because the data retrieved from elasticsearch has different columns, there are field that are contained in a data and some are not) here's the configuration:

 

Fahmihamzah84_1-1679135979938.png

Next, I used UpdateRecord and applied escapeXML() function on "message" field's.

The final processor in this flow is PutBigquery:

Fahmihamzah84_2-1679136655721.png

And when I run this processor it raised an error with this message:

Fahmihamzah84_3-1679136956034.png

and when I run this processor it raised an error with this message:

What do you guys think is incorrect about this entire process?
Here's the schema and example of data:
avro schema
BigQuery schema
Flowfile before PutBigquery example

I sincerely appreciate all the comments; but, if more explanation is required, just leave a comment below.

1 REPLY 1

avatar

@Fahmihamzah84 This appears to be an issue with your schema.  The BigQuery error is suggesting an issue trying to cast a string into a collection (array/list/ect).   It's hard to tell which array may be causing the issue as there are many.   My suggestion is to set the processor to log level DEBUG and see if you can get more verbose error.   This will help you figure out which field or fields is the culprit.      Keep in mind it could be one of the empty arrays too.     I do not suggest the following as a solution just as path to figuring out where the problem is.  Sometimes when i have issues with type casting,  i make everything a string temporarily and for development.  If you do this carefully one at a time, when the error goes away, you can determine which field it is.   This also helps you identify a working state for your flow and allow you to work from that operational base to find solution for the end schema being the format you need.