<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Unable to upload JSON file using PutBigQuery in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-upload-JSON-file-using-PutBigQuery/m-p/366509#M239596</link>
    <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/104090"&gt;@Fahmihamzah84&lt;/a&gt;&amp;nbsp;This appears to be an issue with your schema.&amp;nbsp; The BigQuery error is suggesting an issue trying to cast a string into a collection (array/list/ect).&amp;nbsp; &amp;nbsp;It's hard to tell which array may be causing the issue as there are many.&amp;nbsp; &amp;nbsp;My suggestion is to set the processor to log level DEBUG and see if you can get more verbose error.&amp;nbsp; &amp;nbsp;This will help you figure out which field or fields is the culprit.&amp;nbsp; &amp;nbsp; &amp;nbsp; Keep in mind it could be one of the empty arrays too.&amp;nbsp; &amp;nbsp; &amp;nbsp;I do not suggest the following as a solution just as path to figuring out where the problem is.&amp;nbsp; Sometimes when i have issues with type casting,&amp;nbsp; i make everything a string temporarily and for development.&amp;nbsp; If you do this carefully one at a time, when the error goes away, you can determine which field it is.&amp;nbsp; &amp;nbsp;This also helps you identify a working state for your flow and allow you to work from that operational base to find solution for the end schema being the format you need.&lt;/P&gt;</description>
    <pubDate>Mon, 20 Mar 2023 12:48:31 GMT</pubDate>
    <dc:creator>steven-matison</dc:creator>
    <dc:date>2023-03-20T12:48:31Z</dc:date>
    <item>
      <title>Unable to upload JSON file using PutBigQuery</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-upload-JSON-file-using-PutBigQuery/m-p/366406#M239571</link>
      <description>&lt;P&gt;So I'm making a flow that extract data from elasticsearch using SearchElasticsearch processor and dump the data into my table in BigQuery using PutBigQuery processor.&lt;BR /&gt;The data extracted from elasticsearch is json with new line as delimiter, like this:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;{"_index":"twitter","_id":"123", "_source":{"message":"bla bla bla", type:"tweet"}}
{"_index":"twitter","_id":"124", "_source":{"message":"blalalala", type:"tweet"}}&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And then I'm doing some cleaning to change some column name and make all the hits into one json and write it as pretty json like:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;[

{"_index":"twitter",

"_id":"123",

"_source":

   {"tweet":"bla bla bla",

    "type":"tweet"}},
{"_index":"twitter",

"_id":"124",

"_source":

  {"tweet":"blalalala",

  type:"tweet"}}&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And then, I'm trying to convert flowfile into JSON with my defined schema, so I used UpdateAtribute so my flowfile has my schema name of atribute.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Fahmihamzah84_0-1679135749564.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/37018i327029303A224907/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Fahmihamzah84_0-1679135749564.png" alt="Fahmihamzah84_0-1679135749564.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;Then I used ConvertRecord so each record is using the same avro schema (&lt;SPAN class="Y2IQFc"&gt;Because the data retrieved from elasticsearch has different columns, there are field that are contained in a data and some are not&lt;/SPAN&gt;) here's the configuration:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Fahmihamzah84_1-1679135979938.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/37019i84A228DFBBCE37FE/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Fahmihamzah84_1-1679135979938.png" alt="Fahmihamzah84_1-1679135979938.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;Next, I used UpdateRecord and applied escapeXML() function on "message" field's.&lt;/P&gt;
&lt;P&gt;The final processor in this flow is PutBigquery:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Fahmihamzah84_2-1679136655721.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/37020i07565A3813D6A723/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Fahmihamzah84_2-1679136655721.png" alt="Fahmihamzah84_2-1679136655721.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;And when I run this processor it raised an error with this message:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Fahmihamzah84_3-1679136956034.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/37021i58C08E76B6F2879D/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Fahmihamzah84_3-1679136956034.png" alt="Fahmihamzah84_3-1679136956034.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;and when I run this processor it raised an error with this message:&lt;/P&gt;
&lt;P&gt;What do you guys think is incorrect about this entire process?&lt;BR /&gt;Here's the schema and example of data:&lt;BR /&gt;&lt;A href="https://cl1p.net/myschema" target="_blank" rel="noopener"&gt;avro schema&lt;/A&gt;&lt;BR /&gt;&lt;A href="https://cl1p.net/myschema" target="_blank" rel="noopener"&gt;BigQuery schema&lt;/A&gt;&lt;BR /&gt;&lt;A href="https://cl1p.net/example_datas" target="_blank" rel="noopener"&gt;Flowfile before PutBigquery example&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;I sincerely appreciate all the comments; but, if more explanation is required, just leave a comment below.&lt;/P&gt;</description>
      <pubDate>Mon, 20 Mar 2023 18:58:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Unable-to-upload-JSON-file-using-PutBigQuery/m-p/366406#M239571</guid>
      <dc:creator>Fahmihamzah84</dc:creator>
      <dc:date>2023-03-20T18:58:16Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to upload JSON file using PutBigQuery</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-upload-JSON-file-using-PutBigQuery/m-p/366509#M239596</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/104090"&gt;@Fahmihamzah84&lt;/a&gt;&amp;nbsp;This appears to be an issue with your schema.&amp;nbsp; The BigQuery error is suggesting an issue trying to cast a string into a collection (array/list/ect).&amp;nbsp; &amp;nbsp;It's hard to tell which array may be causing the issue as there are many.&amp;nbsp; &amp;nbsp;My suggestion is to set the processor to log level DEBUG and see if you can get more verbose error.&amp;nbsp; &amp;nbsp;This will help you figure out which field or fields is the culprit.&amp;nbsp; &amp;nbsp; &amp;nbsp; Keep in mind it could be one of the empty arrays too.&amp;nbsp; &amp;nbsp; &amp;nbsp;I do not suggest the following as a solution just as path to figuring out where the problem is.&amp;nbsp; Sometimes when i have issues with type casting,&amp;nbsp; i make everything a string temporarily and for development.&amp;nbsp; If you do this carefully one at a time, when the error goes away, you can determine which field it is.&amp;nbsp; &amp;nbsp;This also helps you identify a working state for your flow and allow you to work from that operational base to find solution for the end schema being the format you need.&lt;/P&gt;</description>
      <pubDate>Mon, 20 Mar 2023 12:48:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Unable-to-upload-JSON-file-using-PutBigQuery/m-p/366509#M239596</guid>
      <dc:creator>steven-matison</dc:creator>
      <dc:date>2023-03-20T12:48:31Z</dc:date>
    </item>
  </channel>
</rss>

