<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Need Help infering an avro schema for a json file in NiFi in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Need-Help-infering-an-avro-schema-for-a-json-file-in-NiFi/m-p/179975#M61486</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I am trying to create a flow in NiFi that takes a valid json file and puts it directly into a hive table using the PutHiveStreaming processor. My json looks something like the following:&lt;/P&gt;&lt;PRE&gt;{
 "Raw_Json": {
  "SystemInfo": {
   "Id": "a string ID",
   "TM": null,
   "CountID": "a string ID",
   "Topic": null,
   "AccountID": "some number",
   "StationID": "some number",
   "STime": "some Timestamp",
   "ETime": "some Timestamp"
  },
  "Profile": {
   "ID": "ID number",
   "ProductID": "Some Number",
   "City": "City Name",
   "State": "State Name",
   "Number": "XXX-XXX-XXXX",
   "ExtNumber": null,
   "Unit": null,
   "Name": "Person Name",
   "Service": "Purchase",
   "AddrID": "00000000",
   "Products": {
    "Product": [{
     "Code": "CODE",
     "Description": "some description"
     
    },
    {
     "Code": "CODE",
     "Description": "some description"
     
    },
    {
     "Code": "CODE",
     "Description": "some description"
     
    },
    {
     "Code": "CODE",
     "Description": "some description"
    
    },
    {
     "Code": "CODE",
     "Description": "some description"
     
    },
    {
     "Code": "CODE",
     "Description": "some description"
     
    },
    {
     "Code": "CODE",
     "Description": "some description"
     
    },
    {
     "Code": "CODE",
     "Description": "some description"
     
    },
    {
     "Code": "CODE",
     "Description": "some description"
     
    },
    {
     "Code": "CODE",
     "Description": "some description"
    }]
   }
  },
  "Total": {
   "Amount": "some amount",
   "Delivery": "some address",
   "Estimate": "some amount",
   "Tax": null,
   "Delivery_Type": null
   
  }
  
 },
 "partition_date":"2017-05-19"
}
&lt;/PRE&gt;&lt;P&gt;I am getting the json, using the InferAvroSchema processor and from there converting the json to avro format by using the inferred avro schema and sending it into the PutHiveStreaming processor. My Flow looks something like this:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="15577-flowexample.jpg" style="width: 1032px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/19341iA1C3EBC758CE51C2/image-size/medium?v=v2&amp;amp;px=400" role="button" title="15577-flowexample.jpg" alt="15577-flowexample.jpg" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;The main goal is that I want all of the "Raw_Json" column to be dumped into one column in the hive table and the table will be partitioned by the "partition_date" column which will be the second column of the table. The problem is that for some reason NiFi is having problems inferring the nested json from the "Raw_Json" column and is dumping it like Null on the table as shown below:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="15578-tableexample.jpg" style="width: 925px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/19342iCE322F2EE7F60F23/image-size/medium?v=v2&amp;amp;px=400" role="button" title="15578-tableexample.jpg" alt="15578-tableexample.jpg" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Does anyone know how could I make NiFi read the entire nested Json of the "Raw_Json" column as a string column and send it to the hive table? How could I create my own avro schema for it to do this? My main goal would be that the Raw_Json can be read as a string column. Any insight or ideas on how to fix this issue would be greatly appreciated!&lt;/P&gt;</description>
    <pubDate>Sun, 18 Aug 2019 09:15:43 GMT</pubDate>
    <dc:creator>Adda_Fuentes2</dc:creator>
    <dc:date>2019-08-18T09:15:43Z</dc:date>
    <item>
      <title>Need Help infering an avro schema for a json file in NiFi</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Need-Help-infering-an-avro-schema-for-a-json-file-in-NiFi/m-p/179975#M61486</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I am trying to create a flow in NiFi that takes a valid json file and puts it directly into a hive table using the PutHiveStreaming processor. My json looks something like the following:&lt;/P&gt;&lt;PRE&gt;{
 "Raw_Json": {
  "SystemInfo": {
   "Id": "a string ID",
   "TM": null,
   "CountID": "a string ID",
   "Topic": null,
   "AccountID": "some number",
   "StationID": "some number",
   "STime": "some Timestamp",
   "ETime": "some Timestamp"
  },
  "Profile": {
   "ID": "ID number",
   "ProductID": "Some Number",
   "City": "City Name",
   "State": "State Name",
   "Number": "XXX-XXX-XXXX",
   "ExtNumber": null,
   "Unit": null,
   "Name": "Person Name",
   "Service": "Purchase",
   "AddrID": "00000000",
   "Products": {
    "Product": [{
     "Code": "CODE",
     "Description": "some description"
     
    },
    {
     "Code": "CODE",
     "Description": "some description"
     
    },
    {
     "Code": "CODE",
     "Description": "some description"
     
    },
    {
     "Code": "CODE",
     "Description": "some description"
    
    },
    {
     "Code": "CODE",
     "Description": "some description"
     
    },
    {
     "Code": "CODE",
     "Description": "some description"
     
    },
    {
     "Code": "CODE",
     "Description": "some description"
     
    },
    {
     "Code": "CODE",
     "Description": "some description"
     
    },
    {
     "Code": "CODE",
     "Description": "some description"
     
    },
    {
     "Code": "CODE",
     "Description": "some description"
    }]
   }
  },
  "Total": {
   "Amount": "some amount",
   "Delivery": "some address",
   "Estimate": "some amount",
   "Tax": null,
   "Delivery_Type": null
   
  }
  
 },
 "partition_date":"2017-05-19"
}
&lt;/PRE&gt;&lt;P&gt;I am getting the json, using the InferAvroSchema processor and from there converting the json to avro format by using the inferred avro schema and sending it into the PutHiveStreaming processor. My Flow looks something like this:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="15577-flowexample.jpg" style="width: 1032px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/19341iA1C3EBC758CE51C2/image-size/medium?v=v2&amp;amp;px=400" role="button" title="15577-flowexample.jpg" alt="15577-flowexample.jpg" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;The main goal is that I want all of the "Raw_Json" column to be dumped into one column in the hive table and the table will be partitioned by the "partition_date" column which will be the second column of the table. The problem is that for some reason NiFi is having problems inferring the nested json from the "Raw_Json" column and is dumping it like Null on the table as shown below:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="15578-tableexample.jpg" style="width: 925px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/19342iCE322F2EE7F60F23/image-size/medium?v=v2&amp;amp;px=400" role="button" title="15578-tableexample.jpg" alt="15578-tableexample.jpg" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Does anyone know how could I make NiFi read the entire nested Json of the "Raw_Json" column as a string column and send it to the hive table? How could I create my own avro schema for it to do this? My main goal would be that the Raw_Json can be read as a string column. Any insight or ideas on how to fix this issue would be greatly appreciated!&lt;/P&gt;</description>
      <pubDate>Sun, 18 Aug 2019 09:15:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Need-Help-infering-an-avro-schema-for-a-json-file-in-NiFi/m-p/179975#M61486</guid>
      <dc:creator>Adda_Fuentes2</dc:creator>
      <dc:date>2019-08-18T09:15:43Z</dc:date>
    </item>
    <item>
      <title>Re: Need Help infering an avro schema for a json file in NiFi</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Need-Help-infering-an-avro-schema-for-a-json-file-in-NiFi/m-p/179976#M61487</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/13399/addafuentes2.html" nodeid="13399"&gt;@Adda Fuentes&lt;/A&gt;&lt;P&gt;When you infer the schema, do you store the schema in content (default) or send it to attribute "inferred.avro.schema"? Can you try setting the inferred schema to attribute? Also set the input content type explicitly to json if it is not. &lt;/P&gt;</description>
      <pubDate>Tue, 23 May 2017 03:02:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Need-Help-infering-an-avro-schema-for-a-json-file-in-NiFi/m-p/179976#M61487</guid>
      <dc:creator>mqureshi</dc:creator>
      <dc:date>2017-05-23T03:02:52Z</dc:date>
    </item>
    <item>
      <title>Re: Need Help infering an avro schema for a json file in NiFi</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Need-Help-infering-an-avro-schema-for-a-json-file-in-NiFi/m-p/179977#M61488</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/10969/mqureshi.html" nodeid="10969"&gt;@mqureshi&lt;/A&gt; I was sending "inferred.avro.schema" as an attribute and the input content was set to json&lt;/P&gt;</description>
      <pubDate>Tue, 23 May 2017 03:15:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Need-Help-infering-an-avro-schema-for-a-json-file-in-NiFi/m-p/179977#M61488</guid>
      <dc:creator>Adda_Fuentes2</dc:creator>
      <dc:date>2017-05-23T03:15:21Z</dc:date>
    </item>
    <item>
      <title>Re: Need Help infering an avro schema for a json file in NiFi</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Need-Help-infering-an-avro-schema-for-a-json-file-in-NiFi/m-p/179978#M61489</link>
      <description>&lt;P&gt;I was able to figure it out. I used the EvaluateJsonPath processor and grabbed the 'Raw_Json' and the 'partition_date' column and then I used the AttributestoJson processor to turn those two attributes into a Json. Afterwards the Inferavroschema processor was able to infer the 'Raw_Json" column as a string and it is now putting it into the Hive table via HiveStreaming correctly. &lt;/P&gt;</description>
      <pubDate>Tue, 23 May 2017 03:19:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Need-Help-infering-an-avro-schema-for-a-json-file-in-NiFi/m-p/179978#M61489</guid>
      <dc:creator>Adda_Fuentes2</dc:creator>
      <dc:date>2017-05-23T03:19:31Z</dc:date>
    </item>
  </channel>
</rss>

