Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Kite dataset for nested JSON document

Kite dataset for nested JSON document

Expert Contributor

I got a nested JSON document for this form that I would like to transform and store into the dataset created by Kite:

 

{
 "uid": 29153333,
 "somefield": "somevalue",
 "options": [
   {
     "item1_lvl2": "a",
     "item2_lvl2": [
       {
         "item1_lvl3": "x1",
         "item2_lvl3": "y1"
       },
       {
         "item1_lvl3": "x2",
         "item2_lvl3": "y2"
       }
     ]
   }
 ]
}

How does one go about storing and querying these types of documents?

 

'm planning on ingesting using Flume's Kite dataset sink and will be relying on extractJsonPath and toAvro morphline commands to transform the JSON documents. Is creating a dataset based on Avro schema using complex types supported? 

 

Thanks!

6 REPLIES 6

Re: Kite dataset for nested JSON document

Explorer
buntu: did you ever come up with a solution for this?

Re: Kite dataset for nested JSON document

Expert Contributor

Currently I'm relying on Kite CLI to generate the Avro schema and then do a json import providing the json file along with the generated schema:

   http://kitesdk.org/docs/current/cli-reference.html#json-schema

   http://kitesdk.org/docs/current/cli-reference.html#json-import

 

Let me know if there are any other alternative ways to handle the ingestion and/or querying the dataset.

 

 

Re: Kite dataset for nested JSON document

Explorer

Specifically we have json that has nested records (and thus of course avro schemas that reflect that nesting) and we can't figure out how to use readJson + extractJsonPaths + toAvro + writeAvroAsByteArray to process this data because toAvro appears to NOT support nested records.

 

Re: Kite dataset for nested JSON document

Expert Contributor

Yes, it doesn't seem to support nested json. So the ingestion process writes the JSON records to HDFS and then I schedule a periodic job to import the files to the Kite dataset.

 

Few other options:

- Read the JSON using Apache Spark and write as Parquet and operate on the data

- Apache Nifi is another option that was suggested but I havn't had a chance to play around with it

Re: Kite dataset for nested JSON document

Explorer
Thanks again for the responses.

Re: Kite dataset for nested JSON document

New Contributor

  This is my solution. I hope that will be helpful.

morphlines: [
  {
    id: convertJsonToAvro
    importCommands: [ "org.kitesdk.**" ]
    commands: [
      # read the JSON blob
      { readJson: {} }
	  
	  # java code
	  {
			  java { 
					imports : """
					  import com.fasterxml.jackson.databind.JsonNode;
					  import com.fasterxml.jackson.databind.ObjectMapper;
					  import org.kitesdk.morphline.base.Fields;
					  import java.io.IOException;
					  import java.util.Set;
					  import java.util.ArrayList;
					  import java.util.Iterator;
					  import java.util.List;
					  import java.util.Map;
					"""

					code : """
					  String jsonStr = record.getFirstValue(Fields.ATTACHMENT_BODY).toString();
					  ObjectMapper mapper = new ObjectMapper();
					  Map<String, Object> map = null;
					  try {
						  map = (Map<String, Object>)mapper.readValue(jsonStr, Map.class);
					  } catch (IOException e) {
						  e.printStackTrace();
					  }
					  Set<String> keySet = map.keySet();
					  for (String o : keySet) {
						  record.put(o, map.get(o));
					  }
					  return child.process(record);                   
					"""
	 
			  }
	  }
      
      # convert the extracted fields to an avro object
      # described by the schema in this field
      { toAvro {
        schemaFile: /etc/flume/conf/a1/like_user_event_realtime.avsc
      } }
      
      #{ logInfo { format : "loginfo: {}", args : ["@{}"] } }
  
      # serialize the object as avro
      { writeAvroToByteArray: {
        format: containerlessBinary
      } }
  
    ]
  }
]