Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to repeat a json to maintain uniqueness for PutHbaseJson processor

Solved Go to solution
Highlighted

How to repeat a json to maintain uniqueness for PutHbaseJson processor

I have flattened a json using jolt since the input was json array it has resulted in list for the keys. But to use PutHBaseJson processor I need scalar values to define rowkey. Is there a way to repeat the whole json again and keep one value at a time? So that the uniqueness is maintained.

Below are my input, transformation and output

Input

{
  "resource": {
    "id": "1234",
    "name": "Resourse_Name"
  },
  "data": [
    {
      "measurement": {
        "key": "5678",
        "value": "status"
      },
      "timestamp": 1517784040000,
      "value": 1
    },
    {
      "measurement": {
        "key": "91011",
        "value": "location"
      },
      "timestamp": 1519984070000,
      "value": 0
    }
  ]
}



Transformation

[
  {
    "operation": "shift",
    "spec": {
      "resource": {
        "id": "resource_id",
        "name": "resource_name"
      },
      //
      // Turn all the SecondaryRatings into prefixed data
      // like "rating-Design" : 4
      "data": {
        "*": {
          // the "&" in "rating-&" means go up the tree 0 levels,
          // grab what is ther and subtitute it in
          "measurement": {
            "*": "measurement_&"
          },
          "timestamp": "measurement_timestamp",
          "value": "value"
        }
      }
    }
  }
]

Output

{
  "resource_id" : "1234",
  "resource_name" : "Resourse_Name",
  "measurement_key" : [ "5678", "91011" ],
  "measurement_value" : [ "status", "location" ],
  "measurement_timestamp" : [ 1517784040000, 1519984070000 ],
  "value" : [ 1, 0 ]
}


Expected output

{
  "resource_id" : "1234",
  "resource_name" : "Resourse_Name",
  "measurement_key" : "5678",
  "measurement_value" : "status",
  "measurement_timestamp" : 1517784040000, ,
  "value" :  1
},
{
  "resource_id" : "1234",
  "resource_name" : "Resourse_Name",
  "measurement_key" :  "91011" ,
  "measurement_value" :  "location" ,
  "measurement_timestamp" :  1519984070000 ,
  "value" :  0 
}




1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: How to repeat a json to maintain uniqueness for PutHbaseJson processor

Super Guru
@vivek jain

Without using JOLT tranform you can achieve the same expected output with Evaluate and Split Json processors.

Example:-

72741-flow.png

i have used your input json in generate flowfile processor to test out this flow.

EvaluateJsonPath Configs:-

Change the below property value
Destination

flowfile-attribute

Add new properties to the processor
resource_id

$.resource.id

resource_name

$.resource.name

now we are extracting id,name values and assigning to resource_id,resource_name attributes.

72743-evaljson1.png

SplitJson Configs:-

As you are having data array split the array using split json processor.

Configure the processor as below

JsonPath Expression

$.data

EvaluateJsonPath Configs:-

Once we have splitted the data array we are going to have 2 flowfiles having same resource_id,resource_name attributes.

Change the below property value
Destination

flowfile-attribute
Add new propety as

measurement_key

$.measurement.key

measurement_timestamp

$.timestamp

measurement_value

$.measurement.value

value

$.value

Now we are going to extract all the values and keep them as attributes to the flowfile.

72742-evaljson.png

AttributesToJSON processor:-

Use this processor to prepare the required message

Attributes List

resource_id,resource_name,measurement_key,measurement_value,measurement_timestamp,value

Now we are going to have 2 flowfiles then you can feed those flowfile to PutHBaseJson processor because puthbasejson processor expects one json message at a time, use some unique field(combination of attributes (or) ${UUID()}..etc) as rowkey value so that You are not going to overwrite the existing data in HBase.
if you want to merge them into 1 then use merge content processor with defragment as merge strategy and prepare the a valid json array of two messages in it so that you can use PutHBaseRecord processor to process chunks of messages at a time.

I have attached my sample flow.xml save/upload and make changes as per your needs

flatten-json-191073.xml

-

If the Answer addressed your question, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

View solution in original post

2 REPLIES 2
Highlighted

Re: How to repeat a json to maintain uniqueness for PutHbaseJson processor

Super Guru
@vivek jain

Without using JOLT tranform you can achieve the same expected output with Evaluate and Split Json processors.

Example:-

72741-flow.png

i have used your input json in generate flowfile processor to test out this flow.

EvaluateJsonPath Configs:-

Change the below property value
Destination

flowfile-attribute

Add new properties to the processor
resource_id

$.resource.id

resource_name

$.resource.name

now we are extracting id,name values and assigning to resource_id,resource_name attributes.

72743-evaljson1.png

SplitJson Configs:-

As you are having data array split the array using split json processor.

Configure the processor as below

JsonPath Expression

$.data

EvaluateJsonPath Configs:-

Once we have splitted the data array we are going to have 2 flowfiles having same resource_id,resource_name attributes.

Change the below property value
Destination

flowfile-attribute
Add new propety as

measurement_key

$.measurement.key

measurement_timestamp

$.timestamp

measurement_value

$.measurement.value

value

$.value

Now we are going to extract all the values and keep them as attributes to the flowfile.

72742-evaljson.png

AttributesToJSON processor:-

Use this processor to prepare the required message

Attributes List

resource_id,resource_name,measurement_key,measurement_value,measurement_timestamp,value

Now we are going to have 2 flowfiles then you can feed those flowfile to PutHBaseJson processor because puthbasejson processor expects one json message at a time, use some unique field(combination of attributes (or) ${UUID()}..etc) as rowkey value so that You are not going to overwrite the existing data in HBase.
if you want to merge them into 1 then use merge content processor with defragment as merge strategy and prepare the a valid json array of two messages in it so that you can use PutHBaseRecord processor to process chunks of messages at a time.

I have attached my sample flow.xml save/upload and make changes as per your needs

flatten-json-191073.xml

-

If the Answer addressed your question, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

View solution in original post

Highlighted

Re: How to repeat a json to maintain uniqueness for PutHbaseJson processor

@Shu you are awesome. Thanks for the help. It does resolve my problem. I wanted to make rowkey using the combination of json attributes, in order to achieve I used updateAttribute processor and declared the key there using the combination of resource id, metric id and timestamp. Then included this rowkey in AttributesToJson.

Don't have an account?
Coming from Hortonworks? Activate your account here