Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Format JSON file using schema

Solved Go to solution

Format JSON file using schema

New Contributor

Hello,


Receiving from an embedded micro over Ethernet a JSON file of the form:


{

"id" : 123456,

"ip" : "*",

"t" : -12.9,

"T" : -23.8,

"variables": [

"user1",

0,

12.97,

23

]

}


and knowing what each variable represents in the "variables" array as:


"username" : "user1",

"active" : 0,

"temperature" : 12.97,

"age" : 23


How can I update the JSON file with the key name of the variables such that the end result is:


{

"id" : 123456,

"ip" : "*",

"t" : -12.9,

"T" : -23.8,

"variables": [

"username" : "user1",

"active" : 0,

"temperature" : 12.97,

"age" : 23

]

}


and then be able to access these variables with their name?


Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Format JSON file using schema

Since you want to change the array of values into key/value pairs, you'll need to put them in an object inside the variables array, so I'm guessing you want a single-element array "variables" containing an object that has the key value pairs. If that's correct, you can use JoltTransformJSON with the following spec, it adds keys for each value in the array based on its order:

[
  {
    "operation": "shift",
    "spec": {
      "variables": {
        "0": "variables[0].username",
        "1": "variables[0].active",
        "2": "variables[0].temperature",
        "3": "variables[0].age"
      },
      "*": "&"
    }
  }
]


This gave me the following output:

{
  "id" : 123456,
  "ip" : "*",
  "t" : -12.9,
  "T" : -23.8,
  "variables" : [ {
    "username" : "user1",
    "active" : 0,
    "temperature" : 12.97,
    "age" : 23
  } ]
}
6 REPLIES 6

Re: Format JSON file using schema

Since you want to change the array of values into key/value pairs, you'll need to put them in an object inside the variables array, so I'm guessing you want a single-element array "variables" containing an object that has the key value pairs. If that's correct, you can use JoltTransformJSON with the following spec, it adds keys for each value in the array based on its order:

[
  {
    "operation": "shift",
    "spec": {
      "variables": {
        "0": "variables[0].username",
        "1": "variables[0].active",
        "2": "variables[0].temperature",
        "3": "variables[0].age"
      },
      "*": "&"
    }
  }
]


This gave me the following output:

{
  "id" : 123456,
  "ip" : "*",
  "t" : -12.9,
  "T" : -23.8,
  "variables" : [ {
    "username" : "user1",
    "active" : 0,
    "temperature" : 12.97,
    "age" : 23
  } ]
}

Re: Format JSON file using schema

New Contributor

Yes, excellent! Thank you @Matt Burgess!!!

On this topic, I tried reading the jolt spec from a file using the GetFile processor and then extracting the text (jolt spec) using ExtractText and assigning it to a new property such that I can pass the spec to the JoltTransformJSON processor but it doesn't work. How can this be done? That is, how can the jolt spec be read from a file instead of configuring the Jolt processor?

Thanking you in advance!

Re: Format JSON file using schema

This is an open issue (NIFI-4957), I pinged the original author/contributor to get the status of it, if he is not actively working it I can take a look at adding the capability.

Re: Format JSON file using schema

New Contributor

That's great Matt.
Looks like it will become a reality. Thank you!

Re: Format JSON file using schema

New Contributor

@Matt Burgess I have noticed that there hasn't been progress on this open issue (NIFI-4957). Nevertheless I tried placing the Jolt spec in a processor group variable (e.g. jspec) and then using this variable in the JoltTransformJSON processor ( Jolt Specification = ${jspec} ) and it transformed the JSON FlowFile successfully. Hence an intermediate solution (until NIFI-4957 is resolved) could be to update the processor group variable at the cost of resetting NiFi.
I will also investigate the process:
1.- Get Jolt Spec from file ( periodically with a CRON )
2.- Put FlowFile ( Jolt Spec ) in Distributed Map Cache with e.g. Cache Entry Identifier = jspec
3.- Fetch Distributed Map Cache (as and when required to transform the JSON FlowFile) and put cache value in attribute jspec
4.- Assign jspec to Jolt Specification of JoltTransformJSON processor

Re: Format JSON file using schema

New Contributor

In continuation to the above using Distributed Map Cache to hold a Jolt Spec read from a file, I can confirm that it works.

Flow A: GetFile (periodically scheduled by CRON) --> PutDistributedMapCache --> LogAttribute:

Jolt spec file contents ( jspec.json ) on disk:

[{"operation":"shift","spec":{"variables":{"0":"variables[0].username","1":"variables[0].active","2":"variables[0].temperature","3":"variables[0].age"},"*":"&"}}]

Note: Jolt spec in compact form!!

GetFile processor:

File Filter <-- jspec.json


PutDistributedMapCache processor:

Cache Entry Identifier <-- joltspec


Flow B: GenerateFlowFile --> FetchDistributedMapCache --> JoltTransformJSON


GenerateFlowFile processor:

Custom Text <---

{

"id" : 123456,

"ip" : "*",

"t" : -12.9,

"T" : -23.8,

"variables": [

"user1",

0,

12.97,

23

]

}

FetchDistributedMapCache processor:

Cache Entry Identifier <-- joltspec
Put Cache Value in Attribute <-- jspec

JoltTransformJSON processor:

Jolt Transformation DSL <-- Chain
Jolt Specification <-- ${jspec}

109486-capture-001-21062019-074630.jpg

Don't have an account?
Coming from Hortonworks? Activate your account here