Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Convert json to multiple insert sql format

Hi All,

 

Greetings!

 

Assume that I have a data flow, and the final result is inserting to the SQL database (Microsoft), but the step before! I have the following JSON format for each flow file content as following:

 

 

{
  "sbrActivityType" : {
    "1" : "ACT+A",
    "3" : "ACT+A",
    "2" : "ACT+X",
    "5" : "ACT+X",
    "4" : "ACT+X"
  },
  "sbrDateTime" : "Sun Mar 19 13:45:15 AST 2023"
}

 

Then, my desired output format look like the following:

INSERT INTO sbrTestNifiData (sbrActivityType, sbrDateTime) VALUES (?, ?)

 

So, it does not insert correctly to my database table which is looks like the following:

 

CREATE TABLE [dbo].[sbrTestNifiData](
	[sbrId] [int] IDENTITY(1,1) NOT NULL primary key,
	[sbrActivityType] [varchar](50) NULL,
	[sbrDateTime] [varchar](50) NULL
);

 

 

So, it should inserted to my database table as a five records:

sbrIdsbrActivityTypesbrDateTime
1ACT+ASun Mar 19 11:56:40 AST 2023
2ACT+ASun Mar 19 11:56:40 AST 2023
3ACT+XSun Mar 19 11:56:40 AST 2023
4ACT+XSun Mar 19 11:56:40 AST 2023
5ACT+XSun Mar 19 11:56:40 AST 2023
2 ACCEPTED SOLUTIONS

Super Collaborator

Hi ,

My guess is that you are trying to use ConvertJsonToSQL & PutSQL to transpose the json into sql table. For this to happen you cant use the input json in the given format , you need to transform it using JoltTransformJSON processor to create an array of records where each record consist of  "sbrActivityType" & "sbrDateTime" columns. The json keys have to match the table column names for this to work. The jolt spec to use can look like this:

 

 

[
  {
    "operation": "shift",
    "spec": {
      "sbrActivityType": {
        // convert every activity into an array element
        "*": "data[#1].sbrActivityType"
      },
      "sbrDateTime": "sbrDateTime"
    }
  },
  {
    "operation": "modify-default-beta", 
    "spec": { 
      "data": {
        "*": {
          // create new sbrDateTime key under each array element from
          // above and set its value to the initial sbrDateTime
          "sbrDateTime": "@(3,sbrDateTime)"
        }
        
      }
    }
  }
]

 

you set the Jolt Specification property with the spec above in the JoltTransformJSON 

 

SAMSAL_0-1679238106574.png

 

This should give you new json flowfile as follows:

 

 

{
  "data" : [ {
    "sbrActivityType" : "ACT+A",
    "sbrDateTime" : "Sun Mar 19 13:45:15 AST 2023"
  }, {
    "sbrActivityType" : "ACT+X",
    "sbrDateTime" : "Sun Mar 19 13:45:15 AST 2023"
  }, {
    "sbrActivityType" : "ACT+A",
    "sbrDateTime" : "Sun Mar 19 13:45:15 AST 2023"
  }, {
    "sbrActivityType" : "ACT+X",
    "sbrDateTime" : "Sun Mar 19 13:45:15 AST 2023"
  }, {
    "sbrActivityType" : "ACT+X",
    "sbrDateTime" : "Sun Mar 19 13:45:15 AST 2023"
  } ],
  "sbrDateTime" : "Sun Mar 19 13:45:15 AST 2023"
}

 

 

 

After that, use SplitJosn to split each record into its own json flowfile. The split JsonPath Expression is set to $.data.*:

 

SAMSAL_3-1679238260659.png

Then you use ConvertJSONToSQL as follows:

 

SAMSAL_4-1679238363872.png

 

And Finally you use the PUTSQL processor as follows:

 

SAMSAL_5-1679238426041.png

 

This should populate the table accordingly:

SAMSAL_6-1679238490140.png

 

If this helps please accept solution.

Thanks

 

 

 

 

 

View solution in original post

Super Collaborator

Hi,

I noticed that you have an "Original" relationship going from SplitJson back to itself. This where your duplicates are coming from. The "Original" relationship means the original input flowfile that is passed to the splitJson from JoltTransformation processor. You need to terminate this relationship and not pass it to anything  unless you are using Wait-Notify processors which I dont think in your case. You can terminate relationship by going to splitJson Configuration then go to the Relationship tab and check the terminate box under the Original relationship.

 

 

 

 

View solution in original post

8 REPLIES 8

Super Collaborator

Hi ,

My guess is that you are trying to use ConvertJsonToSQL & PutSQL to transpose the json into sql table. For this to happen you cant use the input json in the given format , you need to transform it using JoltTransformJSON processor to create an array of records where each record consist of  "sbrActivityType" & "sbrDateTime" columns. The json keys have to match the table column names for this to work. The jolt spec to use can look like this:

 

 

[
  {
    "operation": "shift",
    "spec": {
      "sbrActivityType": {
        // convert every activity into an array element
        "*": "data[#1].sbrActivityType"
      },
      "sbrDateTime": "sbrDateTime"
    }
  },
  {
    "operation": "modify-default-beta", 
    "spec": { 
      "data": {
        "*": {
          // create new sbrDateTime key under each array element from
          // above and set its value to the initial sbrDateTime
          "sbrDateTime": "@(3,sbrDateTime)"
        }
        
      }
    }
  }
]

 

you set the Jolt Specification property with the spec above in the JoltTransformJSON 

 

SAMSAL_0-1679238106574.png

 

This should give you new json flowfile as follows:

 

 

{
  "data" : [ {
    "sbrActivityType" : "ACT+A",
    "sbrDateTime" : "Sun Mar 19 13:45:15 AST 2023"
  }, {
    "sbrActivityType" : "ACT+X",
    "sbrDateTime" : "Sun Mar 19 13:45:15 AST 2023"
  }, {
    "sbrActivityType" : "ACT+A",
    "sbrDateTime" : "Sun Mar 19 13:45:15 AST 2023"
  }, {
    "sbrActivityType" : "ACT+X",
    "sbrDateTime" : "Sun Mar 19 13:45:15 AST 2023"
  }, {
    "sbrActivityType" : "ACT+X",
    "sbrDateTime" : "Sun Mar 19 13:45:15 AST 2023"
  } ],
  "sbrDateTime" : "Sun Mar 19 13:45:15 AST 2023"
}

 

 

 

After that, use SplitJosn to split each record into its own json flowfile. The split JsonPath Expression is set to $.data.*:

 

SAMSAL_3-1679238260659.png

Then you use ConvertJSONToSQL as follows:

 

SAMSAL_4-1679238363872.png

 

And Finally you use the PUTSQL processor as follows:

 

SAMSAL_5-1679238426041.png

 

This should populate the table accordingly:

SAMSAL_6-1679238490140.png

 

If this helps please accept solution.

Thanks

 

 

 

 

 

Hi,

I think the jolt spec is incorrect because one of parentheses are more or less at 5th line.

If you can help me on this I appreciate it.

Ignore my statement, it is ok now, but I did not finish the pipeline yet, I will let you know back again once I done

Once I d SplitJson processor, it give so many flow files, how to limit to the number of input flow file. For example, in my example here: we have 5 ACTs (means we have 5 records in the target database table)

Super Collaborator

What do you mean by it gave you "so many flowfiles". If you follow the same jolt spec and split processors configuration as I provided it should give you the exact amount flowfiles similar to the number of ACT in the original input. Can you provide more details?

Hi @SAMSAL , it gives the following number of flow files (forever), although the source flow file as provided in the question around 3 ACT.As and 2 ACT.Xs.

 

ahmedalsaidi_0-1679816694822.png

 

I hope my explanation was fine and ok.

Super Collaborator

Hi,

I noticed that you have an "Original" relationship going from SplitJson back to itself. This where your duplicates are coming from. The "Original" relationship means the original input flowfile that is passed to the splitJson from JoltTransformation processor. You need to terminate this relationship and not pass it to anything  unless you are using Wait-Notify processors which I dont think in your case. You can terminate relationship by going to splitJson Configuration then go to the Relationship tab and check the terminate box under the Original relationship.

 

 

 

 

Thanks @SAMSAL , everything it works fine....

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.