Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Csv to json conversion error

avatar
Contributor

I have a csv file of the format -

1512340821, 26,576.09, 39824, 989459.009.. and so on 35 total fields

Each of these columns is a long or double format in avro format. Now I have used the Convert Record processor in nifi, which forst converts or uses avro schema and then produces the json format data.

My goal is to have data coming out of json format like the following

{"timestamp":"1512312024","tupvalues":[112.5,0.0,1872296.0,134760.0,0.0,7134.0,19.0,0.0,0.0,3.8136152E7,13703.0,18.0,111.0,37.0,1.38252288E8,1.91762842E9,5.9564032E7,4055040.0,0.0,1.41528269E9,0.0,8.0539648E7,9.5470592E7,0.0,2.0,0.76,0.44,0.0,0.0,2.0,2.0,0.0,0.0,0.0,0.0]}

My original data does not have headers, but I expect the data to be of the format {key, value} where key is the timestamp and values are the other column numbers.

Here is the avro schema that i put in the avro registry

{
"type": "record",
"namespace": "testavro.schema",
"name": "test",
"fields": [
{
"type": "double",
"name": "timestamp"
},
{
"name": "tupvalues",
"type" : {
"type": "array",
"items": "double"
}
}

]
}

I used this website -- "

https://json-schema-validator.herokuapp.com/avro.jsp" to check the conversion and it reads success. But when applied in the avro registry, the data is not picked. I get an error of the following order-- "Cannot create value [26] of type java.lang.string to object array for field tupvalues.

Any sort of help is appreciated. I am a newbie to avro schema writing, I have a feeling thats where I am wrong.

1 ACCEPTED SOLUTION

avatar
Master Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
4 REPLIES 4

avatar
Master Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Contributor

@Matt Burgess I was able to do this and worked perfectly. However, there is just one small request. The data that I finally receive in PUTfile, is all in one line. I tried to insert newlines after each record ends. However in PublishKafka_0_11 there is message demarcator where Shift+Ctrl is also not helping my situation.
I figured that it is because
[{"timestamp":1512.., "tupvalues":[1,2,3,4...]},

{[{"timestamp":1512.., "tupvalues":[1,2,3,4...]},

[{"timestamp":1512.., "tupvalues":[1,2,3,4...]},

[{"timestamp":1512.., "tupvalues":[1,2,3,4...]}.....]

The square bracket is right at the very end. Whereas the required output is somewhat like this:

{"timestamp":"1512312021","tupvalues":[0.8,0.0,18244.0,3176.0,0.0,122.0,11.0,0.0,0.0,100052.0,1783.0,4.0,59.0,1.0,3252224.0,1.8681856E7,2777088.0,999424.0,0.0,524288.0,0.0,487424.0,740352.0,0.0,1.0,0.04,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0]} {"timestamp":"1512312022","tupvalues":[207.8,0.2,3778460.0,309000.0,0.0,22342.0,27.0,0.0,0.0,1.06732936E8,25623.0,36.0,749.0,110.0,3.19459328E8,3.87224371E9,1.17956608E8,7110656.0,0.0,2.87654298E9,0.0,2.0957184E8,2.46372352E8,0.0,3.0,1.95,1.23,0.0,0.0,3.0,6.0,0.0,0.0,0.0,0.0]}

any suggestions? Do u think split json or split record should be now be introduced?

avatar
Master Guru

In NiFi 1.7.0 (via NIFI-4456) you will be able to select that your JsonRecordSetWriter issue one JSON object per line, to give you the output you want. In this case with JoltTransformJSON you'd need a ConvertRecord after the fact, since it will issue an array if you pass in an array. As a workaround you might be able to use ReplaceText to remove the first and last square brackets, and change },{ to },\n{ to put each value on the same line.

avatar
Contributor

Hi @Matt Burgess, the output is still in one line instead of multiple lines. Even though I have tried using what you mentioned above. I have used Replace text and in place of regex : (\[)(\{\"timestamp\"\:15123[0-9]+),(\"tupvalues\"\:\[([0-9]+\.[a-zA-Z0-9-]+),([0-9]+\.[a-zA-Z0-9-]+)([0-9]+\.[a-zA-Z0-9-]+)([0-9]+\.[a-zA-Z0-9-]+)([0-9]+\.[a-zA-Z0-9-]+)([0-9]+\.[a-zA-Z0-9-]+)([0-9]+\.[a-zA-Z0-9-]+)([0-9]+\.[a-zA-Z0-9-]+)([0-9]+\.[a-zA-Z0-9-]+)([0-9]+\.[a-zA-Z0-9-]+)([0-9]+\.[a-zA-Z0-9-]+)([0-9]+\.[a-zA-Z0-9-]+)([0-9]+\.[a-zA-Z0-9-]+)([0-9]+\.[a-zA-Z0-9-]+)([0-9]+\.[a-zA-Z0-9-]+)([0-9]+\.[a-zA-Z0-9-]+)([0-9]+\.[a-zA-Z0-9-]+)([0-9]+\.[a-zA-Z0-9-]+)([0-9]+\.[a-zA-Z0-9-]+)([0-9]+\.[a-zA-Z0-9-]+)([0-9]+\.[a-zA-Z0-9-]+)([0-9]+\.[a-zA-Z0-9-]+)([0-9]+\.[a-zA-Z0-9-]+)([0-9]+\.[a-zA-Z0-9-]+)([0-9]+\.[a-zA-Z0-9-]+)([0-9]+\.[a-zA-Z0-9-]+)([0-9]+\.[a-zA-Z0-9-]+)([0-9]+\.[a-zA-Z0-9-]+)([0-9]+\.[a-zA-Z0-9-]+)([0-9]+\.[a-zA-Z0-9-]+)([0-9]+\.[a-zA-Z0-9-]+)([0-9]+\.[a-zA-Z0-9-]+)([0-9]+\.[a-zA-Z0-9-]+)\]\})(\,)

For replacement values $2, $3. Followed by Split text in order to split line by line. But the output is still the same.
I even tried the solution given by you on post https://community.hortonworks.com/questions/109064/nifi-replace-text-how-t0-replace-string-with.html

And tried substituting the expression [\[\]](\{|\}) but this gives me an output which has no square brackets in the beginning and inside the array. I know its been a week almost, but still have not got a hang of it.