Support Questions
Find answers, ask questions, and share your expertise

Not able to parse text file to json format

New Contributor

Hi folks, i have problem in parsing json textfile.

what i have done is, I sent a text file which is in a json architecture to PutMongoRecord To parse that file into json but it is not getting parsed.

i have attached the configuration done by me.

please help me in this problem.

schemaschemaJsonTreeReaderJsonTreeReaderputmongo's configurationputmongo's configurationthe error which we gotthe error which we gottextfile senttextfile sent

1 ACCEPTED SOLUTION

Accepted Solutions

The FlowFile is not valid json.   

 

Try something like a basic json example found on the internet:

 

{"results":[{"term":"term1"},{"term":"term2"}..]}

 Notice the object results is the array of values and everything is "quoted".

 

Something like this based on your sample image:

{ "world_rank": "1", "country": "China", "population": "1388232694", "World": "0.185" }

  

If you have to use what you have in the sample image, then you will need to extracText (link here for examples of regex to match values) to get the values for world_rank, country, population, world.

 

Each values attribute regex like this:

 

 

.*world_rank :(.*?) .*
.*country: (.*?) .*
.*population: (.*?) .*
.*world: (.*?) .*

 

 

Then use attributesToJson to build the Json object of the attributes you defined above which you can then send to the reader and ultimately to MongoDb.

 

 

If my reply helps you solve your Use Case, please accept it as a solution to close the topic.  

View solution in original post

5 REPLIES 5

Explorer

You can find the indices and matches of the header with the finditer of the re package. Then, use that to process the rest:

import reimport json

thefile = open("file.txt")line = thefile.readline()iter = re.finditer("\w+\s+", line)columns = [(m.group(0), m.start(0), m.end(0)) for m in iter]records = []
while line:    line = thefile.readline()    record = {}
    for col in columns:        record[col[0]] = line[col[1]:col[2]]        records.append(record)

print(json.dumps(records))

The FlowFile is not valid json.   

 

Try something like a basic json example found on the internet:

 

{"results":[{"term":"term1"},{"term":"term2"}..]}

 Notice the object results is the array of values and everything is "quoted".

 

Something like this based on your sample image:

{ "world_rank": "1", "country": "China", "population": "1388232694", "World": "0.185" }

  

If you have to use what you have in the sample image, then you will need to extracText (link here for examples of regex to match values) to get the values for world_rank, country, population, world.

 

Each values attribute regex like this:

 

 

.*world_rank :(.*?) .*
.*country: (.*?) .*
.*population: (.*?) .*
.*world: (.*?) .*

 

 

Then use attributesToJson to build the Json object of the attributes you defined above which you can then send to the reader and ultimately to MongoDb.

 

 

If my reply helps you solve your Use Case, please accept it as a solution to close the topic.  

View solution in original post

New Contributor

Hello,

I need to send 

{"value" : [{"id1":"10"}]}

for the above data what must be its schema ?so that it must store into database in such a way 

id1=10

thanks in advance!!

 

 

New Contributor

how to configure this  because i get "could not parse to json"how to configure this because i get "could not parse to json"

I’m the schema remove the second row in fields.  So it’s just id1.

 

Now do a test: use generateflowfile with the schema in contents.  Then evaluatejson.  In this processor click +, and make id1.  The value for id1 is then $.value.id1 (this is the code to get the real value.  Then route evaluatejson routes to an output port.  Run.  Inspect the item in the queue and confirm you see the attribute id1.

 

This is a simple example to help you understand how json works, how to get the data from the value object, how to confirm schema is correct, and how to unit test by looking at flowfile in the queue.