Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Not able to parse text file to json format

Solved Go to solution
Highlighted

Not able to parse text file to json format

New Contributor

Hi folks, i have problem in parsing json textfile.

what i have done is, I sent a text file which is in a json architecture to PutMongoRecord To parse that file into json but it is not getting parsed.

i have attached the configuration done by me.

please help me in this problem.

schemaschemaJsonTreeReaderJsonTreeReaderputmongo's configurationputmongo's configurationthe error which we gotthe error which we gottextfile senttextfile sent

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Not able to parse text file to json format

The FlowFile is not valid json.   

 

Try something like a basic json example found on the internet:

 

{"results":[{"term":"term1"},{"term":"term2"}..]}

 Notice the object results is the array of values and everything is "quoted".

 

Something like this based on your sample image:

{ "world_rank": "1", "country": "China", "population": "1388232694", "World": "0.185" }

  

If you have to use what you have in the sample image, then you will need to extracText (link here for examples of regex to match values) to get the values for world_rank, country, population, world.

 

Each values attribute regex like this:

 

 

.*world_rank :(.*?) .*
.*country: (.*?) .*
.*population: (.*?) .*
.*world: (.*?) .*

 

 

Then use attributesToJson to build the Json object of the attributes you defined above which you can then send to the reader and ultimately to MongoDb.

 

 

If my reply helps you solve your Use Case, please accept it as a solution to close the topic.  

 


 


If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post.  


 


Thanks,



Steven

View solution in original post

5 REPLIES 5

Re: Not able to parse text file to json format

Explorer

You can find the indices and matches of the header with the finditer of the re package. Then, use that to process the rest:

import reimport json

thefile = open("file.txt")line = thefile.readline()iter = re.finditer("\w+\s+", line)columns = [(m.group(0), m.start(0), m.end(0)) for m in iter]records = []
while line:    line = thefile.readline()    record = {}
    for col in columns:        record[col[0]] = line[col[1]:col[2]]        records.append(record)

print(json.dumps(records))
Highlighted

Re: Not able to parse text file to json format

The FlowFile is not valid json.   

 

Try something like a basic json example found on the internet:

 

{"results":[{"term":"term1"},{"term":"term2"}..]}

 Notice the object results is the array of values and everything is "quoted".

 

Something like this based on your sample image:

{ "world_rank": "1", "country": "China", "population": "1388232694", "World": "0.185" }

  

If you have to use what you have in the sample image, then you will need to extracText (link here for examples of regex to match values) to get the values for world_rank, country, population, world.

 

Each values attribute regex like this:

 

 

.*world_rank :(.*?) .*
.*country: (.*?) .*
.*population: (.*?) .*
.*world: (.*?) .*

 

 

Then use attributesToJson to build the Json object of the attributes you defined above which you can then send to the reader and ultimately to MongoDb.

 

 

If my reply helps you solve your Use Case, please accept it as a solution to close the topic.  

 


 


If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post.  


 


Thanks,



Steven

View solution in original post

Highlighted

Re: Not able to parse text file to json format

New Contributor

Hello,

I need to send 

{"value" : [{"id1":"10"}]}

for the above data what must be its schema ?so that it must store into database in such a way 

id1=10

thanks in advance!!

 

 

Highlighted

Re: Not able to parse text file to json format

New Contributor

how to configure this  because i get "could not parse to json"how to configure this because i get "could not parse to json"

Highlighted

Re: Not able to parse text file to json format

I’m the schema remove the second row in fields.  So it’s just id1.

 

Now do a test: use generateflowfile with the schema in contents.  Then evaluatejson.  In this processor click +, and make id1.  The value for id1 is then $.value.id1 (this is the code to get the real value.  Then route evaluatejson routes to an output port.  Run.  Inspect the item in the queue and confirm you see the attribute id1.

 

This is a simple example to help you understand how json works, how to get the data from the value object, how to confirm schema is correct, and how to unit test by looking at flowfile in the queue.

 


 


If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post.  


 


Thanks,



Steven

Don't have an account?
Coming from Hortonworks? Activate your account here