Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Is there a way in hadoop to convert multiline JSON file to single line at row level JSON file

Is there a way in hadoop to convert multiline JSON file to single line at row level JSON file

New Contributor

I am trying to ingest JSON file into hive. If the the JSON file is single line or single line at row level im able to handle it through get_json_object or lateral view. Is there any other tool or way to convert multi line JSON file into single line JSON file? Could you explain or provide link.

4 REPLIES 4

Re: Is there a way in hadoop to convert multiline JSON file to single line at row level JSON file

Contributor

@Bala Vignesh Nagamuthu Venkatesan You could use Apache NiFi for handling the ingestion of the JSON file and transformation of it. For example if you have a multi-line json file like this:

{
   "book": [
 
      {
         "_key":"01",
         "language": "Java",
         "edition": "third",
         "author": "Herbert Schildt"
      },
 
      {
         "_key":"07",
         "language": "C++",
         "edition": "second",
         "author": "E.Balagurusamy"
      }
   ]
}

You could use a flow like the following to split the json, and then merge it back.

10434-screen-shot-2016-12-18-at-9.png

The output of the flow above using the sample multi-line json provided is this, a single line JSON entry. Note the below actually is a single line, but the formatting online makes it looks like two.

{"_key":"01","language":"Java","edition":"third","author":"Herbert Schildt"}{"_key":"07","language":"C++","edition":"second","author":"E.Balagurusamy"}

You could do a little more transformation on this using some of the other JSON processors available, or using the JSON-TO-JSON Jolt transformation, great explanation by @Yolanda M. Davis here: https://community.hortonworks.com/articles/44726/json-to-json-simplified-with-apache-nifi-and-jolt.h...

If you post the structure of your JSON i'm sure we can get it to work for you either using HIVE/Hadoop or NiFi.

Re: Is there a way in hadoop to convert multiline JSON file to single line at row level JSON file

Guru

:) And of course you could put to HDFS using PutHDFS.

Re: Is there a way in hadoop to convert multiline JSON file to single line at row level JSON file

New Contributor

Instead of writing all to single line JSON can this be broken down to write each at a single line such as

{"_key":"01","language":"Java","edition":"third","author":"Herbert Schildt"}

{"_key":"07","language":"C++","edition":"second","author":"E.Balagurusamy"}

I tried using a replace text processor with append('\n') but that didnt seem to work.

Re: Is there a way in hadoop to convert multiline JSON file to single line at row level JSON file

Use the Jolt processor to parse out the JSON to a single line. Here is an example NiFi DataFlow showing how to configure the Jolt processor to parse out JSON fields:

https://community.hortonworks.com/articles/67178/nifi-for-clickstream-log-ingestion-into-hbase-phoe....

Don't have an account?
Coming from Hortonworks? Activate your account here