Support Questions
Find answers, ask questions, and share your expertise

Is there a way in hadoop to convert multiline JSON file to single line at row level JSON file

New Contributor

I am trying to ingest JSON file into hive. If the the JSON file is single line or single line at row level im able to handle it through get_json_object or lateral view. Is there any other tool or way to convert multi line JSON file into single line JSON file? Could you explain or provide link.

4 REPLIES 4

Contributor

@Bala Vignesh Nagamuthu Venkatesan You could use Apache NiFi for handling the ingestion of the JSON file and transformation of it. For example if you have a multi-line json file like this:

{
   "book": [
 
      {
         "_key":"01",
         "language": "Java",
         "edition": "third",
         "author": "Herbert Schildt"
      },
 
      {
         "_key":"07",
         "language": "C++",
         "edition": "second",
         "author": "E.Balagurusamy"
      }
   ]
}

You could use a flow like the following to split the json, and then merge it back.

10434-screen-shot-2016-12-18-at-9.png

The output of the flow above using the sample multi-line json provided is this, a single line JSON entry. Note the below actually is a single line, but the formatting online makes it looks like two.

{"_key":"01","language":"Java","edition":"third","author":"Herbert Schildt"}{"_key":"07","language":"C++","edition":"second","author":"E.Balagurusamy"}

You could do a little more transformation on this using some of the other JSON processors available, or using the JSON-TO-JSON Jolt transformation, great explanation by @Yolanda M. Davis here: https://community.hortonworks.com/articles/44726/json-to-json-simplified-with-apache-nifi-and-jolt.h...

If you post the structure of your JSON i'm sure we can get it to work for you either using HIVE/Hadoop or NiFi.

Guru

🙂 And of course you could put to HDFS using PutHDFS.

Instead of writing all to single line JSON can this be broken down to write each at a single line such as

{"_key":"01","language":"Java","edition":"third","author":"Herbert Schildt"}

{"_key":"07","language":"C++","edition":"second","author":"E.Balagurusamy"}

I tried using a replace text processor with append('\n') but that didnt seem to work.

Use the Jolt processor to parse out the JSON to a single line. Here is an example NiFi DataFlow showing how to configure the Jolt processor to parse out JSON fields:

https://community.hortonworks.com/articles/67178/nifi-for-clickstream-log-ingestion-into-hbase-phoe....