Code Repositories
Find and share code repositories
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (2)
Repo Description

Normally Hadoop is not able to merge lines together because the underlying tools all use a LineRecordReader that treats every text line as a record. However Hadoop can use different LineReaders as well. This is more scalable than perl or python scripts In this case I used a modified QuotationLineReader. The project has to copy a lot of the code of the standard TextFormat to make this change. Some more usage tips in the README.

Repo Info
Github Repo URL
Github account name benleon
Repo name NewLineRemover

@Benjamin Leonhardi

HIVE's regexp_replace can also be used

I don't think so. Each line in the normal TextInputformat which is the basis for "stored as TEXT" makes every line (a string followed by a new line character ) into a record or row. So it would break before you could even use "regex_replace". If you have a way let me know though :-)

Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
1 of 1
Last update:
‎03-11-2016 10:52 AM
Updated by:
Top Kudoed Authors