Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Please see the Cloudera blog for information on the Cloudera Response to CVE-2021-4428

How can we read many small files and have each record be defined by an arbitrary length?

New Contributor

There are many small files I wish to process as a large file (sounds like a sequence file?). I do not want to read the files line by line, instead I want each record to be defined by an arbitrary length., but I also want to track where each record came from.

Example:

file1.txt

01234567890123456789

012345

file2.txt

01234

01234567

arbitrary length: 10

key -> value

file1.txt -> 0123456789

file1.txt -> 0123456789

file1.txt -> 012345

file2.txt -> 0123401234

file2.txt -> 567

0 REPLIES 0