Member since
09-20-2019
2
Posts
0
Kudos Received
0
Solutions
09-20-2019
02:09 PM
I have lots of JSONL data in S3 files.
We are adding indexes into other systems, but need to access the full record after we've found what we're looking for.
We would like to reference the external table records by filename and line number. They have no unique identifiers of their own. We could reprocess every file and add a record_id field, but that would be a lot of extra work.
If we create external partitions with the filename and point them at the correct external S3 file that part works.
The part that does not, is we cannot query for a record by line number: SELECT * FROM s3_external_table WHERE external_filename = "foo.txt" AND __LINE__=44;
It feels like the loader should be able to inject LINENO into the record as it queries for it. Are there special flags or anything for a feature like this?
Has anyone else attempted something like this?
... View more
Labels:
- Labels:
-
Apache Hadoop