Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

External Table LINE number

External Table LINE number

New Contributor

I have lots of JSONL data in S3 files.

 

We are adding indexes into other systems, but need to access the full record after we've found what we're looking for.

 

We would like to reference the external table records by filename and line number.  They have no unique identifiers of their own.  We could reprocess every file and add a record_id field, but that would be a lot of extra work.

 

If we create external partitions with the filename and point them at the correct external S3 file that part works.


The part that does not, is we cannot query for a record by line number:

SELECT * FROM s3_external_table WHERE external_filename = "foo.txt" AND __LINE__=44;

 

It feels like the loader should be able to inject LINENO into the record as it queries for it.  Are there special flags or anything for a feature like this?

 

Has anyone else attempted something like this?

2 REPLIES 2

Re: External Table LINE number

Contributor

Why don't you use the row_number function?

Just CTAS !

( Create new_Table As Select row_number and other columns )

http://dwgeek.com/apache-hive-rownum-pseudo-column-equivalent.html/

 

 

Re: External Table LINE number

New Contributor

Thank you much. I think that might work - I'll look into it.

 

Don't have an account?
Coming from Hortonworks? Activate your account here