Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

External Table LINE number

New Contributor

I have lots of JSONL data in S3 files.

 

We are adding indexes into other systems, but need to access the full record after we've found what we're looking for.

 

We would like to reference the external table records by filename and line number.  They have no unique identifiers of their own.  We could reprocess every file and add a record_id field, but that would be a lot of extra work.

 

If we create external partitions with the filename and point them at the correct external S3 file that part works.


The part that does not, is we cannot query for a record by line number:

SELECT * FROM s3_external_table WHERE external_filename = "foo.txt" AND __LINE__=44;

 

It feels like the loader should be able to inject LINENO into the record as it queries for it.  Are there special flags or anything for a feature like this?

 

Has anyone else attempted something like this?

2 REPLIES 2

Rising Star

Why don't you use the row_number function?

Just CTAS !

( Create new_Table As Select row_number and other columns )

http://dwgeek.com/apache-hive-rownum-pseudo-column-equivalent.html/

 

 

New Contributor

Thank you much. I think that might work - I'll look into it.

 

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.