Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Who agreed with this solution

avatar
Guru
The intermediate_access_log table is not intended to be viewed directly,
especially not in Impala. In that tutorial step you're actually using Hive
to do an ETL (extract transform load) job. The Apache logs are in a format
that is hard to query directly through SQL, so we use one of Hive's
extensions to express a regular expression to break up the fields more
explicitly. After this step, the intermediate table is not useful. It's the
second table you create (tokenized_access_logs) that should be queried from
Impala.

View solution in original post

Who agreed with this solution