Created on 01-26-2016 07:42 AM - edited 09-16-2022 03:00 AM
Tutorial Exercise 2
Error while running the following query.
CREATE EXTERNAL TABLE intermediate_access_logs ( ip STRING, date STRING, method STRING, url STRING, http_version STRING, code1 STRING, code2 STRING, dash STRING, user_agent STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'input.regex' = '([^ ]*) - - \\[([^\\]]*)\\] "([^\ ]*) ([^\ ]*) ([^\ ]*)" (\\d*) (\\d*) "([^"]*)" "([^"]*)"', 'output.format.string' = "%1$$s %2$$s %3$$s %4$$s %5$$s %6$$s %7$$s %8$$s %9$$s") LOCATION '/user/hive/warehouse/original_access_logs'; CREATE EXTERNAL TABLE tokenized_access_logs ( ip STRING, date STRING, method STRING, url STRING, http_version STRING, code1 STRING, code2 STRING, dash STRING, user_agent STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION '/user/hive/warehouse/tokenized_access_logs'; ADD JAR /usr/lib/hive/lib/hive-contrib.jar; INSERT OVERWRITE TABLE tokenized_access_logs SELECT * FROM intermediate_access_logs;
AnalysisException: Syntax error in line 11: ROW FORMAT SERDE 'org.apache.hadoop.hiv... ^ Encountered: IDENTIFIER Expected: DELIMITED CAUSED BY: Exception: Syntax error.
Any ideas to resolve the syntax error.
Created 02-24-2016 12:20 PM
Use Hive Query Editor. It should work fine.
Created 03-13-2016 12:45 PM
I have experienced the same issue when tried to pass the cloudera tutorials.
By the way, I used the Live VM that cloudera provides and I suppose that tutorials (http://www.cloudera.com/developers/get-started-with-hadoop-tutorial/exercise-2.html) were tested using the same environments.
It would be nice if somebody will explain the root cause and also provide a workaround.
Thank you in advance
Created 03-13-2016 07:58 PM
Created 03-13-2016 10:59 PM
Created 09-22-2017 05:02 PM
I was able to create the table with Impala in Hue using this syntax:
CREATE EXTERNAL TABLE intermediate_access_logs (
ip STRING,
`date` STRING,
method STRING,
url STRING,
http_version STRING,
code1 STRING,
code2 STRING,
dash STRING,
user_agent STRING)
WITH SERDEPROPERTIES (
'input.regex' = '([^ ]*) - - \\[([^\\]]*)\\] "([^\ ]*) ([^\ ]*) ([^\ ]*)" (\\d*) (\\d*) "([^"]*)" "([^"]*)"',
'output.format.string' = "%1$$s %2$$s %3$$s %4$$s %5$$s %6$$s %7$$s %8$$s %9$$s")
LOCATION '/user/hive/warehouse/original_access_logs';
I was using the backquote (backward apostrophe) ` on line 3
Also I have removed the row format clause since it is not needed in Impala (check the doc: https://www.cloudera.com/documentation/enterprise/5-5-x/topics/impala_create_table.html#create_table).
Created on 09-23-2017 12:19 PM - edited 09-23-2017 12:20 PM
We should use Hive Query Editor since some of the Examples syntax is not compatible with Impala and the tables are not created correctly.
Created 03-01-2018 07:24 AM
This is pretty well laided out in the documantion...need to stop and read though it though.
Now you can build a table in Hive and query the data via Impala and Hue. You'll build this table in 2 steps. First, you'll take advantage of Hive's flexible SerDes (serializers / deserializers) to parse the logs into individual fields using a regular expression. Second, you'll transfer the data from this intermediate table to one that does not require any special SerDe. Once the data is in this table, you can query it much faster and more interactively using Impala.
We'll use the Hive Query Editor app in Hue to execute the following queries:
)