Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Query result doesn't match with tutorial

Query result doesn't match with tutorial

New Contributor

Hi,

 

I am not getting the same result for the query as mentioned in tutorial.

Below is the query:

 

select count(*),url from tokenized_access_logs
where url like '%\/product\/%'
group by url order by count(*) desc;

 

I am getting the following result:

01926/department/apparel/category/cleats/product/Perfect%20Fitness%20Perfect%20Rip%20Deck
11793/department/apparel/category/featured%20shops/product/adidas%20Kids'%20RG%20III%20Mid%20Football%20Cleat
21780/department/golf/category/women's%20apparel/product/Nike%20Men's%20Dri-FIT%20Victory%20Golf%20Polo
31757/department/apparel/category/men's%20footwear/product/Nike%20Men's%20CJ%20Elite%202%20TD%20Football%20Cleat
41104/department/fan%20shop/category/water%20sports/product/Pelican%20Sunstream%20100%20Kayak
51084/department/fan%20shop/category/indoor/outdoor%20games/product/O'Brien%20Men's%20Neoprene%20Life%20Vest

 

I got the following result for beeline query:

 

beeline -u jdbc:hive2://quickstart:10000/default -n admin -d org.apache.hive.jdbc.HiveDriver
Connecting to jdbc:hive2://quickstart:10000/default
Connected to: Apache Hive (version 1.1.0-cdh5.4.2)
Driver: Hive JDBC (version 1.1.0-cdh5.4.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.1.0-cdh5.4.2 by Apache Hive
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default> CREATE EXTERNAL TABLE intermediate_access_logs (
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default> ip STRING,
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default> date STRING,
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default> method STRING,
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default> url STRING,
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default> http_version STRING,
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default> code1 STRING,
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default> code2 STRING,
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default> dash STRING,
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default> user_agent STRING)
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default> ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default> WITH SERDEPROPERTIES (
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default> 'input.regex' = '([^ ]*) - - \\[([^\\]]*)\\] "([^\ ]*) ([^\ ]*) ([^\ ]*)" (\\d*) (\\d*) "([^"]*)" "([^"]*)"',
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default> 'output.format.string' = '%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s'
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default> )
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default> LOCATION '/user/hive/warehouse/original_access_logs';
No rows affected (4.807 seconds)
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default> CREATE EXTERNAL TABLE tokenized_access_logs (
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default> ip STRING,
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default> date STRING,
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default> method STRING,
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default> url STRING,
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default> http_version STRING,
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default> code1 STRING,
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default> code2 STRING,
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default> dash STRING,
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default> user_agent STRING)
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default> LOCATION '/user/hive/warehouse/tokenized_access_logs';
No rows affected (0.566 seconds)
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default> ADD JAR /usr/lib/hive/lib/hive-contrib.jar;
No rows affected (0.004 seconds)
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default> INSERT OVERWRITE TABLE tokenized_access_logs SELECT * FROM intermediate_access_logs;
INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
WARN  : Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
INFO  : Starting Job = job_1451417832419_0001, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1451417832419_0001/
INFO  : Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1451417832419_0001
INFO  : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
INFO  : 2015-12-29 20:02:03,348 Stage-1 map = 0%,  reduce = 0%
INFO  : 2015-12-29 20:02:29,483 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 6.73 sec
INFO  : MapReduce Total cumulative CPU time: 6 seconds 730 msec
INFO  : Ended Job = job_1451417832419_0001
INFO  : Stage-4 is selected by condition resolver.
INFO  : Stage-3 is filtered out by condition resolver.
INFO  : Stage-5 is filtered out by condition resolver.
INFO  : Moving data to: hdfs://quickstart.cloudera:8020/user/hive/warehouse/tokenized_access_logs/.hive-staging_hive_2015-12-29_20-01-22_533_8211099068125295799-1/-ext-10000 from hdfs://quickstart.cloudera:8020/user/hive/warehouse/tokenized_access_logs/.hive-staging_hive_2015-12-29_20-01-22_533_8211099068125295799-1/-ext-10002
INFO  : Loading data to table default.tokenized_access_logs from hdfs://quickstart.cloudera:8020/user/hive/warehouse/tokenized_access_logs/.hive-staging_hive_2015-12-29_20-01-22_533_8211099068125295799-1/-ext-10000
INFO  : Table default.tokenized_access_logs stats: [numFiles=1, numRows=180000, totalSize=37433868, rawDataSize=37253868]
No rows affected (70.497 seconds)
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default>
0: jdbc:hive2://quickstart:10000/default> !quit
Closing: 0: jdbc:hive2://quickstart:10000/default