Created on 10-17-2017 11:05 AM - edited 09-16-2022 05:24 AM
Hi, while running a Hive SQL group by query on few million rows I get a message in the results mixed with the output:
error_message : You have exceeded your daily reque
(xxxxxx row(s) affected)
results : []
this is mixed with some actual data output. Anyone has seen this error message?
Created 10-18-2017 07:44 PM
Created 10-18-2017 04:12 AM
Created 10-18-2017 05:40 AM
Created 10-18-2017 03:18 PM
Created on 10-18-2017 04:55 PM - edited 10-18-2017 05:30 PM
All clients giving same error so maybe something in Hive? I also restarted Hive in CMgr and tried but got same error. See below full output with some values masked. Included some log output also for the job.
Last login: Wed Oct 18 19:20:32 2017 from 10.82.159.226
/root>beeline
Beeline version 1.1.0-cdh5.12.1 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000/wxyz_data
scan complete in 2ms
Connecting to jdbc:hive2://localhost:10000/wxyz_data
Enter username for jdbc:hive2://localhost:10000/wxyz_data: huexxxx
Enter password for jdbc:hive2://localhost:10000/wxyz_data: ************
Connected to: Apache Hive (version 1.1.0-cdh5.12.1)
Driver: Hive JDBC (version 1.1.0-cdh5.12.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000/wxyz_data> select t11.colxyz colxyz, count(t11.colabcd) abcd
. . . . . . . . . . . . . . . . . . . . .> from wxyz_data.tableabcd t11
. . . . . . . . . . . . . . . . . . . . .> group by t11.colxyz;
INFO : Compiling command(queryId=hive_20171018192727_c6648051-2064-4444-849e-677310cfc334): select t11.colxyz colxyz, count(t11.colabcd) abcd
from wxyz_data.tableabcd t11
group by t11.colxyz
INFO : Semantic Analysis Completed
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:colxyz, type:string, comment:null), FieldSchema(name:abcd, type:bigint, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20171018192727_c6648051-2064-4444-849e-677310cfc334); Time taken: 0.097 seconds
INFO : Executing command(queryId=hive_20171018192727_c6648051-2064-4444-849e-677310cfc334): select t11.colxyz colxyz, count(t11.colabcd) abcd
from wxyz_data.tableabcd t11
group by t11.colxyz
INFO : Query ID = hive_20171018192727_c6648051-2064-4444-849e-677310cfc334
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in serial mode
INFO : Number of reduce tasks not specified. Estimated from input data size: 26
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
INFO : number of splits:7
INFO : Submitting tokens for job: job_1504880501407_0347
INFO : The url to track the job: http://HADOOPDB1.xyz.org:8088/proxy/application_1504880501407_0347/
INFO : Starting Job = job_1504880501407_0347, Tracking URL = http://HADOOPDB1.xyz.org:8088/proxy/application_1504880501407_0347/
INFO : Kill Command = /opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p0.3/lib/hadoop/bin/hadoop job -kill job_1504880501407_0347
INFO : Hadoop job information for Stage-1: number of mappers: 7; number of reducers: 26
INFO : 2017-10-18 19:27:29,276 Stage-1 map = 0%, reduce = 0%
INFO : 2017-10-18 19:27:35,484 Stage-1 map = 14%, reduce = 0%, Cumulative CPU 1.97 sec
INFO : 2017-10-18 19:27:36,519 Stage-1 map = 86%, reduce = 0%, Cumulative CPU 17.47 sec
INFO : 2017-10-18 19:27:37,554 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 20.59 sec
INFO : 2017-10-18 19:27:44,827 Stage-1 map = 100%, reduce = 38%, Cumulative CPU 31.93 sec
INFO : 2017-10-18 19:27:45,874 Stage-1 map = 100%, reduce = 54%, Cumulative CPU 38.01 sec
INFO : 2017-10-18 19:27:50,042 Stage-1 map = 100%, reduce = 77%, Cumulative CPU 44.47 sec
INFO : 2017-10-18 19:27:51,079 Stage-1 map = 100%, reduce = 85%, Cumulative CPU 48.63 sec
INFO : 2017-10-18 19:27:52,107 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 53.12 sec
INFO : MapReduce Total cumulative CPU time: 53 seconds 120 msec
INFO : Ended Job = job_1504880501407_0347
INFO : MapReduce Jobs Launched:
INFO : Stage-Stage-1: Map: 7 Reduce: 26 Cumulative CPU: 53.12 sec HDFS Read: 3935419 HDFS Write: 148 SUCCESS
INFO : Total MapReduce CPU Time Spent: 53 seconds 120 msec
INFO : Completed executing command(queryId=hive_20171018192727_c6648051-2064-4444-849e-677310cfc334); Time taken: 29.54 seconds
INFO : OK
+----------------------------------------------------+----------+--+
| colxyz | abcd |
+----------------------------------------------------+----------+--+
| | 0 |
| results : [] | 0 |
| TUV-AB | 88633 |
| " | 0 |
| (7256823 row(s) affected) | 0 |
| DEF | 73165 |
| error_message : You have exceeded your daily reque | 0 |
| PQR | 7095025 |
+----------------------------------------------------+----------+--+
8 rows selected (29.959 seconds)
0: jdbc:hive2://localhost:10000/wxyz_data>
----------------------------------------------------------------------------------
There were two errors in the syslog:
2017-10-18 19:27:49,009 INFO [Socket Reader #1 for port 41800] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1504880501407_0347 (auth:SIMPLE) 2017-10-18 19:27:49,016 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_1504880501407_0347_01_000035 2017-10-18 19:27:49,016 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container complete event for unknown container container_1504880501407_0347_01_000035 2017-10-18 19:27:49,016 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_1504880501407_0347_01_000036 2017-10-18 19:27:49,017 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container complete event for unknown container container_1504880501407_0347_01_000036 2017-10-18 19:27:49,017 INFO [IPC Server handler 27 on 41800] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1504880501407_0347_r_000017_0 is : 0.0 2017-10-18 19:27:49,020 INFO [IPC Server handler 25 on 41800] org.apache.hadoop.mapred.TaskAttemptListenerImpl: MapCompletionEvents request from attempt_1504880501407_0347_r_000025_0. startIndex 0 maxEvents 10000
-----------------------------------------------------------------------
Also following message in the stderr log:
Log Type: stderr Log Upload Time: Wed Oct 18 19:27:59 -0400 2017 Log Length: 2275 Oct 18, 2017 7:27:40 PM com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider get WARNING: You are attempting to use a deprecated API (specifically, attempting to @Inject ServletContext inside an eagerly created singleton. While we allow this for backwards compatibility, be warned that this MAY have unexpected behavior if you have more than one injector (with ServletModule) running in the same JVM. Please consult the Guice documentation at http://code.google.com/p/google-guice/wiki/Servlets for more information. Oct 18, 2017 7:27:40 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider class Oct 18, 2017 7:27:40 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider class Oct 18, 2017 7:27:40 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as a root resource class Oct 18, 2017 7:27:40 PM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM' Oct 18, 2017 7:27:40 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver to GuiceManagedComponentProvider with the scope "Singleton" Oct 18, 2017 7:27:41 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to GuiceManagedComponentProvider with the scope "Singleton" Oct 18, 2017 7:27:41 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to GuiceManagedComponentProvider with the scope "PerRequest" log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Created 10-18-2017 05:34 PM
Created 10-18-2017 05:48 PM
It is not the data as I searched for those strings in the data and didnt find. Also if these were in the data the Group By will give count at least 1 not 0 like it shows in the output. This is the query giving errors for now other queries seem fine.
Created 10-18-2017 06:27 PM
ok you may be right it may be in the data. Now I ran a
select * from xxxx where yyyy in ('(7256823 row(s) affected)') ;
I got one row with all nulls in the other columns strange.
-----------------+------------------+-------------------+-----------------+-----------------+------------------+---------------+--+
| (7256823 row(s) affected) | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
+----------------------------+---------------+---------------+------------------+-------------+--------------+----------------+----------+----------------+-------------+------------+------------------+----------------
Also I see two empty directories in the Hive parquet table directory. Maybe these are causing some issues:
/user/hive/warehouse/xxxx.db/yyyyyy/.hive-staging_hive_2017-09-27_12-11-02_786_1438903386909470651-1
/user/hive/warehouse/xxxx.db/yyyyy/.hive-staging_hive_2017-09-27_12-14-17_278_5978738369450390063-1
000000_0
000001_0
000002_0
000003_0
000004_0
Created 10-18-2017 06:43 PM
Created 10-18-2017 07:04 PM
Yes added order by and seems to be in the data. But my guess is maybe the two empty directories below are creating some junk in the parquet table query. Maybe I should delete these two directories manually and see.
/user/hive/warehouse/xxxx.db/yyyyyy/.hive-staging_hive_2017-09-27_12-11-02_786_1438903386909470651-1
/user/hive/warehouse/xxxx.db/yyyyy/.hive-staging_hive_2017-09-27_12-14-17_278_5978738369450390063-1
+----------------------------------------------------+----------+--+
| triptype | abcd |
+----------------------------------------------------+----------+--+
| " | 0 |
| error_message : You have exceeded your daily reque | 0 |
| (7256823 row(s) affected) | 0 |
| results : [] | 0 |
| | 0 |
| XYZ | 73165 |
| ABC-DE | 88633 |
| PQR | 7095025 |
+----------------------------------------------------+----------+--+
8 rows selected (47.02 seconds)