Support Questions

ebeb · ‎10-17-2017

Hi, while running a Hive SQL group by query on few million rows I get a message in the results mixed with the output:

error_message : You have exceeded your daily reque

(xxxxxx row(s) affected)

results : []

this is mixed with some actual data output. Anyone has seen this error message?

EricL · ‎10-18-2017

If you are using CDH, you should already have parquet-tools command line available. If not, you can get it from here:
https://github.com/apache/parquet-mr/tree/master/parquet-tools

You just need to run:

parquet-tools cat /path/to/parquet-file.parq

Please note that the path is on local file system, not in HDFS.

And then you can grep the output to search for string "error_message".

View solution in original post

EricL · ‎10-18-2017

hmm, never seem this before.

Are you running Hive CLI or beeline through HS2?

If through HS2, can you find the same error message in HS2 server log?

Can you copy and paste the full error message in the thread?

Does it only happen to query returns a few million rows?

ebeb · ‎10-18-2017

This is cdh 5.12.x . the query ran on 7million rows but has group by to
return on 10 rows in result set. Ran using hue and toad for hadoop with
same error. I think microstrategy odbc query also giving same error. Tried
both hive.execution.engine=spark and mr same error. Will try and collect
the error logs but i think was not seen in the logs. Thanks.

EricL · ‎10-18-2017

Can you try to run it in beeline and paste the full output to the thread here?

We need to isolate if the message is from ODBC driver or Hive itself.

ebeb · ‎10-18-2017

All clients giving same error so maybe something in Hive? I also restarted Hive in CMgr and tried but got same error. See below full output with some values masked. Included some log output also for the job.

Last login: Wed Oct 18 19:20:32 2017 from 10.82.159.226
/root>beeline
Beeline version 1.1.0-cdh5.12.1 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000/wxyz_data
scan complete in 2ms
Connecting to jdbc:hive2://localhost:10000/wxyz_data
Enter username for jdbc:hive2://localhost:10000/wxyz_data: huexxxx
Enter password for jdbc:hive2://localhost:10000/wxyz_data: ************
Connected to: Apache Hive (version 1.1.0-cdh5.12.1)
Driver: Hive JDBC (version 1.1.0-cdh5.12.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000/wxyz_data> select t11.colxyz colxyz, count(t11.colabcd) abcd
. . . . . . . . . . . . . . . . . . . . .> from wxyz_data.tableabcd t11
. . . . . . . . . . . . . . . . . . . . .> group by t11.colxyz;
INFO : Compiling command(queryId=hive_20171018192727_c6648051-2064-4444-849e-677310cfc334): select t11.colxyz colxyz, count(t11.colabcd) abcd
from wxyz_data.tableabcd t11
group by t11.colxyz
INFO : Semantic Analysis Completed
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:colxyz, type:string, comment:null), FieldSchema(name:abcd, type:bigint, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20171018192727_c6648051-2064-4444-849e-677310cfc334); Time taken: 0.097 seconds
INFO : Executing command(queryId=hive_20171018192727_c6648051-2064-4444-849e-677310cfc334): select t11.colxyz colxyz, count(t11.colabcd) abcd
from wxyz_data.tableabcd t11
group by t11.colxyz
INFO : Query ID = hive_20171018192727_c6648051-2064-4444-849e-677310cfc334
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in serial mode
INFO : Number of reduce tasks not specified. Estimated from input data size: 26
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
INFO : number of splits:7
INFO : Submitting tokens for job: job_1504880501407_0347
INFO : The url to track the job: http://HADOOPDB1.xyz.org:8088/proxy/application_1504880501407_0347/
INFO : Starting Job = job_1504880501407_0347, Tracking URL = http://HADOOPDB1.xyz.org:8088/proxy/application_1504880501407_0347/
INFO : Kill Command = /opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p0.3/lib/hadoop/bin/hadoop job -kill job_1504880501407_0347
INFO : Hadoop job information for Stage-1: number of mappers: 7; number of reducers: 26
INFO : 2017-10-18 19:27:29,276 Stage-1 map = 0%, reduce = 0%
INFO : 2017-10-18 19:27:35,484 Stage-1 map = 14%, reduce = 0%, Cumulative CPU 1.97 sec
INFO : 2017-10-18 19:27:36,519 Stage-1 map = 86%, reduce = 0%, Cumulative CPU 17.47 sec
INFO : 2017-10-18 19:27:37,554 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 20.59 sec
INFO : 2017-10-18 19:27:44,827 Stage-1 map = 100%, reduce = 38%, Cumulative CPU 31.93 sec
INFO : 2017-10-18 19:27:45,874 Stage-1 map = 100%, reduce = 54%, Cumulative CPU 38.01 sec
INFO : 2017-10-18 19:27:50,042 Stage-1 map = 100%, reduce = 77%, Cumulative CPU 44.47 sec
INFO : 2017-10-18 19:27:51,079 Stage-1 map = 100%, reduce = 85%, Cumulative CPU 48.63 sec
INFO : 2017-10-18 19:27:52,107 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 53.12 sec
INFO : MapReduce Total cumulative CPU time: 53 seconds 120 msec
INFO : Ended Job = job_1504880501407_0347
INFO : MapReduce Jobs Launched:
INFO : Stage-Stage-1: Map: 7 Reduce: 26 Cumulative CPU: 53.12 sec HDFS Read: 3935419 HDFS Write: 148 SUCCESS
INFO : Total MapReduce CPU Time Spent: 53 seconds 120 msec
INFO : Completed executing command(queryId=hive_20171018192727_c6648051-2064-4444-849e-677310cfc334); Time taken: 29.54 seconds
INFO : OK
+----------------------------------------------------+----------+--+
| colxyz | abcd |
+----------------------------------------------------+----------+--+
| | 0 |
| results : [] | 0 |
| TUV-AB | 88633 |
| " | 0 |
| (7256823 row(s) affected) | 0 |
| DEF | 73165 |
| error_message : You have exceeded your daily reque | 0 |
| PQR | 7095025 |
+----------------------------------------------------+----------+--+
8 rows selected (29.959 seconds)
0: jdbc:hive2://localhost:10000/wxyz_data>

----------------------------------------------------------------------------------

There were two errors in the syslog:

2017-10-18 19:27:49,009 INFO [Socket Reader #1 for port 41800] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1504880501407_0347 (auth:SIMPLE)
2017-10-18 19:27:49,016 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_1504880501407_0347_01_000035
2017-10-18 19:27:49,016 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container complete event for unknown container container_1504880501407_0347_01_000035
2017-10-18 19:27:49,016 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_1504880501407_0347_01_000036
2017-10-18 19:27:49,017 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container complete event for unknown container container_1504880501407_0347_01_000036
2017-10-18 19:27:49,017 INFO [IPC Server handler 27 on 41800] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1504880501407_0347_r_000017_0 is : 0.0
2017-10-18 19:27:49,020 INFO [IPC Server handler 25 on 41800] org.apache.hadoop.mapred.TaskAttemptListenerImpl: MapCompletionEvents request from attempt_1504880501407_0347_r_000025_0. startIndex 0 maxEvents 10000

-----------------------------------------------------------------------
Also following message in the stderr log:

        
            Log Type: stderr
          
            Log Upload Time: Wed Oct 18 19:27:59 -0400 2017
          
            Log Length: 2275
          Oct 18, 2017 7:27:40 PM com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider get
WARNING: You are attempting to use a deprecated API (specifically, attempting to @Inject ServletContext inside an eagerly created singleton. While we allow this for backwards compatibility, be warned that this MAY have unexpected behavior if you have more than one injector (with ServletModule) running in the same JVM. Please consult the Guice documentation at http://code.google.com/p/google-guice/wiki/Servlets for more information.
Oct 18, 2017 7:27:40 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider class
Oct 18, 2017 7:27:40 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider class
Oct 18, 2017 7:27:40 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as a root resource class
Oct 18, 2017 7:27:40 PM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM'
Oct 18, 2017 7:27:40 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver to GuiceManagedComponentProvider with the scope "Singleton"
Oct 18, 2017 7:27:41 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to GuiceManagedComponentProvider with the scope "Singleton"
Oct 18, 2017 7:27:41 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to GuiceManagedComponentProvider with the scope "PerRequest"
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

EricL · ‎10-18-2017

Hi,

That looks like it is from the data itself, as it is displayed inside the table with "|" delimiter. It does not seem like a error message to me.

Do you get the same output for other tables?

ebeb · ‎10-18-2017

It is not the data as I searched for those strings in the data and didnt find. Also if these were in the data the Group By will give count at least 1 not 0 like it shows in the output. This is the query giving errors for now other queries seem fine.

ebeb · ‎10-18-2017

ok you may be right it may be in the data. Now I ran a

select * from xxxx where yyyy in ('(7256823 row(s) affected)') ;

I got one row with all nulls in the other columns strange.

Also I see two empty directories in the Hive parquet table directory. Maybe these are causing some issues:

/user/hive/warehouse/xxxx.db/yyyyyy/.hive-staging_hive_2017-09-27_12-11-02_786_1438903386909470651-1

/user/hive/warehouse/xxxx.db/yyyyy/.hive-staging_hive_2017-09-27_12-14-17_278_5978738369450390063-1

000000_0

000001_0

000002_0

000003_0

000004_0

EricL · ‎10-18-2017

Try to add "ORDER BY abcd" to the query and see if the output changes the order, that will also help to confirm if the text is from the data, not a message produced on Hive side.

ebeb · ‎10-18-2017

Yes added order by and seems to be in the data. But my guess is maybe the two empty directories below are creating some junk in the parquet table query. Maybe I should delete these two directories manually and see.

/user/hive/warehouse/xxxx.db/yyyyyy/.hive-staging_hive_2017-09-27_12-11-02_786_1438903386909470651-1

/user/hive/warehouse/xxxx.db/yyyyy/.hive-staging_hive_2017-09-27_12-14-17_278_5978738369450390063-1

+----------------------------------------------------+----------+--+
| triptype | abcd |
+----------------------------------------------------+----------+--+
| " | 0 |
| error_message : You have exceeded your daily reque | 0 |
| (7256823 row(s) affected) | 0 |
| results : [] | 0 |
| | 0 |
| XYZ | 73165 |
| ABC-DE | 88633 |
| PQR | 7095025 |
+----------------------------------------------------+----------+--+
8 rows selected (47.02 seconds)