Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to identify the source of thrift API calls (ImpalaService.ImpalaHiveServer2Service.Client)

How to identify the source of thrift API calls (ImpalaService.ImpalaHiveServer2Service.Client)

Hi,

 

I hope this is the right place to put this query.

 

I am a Hadoop admin and we have either a user or a workflow regularly executing INVALIDATE METADATA statements on a large set of tables within a database. These commands are being issued in the tens of thousands per day and I am trying to track thier origin.

 

They are called by a service account, which a previous admin handed out the password for, so we can't use that to indentify the cuprit.

 

The Hue runcpserver.log shows lots of the following:

 

[27/Jun/2019 12:06:56 +0100] thrift_util  INFO     SLOW: 2.42 - Thrift call: <class 'ImpalaService.ImpalaHiveServer2Service.Client'>.ExecuteStatement(args=(TExecuteStatementReq(confOverlay={'impala.resultset.cache.size': '100000', 'QUERY_TIMEOUT_S': '600'}, sessionHandle=TSessionHandle(sessionId=THandleIdentifier(secret=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, guid=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)), runAsync=True, statement='INVALIDATE METADATA `database`.`table`'),), kwargs={}) returned in 2422ms: TExecuteStatementResp(status=TStatus(errorCode=None, errorMessage=None, sqlState=None, infoMessages=None, statusCode=0), operationHandle=TOperationHandle(hasResultSet=False, modifiedRowCount=None, operationType=0, operationId=THandleIdentifier(secret=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, guid=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)))

Which leads me to believe that a thrift API call is responsible.

 

Is there any logging on the thrift calls which could help to identify the source?

 

I know which node acted as the query co-ordinator and a lot of trawling through impala logs has not yeilded any joy.

 

Thanks,

Tom

 

Below is an image of the queries in cloudera manager

 

impala queries.png

2 REPLIES 2

Re: How to identify the source of thrift API calls (ImpalaService.ImpalaHiveServer2Service.Client)

Guru
Hi Thomas,

Can you check in the PROFILE of those queries, what's the value for "Session Type", just to confirm if they are from impala-shell or Hue or other impala clients.

I noticed that they were all submitted at the same time and finished at the same time, which is odd. Were all those queries for the same table? Can you see all those queries in the Hue log to confirm that it was from Hue?

Do you use LB for Impala? If no, is the coordinator used for those queries match with the one setup for Hue?

Cheers
Eric

Re: How to identify the source of thrift API calls (ImpalaService.ImpalaHiveServer2Service.Client)

Hi Eric,

 

Thanks for the reply, the session type is HIVESERVER2.

 

Yeah, I thought that the timing was curious too, they are all tables within the same database but not the same table.

 

We have a load balanced hue, with 3 instances, each pointing to a specific datanode using [impala] server_host = servername in hue_safety_valve_server.ini. For each of the queries, the coordinator is the datanode referenced in that safety valve for one specific Hue instance.

 

I can see the calls in the Hue runcpserver.log but I can't in the access.log. Are any of the other logs worth checking?

 

Cheers,

Tom

 

The runcpserver log entries look like this:

 

[27/Jun/2019 12:06:56 +0100] thrift_util INFO SLOW: 2.42 - Thrift call: <class 'ImpalaService.ImpalaHiveServer2Service.Client'>.ExecuteStatement(args=(TExecuteStatementReq(confOverlay={'impala.resultset.cache.size': '100000', 'QUERY_TIMEOUT_S': '600'}, sessionHandle=TSessionHandle(sessionId=THandleIdentifier(secret=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, guid=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)), runAsync=True, statement='INVALIDATE METADATA `database`.`table`'),), kwargs={}) returned in 2422ms: TExecuteStatementResp(status=TStatus(errorCode=None, errorMessage=None, sqlState=None, infoMessages=None, statusCode=0), operationHandle=TOperationHandle(hasResultSet=False, modifiedRowCount=None, operationType=0, operationId=THandleIdentifier(secret=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, guid=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)))