Reply
New Contributor
Posts: 2
Registered: ‎07-10-2017

Hive operation logs are not released by hive instance

[ Edited ]

Hi guys,

 

 

CDH: 5.7.0

Deployment: packages without cloudera manager

 

I faced the problem that after a few days of working Hive's open FDs count becomes too large, it grows every day.

After some investigation I found that Hive does not release open FDs for operations log files.

 

 

grep -A1 -i log /etc/hive/conf/hive-site.xml

Result:

<name>hive.server2.logging.operation.log.location</name>
<value>/tmp/${user.name}/operation_logs</value>
</property>
--
<name>hive.querylog.enable.plan.progress</name>
<value>true</value>
--
<name>hive.server2.logging.operation.enabled</name>
<value>true</value>

 

 

lsof -u hive | wc -l

Result:

2624

 

 

lsof -u hive | grep operation_logs | wc -l

Result:

1512

 

lsof sample output:

 

lsof -u hive | grep operation_logs | head -n10

Result:

java 21165 hive 422r REG 9,1 3361 27131932 /tmp/hive/operation_logs/985035d7-7b53-44aa-a58b-cd8fca4eeb2b/81c51fc6-ef79-437f-917c-6f5aa9e7280a
java 21165 hive 423w REG 9,1 3361 27131932 /tmp/hive/operation_logs/985035d7-7b53-44aa-a58b-cd8fca4eeb2b/81c51fc6-ef79-437f-917c-6f5aa9e7280a
java 21165 hive 425r REG 9,1 3528 27131925 /tmp/hive/operation_logs/985035d7-7b53-44aa-a58b-cd8fca4eeb2b/927ce0b8-bccf-4912-aaf1-4fb81d3b3983
java 21165 hive 450r REG 9,1 3514 27131964 /tmp/hive/operation_logs/985035d7-7b53-44aa-a58b-cd8fca4eeb2b/ffce062b-92f4-4dd3-ad52-f7cb7aee0773
java 21165 hive 451r REG 9,1 23987 27131937 /tmp/hive/operation_logs/4415b94b-797d-44a7-a530-b64c79178bdb/4ffdcb11-9525-4e88-a321-7eba05749b4c
java 21165 hive 452r REG 9,1 13878 27132436 /tmp/hive/operation_logs/a1fbefec-770b-44fc-a8f8-aa0e10a398a4/62215f1f-6b9c-471d-a74e-e32ac4ece9a2
java 21165 hive 453r REG 9,1 0 27002920 /tmp/hive/operation_logs/defd27de-ec20-4e15-9164-94979f87b3a7/1b64c8e4-0ed2-45d6-a5b6-fc64e1ae0908
java 21165 hive 454r REG 9,1 3463 27131924 /tmp/hive/operation_logs/985035d7-7b53-44aa-a58b-cd8fca4eeb2b/9dea5c8b-96c0-4a80-b190-f50f97cdc179
java 21165 hive 455w REG 9,1 3528 27131925 /tmp/hive/operation_logs/985035d7-7b53-44aa-a58b-cd8fca4eeb2b/927ce0b8-bccf-4912-aaf1-4fb81d3b3983
java 21165 hive 456r REG 9,1 7215 27133671 /tmp/hive/operation_logs/985035d7-7b53-44aa-a58b-cd8fca4eeb2b/583f7c95-96a4-4870-a582-4d10f5931f7f

 

My hive session config is: hive.server2.idle.session.timeout=0ms. I will try later if I can avoid this problem by setting session timeout to some positive value and enabling it. But I think this is a workaround same as increasing open files limit to hive server instance.


I've tried to find approprieate ticket in apache jira tracker and it seems that there is no one similar. I found the same problem description on HDP community site, but there was no solution provided. Think that this is a HS2 bug.

 

Can you help me with this problem?

 

 

Posts: 352
Topics: 11
Kudos: 54
Solutions: 30
Registered: ‎09-02-2016

Re: Hive operation logs are not released by hive instance

@gbezrukikh

 

The default log path for "HiveServer2 Log Directory" is /var/log/hive, not sure the reson to keep it under /tmp/username

 

ok, for your problem, you can set the max log size, also limit the number of log file backup as follows:

Ex: 

HiveServer2 Max Log Size to 200 MiB

HiveServer2 Maximum Log File Backups to 10

 

This will help to limit the log growth

 

Note: This is cloudera name, pls use the corresponding xml property name

Champion
Posts: 463
Registered: ‎05-16-2016

Re: Hive operation logs are not released by hive instance

[ Edited ]

as suggested by @saranvisa increasing the log file size should help .

if you are managing your clusters with Cloudera manager follow the below path and change the parameter accordingly

 

hive-> configuration->under scope -> click HiveSever2
change the below properties accordingly

HiveServer2 Max Log Size
HiveServer2 Maximum Log File Backup

 

Posts: 614
Topics: 3
Kudos: 93
Solutions: 61
Registered: ‎08-16-2016

Re: Hive operation logs are not released by hive instance

@csguna @saranvisa I don't know if those settings will effect the Operations log.

I did find this JIRA but it isn't for the operations logs. I only skimmed through it though.

https://issues.apache.org/jira/browse/HIVE-4500

It definitely sounds like you have a FD leak in HS2. you could just disable the operation logs to alleviate the issue while you dig into it further. For what it is worth, I am running CDH 5.8.2 in production and don't see this issue with HS2.

Champion
Posts: 463
Registered: ‎05-16-2016

Re: Hive operation logs are not released by hive instance

@mbigelow  saw the Jira it looks like they fixed the issue in 0.11.0 but @gbezrukikh is runining Cdh 5.7.0 Version which comes with hive 1.1.0 i am confused .

Also able to find another bug related to this when UDF is used .

not sure if @gbezrukikh has any UDF runining .

 

jira  - https://issues.apache.org/jira/browse/HIVE-10970

Posts: 614
Topics: 3
Kudos: 93
Solutions: 61
Registered: ‎08-16-2016

Re: Hive operation logs are not released by hive instance

@csguna that JIRA mentions a different log and has been fixed.  So it is likely not the issue reported here.  I saw the JIRA for the FD leak for UDFs.  With that one, the FDs would be to the UDF jar file, which is not what is seen here.

Champion
Posts: 463
Registered: ‎05-16-2016

Re: Hive operation logs are not released by hive instance

@mbigelow i am lost and agreed that it is different issue here

New Contributor
Posts: 2
Registered: ‎07-10-2017

Re: Hive operation logs are not released by hive instance

Yes, I saw this jira tickets you wrote. As you mentioned, one of them was closed in 2013, second about UDFs. It is not our case. So, I decided to create this thread.

BTW, Does anyone know about the practice how to escalate such bugs to Apache jira tracker if I have cloudera packages installed? My Hive version is 1.1, but really it was build by cloudera and many patches were applied to it, so I have a lot of features and bugs fixes and it is not correct to say that this is Apache Hive 1.1, they have many diffs. What version should I mention in Apache Hive jira tracker if I want to create a bug ticket about Hive in CDH 5.7.0?
Highlighted
Posts: 614
Topics: 3
Kudos: 93
Solutions: 61
Registered: ‎08-16-2016

Re: Hive operation logs are not released by hive instance

I would say Cloudera support if you have that for your cluster. They can then vet it against existing bug and patches backported to your version. They can also tell you if a bug exist, when it will be available and which version. And failing all of that they can open a new JIRA.

You can open a JIRA account and create a ticket yourself, providing the CDH version and ask the community how to proceed. They should have some guidelines as well although I do not know them or have them handy.
Announcements