Support Questions
Find answers, ask questions, and share your expertise

HDP 2.4 - Hive compaction not happening on Hive streaming ingest

Explorer

I have written a utility to periodically load data in to hive tables using Hive streaming API as described in [1]. The compaction is not happening at Hive event after a large number of delta directories are created. Configuration parameters are set as described in [2]. See listing below. There is no directory with name starting with base_. Is this normal ?

-rw-r--r--  3 hive hdfs  4 2016-08-25 10:14 /apps/hive/warehouse/perf_log/month=2016-04/_orc_acid_version
drwxrwxrwx  - hive hdfs  0 2016-08-25 10:15 /apps/hive/warehouse/perf_log/month=2016-04/delta_2380427_2380429
-rw-r--r--  3 hive hdfs  1684381 2016-08-25 10:15 /apps/hive/warehouse/perf_log/month=2016-04/delta_2380427_2380429/bucket_00002
drwxrwxrwx  - hive hdfs  0 2016-08-25 10:15 /apps/hive/warehouse/perf_log/month=2016-04/delta_2380430_2380432
-rw-r--r--  3 hive hdfs  1088592 2016-08-25 10:15 /apps/hive/warehouse/perf_log/month=2016-04/delta_2380430_2380432/bucket_00000
drwxrwxrwx  - hive hdfs  0 2016-08-25 10:15 /apps/hive/warehouse/perf_log/month=2016-04/delta_2380433_2380435
-rw-r--r--  3 hive hdfs  1681985 2016-08-25 10:15 /apps/hive/warehouse/perf_log/month=2016-04/delta_2380433_2380435/bucket_00004

[1] - https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest

[2] - https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Compactor

3 REPLIES 3

Re: HDP 2.4 - Hive compaction not happening on Hive streaming ingest

Super Guru
@Vinuraj M

Can you please share the settings of your command. Like exactly what you are submitting.

Re: HDP 2.4 - Hive compaction not happening on Hive streaming ingest

Explorer

I have shared the code and configurations. Please reply

Re: HDP 2.4 - Hive compaction not happening on Hive streaming ingest

Explorer
@@mqureshi,

Hive configuration is attached. Code is as below

String endPointKey = getEndPointKey(config.getUrl(),config.getDatabase(),tableName,partitionKeys);
HiveEndPoint hep = getHiveEndPoint(tableName,endPointKey,partitionKeys);       
StreamingConnection connection = getStreamingConnection(hep,endPointKey);       
DelimitedInputWriter inWriter =  new DelimitedInputWriter(
             config.getColumnsAsArray(tableName),config.getFieldDelimiter(),hep);      
TransactionBatch txnBatch = connection.fetchTransactionBatch(config.getTrxnNumBatches(), inWriter);   
txnBatch.beginNextTransaction();       
for(String record:records) {         
  if(txnBatch.remainingTransactions() == 0) {           
    txnBatch = connection.fetchTransactionBatch(hiveConfig.getTrxnNumBatches(), inWriter); 
    txnBatch.beginNextTransaction();         
  }         
  txnBatch.write(record.getBytes());       
}       
txnBatch.commit();       
txnBatch.close();      
connection.close();