Support Questions
Find answers, ask questions, and share your expertise

Hive table with DruidStorageHandler not getting incremented

Explorer

Hi,

I tried to create a table in Hive with DruidStorageHandler using the following command:

CREATE TABLE druid_table (`__time` timestamp,`userid`string,`num_l`float)STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'TBLPROPERTIES ("druid.segment.granularity" = "DAY",   "druid.query.granularity" = "DAY") as select * from poc.test;

It failed with the below error:

        at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:489)
        at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:397)
        ... 18 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException: Data source name is null
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:564)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:664)
        at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:101)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
        at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:145)
        at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:478)
        ... 19 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException: Data source name is null
        at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:272)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:609)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:553)
        ... 24 more
Caused by: java.lang.NullPointerException: Data source name is null
        at org.apache.hive.druid.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:229)
        at org.apache.hadoop.hive.druid.io.DruidOutputFormat.getHiveRecordWriter(DruidOutputFormat.java:187)
        at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:284)
        at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:269)
        ... 26 more
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1525394716493_0623_23_02 [Reducer 3] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 (state=08S01,code=2)

I added druid.datasource property to TBLPROPERTIES and ran it once again:

CREATE TABLE druid_table (`__time` timestamp,`userid`string,`num_l`float)STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'TBLPROPERTIES ("druid.segment.granularity" = "DAY",   "druid.query.granularity" = "DAY", "druid.datasource"="dummy") as select * from poc.test;

This was successful. Now I want to increment this table by adding new data.

insert into table druid_table select * from poc.test2;

This fails with the following error:

Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:INSERT INTO statement is not allowed by druid storage handler) (state=08S01,code=1)

The documentation says that INSERT INTO statements are supported. Am I missing something here?

Please let me know in case any additional details are required.

Many Thanks.

3 REPLIES 3

Hi @Megh Vidani

Not sure how your poc.test2 table looks like but I tried to create and insert into hive druid based table using wikiticker data and it worked fine.

Note that hive-druid integration required hive Interactive and HDP 2.6+

I did my testing on HDP-2.6.4

Below are the queries which I executed.

CREATE EXTERNAL TABLE druid_table_1
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.datasource" = "wikiticker"); 
CREATE TABLE druid_table_2 STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler' as select `__time`, channel, countryname, regionname from druid_table_1; 
insert into table druid_table_2 select `__time`, channel, countryname, regionname from druid_table_1 where channel='#ca.wikipedia';

Explorer

Hi @Rahul Pathak, I have deployed HDP 2.6.1 and running the query using hiveServer2 Interactive. The table poc.test2 is in parquet format.

Explorer

@Rahul Pathak can you please let me know if Hive also provides capability of deleting/dropping a particular segment from a table with druidstoragehandler? Since we're not having the latest HDP stack (2.6.4) with latest Druid version, we're unable to test the same. Would really appreciate your help in this regard.