Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive table with DruidStorageHandler not getting incremented

Highlighted

Hive table with DruidStorageHandler not getting incremented

Explorer

Hi,

I tried to create a table in Hive with DruidStorageHandler using the following command:

CREATE TABLE druid_table (`__time` timestamp,`userid`string,`num_l`float)STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'TBLPROPERTIES ("druid.segment.granularity" = "DAY",   "druid.query.granularity" = "DAY") as select * from poc.test;

It failed with the below error:

        at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:489)
        at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:397)
        ... 18 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException: Data source name is null
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:564)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:664)
        at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:101)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
        at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:145)
        at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:478)
        ... 19 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException: Data source name is null
        at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:272)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:609)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:553)
        ... 24 more
Caused by: java.lang.NullPointerException: Data source name is null
        at org.apache.hive.druid.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:229)
        at org.apache.hadoop.hive.druid.io.DruidOutputFormat.getHiveRecordWriter(DruidOutputFormat.java:187)
        at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:284)
        at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:269)
        ... 26 more
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1525394716493_0623_23_02 [Reducer 3] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 (state=08S01,code=2)

I added druid.datasource property to TBLPROPERTIES and ran it once again:

CREATE TABLE druid_table (`__time` timestamp,`userid`string,`num_l`float)STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'TBLPROPERTIES ("druid.segment.granularity" = "DAY",   "druid.query.granularity" = "DAY", "druid.datasource"="dummy") as select * from poc.test;

This was successful. Now I want to increment this table by adding new data.

insert into table druid_table select * from poc.test2;

This fails with the following error:

Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:INSERT INTO statement is not allowed by druid storage handler) (state=08S01,code=1)

The documentation says that INSERT INTO statements are supported. Am I missing something here?

Please let me know in case any additional details are required.

Many Thanks.

3 REPLIES 3
Highlighted

Re: Hive table with DruidStorageHandler not getting incremented

Hi @Megh Vidani

Not sure how your poc.test2 table looks like but I tried to create and insert into hive druid based table using wikiticker data and it worked fine.

Note that hive-druid integration required hive Interactive and HDP 2.6+

I did my testing on HDP-2.6.4

Below are the queries which I executed.

CREATE EXTERNAL TABLE druid_table_1
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.datasource" = "wikiticker"); 
CREATE TABLE druid_table_2 STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler' as select `__time`, channel, countryname, regionname from druid_table_1; 
insert into table druid_table_2 select `__time`, channel, countryname, regionname from druid_table_1 where channel='#ca.wikipedia';
Highlighted

Re: Hive table with DruidStorageHandler not getting incremented

Explorer

Hi @Rahul Pathak, I have deployed HDP 2.6.1 and running the query using hiveServer2 Interactive. The table poc.test2 is in parquet format.

Re: Hive table with DruidStorageHandler not getting incremented

Explorer

@Rahul Pathak can you please let me know if Hive also provides capability of deleting/dropping a particular segment from a table with druidstoragehandler? Since we're not having the latest HDP stack (2.6.4) with latest Druid version, we're unable to test the same. Would really appreciate your help in this regard.

Don't have an account?
Coming from Hortonworks? Activate your account here