Member since
04-28-2018
14
Posts
0
Kudos Received
0
Solutions
05-10-2018
03:05 PM
Hi Erkan, Can you please provide more details on how you resolved the issue. Iam facing same issue while installing HDP 2.6.4 Thanks in advance.
... View more
05-10-2018
03:04 PM
Hi Erkan, can you please provide more details on how your resolved the issue. I am also facing same issue while installing HDP 2.6.4 Thanks in advance.
... View more
05-07-2018
02:13 PM
When we create table as follows:
CREATE TABLE druid_table (`__time` timestamp,`userid`string,`num_l`float)STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'TBLPROPERTIES ("druid.segment.granularity"="DAY");
We are getting following error:
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: java.io.FileNotFoundException: File /tmp/workingDirectory/.staging-hive_20180507130925_227f2e48-d049-464e-b2cd-43009b3398b3/segmentsDescriptorDir does not exist. (state=08S01,code=1)
Can you please help?
... View more
05-07-2018
07:50 AM
Hi, I am creating table using the following DDL: CREATE TABLE poc_db.poc_druid_v2_07may
(`__time` timestamp, col1 string, col2 string, metric1 double, trans_count int)
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler' TBLPROPERTIES ( "druid.segment.granularity" = "DAY", "druid.query.granularity" = "DAY"); My objective is to create a hive table with Druid storage handler. And insert data into this table. When i execute the above DDL, i am getting following error. 0: jdbc:hive2://HYDHADDAT01:10500> CREATE TABLE poc_db.poc_druid_v2_07may (`__time` timestamp, col1 string, col2 string, metric1 double, trans_count int) STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler' TBLPROPERTIES ( "druid.segment.granularity" = "DAY", "druid.query.granularity" = "DAY");
INFO : Compiling command(queryId=hive_20180507130925_227f2e48-d049-464e-b2cd-43009b3398b3): CREATE TABLE poc_db.poc_druid_v2_07may (`__time` timestamp, col1 string, col2 string, metric1 double, trans_count int) STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler' TBLPROPERTIES ( "druid.segment.granularity" = "DAY", "druid.query.granularity" = "DAY")
INFO : We are setting the hadoop caller context from HIVE_SSN_ID:e2b2c4e9-d858-4d84-8556-445a01e70657 to hive_20180507130925_227f2e48-d049-464e-b2cd-43009b3398b3
INFO : Semantic Analysis Completed
INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling command(queryId=hive_20180507130925_227f2e48-d049-464e-b2cd-43009b3398b3); Time taken: 0.004 seconds
INFO : We are resetting the hadoop caller context to HIVE_SSN_ID:e2b2c4e9-d858-4d84-8556-445a01e70657
INFO : Setting caller context to query id hive_20180507130925_227f2e48-d049-464e-b2cd-43009b3398b3
INFO : Executing command(queryId=hive_20180507130925_227f2e48-d049-464e-b2cd-43009b3398b3): CREATE TABLE poc_db.poc_druid_v2_07may (`__time` timestamp, col1 string, col2 string, metric1 double, trans_count int) STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler' TBLPROPERTIES ( "druid.segment.granularity" = "DAY", "druid.query.granularity" = "DAY")
INFO : Starting task [Stage-0:DDL] in serial mode
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: java.io.FileNotFoundException: File /tmp/workingDirectory/.staging-hive_20180507130925_227f2e48-d049-464e-b2cd-43009b3398b3/segmentsDescriptorDir does not exist.
INFO : Resetting the caller context to HIVE_SSN_ID:e2b2c4e9-d858-4d84-8556-445a01e70657
INFO : Completed executing command(queryId=hive_20180507130925_227f2e48-d049-464e-b2cd-43009b3398b3); Time taken: 0.385 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: java.io.FileNotFoundException: File /tmp/workingDirectory/.staging-hive_20180507130925_227f2e48-d049-464e-b2cd-43009b3398b3/segmentsDescriptorDir does not exist. (state=08S01,code=1) I think hive the file it is trying to create in "/tmp/workingDirectory...." is critical when SQLs are executed subsequently. Is there a way for me to configure the location of this file, instead of /tmp? How can i fix the above error? i have enough space in /tmp. Please help.
... View more
Labels:
- Labels:
-
Apache Hive
05-04-2018
02:00 PM
I tried the following, and it works. df3.write.mode("append").insertInto("my_druid_table")
... View more
05-03-2018
01:02 PM
Hi Slim, Thanks for your response. For my benifit, iam posting my understanding. Can you please confirm. At the end, iam posting followup question. Case 1:
In hive table: (only timestamp data differs for each customer)
trans_timestamp,customer,product,store,qty,amount 2017-01-01 14:01:01,c1,p1,s1,1,10 2017-01-01 14:10:01,c1,p1,s1,2,20 2017-01-01 14:20:01,c1,p1,s1,3,30 2017-01-01 14:01:01,c2,p1,s1,4,40 2017-01-01 14:10:01,c2,p1,s1,5,50 2017-01-01 14:20:01,c2,p1,s1,6,60 Config: segment.granularity=DAY and query.granularity=HOUR In Druid Segments: One segment per day, only 1 segment as we have only 1 day data.
6 rows are rolled up into 2 rows.
trans_timestamp,customer,product,store,qty,amount 2017-01-01 14:00:00,c1,p1,s1,6,60 2017-01-01 14:00:00,c2,p1,s1,15,150 Case 2:
In hive table:
trans_timestamp,customer,product,store,qty,amount 2017-01-01 00:00:00,c1,p1,s1,1,10 2017-02-01 00:00:00,c1,p2,s1,2,20 2017-03-01 00:00:00,c1,p3,s1,3,30 2017-01-01 00:00:00,c2,p1,s1,4,40 2017-02-01 00:00:00,c2,p2,s1,5,50 2017-03-01 00:00:00,c2,p3,s1,6,60 Config: segment.granularity=MONTH and query.granularity=DAY
In Druid Segments: One segment per month. Total 3 segments as we have 3 months data. trans_timestamp,customer,product,store,qty,amount segment1: 2017-01-01 00:00:00,c1,p1,s1,1,10 2017-01-01 00:00:00,c2,p1,s1,4,40 segment2: 2017-02-01 00:00:00,c1,p2,s1,2,20 2017-02-01 00:00:00,c2,p2,s1,5,50 segment3: 2017-03-01 00:00:00,c1,p3,s1,3,30 2017-03-01 00:00:00,c2,p3,s1,6,60 Question: If i have daily data volume at say 10million. My monthly volume is ~300million. If i use segment granularity as day, then i will have 1000+ segments (assuming 3yrs of data). If i want to query by month ( group by month), then lot of segments need to be processed. On the other hand, if i set segment granularity as month - then my segment size would be huge. Does segment size has an impact on performance? Based on your experience what is recommened? What are people usually using. In my case, the granularity of the data is day (thankfully no timestamp involved.). Please suggest. Thanks in advance.
... View more
05-02-2018
12:32 PM
Hi, I have 6 historical nodes, with 18 cores and 64GB RAM. When i am loading data around 7million records (8dimensions, 2 metrics) - it is taking around 30minutes for druid to do indexing. I am using the following command using beeline: CREATE TABLE test_druid STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ( "druid.datasource" = "test_druid", "druid.segment.granularity" = "MONTH", "druid.query.granularity" = "DAY")
as select cast(trans_date as timestamp) as `__time` , col1, col2, col3 from testdb.test_hive_Table where to_date(trans_Date) = '2018-01-01'; Is it expected to take so much time? What are the recommended configurations for better performance. Please suggest. Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Hive
05-02-2018
11:23 AM
Hi, I am creating a table as following. The data till say, 10-apr-2018, is loaded. How do i load data from 11apr to latest day? If i do insert into table test_druid, it is failing. Do i need to drop the month segment (apr-18) and load the data again for entire apr-18 month? If so, can you please give steps on how to do from hive. I am using beeline to do all my operations. CREATE TABLE test_druid STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler' TBLPROPERTIES ("druid.datasource"="test_druid","druid.segment.granularity"="MONTH","druid.query.granularity"="DAY") asselect cast(trans_date as timestamp)as`__time`, col1, col2, col3 from testdb.test_hive_Table where to_date(trans_Date)>='2018-01-01';
... View more
Labels:
- Labels:
-
Apache Hive
05-02-2018
11:19 AM
Hi,
I am creating a table in Hive (LLAP) using an existing table. as follows:
CREATE TABLE test_druid STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ( "druid.datasource" = "test_druid", "druid.segment.granularity" = "MONTH", "druid.query.granularity" = "DAY")
as select cast(trans_date as timestamp) as `__time` , col1, col2, col3 from testdb.test_hive_Table where to_date(trans_Date) >= '2018-01-01';
I am getting following error. No where i mentioned the druid storage as "/druid/segments". I dont know from where it is picking it.
In Ambari, i have set druid.storage.storageDirectory=/user/druid/data. Not sure what is causing the issue.
Please help.
Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: Permission denied: user=hive, access=WRITE, inode="/druid/segments/ff7e385a0fbf4d84973be64d88f31c02":hdfs:hdfs:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292)
at org.apache.hadoop.hdfs.server.namenode.FS
<br>
... View more
Labels:
- Labels:
-
Apache Hive