Member since
04-28-2018
14
Posts
0
Kudos Received
0
Solutions
11-06-2018
07:34 PM
This is a known bug, when you try to create an empty table. Quick fix will be to Create the table and insert data in the same time using Create table as Select statement. Or update to version with the actual bug fix https://issues.apache.org/jira/browse/HIVE-16677
... View more
05-03-2018
01:02 PM
Hi Slim, Thanks for your response. For my benifit, iam posting my understanding. Can you please confirm. At the end, iam posting followup question. Case 1:
In hive table: (only timestamp data differs for each customer)
trans_timestamp,customer,product,store,qty,amount 2017-01-01 14:01:01,c1,p1,s1,1,10 2017-01-01 14:10:01,c1,p1,s1,2,20 2017-01-01 14:20:01,c1,p1,s1,3,30 2017-01-01 14:01:01,c2,p1,s1,4,40 2017-01-01 14:10:01,c2,p1,s1,5,50 2017-01-01 14:20:01,c2,p1,s1,6,60 Config: segment.granularity=DAY and query.granularity=HOUR In Druid Segments: One segment per day, only 1 segment as we have only 1 day data.
6 rows are rolled up into 2 rows.
trans_timestamp,customer,product,store,qty,amount 2017-01-01 14:00:00,c1,p1,s1,6,60 2017-01-01 14:00:00,c2,p1,s1,15,150 Case 2:
In hive table:
trans_timestamp,customer,product,store,qty,amount 2017-01-01 00:00:00,c1,p1,s1,1,10 2017-02-01 00:00:00,c1,p2,s1,2,20 2017-03-01 00:00:00,c1,p3,s1,3,30 2017-01-01 00:00:00,c2,p1,s1,4,40 2017-02-01 00:00:00,c2,p2,s1,5,50 2017-03-01 00:00:00,c2,p3,s1,6,60 Config: segment.granularity=MONTH and query.granularity=DAY
In Druid Segments: One segment per month. Total 3 segments as we have 3 months data. trans_timestamp,customer,product,store,qty,amount segment1: 2017-01-01 00:00:00,c1,p1,s1,1,10 2017-01-01 00:00:00,c2,p1,s1,4,40 segment2: 2017-02-01 00:00:00,c1,p2,s1,2,20 2017-02-01 00:00:00,c2,p2,s1,5,50 segment3: 2017-03-01 00:00:00,c1,p3,s1,3,30 2017-03-01 00:00:00,c2,p3,s1,6,60 Question: If i have daily data volume at say 10million. My monthly volume is ~300million. If i use segment granularity as day, then i will have 1000+ segments (assuming 3yrs of data). If i want to query by month ( group by month), then lot of segments need to be processed. On the other hand, if i set segment granularity as month - then my segment size would be huge. Does segment size has an impact on performance? Based on your experience what is recommened? What are people usually using. In my case, the granularity of the data is day (thankfully no timestamp involved.). Please suggest. Thanks in advance.
... View more
07-05-2018
03:42 PM
Hello, When I am trying to create a table from beeline using Druid Storage, I get the below error. Could you please guide on how to proceed further? Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.skife.jdbi.v2.exceptions.UnableToObtainConnectionException: java.sql.SQLException: Cannot create PoolableConnectionFactory (Access denied for user)
... View more
05-02-2018
09:17 AM
Hi Slim, Thanks for your response. That worked. I have few more questions, i will open a new thread.
... View more
05-29-2018
01:26 PM
Hello, please, could you @Johann Voppichler describe effort done to create Hive table with druid storage handler? We tried many options to integrate Hive 1.1.0 version and Druid 0.12 version via hive-druid-handler-2.3.0.jar (or 3.0.0), hive-metastore-2.3.0.jar (or 3.0.0), hive-exec-2.3.0.jar (or 3.0.0). Only output when creating table via beeline is there is missing class. All classes which are subjects of errors are included but there is probably conflict between .jars sumplemented. Is it possible integrate Hive-druid within our versions? What combinations of versions are supported? Thank you! errors: java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.session.SessionState$LogHelper.<init>(Lorg/slf4j/Logger;)V (state=,code=0) java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hive.druid.DruidStorageHandler java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/metadata/StorageHandlerInfo
... View more
05-22-2018
01:59 PM
i can understand Lukas' issue with "*-2" named .repo files. my install is error'ing out and giving me no clues, no breadcrumbs to follow. all my /var/lib/ambari-agent/data/errors* log files are either size 0-length or 86-length, with latter: "Server considered task failed and automatically aborted it." on centos7.4, Ambari2.6.1.5 when i installed with a ambari-hdp.repo Ambari complained and duplicated it as ambari-hdp-1.repo Justin
... View more