Member since
09-25-2015
112
Posts
88
Kudos Received
12
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
9370 | 03-15-2017 03:17 PM | |
5840 | 02-17-2017 01:03 PM | |
1692 | 01-02-2017 10:47 PM | |
2508 | 11-16-2016 07:03 PM | |
1019 | 10-07-2016 05:24 PM |
01-17-2017
06:49 PM
Hi @Naren Reddy. @Eugene Koifman is correct - ACID tables are not designed for frequent OLTP-like transactions... It is much more optimal for tables that get their deletes/updates applied every 15 minutes or longer.
... View more
01-16-2017
03:26 PM
Hi @Rohit Sharma. Did you use the 'EXTERNAL' keyword when you created the table? If you don't specify 'EXTERNAL' then it is an internal table and the data will be deleted, regardless of what location you specify for the data...
... View more
01-04-2017
07:23 PM
Hi @Bala Vignesh N V. Lester Martin has an excellent Pig script to do this type of work. It is not an external table solution but a good way to do this type of work... https://martin.atlassian.net/wiki/pages/viewpage.action?pageId=21299205
... View more
01-03-2017
12:03 PM
1 Kudo
Hi @Kaliyug Antagonist. This is a bit of a philosophical question. The issue above occurs because the IN statement appears to do an implicit conversion of 0552094 to a numeric datatype and the IN statement does not find the row for 0552094. This implicit conversion is not what you want the IN statement to do. By explicitly quoting the numeric value, we do not allow the IN statement to do an Implicit conversion. IMHO I would recommend that you *never* allow implicit conversions to take place - whether in the RDBMS world (SQL Server, Oracle) or in Hive. By always quoting string/date values (OR using CAST function to ensure the correct datatype) you will get the correct/optimal results and you will never be affected by an implicit conversion. In the RDBMS world there are good of discussions about avoiding implicit conversions. RDBMSs do a much more thorough job of handling conversions, but even they are far from perfect when doing implicit conversions. An example of this - please see the Green/Yellow/Red chart of allowable conversions in this blog post by Jonathan Keyahias... https://www.sqlskills.com/blogs/jonathan/implicit-conversions-that-cause-index-scans/
... View more
01-02-2017
10:47 PM
5 Kudos
Hi @Kaliyug Antagonist. I couldn't help but notice that the value in question had a leading zero. Check the datatype used for the column called equipmentNumber and adjust the IN clause accordingly. It looks like equipmentNumber is defined as a String and thus you will have to quote the values inside the "IN" clause. Here are some examples & results: 1. Using INT as the datatype for equipmentNumber: create table if not exists fact_rtc_se
(equipmentnumber int, dimensionid string, datemessage date) ;
insert into fact_rtc_se values (0552094, 36081, '2016-02-29') ;
insert into fact_rtc_se values (0552094, 18943, '2016-02-29') ;
insert into fact_rtc_se values (1857547, 27956, '2016-01-08') ;
insert into fact_rtc_se values (1857547, 749597, '2016-01-15') ;
select equipmentnumber, dimensionid, datemessage from fact_rtc_se
where equipmentnumber in (0552094,1857547) and datemessage < '2016-03-01';
552094 36081 2016-02-29
552094 18943 2016-02-29
1857547 27956 2016-01-08
1857547 749597 2016-01-15
select equipmentnumber, dimensionid, datemessage from fact_rtc_se
where equipmentnumber in ('0552094','1857547') and datemessage < '2016-03-01';
1857547 27956 2016-01-08
1857547 749597 2016-01-15
2. Using STRING as the datatype for equipmentNumber: create table if not exists fact_rtc_se
(equipmentnumber string, dimensionid string, datemessage date) ;
insert into fact_rtc_se values ('0552094', 36081, '2016-02-29') ;
insert into fact_rtc_se values ('0552094', 18943, '2016-02-29') ;
insert into fact_rtc_se values ('1857547', 27956, '2016-01-08') ;
insert into fact_rtc_se values ('1857547', 749597, '2016-01-15') ;
select equipmentnumber, dimensionid, datemessage from fact_rtc_se
where equipmentnumber in (0552094,1857547) and datemessage < '2016-03-01';
1857547 27956 2016-01-08
1857547 749597 2016-01-15
select equipmentnumber, dimensionid, datemessage from fact_rtc_se
where equipmentnumber in ('0552094','1857547') and datemessage < '2016-03-01';
0552094 36081 2016-02-29
0552094 18943 2016-02-29
1857547 27956 2016-01-08
1857547 749597 2016-01-15
I hope this helps.
... View more
12-19-2016
12:22 AM
Thanks for the clarification @Vijayandra Mahadik. You are correct... It was implemented for Hive 0.14. My comments were out of date. I will correct my original statement above. https://issues.apache.org/jira/browse/HIVE-5760
... View more
11-18-2016
12:11 PM
Yikes! Have never seen that happen before, but I certainly have no reason to doubt you. Does it happen with hive.execution.engine=tez? If you could grab & sanitize your query/config details & post that as a Hive bug in Jira it would be greatly appreciated... we don't want that problem to bite anyone else.
... View more
11-16-2016
10:42 PM
Edited the post to fix syntax error. Now it runs just fine. 😉
... View more
11-16-2016
07:03 PM
Hi @Zack Riesland. Here's how I normally do this. It's not specifically a subquery but accomplishes what you're looking for. insert into table daily_counts
select count(*), 'table_a' from table_a
UNION
select count(*), 'table_b' from table_b
UNION
select count(*), 'table_c' from table_c
...
;
... View more
11-08-2016
09:56 PM
Hi @vamsi valiveti. The example above uses the exact same source file in the exact same location for both external tables. Both test_csv_serde_using_CSV_Serde_reader and test_csv_serde tables read an external file(s) stored in the directory called '/user/<uname>/elt/test_csvserde/'. The file I used was pipe delimited and contains 62,000,000 rows - so I didn't attach it . 😉 It would look like Option 2 above, but of course with 4 columns: 121|Hello World|4567|34345
232|Text|5678|78678
343|More Text|6789|342134
... View more