About bpreachuk

bpreachuk · ‎01-17-2017

Hi @Naren Reddy. @Eugene Koifman is correct - ACID tables are not designed for frequent OLTP-like transactions... It is much more optimal for tables that get their deletes/updates applied every 15 minutes or longer.

bpreachuk · ‎01-16-2017

Hi @Rohit Sharma. Did you use the 'EXTERNAL' keyword when you created the table? If you don't specify 'EXTERNAL' then it is an internal table and the data will be deleted, regardless of what location you specify for the data...

bpreachuk · ‎01-04-2017

Hi @Bala Vignesh N V. Lester Martin has an excellent Pig script to do this type of work. It is not an external table solution but a good way to do this type of work... https://martin.atlassian.net/wiki/pages/viewpage.action?pageId=21299205

bpreachuk · ‎01-03-2017

Hi @Kaliyug Antagonist. This is a bit of a philosophical question. The issue above occurs because the IN statement appears to do an implicit conversion of 0552094 to a numeric datatype and the IN statement does not find the row for 0552094. This implicit conversion is not what you want the IN statement to do. By explicitly quoting the numeric value, we do not allow the IN statement to do an Implicit conversion. IMHO I would recommend that you *never* allow implicit conversions to take place - whether in the RDBMS world (SQL Server, Oracle) or in Hive. By always quoting string/date values (OR using CAST function to ensure the correct datatype) you will get the correct/optimal results and you will never be affected by an implicit conversion. In the RDBMS world there are good of discussions about avoiding implicit conversions. RDBMSs do a much more thorough job of handling conversions, but even they are far from perfect when doing implicit conversions. An example of this - please see the Green/Yellow/Red chart of allowable conversions in this blog post by Jonathan Keyahias... https://www.sqlskills.com/blogs/jonathan/implicit-conversions-that-cause-index-scans/

bpreachuk · ‎01-02-2017

Hi @Kaliyug Antagonist. I couldn't help but notice that the value in question had a leading zero. Check the datatype used for the column called equipmentNumber and adjust the IN clause accordingly. It looks like equipmentNumber is defined as a String and thus you will have to quote the values inside the "IN" clause. Here are some examples & results: 1. Using INT as the datatype for equipmentNumber: create table if not exists fact_rtc_se (equipmentnumber int, dimensionid string, datemessage date) ; insert into fact_rtc_se values (0552094, 36081, '2016-02-29') ; insert into fact_rtc_se values (0552094, 18943, '2016-02-29') ; insert into fact_rtc_se values (1857547, 27956, '2016-01-08') ; insert into fact_rtc_se values (1857547, 749597, '2016-01-15') ; select equipmentnumber, dimensionid, datemessage from fact_rtc_se where equipmentnumber in (0552094,1857547) and datemessage < '2016-03-01'; 552094 36081 2016-02-29 552094 18943 2016-02-29 1857547 27956 2016-01-08 1857547 749597 2016-01-15 select equipmentnumber, dimensionid, datemessage from fact_rtc_se where equipmentnumber in ('0552094','1857547') and datemessage < '2016-03-01'; 1857547 27956 2016-01-08 1857547 749597 2016-01-15 2. Using STRING as the datatype for equipmentNumber: create table if not exists fact_rtc_se (equipmentnumber string, dimensionid string, datemessage date) ; insert into fact_rtc_se values ('0552094', 36081, '2016-02-29') ; insert into fact_rtc_se values ('0552094', 18943, '2016-02-29') ; insert into fact_rtc_se values ('1857547', 27956, '2016-01-08') ; insert into fact_rtc_se values ('1857547', 749597, '2016-01-15') ; select equipmentnumber, dimensionid, datemessage from fact_rtc_se where equipmentnumber in (0552094,1857547) and datemessage < '2016-03-01'; 1857547 27956 2016-01-08 1857547 749597 2016-01-15 select equipmentnumber, dimensionid, datemessage from fact_rtc_se where equipmentnumber in ('0552094','1857547') and datemessage < '2016-03-01'; 0552094 36081 2016-02-29 0552094 18943 2016-02-29 1857547 27956 2016-01-08 1857547 749597 2016-01-15 I hope this helps.

bpreachuk · ‎12-19-2016

Thanks for the clarification @Vijayandra Mahadik. You are correct... It was implemented for Hive 0.14. My comments were out of date. I will correct my original statement above. https://issues.apache.org/jira/browse/HIVE-5760

bpreachuk · ‎11-18-2016

Yikes! Have never seen that happen before, but I certainly have no reason to doubt you. Does it happen with hive.execution.engine=tez? If you could grab & sanitize your query/config details & post that as a Hive bug in Jira it would be greatly appreciated... we don't want that problem to bite anyone else.

bpreachuk · ‎11-16-2016

Edited the post to fix syntax error. Now it runs just fine. 😉

bpreachuk · ‎11-16-2016

Hi @Zack Riesland. Here's how I normally do this. It's not specifically a subquery but accomplishes what you're looking for. insert into table daily_counts select count(*), 'table_a' from table_a UNION select count(*), 'table_b' from table_b UNION select count(*), 'table_c' from table_c ... ;

bpreachuk · ‎11-08-2016

Hi @vamsi valiveti. The example above uses the exact same source file in the exact same location for both external tables. Both test_csv_serde_using_CSV_Serde_reader and test_csv_serde tables read an external file(s) stored in the directory called '/user/<uname>/elt/test_csvserde/'. The file I used was pipe delimited and contains 62,000,000 rows - so I didn't attach it . 😉 It would look like Option 2 above, but of course with 4 columns: 121|Hello World|4567|34345 232|Text|5678|78678 343|More Text|6789|342134

Online	Offline
Last Visited	‎04-26-2019 11:02 AM

Member Since	‎09-25-2015 05:26 PM
Last Visited	‎04-26-2019 11:02 AM
Posts	112
Kudos received	85

Cloudera Community

Re: Does Hortonworks have EOL dates?

Re: I have multiple tables which i need to join an...

Re: Difference between WHERE ...OR & WHERE ... IN

Re: How to insert individual rows into hive based ...

Re: Why is there data limitation with LLAP?

Re: ACID Transactions DELETE & UPDATE Issues ... S...

Re: The Directory in HDFS goes in Trash when i dro...

Re: Can we create external hive table on top of Fi...

Re: Difference between WHERE ...OR & WHERE ... IN

Re: Difference between WHERE ...OR & WHERE ... IN

Re: Hive STRING vs VARCHAR Performance

Re: How to insert individual rows into hive based ...

Re: How to insert individual rows into hive based ...

Re: How to insert individual rows into hive based ...

Re: When to Use Hive CSVSerde