Member since
05-02-2017
360
Posts
65
Kudos Received
22
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
13344 | 02-20-2018 12:33 PM | |
1501 | 02-19-2018 05:12 AM | |
1859 | 12-28-2017 06:13 AM | |
7135 | 09-28-2017 09:25 AM | |
12164 | 09-25-2017 11:19 AM |
03-09-2017
06:08 AM
Hi @Bala Vignesh N V, please consider accepting my answer to help us manage answered questions. Tnx!
... View more
03-08-2018
11:34 AM
@Binu Mathew Hey, In my case lot of mappers are launched when I run a select query on ORC file. Also, are there some particular settings of hive to be turned on so that read operations in ORC use ppd. I have tried a lot but almost all my queries read the same as size(of my ORC table), which means reader is reading the whole ORC file. I run Hive 0.13.
... View more
03-20-2017
07:14 PM
Hi Reddy.. Choose a delimiter which will not used easily in a data. Choose unicode as delimiter it will solve your issue. 90% of the data will not contain unicode. (row format delimited Fields terminated by '/u0001') . In your case export the the data with '/u0001' as delimiter and then insert into a hive table which has delimiter as '|'
... View more
06-07-2017
02:31 AM
Did you find answer for this? I am facing the same issue. When I applied compression on external table with text format I could see the change in compression ratio, but when I applied the same on AVRO by setting the following attributes in hive-site.xml and creating table with "avro.compress=snappy" as TBLPROPERTIES, compression ratio is same. I am not sure if compression is applied on this table. Is there any way to validate if it is compressed or not? "hive.exec.compress.output" : "true"
"hive.exec.compress.intermediate" : "true"
"avro.output.codec": "snappy"
... View more
11-03-2017
10:32 AM
@bpreachukCan we not load the data into a single column of a stage table. And then use split() function to divide the string into multiple columns. Does this approach have any performance issues that would make us choose Pig / Serde approach instead ?Please share if you see any issues with this approach.
... View more
11-30-2017
07:55 AM
hdp 2.3 and hive 1.2 the hive.enforce.bucketing is default true What is the need to set?
... View more
09-26-2016
05:52 PM
4 Kudos
@Bala Vignesh N V If your table is an actual Hive table (not an external table) and it is ACID-enabled (require ORC file format) and Hive/Tez is enabled globally for parallelism and you write those SQL statements as separate jobs, then YES. The assumption is that you run one of the versions of Hive capable of ACID which most likely you do if you use anything released in the last 1.5-2 years.
... View more
09-09-2016
12:58 PM
I'm totally in agreement with @Constantin Stanca that in SQL 'IN' is for static lists and Exists is for dynamic data sets. His point is solid that they are different use cases. Exists will not shortcut in hive. (with default map/reduce.) Map/Reduce jobs don't have a shortcut method. (That said, if you use a different engine under the hood for HIVE like SPARK or Tez.. lots of those engines do use optimizations/partitions/strategies to only pull back the required data.) I did test a small query on a small data set and 'IN' and 'EXISTS' ran in exactly the same time. (On Map/reduce) When I looked at the Execution plan for both queries (using a sub query for both EXISTS and IN) they were functionally equivalent. Meaning it doesn't matter for map/reduce what you use from a speed perspective. (If you are using a subquery.) To me this lends more strength to @Constantin Stanca statement. Follow the SQL convention.
... View more
02-15-2017
04:51 PM
How would you go about the install procedure for Apache Tika? Because I have the same situation.
... View more
08-11-2016
10:44 AM
Yes, it would be same.
... View more
- « Previous
- Next »