About balavignesh_nag

pminovic · ‎03-09-2017

Hi @Bala Vignesh N V, please consider accepting my answer to help us manage answered questions. Tnx!

Hadoopy · ‎03-08-2018

@Binu Mathew Hey, In my case lot of mappers are launched when I run a select query on ORC file. Also, are there some particular settings of hive to be turned on so that read operations in ORC use ppd. I have tried a lot but almost all my queries read the same as size(of my ORC table), which means reader is reading the whole ORC file. I run Hive 0.13.

balavignesh_nag · ‎03-20-2017

Hi Reddy.. Choose a delimiter which will not used easily in a data. Choose unicode as delimiter it will solve your issue. 90% of the data will not contain unicode. (row format delimited Fields terminated by '/u0001') . In your case export the the data with '/u0001' as delimiter and then insert into a hive table which has delimiter as '|'

dileepgurala · ‎06-07-2017

Did you find answer for this? I am facing the same issue. When I applied compression on external table with text format I could see the change in compression ratio, but when I applied the same on AVRO by setting the following attributes in hive-site.xml and creating table with "avro.compress=snappy" as TBLPROPERTIES, compression ratio is same. I am not sure if compression is applied on this table. Is there any way to validate if it is compressed or not? "hive.exec.compress.output" : "true" "hive.exec.compress.intermediate" : "true" "avro.output.codec": "snappy"

tanmoy_official · ‎11-03-2017

@bpreachukCan we not load the data into a single column of a stage table. And then use split() function to divide the string into multiple columns. Does this approach have any performance issues that would make us choose Pig / Serde approach instead ?Please share if you see any issues with this approach.

maybe618 · ‎11-30-2017

hdp 2.3 and hive 1.2 the hive.enforce.bucketing is default true What is the need to set?

cstanca · ‎09-26-2016

@Bala Vignesh N V If your table is an actual Hive table (not an external table) and it is ACID-enabled (require ORC file format) and Hive/Tez is enabled globally for parallelism and you write those SQL statements as separate jobs, then YES. The assumption is that you run one of the versions of Hive capable of ACID which most likely you do if you use anything released in the last 1.5-2 years.

matt_andruff · ‎09-09-2016

I'm totally in agreement with @Constantin Stanca that in SQL 'IN' is for static lists and Exists is for dynamic data sets. His point is solid that they are different use cases. Exists will not shortcut in hive. (with default map/reduce.) Map/Reduce jobs don't have a shortcut method. (That said, if you use a different engine under the hood for HIVE like SPARK or Tez.. lots of those engines do use optimizations/partitions/strategies to only pull back the required data.) I did test a small query on a small data set and 'IN' and 'EXISTS' ran in exactly the same time. (On Map/reduce) When I looked at the Execution plan for both queries (using a sub query for both EXISTS and IN) they were functionally equivalent. Meaning it doesn't matter for map/reduce what you use from a speed perspective. (If you are using a subquery.) To me this lends more strength to @Constantin Stanca statement. Follow the SQL convention.

Matthew_Chang-K · ‎02-15-2017

How would you go about the install procedure for Apache Tika? Because I have the same situation.

ssubhas · ‎08-11-2016

Yes, it would be same.

Online	Offline
Last Visited	‎10-03-2019 09:01 AM

Member Since	‎05-02-2017 01:47 PM
Last Visited	‎10-03-2019 09:01 AM
Posts	360
Kudos received	64

Cloudera Community

Re: what is the best way to get ftp file to hdfs c...

Re: when yarn communicates with the namenodes when...

Re: [TEZ] are partition, sort and shuffle built-in...

Re: CASE statement Error in Beeline HIVE

Re: hive query to display Week of the timestamp an...

Re: Estimate run time of hive query

Re: Will there be any performance issues if we sel...

Re: How to handle delimiters, if they are part of ...

Re: Snappy Compression on Avro backed Hive table -...

Re: Can we create external hive table on top of Fi...

Re: HIVE MR VS TEZ difference in output, ,Hi,

Re: Is it possible to load hive table parallely?

Re: Exists or IN which performs better

Re: How to convert pdf file into hive table?

Re: count(*) in hive and the wc of hadoop file is ...