About balavignesh_nag

balavignesh_nag · ‎12-19-2017

I have ORC table in hive. Im using sparkSQL to query the hive ORC table in spark. Table is partition and I have two partitions, in which one partition has data and other partition doesn't have any data. I can understand and know that there is a bug existing in spark to handle zero byte file in hive table which is stored in ORC. But I just wanted to know is there any work around available to handle this issue. Spark version up-gradation is not a choice.

balavignesh_nag · ‎12-15-2017

@Ravi teja Based on my encounters, group by will be faster than distinct. Groupby is something similar to segregating the key, values which MR is capable of handling it with ease. I would say better to go with group by.

balavignesh_nag · ‎11-16-2017

Thanks @Geoffrey Shelton Okot. This is what I'm looking for.

balavignesh_nag · ‎11-16-2017

What does hadoop fs -test do? What are the other set of options which can be used along with -test like hadoop fs -test -d.

balavignesh_nag · ‎11-13-2017

Thanks @Chris Cotter. I understand for external table its not maintained by Hcatalog. But the folder mapping is done properly even im able to see the new partition value when I query the table. The only issue is the folder partition value is not changed which Im able to understand the reason. How come when I query the table I'm able to see the new partition value when its underlying folder value doesn't change?

balavignesh_nag · ‎11-10-2017

Thanks. I googled that already. Im looking for some detailed explanation.

balavignesh_nag · ‎11-10-2017

Could someone explain about this parameter?

balavignesh_nag · ‎11-10-2017

I have an external table created as TEXTFILE with partion on load_date. I have inserted data for one partitions say for example that particular hive table has partition (load_date='2017-11-09'). Now i wanted to rename the partition which I have did by using ALTER TABLE tbl_name PARTITION (load_date='2017-11-09') RENAME TO PARTITION (load_date='2017-11-10'); After performing this operation, If i query the table Im able to see the new value for the partition. However the underlying HDFS still show the old partition path and its sub-directory still points to /hive/warehouse/default/tbl_name/load_date=2017-11-09. Is this an known issue?

balavignesh_nag · ‎11-08-2017

@PremKumar Karunakaran In spark you will not be able to modify the data. It's has immutable data which cannot be altered or modified. If you need to perform some modification in the DDL again that's not supported in spark, atleast as of now. You have to do it either through hive CLI but definitely not through spark. Hope it helps!!

balavignesh_nag · ‎11-07-2017

I have a external table which is created with partitions and buckets(256). Now if have to reduce the no of partitions. As it is an external table I have drop and recreate without affecting the data. Now the underlying HDFS location will have 256 files created under each partition. Is there any way that I change the 256 files equal to no of buckets which I have used in the new DDL? I know I can achieve this by re-processing the data again. But just wanted to know If I can achieve this by enabling any property?

Online	Offline
Last Visited	‎10-03-2019 09:01 AM

Member Since	‎05-02-2017 01:47 PM
Last Visited	‎10-03-2019 09:01 AM
Posts	360
Kudos received	64

Cloudera Community

Re: what is the best way to get ftp file to hdfs c...

Re: when yarn communicates with the namenodes when...

Re: [TEZ] are partition, sort and shuffle built-in...

Re: CASE statement Error in Beeline HIVE

Re: hive query to display Week of the timestamp an...

Is there any fix or work around available for Null...

Re: distinct vs group by

Re: Hadoop filesystem Commands

Hadoop filesystem Commands

Re: Partition rename in Hive & HDFS path

Re: hive.warehouse.subdir.inherit.perms=false

hive.warehouse.subdir.inherit.perms=false

Partition rename in Hive & HDFS path

Re: Spark 2.1 Alter table - Alternatives / Work A...

Updating the bucketted Hive table