Member since
05-02-2017
360
Posts
65
Kudos Received
22
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
13322 | 02-20-2018 12:33 PM | |
1498 | 02-19-2018 05:12 AM | |
1858 | 12-28-2017 06:13 AM | |
7135 | 09-28-2017 09:25 AM | |
12155 | 09-25-2017 11:19 AM |
12-19-2017
11:21 AM
I have ORC table in hive. Im using sparkSQL to query the hive ORC table in spark. Table is partition and I have two partitions, in which one partition has data and other partition doesn't have any data. I can understand and know that there is a bug existing in spark to handle zero byte file in hive table which is stored in ORC. But I just wanted to know is there any work around available to handle this issue. Spark version up-gradation is not a choice.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
-
Apache Spark
12-15-2017
05:06 AM
@Ravi teja Based on my encounters, group by will be faster than distinct. Groupby is something similar to segregating the key, values which MR is capable of handling it with ease. I would say better to go with group by.
... View more
11-16-2017
11:23 AM
What does hadoop fs -test do? What are the other set of options which can be used along with -test like hadoop fs -test -d.
... View more
Labels:
- Labels:
-
Apache Hadoop
11-13-2017
06:06 AM
Thanks @Chris Cotter. I understand for external table its not maintained by Hcatalog. But the folder mapping is done properly even im able to see the new partition value when I query the table. The only issue is the folder partition value is not changed which Im able to understand the reason. How come when I query the table I'm able to see the new partition value when its underlying folder value doesn't change?
... View more
11-10-2017
10:03 AM
Thanks. I googled that already. Im looking for some detailed explanation.
... View more
11-10-2017
09:39 AM
Could someone explain about this parameter?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
11-10-2017
09:01 AM
I have an external table created as TEXTFILE with partion on load_date. I have inserted data for one partitions say for example that particular hive table has partition (load_date='2017-11-09'). Now i wanted to rename the partition which I have did by using ALTER TABLE tbl_name PARTITION (load_date='2017-11-09') RENAME TO PARTITION (load_date='2017-11-10'); After performing this operation, If i query the table Im able to see the new value for the partition. However the underlying HDFS still show the old partition path and its sub-directory still points to /hive/warehouse/default/tbl_name/load_date=2017-11-09. Is this an known issue?
... View more
Labels:
- Labels:
-
Apache Hive
11-08-2017
09:23 AM
@PremKumar Karunakaran In spark you will not be able to modify the data. It's has immutable data which cannot be altered or modified. If you need to perform some modification in the DDL again that's not supported in spark, atleast as of now. You have to do it either through hive CLI but definitely not through spark. Hope it helps!!
... View more
11-07-2017
12:35 PM
I have a external table which is created with partitions and buckets(256). Now if have to reduce the no of partitions. As it is an external table I have drop and recreate without affecting the data. Now the underlying HDFS location will have 256 files created under each partition. Is there any way that I change the 256 files equal to no of buckets which I have used in the new DDL? I know I can achieve this by re-processing the data again. But just wanted to know If I can achieve this by enabling any property?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive