Member since
02-11-2019
78
Posts
1
Kudos Received
0
Solutions
05-03-2019
01:02 PM
Hi, We have an existing external Hive Table containing millions of rows partitioned by columnA of type string. We want to change this to ColumnB of type timestamp What's the most efficient way to go about this, considering we have all this rows of data already stored in the existing partition structure
... View more
Labels:
- Labels:
-
Apache Hive
04-16-2019
01:37 PM
Hi, Can I create a parameterized view in impala somethig like the below pseudo code: Create View MyView as SELECT col1, col2 col3 FROM table_one WHERE startdate = ${date1} and enddate = ${date2} ...
... View more
Labels:
- Labels:
-
Apache Impala
04-07-2019
04:41 AM
we need to partition our Hive Table based on date. Date/Month/Year is it better to use int or string for the partition types. ex: CREATE EXTERNAL TABLE partition (id string, event timestamp and so on) PARTITIONED BY (year INT, month INT, day INT) Stored as Parquet vs CREATE EXTERNAL TABLE partition (id string, event timestamp and so on) PARTITIONED BY (year string, month string, day string) Stored as Parquet Noticed that we couldn't do queries like: ... where day > 10 with the string option
... View more
Labels:
- Labels:
-
Apache Hive
04-02-2019
06:20 AM
Hi all. Need to generate unique id's in our hadoop cluster during data ingestion. We have parallel processes ingesting data from different sources into hive tables, we'd like a unique ID for each data row inserted. I understand zookeper offers Unique ID generation for distributed scenarios. Please help with how do we do this, can't find sample of documentation. Also please let me know If there is a better distributed unique id generator in the cloudera environment Thanks
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Zookeeper
03-27-2019
07:36 AM
Can we take advantage of Hive table partitions when querying with impala Are there any issues or problems we might run into given this scenario. We currently have partitioned hive tables... will we be missing anything if we dont convert to impala tables
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Impala
03-25-2019
02:47 PM
Hi, I need to retrieve only the last entry in a given partition if there are multiple entries therein Assume I create an external table partitioned by date: create external table test_lb (field1 string, field2 string, field3 string) partitioned by (year string, month string , day string, host string) row format delimited fields terminated by ',' Then I insert multiple records to same partition. i,e. same year,month,day insert into test_lb partition (year="2013", month="07", day="28") values ("foo1", "FOO2", "FOO3"); insert into test_lb partition (year="2013", month="07", day="28") values ("foo4", "FOO5", "FOO6"); How do I retrieve just the most recent entry via a query... is there an inbuilt way to get only the latest values
... View more
Labels:
- Labels:
-
Apache Impala
-
Apache Spark
03-01-2019
01:45 PM
1 Kudo
Looks like a netgear switch was causing the problem. switched to wifi connect between the workstation and the ISP router and all is well... Thanks
... View more
03-01-2019
01:40 PM
I have a little netgear unmanaged switch that I'm using to extend my local lan. Looks like that switch was causing the problem, tho the docs insist its a simple pass-thru. anyways, connected via my wifi to the ISP router and all is well... Thanks
... View more
- « Previous
- Next »