Member since
11-24-2015
223
Posts
10
Kudos Received
0
Solutions
11-30-2018
04:28 PM
I tried the below but it didn't work : hive> create view testview as select * from test1 where id = "{$hiveconf:id}";
OK
Time taken: 0.13 seconds hive> set id=1; hive> select * from testview; Above query did not return any rows.
... View more
11-30-2018
04:22 PM
I have two tables. test1 and test2 hive> select * from test1;
OK id year 1 2017 2 2017 hive> select * from test2;
OK no year 2 2017 3 2017 query : select id, year from test1 where id > 1 union all select no, year from test2 where no > 1 question 1: if i put the above query in a view and can I pass a parameter to it to use in the where clause (for id and no) ? question 2 : can i frame the above query without the union all appreciate the feedback.
... View more
Labels:
- Labels:
-
Apache Hive
11-16-2018
06:18 PM
ok, this can be done simply as : partitioned by (yr string, mth string). tks.
... View more
11-16-2018
06:11 PM
I tried this but wouldn't work : create table test_part_bkt_tbl (id string, cd string, dttm string) partitioned by (yr string) clustered by (month(dttm)) into 12 buckets;
... View more
11-16-2018
05:54 PM
if i partition a table by year - can i further bucket it by month? so the idea is the year will be the top level and months will be at a level beneath it. so the directory structure would be : 2018 -> 1, 2, 3 ... 12 2019 -> 1, 2, 3 ... 12 Is this what bucketing is about? Or should i be doing this someway with partitions itself? Appreciate the insights.
... View more
11-15-2018
05:53 PM
actually I am working with cloudera now and i dont see hive.exec.parallel as a configurable option in cloudera manager.
... View more
11-15-2018
04:42 PM
The below hive performance parameter - is it usually set within a map reduce program to be set at the time of execution : SET hive.exec.parallel=true Or can it be set at the global level in Ambari? Appreciate the feedback.
... View more
11-14-2018
07:13 PM
i have a table with two string columns and one datetime column (which is also defined as string datatype). I want to partition the table on monthly basis ie month(the datetime column). So I did the below : create table test_part_tbl (id string, cd string, dttm string) partitioned by (mth string); insert into test_part_tbl partition(mth) select id, cd, create_dt, month(create_dt) from real_table; hive> select * from test_part_tbl ; OK test_part_tbl.id test_part_tbl.cd test_part_tbl.dttm test_part_tbl.mth id1 cd1 2018-10-24 10 id2 cd1 2018-10-24 10 Time taken: 0.13 seconds, Fetched: 2 row(s) So the month ie "10" is actually appearing as part of the table data. Is that correct? Is it possible to partition the table as above and not have the partition column/value as part of the table data? ie when querying can't I use "month(dttm)" and search based on month? Appreciate the insights.
... View more
Labels:
- Labels:
-
Apache Hive
11-05-2018
05:01 PM
I have below query : select c1 as "col1", c2 as "col2", c3 as "col3", c4 as "col4", "xxx" as "col5" from tbl1 "xxx" represents a static value to be shown in the query for all resultant rows with a column name of "col5". But hive is giving error. How do i correctly code this? BTW we need this to use in a "union" clause with tables with differing columns. Appreciate the feedback.
... View more
Labels:
- Labels:
-
Apache Hive
09-13-2018
12:54 PM
@Mahesh Balakrishnan @Naresh P R @Shu can i have feedback on how the data should be formatted to be loaded (load inpath command) into a table created as : CREATE EXTERNAL TABLE staging3 ROW FORMAT SERDE 'org.apache.hadoop.hive.serd2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/tmp/staging' TBLPROPERTIES ('avro.schema.url'='hdfs:///tmp/avroschemas/testtbl.json'); The schema is the same as described earlier. But there is no delimiter specified. Appreciate if you could provide a sample piece of data for this.
... View more