Member since
09-26-2017
24
Posts
0
Kudos Received
0
Solutions
04-04-2018
07:26 AM
Hi @Joy Ndjama, Awesome ! Exactly what I was expecting. Even if it is quite expensive, it is a elegant way to get a true sample. Thanks @Scott Shaw as well, TABLESAMPLE is a very interesting functionnality too.
... View more
03-27-2018
05:47 PM
Hi, I was wondering if is there a way to perform a "local limit" in a Hive query. I explain : Considering a query that "distribute by" a partition "X". This partition contains 30 values and I want to have exactly 100 rows per value... Because, when we perform "limit", generally, this one will break the sink operation at the n-th row, generally only one partition is concerned in that way... And in the aim to build some samples, I think it will be very helpful that reducers (or mappers) can be locally "limited"... I hope it is clear 🙂 Thanks for your replies. SF
... View more
Labels:
- Labels:
-
Apache Hive
01-06-2018
08:39 AM
Hi @Gunther Hagleitner ; thanks it's very clear with your explainations.
... View more
01-03-2018
02:36 PM
Hi, I know that Tez avoids storing intermediates result into HDFS (versus MapReduce that does it) but I was wondering, where are they stored then ? I read : "on memory", "on local disk"... But what if the task which emits intermediates result are not on the same node that the task which will receive it ? So, is it just network I/O instead of HDFS read / write streaming datas from memory and/or local disk ? Thanks for your help 🙂
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Tez
01-01-2018
11:52 AM
thanks @Bala Vignesh N V ; it helps 🙂
... View more
12-27-2017
04:37 PM
Hello, If the concept of MapReduce is pretty clear in my mind, i can't say so much for Tez. MapReduce performs its work through Map > Partition, Sort, Shuffle > Reduce, and I know well each of these phases... But for Tez, and more precisely, between two Vertices (considering a Vertices Map and a Vertices Reduce), how is it ? Is there a built-in "partition, sort, shuffle" like in MR ? Or is it to us to manage this internal logic (i read a word count example, it seems it is, but I prefer to be sure) ? Thanks !
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Tez
12-17-2017
04:43 PM
Thanks a lot @bkosaraju it really helps me 🙂
... View more
12-11-2017
06:24 PM
Hi, I would to select a partitioned table (by YEAR, MONTH, DAY), but instead of writing "WHERE YEAR='2017' AND MONTH='12' AND DAY='11'", I would like make a join from this table to a table that contains each field YEAR, MONTH, DAY. SELECT * FROM mypartitionedtable t1 INNER JOIN currentpartitiontable t2 ON t1.YEAR=t2.YEAR etc. etc. But when I am doing an EXPLAIN EXTENDED, I see the analyzer will fetch every partition... Is there something I missed ? Thanks 🙂
... View more
Labels:
- Labels:
-
Apache Hive
11-09-2017
09:55 AM
Thanks it helps. before OVERWRITE : $ hdfs dfs -ls /apps/hive/warehouse/xyz.db/table_tmp
Found 1 items
718 2017-11-09 10:18 /apps/hive/warehouse/xyz.db/table_tmp/000000_0 during OVERWRITE : $ hdfs dfs -ls /apps/hive/warehouse/xyz.db/table_tmp
Found 2 items
0 2017-11-09 10:35 /apps/hive/warehouse/xyz.db/table_tmp/.hive-staging_hive_2017-11-09_10-35-38_682_2619781700846007196-1
718 2017-11-09 10:18 /apps/hive/warehouse/xyz.db/table_tmp/000000_0
after OVERWRITE : $ hdfs dfs -ls /apps/hive/warehouse/xyz.db/table_tmp
Found 1 items
718 2017-11-09 10:35 /apps/hive/warehouse/xyz.db/table_tmp/000000_0 What I understand is that a query running (involving the file in example), for example, since 10:15 and still executing at 10:35 does not garantee a good execution (but I can presume the file, especially because it is small here, will have already been processed in a first stage of the M/R process). Is that so ? I am wondering if OVERWRITE is a good way to build intermediate table in this case... Without LOCK functionnality enabled, do you suggest a better way ?
... View more
11-07-2017
10:24 AM
Hi, In my organization, Hive is used with the hive.support.concurrency setted to false. I am wondering what are the consequences about inserting datas during a select (and vice versa). At insert, I think the table's metadatas are updated at the very end of the Map/Reduce job. Thus, a select should be not disturbed, because I think files involved by the select are determined at the very beginning of the M/R job... For an insert overwrite, I think this is pretty similar, but I didn't find a confirmation during my research... Could you validate (or not ;)) my thoughts ? Thanks 🙂
... View more
Labels:
- Labels:
-
Apache Hive