Member since
12-10-2015
10
Posts
3
Kudos Received
0
Solutions
04-30-2018
09:01 PM
I am having the similar error. Trying to run HDP sandbox using docker for the first time. Got a similar error. Both the below files are not present. /var/log/ doesnt even have a folder named ambari-server Server out at: /var/log/ambari-server/ambari-server.out Server log at: /var/log/ambari-server/ambari-server.log
... View more
04-27-2018
11:03 PM
Repairing the table might help by connecting the HDFS data partitions to the meta store. Try this, MSCK REPAIR TABLE parti;
... View more
03-12-2018
06:44 PM
I have a views table joining with a temp table with the below parameters intentionally enabled. hive.auto.convert.join=true;
hive.execution.engine=tez;
The Code Snippet is, CREATE TABLE STG_CONVERSION AS
SELECT CONV.CONVERSION_ID,
CONV.USER_ID,
TP.TIME,
CONV.TIME AS ACTIVITY_TIME,
TP.MULTI_DIM_ID,
CONV.CONV_TYPE_ID,
TP.SV1
FROM VIEWS TP
JOIN SCU_TMP CONV ON TP.USER_ID = CONV.USER_ID
WHERE TP.TIME <= CONV.TIME;
In the normal scenario, both the tables can have any number of records. However,in the SCU_TMP table, only 10-50 records are expected with the same User Id. But in some cases, couple of User IDs come with around 10k-20k records in SCU Temp table, which creates a cross product effect. In such cases, it'll run for ever with just 1 mapper to complete. Is there any way to optimise this and run this gracefully?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Tez
08-27-2016
09:06 PM
Thats perfect!!! But still just curious to know how it works fine when the string is taken from a table and "mm" in format.
... View more
08-26-2016
01:16 PM
2 Kudos
The below query fetches UNIX_TIMESTAMP of the same time string but one of them is hardcoded and other fetched from a table.
select distinct UNIX_TIMESTAMP(TIME, 'yyyy-mm-dd HH:mm:ss'),UNIX_TIMESTAMP('2015-08-22 00:00:32', 'yyyy-mm-dd HH:mm:ss') from clicks where time='2015-08-22 00:00:32';
Both the fields are supposed to give the same result as the time string is same. But the output is,
_c0 _c1 1440201632 1421884832 Is there any reason why it differs? Is there any workaround?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Tez
12-26-2015
08:28 PM
The table is in ORC and haven't tried SMB, but the process is getting stuck at the last reducer and the whole processing runs in a single node while other nodes are idle. Can you explain the function of "hive.map.aggr" parameter?
... View more
12-26-2015
08:25 PM
Table STAGE_SOURCE is already in ORC format.
... View more
12-17-2015
07:33 PM
@Neeraj Sabharwal @Jean-Philippe Player Explain Plan file is attached..
... View more
12-13-2015
08:23 PM
1 Kudo
The below query takes a lot of time to execute. It is run with tez execution engine. SELECT STG.EMP_TYPE,DEPT,COUNT(DISTINCT EMP_ID) AS COUNT, A.TOTAL_COUNT
FROM STAGE_SOURCE STG
LEFT OUTER JOIN
(SELECT EMP_TYPE,COUNT(DISTINCT EMP_ID) AS TOTAL_COUNT FROM STAGE_SOURCE GROUP BY EMP_TYPE) A
ON STG.EMP_TYPE = A.EMP_TYPE
GROUP BY STG.EMP_TYPE,DEPT,A.TOTAL_COUNT; Is there any rewrite option or optimization strategy which can improve the query performance? The subquery with alias "A" itself takes 2-3hrs to execute. Attaching the explain plan of just the join subquery "A" explain-plan.txt.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Tez