About sooraj_antony

sooraj_antony · ‎04-30-2018

I am having the similar error. Trying to run HDP sandbox using docker for the first time. Got a similar error. Both the below files are not present. /var/log/ doesnt even have a folder named ambari-server Server out at: /var/log/ambari-server/ambari-server.out Server log at: /var/log/ambari-server/ambari-server.log

sooraj_antony · ‎04-27-2018

Repairing the table might help by connecting the HDFS data partitions to the meta store. Try this, MSCK REPAIR TABLE parti;

sooraj_antony · ‎03-12-2018

I have a views table joining with a temp table with the below parameters intentionally enabled. hive.auto.convert.join=true; hive.execution.engine=tez; The Code Snippet is, CREATE TABLE STG_CONVERSION AS SELECT CONV.CONVERSION_ID, CONV.USER_ID, TP.TIME, CONV.TIME AS ACTIVITY_TIME, TP.MULTI_DIM_ID, CONV.CONV_TYPE_ID, TP.SV1 FROM VIEWS TP JOIN SCU_TMP CONV ON TP.USER_ID = CONV.USER_ID WHERE TP.TIME <= CONV.TIME; In the normal scenario, both the tables can have any number of records. However,in the SCU_TMP table, only 10-50 records are expected with the same User Id. But in some cases, couple of User IDs come with around 10k-20k records in SCU Temp table, which creates a cross product effect. In such cases, it'll run for ever with just 1 mapper to complete. Is there any way to optimise this and run this gracefully?

sooraj_antony · ‎08-27-2016

Thats perfect!!! But still just curious to know how it works fine when the string is taken from a table and "mm" in format.

sooraj_antony · ‎08-26-2016

The below query fetches UNIX_TIMESTAMP of the same time string but one of them is hardcoded and other fetched from a table. select distinct UNIX_TIMESTAMP(TIME, 'yyyy-mm-dd HH:mm:ss'),UNIX_TIMESTAMP('2015-08-22 00:00:32', 'yyyy-mm-dd HH:mm:ss') from clicks where time='2015-08-22 00:00:32'; Both the fields are supposed to give the same result as the time string is same. But the output is, _c0 _c1 1440201632 1421884832 Is there any reason why it differs? Is there any workaround?

sooraj_antony · ‎12-26-2015

The table is in ORC and haven't tried SMB, but the process is getting stuck at the last reducer and the whole processing runs in a single node while other nodes are idle. Can you explain the function of "hive.map.aggr" parameter?

sooraj_antony · ‎12-26-2015

Table STAGE_SOURCE is already in ORC format.

sooraj_antony · ‎12-17-2015

@Neeraj Sabharwal @Jean-Philippe Player Explain Plan file is attached..

sooraj_antony · ‎12-13-2015

The below query takes a lot of time to execute. It is run with tez execution engine. SELECT STG.EMP_TYPE,DEPT,COUNT(DISTINCT EMP_ID) AS COUNT, A.TOTAL_COUNT FROM STAGE_SOURCE STG LEFT OUTER JOIN (SELECT EMP_TYPE,COUNT(DISTINCT EMP_ID) AS TOTAL_COUNT FROM STAGE_SOURCE GROUP BY EMP_TYPE) A ON STG.EMP_TYPE = A.EMP_TYPE GROUP BY STG.EMP_TYPE,DEPT,A.TOTAL_COUNT; Is there any rewrite option or optimization strategy which can improve the query performance? The subquery with alias "A" itself takes 2-3hrs to execute. Attaching the explain plan of just the join subquery "A" explain-plan.txt.

Online	Offline
Last Visited	‎08-01-2018 11:20 AM

Member Since	‎12-10-2015 10:14 AM
Last Visited	‎08-01-2018 11:20 AM
Posts	10
Kudos received	3

Cloudera Community

Re: REASON: Ambari Server java process has stopped...

Re: hive partition table issue while copying into ...

Hive + Tez :: A join query stuck at last 2 mappers...

Re: UNIX_TIMESTAMP function returns different valu...

UNIX_TIMESTAMP function returns different values w...

Re: Optimize a long running hive query - has a joi...

Re: Optimize a long running hive query - has a joi...

Re: Optimize a long running hive query - has a joi...

Optimize a long running hive query - has a join wi...