Member since
05-23-2017
28
Posts
10
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
26878 | 06-16-2017 12:14 PM |
06-19-2017
07:20 AM
@Sonny Heer So, we definitely need to use Sub-query in any case ( Group by or Windowing). And yes, Windowing is much faster than Group By, For the simple logic, Say you have 1 million rows, Group by will 1st Sort the data and then Group by the Key, whereas Windowing will just Sort and give you the 1st entry. However, If your dataset is not large enough, you can live with Group by. It will hardly make any difference. Can you please try and run both the queries (Windowing & Group by) and check a couple of things: 1. No. of Map task /Reduce tasks in both the queries. 2. If the Time Difference between 2 queries is more than 2 Mins, or it's almost the same.
... View more
06-16-2017
12:14 PM
Hi @Sonny Heer, So what I understand from your query is you've got multiple tables say A,B,C,D,etc and your selecting a query joining on A left join B left join C , etc and there are Multiple entries in table B,C,D for the Key matching with A. If this is the case, What I would suggest you is to use Windowing function. Select A.a,B.b,C,c
from A left join
(Select * from
( Select B.b,B.key,ROW_NUMBER() OVER (partition by key) AS row_num from B)
where row_num=1) B
on A.key = B.key
and so on..
Try this out and let me know if it was helpful. Cheers, Sagar
... View more
06-16-2017
08:07 AM
Hi @Simran Kaur If you want to use this within the script, you can do the following. set hivevar:DATE=current_date;
INSERT OVERWRITE DIRECTORY '/user/xyz/reports/oos_table_sales/${DATE}' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' SELECT * FROM outputs.oos_table_sale; Cheers, Sagar
... View more
06-12-2017
09:35 PM
Hi @rama The below query will make your query run faster. insert into table dropme_master_6
select * from dropme_master_5 a
left outer join dropme_master_6 b
on a.consumer_sequence_id = b.consumer_sequence_id
where b.consumer_sequence_id is null;
... View more
06-12-2017
01:48 PM
Hi @rama , Please change your query to this: insert into table dropme_master_6
select * from dropme_master_5 a
left outer join dropme_master_6 b
on a.consumer_sequence_id = b.consumer_sequence_id
where b.consumer_sequence_id is null;
I am pretty confident this will improve your performance. Please let me know if it works and give me a thumps up. 🙂 Regards, Sagar Morakhia
... View more
06-12-2017
01:48 PM
@Ravi Chinni insert into some_test_table
select 'c1_val',named_struct('c2_a',array(cast (null as string)),'c2_c',cast (null as string)),array(named_struct('c3_a',cast (null as string) ,'c3_b',cast (null as string))) from z_dummy;
This will work for you. Needless to say, please upvote if the answer was useful. 🙂
... View more
06-11-2017
08:48 AM
Can you provide your sample input entry.?
... View more
06-10-2017
09:53 AM
Hi @rama , First of all, I would suggest you to kill such jobs after 2-2.5 Hours, especially when your job finishes in half n hour on a normal day. 1 probable cause could be any other job is utilizing 90+% CPU, hence slowing down your job process. If you can provide me your entire query, I may be able to provide you few set parameters which will help running the query faster. Cheers,
Sagar
... View more
06-10-2017
09:48 AM
Perfect answer!
... View more
05-25-2017
12:49 PM
Hi, Nice Article. I found a faster way of doing the same from Official documentation of sqoop. https://oozie.apache.org/docs/4.1.0/DG_WorkflowReRun.html So generally, rerunning the Oozie jobs are ad-hoc tasks and you may not want to create xml file just for re-running the job. So command line argument goes as below: oozie job -oozie http://localhost:11000/oozie -rerun 14-20090525161321-oozie-joe -Doozie.wf.rerun.skip.nodes=<>
Example for the same oozie job -oozie http://localhost:11000/oozie -rerun 14-20090525161321-oozie-joe -Doozie.wf.rerun.skip.nodes=action1,action2,action3
where http://localhost:11000/oozie --> host where Oozie is running 14-20090525161321-oozi-joe --> is your Oozie Job name action1,action2,action3 --> are the steps that you want to skip. It is eventually doing the same thing as mentioned in the article, but with this, we don't have to create the config file. Cheers, Sagar
... View more
- « Previous
-
- 1
- 2
- Next »