Member since
05-23-2017
28
Posts
9
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
16871 | 06-16-2017 12:14 PM |
08-29-2018
11:38 AM
mysql -u root -p hortonworks1
... View more
08-17-2018
10:35 AM
Hi @sudhir reddy I have the same use case and i am currently stuck at the same point. Were you able to fix your issue. If yes, could you please reply back.
... View more
10-09-2017
11:15 AM
Thanks So much.. Worked like a charm!!
... View more
10-05-2017
07:35 PM
Desc <Table_name>; you'll come to know your table would have columns like _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9.. What you need to do is Add the alias to your existing table so that your column names are the actual names and not default _col1, etc.
... View more
07-07-2017
07:48 AM
@prsingh Hi, You're my life saviour. I tried checking the logs on the schedular website but just couldnt find it. and then I used this Yarn log and saved it into a log.txt file and there you go. Found the exact error that I was looking for. Thank you so much. 🙂
... View more
07-01-2017
12:53 PM
Try changing the execution engine to MR.. Run your query.. If its working fine. Change the execution engine to tez. this is how you do it. set hive.execution.engine=mr;
set hive.execution.engine=tez;
... View more
06-21-2017
12:44 PM
@Sonny Heer Apologies but I din get your question. Can you provide me with an example?
... View more
06-20-2017
06:25 AM
@Sonny Heer I think you can do that. Instead of this: Select B.b,B.key,ROW_NUMBER() OVER (partition by key) AS row_num from B)where row_num=1
You can use Select B.b,B.key,ROW_NUMBER() OVER (count by key) AS row_num from B)where row_num=1
Though I am not very sure, but Hive documentation says you can use standard aggregate in Over function. Check the link below: Hive Documentation Cheers, Sagar
... View more
06-20-2017
06:19 AM
@Bala Vignesh N V: I have never heard this thing. Windowing Function works only on 1 reducer... Thats only for Order by if I am not wrong.. Can you share any documentation If you have for the same.
... View more
06-19-2017
09:04 AM
2 Kudos
@Sateesh Mandumula Obviously You cannot have transactional Properties on top of external table as Hive just reads the file present in HDFS. Its not suppose to change the content of the External files. If you really want to use Transactional Property, Create an internal table select * from external table and use transaction properly on this internal table.
... View more
06-19-2017
07:20 AM
@Sonny Heer So, we definitely need to use Sub-query in any case ( Group by or Windowing). And yes, Windowing is much faster than Group By, For the simple logic, Say you have 1 million rows, Group by will 1st Sort the data and then Group by the Key, whereas Windowing will just Sort and give you the 1st entry. However, If your dataset is not large enough, you can live with Group by. It will hardly make any difference. Can you please try and run both the queries (Windowing & Group by) and check a couple of things: 1. No. of Map task /Reduce tasks in both the queries. 2. If the Time Difference between 2 queries is more than 2 Mins, or it's almost the same.
... View more
06-16-2017
12:14 PM
Hi @Sonny Heer, So what I understand from your query is you've got multiple tables say A,B,C,D,etc and your selecting a query joining on A left join B left join C , etc and there are Multiple entries in table B,C,D for the Key matching with A. If this is the case, What I would suggest you is to use Windowing function. Select A.a,B.b,C,c
from A left join
(Select * from
( Select B.b,B.key,ROW_NUMBER() OVER (partition by key) AS row_num from B)
where row_num=1) B
on A.key = B.key
and so on..
Try this out and let me know if it was helpful. Cheers, Sagar
... View more
06-16-2017
08:07 AM
Hi @Simran Kaur If you want to use this within the script, you can do the following. set hivevar:DATE=current_date;
INSERT OVERWRITE DIRECTORY '/user/xyz/reports/oos_table_sales/${DATE}' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' SELECT * FROM outputs.oos_table_sale; Cheers, Sagar
... View more
06-12-2017
09:35 PM
Hi @rama The below query will make your query run faster. insert into table dropme_master_6
select * from dropme_master_5 a
left outer join dropme_master_6 b
on a.consumer_sequence_id = b.consumer_sequence_id
where b.consumer_sequence_id is null;
... View more
06-12-2017
01:48 PM
Hi @rama , Please change your query to this: insert into table dropme_master_6
select * from dropme_master_5 a
left outer join dropme_master_6 b
on a.consumer_sequence_id = b.consumer_sequence_id
where b.consumer_sequence_id is null;
I am pretty confident this will improve your performance. Please let me know if it works and give me a thumps up. 🙂 Regards, Sagar Morakhia
... View more
06-12-2017
01:48 PM
@Ravi Chinni insert into some_test_table
select 'c1_val',named_struct('c2_a',array(cast (null as string)),'c2_c',cast (null as string)),array(named_struct('c3_a',cast (null as string) ,'c3_b',cast (null as string))) from z_dummy;
This will work for you. Needless to say, please upvote if the answer was useful. 🙂
... View more
06-12-2017
01:08 PM
Same here.. I am having the same issue.. 😞
... View more
06-11-2017
08:48 AM
Can you provide your sample input entry.?
... View more
06-10-2017
09:53 AM
Hi @rama , First of all, I would suggest you to kill such jobs after 2-2.5 Hours, especially when your job finishes in half n hour on a normal day. 1 probable cause could be any other job is utilizing 90+% CPU, hence slowing down your job process. If you can provide me your entire query, I may be able to provide you few set parameters which will help running the query faster. Cheers,
Sagar
... View more
06-10-2017
09:48 AM
Perfect answer!
... View more
05-25-2017
01:22 PM
@bkosaraju Will try this and update you. Thanks 🙂
... View more
05-25-2017
01:02 PM
@Timothy Spann Thats something I am already donig, but my problem is with the transfer speed.
... View more
05-25-2017
01:00 PM
Thanks @Christophe Vico, Ill try with the 1st option and update you.
... View more
05-25-2017
12:49 PM
Hi, Nice Article. I found a faster way of doing the same from Official documentation of sqoop. https://oozie.apache.org/docs/4.1.0/DG_WorkflowReRun.html So generally, rerunning the Oozie jobs are ad-hoc tasks and you may not want to create xml file just for re-running the job. So command line argument goes as below: oozie job -oozie http://localhost:11000/oozie -rerun 14-20090525161321-oozie-joe -Doozie.wf.rerun.skip.nodes=<>
Example for the same oozie job -oozie http://localhost:11000/oozie -rerun 14-20090525161321-oozie-joe -Doozie.wf.rerun.skip.nodes=action1,action2,action3
where http://localhost:11000/oozie --> host where Oozie is running 14-20090525161321-oozi-joe --> is your Oozie Job name action1,action2,action3 --> are the steps that you want to skip. It is eventually doing the same thing as mentioned in the article, but with this, we don't have to create the config file. Cheers, Sagar
... View more
05-23-2017
09:27 AM
1 Kudo
Hi All, I am trying to move Sql server table to Hive using Spark Program. The table in SqlServer is around 100GB. Hence, it's taking lots of time(around 2 hours) for the table to be created in Hive. Is there any way wherein we can compress the data while/before fetching the table from SQL Server and decompress once it reaches hive? That way, complete 100GB need not be transferred. Only the compressed data (probably 30-40GB) would be transferred and may be the job will run faster.? Thanks in Advance. Regards, Sagar
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark