Member since
01-12-2016
123
Posts
12
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1501 | 12-12-2016 08:59 AM |
01-01-2017
04:58 AM
HI @Greg Keys Happy New year.Could you please provide below two clarifications. clarification 1:-
Let us say my input is:-
1;(7287026502032012,18);{(706),(707)};{(101200010),(101200011)};{(17286),(17287)};{(oz),(oz1)};2.5
The expression for data_flattened is same and in that case whether my understanding is correct?
Is below output is correct?
Output:-
1;7287026502032012,18;706,707;101200010,101200011;17286,17287;oz,oz1;2.5
clarification 2:-
Let us say my input is:-
1;(7287026502032012,18);{(706),(707)};{(101200010),(101200011)};{(17286),(17287)};{(oz),(oz1)};2.5
data_flattened_1 = FOREACH data GENERATE
$0,
FLATTEN ($1),
FLATTEN($2),
FLATTEN($3),
FLATTEN($4),
FLATTEN($5),
$6;
The expression for data_flattened_1 is mentioned above and in that case whether my understanding is correct?
Is below output is correct?
Output:-
1;7287026502032012,18;706;101200010;17286;oz;2.5
1;7287026502032012,18;707;101200011;17287;oz1;2.5
... View more
12-31-2016
05:49 AM
a)could you please provide source for this link and it is really useful b)what about these two in the diagram and where it will come?
grouping shuffle and merge:each reducer will take one partition from all map tasks and merge together
... View more
12-30-2016
01:18 PM
1 Kudo
what is the order of execution for mapreduce Job? Is it correct and please correct me if i am wrong? Mapper
partition each mapper output
sorting with in each partition based on key
grouping
shuffle and merge:each reducer will take one partition from all map tasks and merge together
combiner
reducer
... View more
Labels:
- Labels:
-
Apache Hadoop
12-29-2016
03:07 PM
HI @Michael M For first option:- In production can I place below command in shell script and schedule that script using crontab so that it will run the Flume will run continuously since In production environment we are not allowed to run any command manually on gateway node.Please correct me if i am wrong? nohup <my_command> &
... View more
12-29-2016
02:28 PM
HI @Michael M Thanks alot for your time.one small clarification You mentioned good approach is to keep Flume running all the time. And schedule oozie jobs to process the data whenever you need. clarification 1:- How to keep Flume running all the time?currently i am using below command on my gateway node. flume-ng agent --conf $FLUME_CONF_DIR --conf-file $FLUME_CONF_DIR/flume.conf --name Agent7
... View more
12-27-2016
03:19 PM
a)I am starting flume agent using below command.In production how we will trigger this command currently I am running manually on unix command prompt and also i want to create dependeny with hive? b)can i place below command in unix shell script and call it in shell action in oozie? flume-ng agent --conf $FLUME_CONF_DIR --conf-file $FLUME_CONF_DIR/flume.conf --name Agent7
... View more
12-27-2016
11:52 AM
1 Kudo
We are using Flume to get the data into HDFS.After that we are running pig, hive for data transformation.Not sure how to trigger flume from oozie?
... View more
Labels:
- Labels:
-
Apache Flume
-
Apache Oozie
12-16-2016
03:19 PM
Hi @Eyad Garelnabi Thanks it answers my question but in oozie textbook they mentioned we can calculate current(0) using formulae current(0) = dsII + dsF * (0 + (caNT – dsII) / dsF). What is the problem with my calculation since i am not able to get the 2014-10-18T06:00Z with the formalue.
... View more
12-16-2016
12:21 PM
Hi @Eyad Garelnabi Thanks for Input regarding current(0) one clarification is there.When i check oozie textbook below formula is present current(n) = dsII + dsF * (n + (caNT – dsII) / dsF) current(0) = dsII + dsF * (n + (caNT – dsII) / dsF)
= 2014-10-06T06:00Z + 3 day x
(0 + (2014-10-19T06:00Z - 2014-10-06T06:00Z))/ 3 day
= 2014-10-06T06:00Z + 3 day *(13)/3
= 2014-10-06T06:00Z +(13)=2014-10-19T06:00Z
but when i check textbook page 127 they mentioned as 2014-10-18T06:00Z not sure what i am missing.
... View more