Member since
10-01-2015
3933
Posts
1150
Kudos Received
374
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 3485 | 05-03-2017 05:13 PM | |
| 2867 | 05-02-2017 08:38 AM | |
| 3123 | 05-02-2017 08:13 AM | |
| 3089 | 04-10-2017 10:51 PM | |
| 1578 | 03-28-2017 02:27 AM |
02-02-2017
08:04 AM
Yes, all clients are installing on the host
... View more
02-09-2017
03:49 PM
Thanks Artem. I tried downloading the VMware image , uploaded to s3 ,created the ami and then spin the cluster from ami.
... View more
03-06-2017
06:50 AM
Simply set a propert in oozie workflow ie. hadoop property "MAPRED>JOB>QUEUENAME" value would be "YOUR_QUEUE_NAME". this worked for me my oozie workflow is being submitted to a particular QUEUE now. Cheers. , I also had the similar requirement to submit the job in a particular queue. We simply need to change at one place that is oozie workflow settings. Add one hadoop property there as "MAPRED.JOB.QUEUENAME" and Value is "YOUR_QUEUE_NAME". By this way workflow is submitted to defined queue.
... View more
01-17-2017
02:58 PM
1. Only once 2. Use decision property https://www.infoq.com/articles/oozieexample/
... View more
02-01-2017
03:21 AM
1 Kudo
@Vaibhav Kumar
recommendations from my colleagues are valid, you have strings in header row of your CSV documents. You can certainly filter by some known entity but there's a more advanced version of CSV Pig Loader called CSVExcelStorage. It is part of Piggybank library that comes bundled with HDP, hence the register command. You can pass different control parameters to it. Mortar blog is an excellent source of information on working with Pig http://help.mortardata.com/technologies/pig/csv. grunt> register /usr/hdp/current/pig-client/piggybank.jar;
grunt> a = load 'BJsales.csv' using org.apache.pig.piggybank.storage.CSVExcelStorage(',', 'NO_MULTILINE', 'NOCHANGE', 'SKIP_INPUT_HEADER') as (Num:Int,time:int,BJsales:float);
grunt> describe a;
a: {Num: int,time: int,BJsales: float}
grunt> b = limit a 5;
grunt> dump b;
output (1,1,200.1)
(2,2,199.5)
(3,3,199.4)
(4,4,198.9)
(5,5,199.0)
notice I am not filtering any relation, I'm telling the loader to skip header outright, it saves a few key strokes and doesn't waste any cycles processing anything extra.
... View more
12-11-2017
03:50 PM
what should be the url/command when we need to access hadoop jobs for a specified time duration ?
... View more
06-08-2018
09:58 AM
@Qi Wang Could you please mention the version of Atlas in which the fix has been provided. I am using Atlas 0.8.0 and HDP2.6.4 and I am facing same issue. Could you please help.
... View more