Member since
01-12-2016
123
Posts
12
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
426 | 12-12-2016 08:59 AM |
11-28-2016
04:07 PM
I want to get the distinct exchanges for each symbol. I already have pig script for that but i have some clarifications which is mentioned in orginal post
... View more
11-28-2016
03:55 PM
Hi @Karthik Narayanan Thanks for input.see my clarifications in orginal post.I am looking for inputs on those things
... View more
11-28-2016
01:24 PM
I am new to pig and any input is really appreciated source file: Exchange,Symbol,date,open,high,low,close,volume,adj_close
NASDAQ,JDAS,2010-01-29,26.91,27.53,26.02,26.21,883100,26.21
NASDAQ,JDAS,2010-01-28,29.86,27.97,26.84,26.88,1272600,26.88
NASDAQ,JDAS,2010-01-27,27.48,27.93,27.20,27.68,560100,27.68
ICICI,JDAS,2010-02-08,25.41,26.59,25.15,26.46,488900,26.46
ICICI,JDAS,2010-01-29,26.91,27.53,26.02,26.21,883100,26.21
ICICI,JDAS,2010-01-28,27.86,27.97,26.84,26.88,1272600,26.88
NASDAQ,JDAS,2010-01-29,26.91,27.53,26.02,26.21,883100,26.21
NASDAQ,JDAS,2010-01-28,27.86,27.97,26.84,26.88,1272600,26.88
NASDAQ,JDAS,2010-01-27,27.48,27.93,27.20,27.68,560100,27.68
NASDAQ,JDAS,2010-02-08,25.41,26.59,25.15,26.46,488900,26.46
NASDAQ,JDAS,2010-02-05,25.42,25.84,24.94,25.49,1121700,25.49
NASDAQ,JDAS,2010-02-04,26.53,26.61,25.46,25.46,574900,25.46
NASDAQ,JDAS,2009-12-31,25.97,26.13,25.47,25.47,283600,25.47
NASDAQ,JDAS,2009-12-30,25.74,26.25,25.61,26.05,236300,26.05
NASDAQ,JDAS,2009-12-29,25.98,25.98,25.52,25.76,238600,25.76
NASDAQ,JDAS,2009-11-30,23.39,23.65,22.78,23.48,522000,23.48
NASDAQ,JDAS,2009-11-27,23.12,23.71,23.10,23.54,144900,23.54
NASDAQ,JDAS,2009-11-25,23.96,24.00,23.59,23.82,220400,23.82
NASDAQ,JOEZ,2010-01-29,1.68,1.69,1.60,1.60,158900,1.60
NASDAQ,JOEZ,2010-01-28,1.64,1.70,1.61,1.62,250700,1.62
NASDAQ,JOEZ,2010-01-27,1.73,1.76,1.63,1.64,329200,1.64
NASDAQ,JOEZ,2010-01-26,1.70,1.76,1.66,1.70,509100,1.70
NASDAQ,JOEZ,2010-01-25,1.64,1.68,1.60,1.68,169600,1.68
NASDAQ,JOEZ,2010-02-08,1.80,2.04,1.76,1.93,1712200,1.93
NASDAQ,JOEZ,2010-02-05,1.84,1.88,1.70,1.80,1044700,1.80
NASDAQ,JOEZ,2010-02-04,1.96,1.97,1.74,1.88,3758600,1.88
NASDAQ,JOEZ,2010-02-03,1.73,1.79,1.68,1.72,1211700,1.72
NASDAQ,JOEZ,2010-02-02,1.59,1.72,1.51,1.70,909400,1.70
NASDAQ,JOEZ,2009-07-15,1.00,1.05,0.75,0.81,1215200,0.81
NASDAQ,JOEZ,2009-07-14,0.80,0.95,0.80,0.93,580000,0.93
NASDAQ,JOEZ,2009-07-13,0.80,0.83,0.75,0.79,148100,0.79
NASDAQ,JOEZ,2009-05-06,0.56,0.67,0.55,0.58,83800,0.58
NASDAQ,JOEZ,2009-05-05,0.63,0.63,0.58,0.58,68700,0.58
NASDAQ,JOEZ,2009-05-04,0.62,0.68,0.60,0.63,134400,0.63
x = LOAD '/home/prime23/source.txt' using PigStorage(',') As (exchange:chararray, symbol:chararray, date:chararray, open:double, high:double, low:double, close:double, volume:long, adj_close:double);
query:-For each symbol get me all distinct exchanges:
y = GROUP x by symbol;
z1 = foreach y {
t = distinct x.exchange;
generate group, t;
}
clarifications: 1)Here we have two symbols(JOEZ,JDAS) so nested foreach will iterate for two times.Please correct me if i am wrong? 2)How to get schema of t relation.describe is not working.
3)last statement is not clear: y relation contains only(group,x) fields.How can we select t field which is not present in y relation.
... View more
Labels:
11-25-2016
10:34 AM
grunt> uniqcnt_1 = foreach uniq_sym_1 generate COUNT(uniq_sym_1.symbol); 2016-11-25 00:35:49,902 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse: <line 10, column 46> Invalid scalar projection: uniq_sym_1 Details at logfile: /home/naresh/Work1/hadoop-1.2.1/bin/pig_1480051207097.log grunt> describe uniq_sym_1; uniq_sym_1: {{(symbol: bytearray)}}
... View more
Labels:
11-23-2016
08:49 AM
1)I have two queries on flatten and cogroup Doubt 1: student_details.txt
001,Rajiv,Reddy,21,9848022337,Hyderabad
002,siddarth,Battacharya,22,9848022338,Kolkata
003,Rajesh,Khanna,22,9848022339,Delhi
004,Preethi,Agarwal,21,9848022330,Pune
005,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar
006,Archana,Mishra,23,9848022335,Chennai
007,Komal,Nayak,24,9848022334,trivendram
008,Bharathi,Nambiayar,24,9848022333,Chennai
employee_details.txt
001,Robin,22,newyork
002,BOB,23,Kolkata
003,Maya,23,Tokyo
004,Sara,25,London
005,David,23,Bhuwaneshwar
006,Maggy,22,Chennai grunt> cogroup_data = COGROUP student_details1 by age, employee_details by age; grunt> dump cogroup_data;
(21,{(1,Rajiv,Reddy,21,9848022337,Hyderabad),(4,Preethi,Agarwal,21,9848022330,Pune)},{})
(22,{(2,siddarth,Battacharya,22,9848022338,Kolkata),(3,Rajesh,Khanna,22,9848022339,Delhi)},{(1,Robin,22,newyork ),(6,Maggy,22,Chennai)})
(23,{(5,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar),(6,Archana,Mishra,23,9848022335,Chennai)},{(2,BOB,23,Kolkata ),(3,Maya,23,Tokyo ),(5,David,23,Bhuwaneshwar )})
(24,{(7,Komal,Nayak,24,9848022334,trivendram),(8,Bharathi,Nambiayar,24,9848022333,Chennai)},{})
(25,{},{(4,Sara,25,London )})
join_cogroup = FOREACH cogroup_data GENERATE group,FLATTEN(student_details1).
(25,{},{(4,Sara,25,London )}).I also need this record in the output of join_cogroup how to get that record? Doubt 2:- grunt> coustomer_orders = JOIN customers BY id, orders BY customer_id; i want to do samething using cogroup+flatten.As per Pig textbook In fact, cogroup plus foreach, where each bag is flattened, is equivalent to a join—as long as there are no null values in the keys. Tried below thing but not getting required output. join_cogroup = FOREACH cogroup_data GENERATE group,FLATTEN(student_details1);
... View more
Labels:
11-21-2016
10:29 AM
Hi @Greg Keys. 1)after using USING PigStorage() as (str:chararray); Issue is resolved.Thanks for your valuable time.
... View more
11-18-2016
05:20 PM
Hi @Greg Keys Thanks for input.your input is always appreciated.one clarification Then I should get warning during below filter statement but why i got warning during load statement.In load statement i am not converting bytearray to chararray. Then why i got warning during load statement? filter_records = FILTER ya BY $0 MATCHES '.*Hadoop.*';
... View more
11-18-2016
09:44 AM
Below is my source in HDFS:
/abc/ Hadoop is an open source
MR is to process data in hadoop. Hadoop has a good eco system. I want to do below opearation
filter_records = FILTER ya BY $0 MATCHES '.*Hadoop.*';
but load command is unsuccessful.Could anybody provide input on load statement?
grunt> ya = load '/abc/' USING TextLoader();
2016-11-17 21:00:14,470 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 1 time(s).
grunt> yab = load '/abc/';
2016-11-17 21:00:50,199 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 1 time(s).
grunt>
... View more
Labels:
11-17-2016
08:00 AM
Hi @Josh Elser. I got the answer by unix command as mentiond below. Command:-
echo "scan 'emp" | $HBASE_HOME/bin/hbase shell | awk -F'=' '{print $2}' | awk -F ':' '{print $2}'|awk -F ',' '{print $1}'
... View more
11-16-2016
01:13 PM
How do we get the complete list of columns that
exist in a column Family? Let us assume i have 5 columns for personal data column familyand 4 columns are present for professional data column family create 'emp', 'personal data', ’professional data’
... View more
Labels:
11-16-2016
09:51 AM
I tired below command but it is not changing to 777.It is changing to rw-rw-rw- hadoop fs -chmod 777 -R /vamsi/part-m-00003
... View more
11-15-2016
02:01 PM
ok.In that case 768(256*3)-30=738MB is used for other files or is it waste of memory?
... View more
11-15-2016
01:23 PM
If a file of size 10 MB is copied on to HDFS of block size 256 MB, then how much storage will be allocated to the file on HDFS ?
Let us assume replication factor is 3 so it will occupy (256 MB)*3 on HDFS.Please correct me if i am wrong?
... View more
Labels:
11-14-2016
09:05 AM
part-m-00003 is file not directory.-R is to change all files in a directory.Please correct me if i am wrong? hadoop fs -chmod 777 -R /vamsi/part-m-00003
... View more
11-14-2016
08:36 AM
1 Kudo
a)I want to change the permissions of file:-part-m-00001 to 777.The owner for this file is naresh. The first two commands with sudo is showing command not found whereas hadoop fs -chmod 777 /vamsi/part-m-00003 command changes permissions to rw-rw-rw-but i want it to change to 777(rwxrwxrwx) naresh@ubuntu:~/Work1/hadoop-1.2.1/bin$ sudo -u root hadoop fs -chmod 777 /vamsi/part-m-00001
sudo: hadoop: command not found
naresh@ubuntu:~/Work1/hadoop-1.2.1/bin$ sudo -u naresh hadoop fs -chmod 777 /vamsi/part-m-00001
sudo: hadoop: command not found
naresh@ubuntu:~/Work1/hadoop-1.2.1/bin$ hadoop fs -chmod 777 /vamsi/part-m-00003
Warning: $HADOOP_HOME is deprecated.
... View more
Labels:
11-10-2016
03:50 PM
Thanks for input but still having clarification on below point. A lateral view first applies the UDTF to each row of base table and then joins resulting output rows to the input rows to form a virtual table having the supplied table alias A lateral view first applies the UDTF to each row of base table means it will return 1 2 3 3 4 5 and then joins resulting output rows to the input rows to form a virtual table having the supplied table alias.? --Not clear How it will Join? How 1 will match with [1,2,3] source:- https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView
... View more
11-10-2016
08:24 AM
Hi @Scott Shaw. 1)I can get the to get Collapsed version using above url.It is working.Thanks but how we will do this in Prod environment. This is manual work.Is there anyway to automate this? 2)Regarding https://github.com/rcongiu/Hive-JSON-Serde initially I do not see any target folder in json-serde,json-udf. I ran all the steps as you mentioned previously.Now so many jar files are present in target directory of json-serde,json-udf. Not sure which one i need to consider.Both screenshots are attached with this comment.Could you please guide me which jar file i need to use?Please correct me if i miss anything json-serde-target:- json-udf target:-
... View more
11-09-2016
03:21 PM
1)Thanks for inputs.I will try and let you know. 2)one more clarification how to get Collapsed version?If it is small json document means i will do it let us assume a big jsob document is present?How to get collapses version if source file is in below format. {
"DocId": "ABC",
"User": {
"Id": 1235,
"Username": "fred1235",
"Name": "Fred",
"ShippingAddress": {
"Address1": "456 Main St.",
"Address2": "",
"City": "Durham",
"State": "NC"
}
}
}
{
"DocId": "ABC",
"User": {
"Id": 1236,
"Username": "larry1234",
"Name": "Larry",
"ShippingAddress": {
"Address1": "789 Main St.",
"Address2": "",
"City": "Durham",
"State": "NC",
"PostalCode": "27713"
},
"Orders": [
{
"ItemId": 1111,
"OrderDate": "11/11/2012"
},
{
"ItemId": 2222,
"OrderDate": "12/12/2012"
}
]
}
}
Collapsed version: {"DocId":"ABC","User":{"Id":1235,"Username":"fred1235","Name":"Fred","ShippingAddress":{"Address1":"456 Main St.","Address2":"","City":"Durham","State":"NC"}}}
{"DocId":"ABC","User":{"Id":1236,"Username":"larry1234","Name":"Larry","ShippingAddress":{"Address1":"789 Main St.","Address2":"","City":"Durham","State":"NC","PostalCode":"27713"},"Orders":[{"ItemId":1111,"OrderDate":"11/11/2012"},{"ItemId":2222,"OrderDate":"12/12/2012"}]}
... View more
11-09-2016
02:01 PM
1 Kudo
i am trying json with hive using below link http://thornydev.blogspot.in/2013/07/querying-json-records-via-hive.html But i do not find any jar file in the link:- https://github.com/rcongiu/Hive-JSON-Serde
... View more
- Tags:
- Data Processing
- Hive
Labels:
11-08-2016
12:27 PM
a)what is the source for the tables:-test_csv_serde_using_CSV_Serde_reader,test_csv_serde; which one i need to consider? b)Is it same for both the tables? Option 1 :
col1 col2 col3
----------------------
121 Hello World 4567
232 Text 5678
343 More Text 6789
Option 2 :
121|Hello World|4567|
232|Text|5678|
343|More Text|6789|
Option 3 :
121 'Hello World' 4567
232 Text 5678
343 'More Text' 6789
... View more
11-07-2016
03:58 PM
2 Kudos
a)I am trying to understand below query in link: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView SELECT pageid, adid FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid; I know about explode but i did not understand the output of above query.Could anybody elaborate LATERAL view on this. In manual:- A lateral view first applies the UDTF to each row of base table and then joins resulting output rows to the input rows to form a virtual table having the supplied table alias. A lateral view first applies the UDTF to each row of base table means it will return 1 2 3 3 4 5 and then joins resulting output rows to the input rows to form a virtual table having the supplied table alias.? --Not clear
... View more
Labels:
11-04-2016
01:34 PM
where to get this jar file com.ibm.spss.hive.serde2.xml.XmlSerDe?
... View more
11-03-2016
11:38 AM
Thanks for quick response. 0: jdbc:hive2://hdp224.local:10000/default> !sh hdfs dfs -ls / is to run the HDFS commands How to run Linux shell commands from beeline i mean how we will run below command in beeline hive>!pwd;
... View more
11-03-2016
11:16 AM
1)what is the difference between these two commands? To get table names in a database which one i need to use? 0: jdbc:hive2://> show tables; or 0: jdbc:hive2://> !tables 2)How to run HDFS ,Unix commands in beeline?
... View more
Labels:
11-03-2016
10:54 AM
a)Thanks for input. when i use below sql statement on table t2 It will display the data present in location:-/apps/hive/warehouse/default.db/t2 .Please correct me if i am wrong select * from t2; b)when i use below sql statement on table t4 It will display the data present in location:-/apps/hive/warehouse/default.db/t4/b=1 .Please correct me if i am wrong select * from t4 where b='1'; If answer is Yes for both the questions how to load the data in above paths.
... View more