Member since
08-03-2019
186
Posts
34
Kudos Received
26
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1908 | 04-25-2018 08:37 PM | |
5832 | 04-01-2018 09:37 PM | |
1545 | 03-29-2018 05:15 PM | |
6656 | 03-27-2018 07:22 PM | |
1947 | 03-27-2018 06:14 PM |
03-17-2018
05:37 AM
@bernie zhang Can you try using raj_ops as username and password on Ambari login page?
... View more
03-15-2018
05:48 PM
May be silly, but do you have any data at all in ngmss.company? Are you able to see it via Hive?
... View more
03-15-2018
05:47 PM
When you say you want to connect to mysql service, do you mean you want to login to MySQL shell? Or you are trying to connect MySQL over http?
... View more
03-14-2018
04:30 PM
1 Kudo
There are a couple of issues that I can see with your script. Your first statement that reads the data from the file. my_data = LOAD 'customers.txt'usingPigStorage()as(name:chararray, age:int, eye_color:chararray, height:int); You used PigStorage() method without any parameter. If you don't pass any parameter to this method, it will consider TAB as the delimiter. And looking at your data file, you have a comma as the delimiter. So your LOAD statement should look like follows. my_data = LOAD 'customers.txt'usingPigStorage(',')as(name:chararray, age:int, eye_color:chararray, height:int); This actually is not the problem that you are facing though. In your last statement, where you are creating the final_data relation, you referred to your columns as SUM(brown_eyes) as num_brown_eyes,SUM(blue_eyes) as num_blue_eyes SUM(green_eyes) as num_green_eyes This is incorrect. A describe statement should explain the schema to you. A describe statement should explain the schema for you. grunt> describe by_age;
by_age: {group: int,my_data: {(name: chararray,age: int,eye_color: chararray,height: int)}} You can see that all the columns are clubbed inside my_data column. So the reference to these columns should be made as mentioned below. SUM(my_data.brown_eyes) as num_brown_eyes,SUM(my_data.blue_eyes) as num_blue_eyesSUM(my_data.green_eyes) as num_green_eyes The same way you have used my_data.height in your code. So you final generate statement should look like as follows. final_data = FOREACH by_age GENERATE groupas age, COUNT(my_data)as num_people, AVG(my_data.height)as avg_height, SUM(my_data.brown_eyes)as num_brown_eyes, SUM(my_data.blue_eyes)as num_blue_eyes, SUM(my_data.green_eyes)as num_green_eyes; All in all, your complete script should look like as shown below. my_data = LOAD 'customers.txt'usingPigStorage(',')as(name:chararray, age:int, eye_color:chararray, height:int);
my_data = FOREACH my_data GENERATE name, age, height,(eye_color =='brown'?1:0) AS brown_eyes,(eye_color =='blue'?1:0) AS blue_eyes,(eye_color =='green'?1:0) AS green_eyes;
by_age =group my_data by age;
final_data = FOREACH by_age GENERATE groupas age, COUNT(my_data)as num_people, AVG(my_data.height)as avg_height, SUM(my_data.brown_eyes)as num_brown_eyes, SUM(my_data.blue_eyes)as num_blue_eyes, SUM(my_data.green_eyes)as num_green_eyes; Now you know what were the issues, you will be able to run your script and also prevent those "typos" in future! Happy coding!
... View more
03-13-2018
05:57 PM
Hi Jasper You can use the JoltTransformJson processor to get it done. Follows how should your processor config look like. Follows the complete Jolt specification. [
{ "operation": "shift",
"spec": {
"agent-submit-time": "agent_submit_time",
"agent-end-time": "agent_end_time",
"agent-name": "agent_name",
"*": {
"@": "&"
}
}
}
] Follows a snippet of the output I got using your input. Hope that helps.
... View more
03-13-2018
02:43 PM
I guess you need to drop these expressions one at a time. Using multiple ReplaceText processors. For example for the first pattern, you can use the replace text as follows. Similarly, you can replace your patterns in the "Search Value" text box with following expressions. (?s)(\\\"\[)
(?s)(\\) Hope that helps!
... View more
03-07-2018
10:18 PM
This is not a straightforward implementation. Though there are workarounds available. Have a look at this article on the community which talks about using a Stored Procedure to do the stuff. This may cost you some performance but will do the needful.
... View more
03-07-2018
02:32 PM
@Dmitro Vasilenko The only thing your log is telling is GC information. Can you please share some more details about your query and job logs from yarn?
... View more
03-07-2018
02:30 PM
You may want to look at this answer.
... View more
03-05-2018
02:46 PM
"i put a csv file into hdfs location and do an alter table to add that new location to the partition". Can you please explain this operation?
... View more
- « Previous
- Next »