Member since
03-07-2018
9
Posts
0
Kudos Received
0
Solutions
06-19-2018
01:36 AM
Hi Experts, Can anyone please help me with below error? I am writing a small code to understand how UDF works. The java code is as below:
package UDF; import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.Text; @SuppressWarnings("deprecation") public class MyUpper extends UDF{ public Text evaluate(final Text s){ if (s==null){return null;} return new Text(s.toString().toUpperCase()); } } //Class path of above class is: /Hive/src/UDF/MyUpper.java I have created a jar file for this class as test.jar and added the jar in hive. hive> ADD JAR test.jar; Added [test.jar] to class path Added resources: [test.jar] Now I am trying to create a temporary function upper by typecasting the java function MyUpper. Below is my command. create temporary function upper as 'Hive/src/UDF/MyUpper.java'; or I have tried below command also. create temporary function upper as 'Hive/src/UDF/MyUpper.java'; Both the commands above is giving me error as below: FAILED: Class Hive.src.UDF.MyUpper.java not found FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
... View more
Labels:
- Labels:
-
Apache Hive
03-15-2018
06:52 PM
I have to check how to re-direct output to a file. The output is generating the count of airports only. It is not showing country due to data issues.
... View more
03-15-2018
06:01 PM
output.jpg
... View more
03-15-2018
05:23 PM
Thanks guys for the help. I have used below script. The only modification I have done is, I have removed alias d for airports table. Can any one please advise if keeping alias gives any advantage even though I am getting same result? @Aditya Sirna @Yassine @Nde Gerald Awa select c.countryName, c.airportcount from (select country as countryName, count(*) as airportcount from airports group by country ) c where c.airportcount IN (select max(f.airportcount) from (select count(*) as airportcount from
airports cnt group by cnt.country)f);
... View more
03-15-2018
05:27 AM
Hi, I have a table, "airports" with below schema in Hive airport_id string name string city string country string iata_faa string icao string latitude string longitude string altitude string timezone string dst string tz string I want to find the country with max number of airports. I have created below script that is giving me the max number of airport in a country but I am not able to print the country name along with it. Can I get a help pls? select max(c.airportcount)
from (select count(airport_id) as airportcount,country from airports group by country) c;
... View more
- Tags:
- Hadoop Core
- Hive
Labels:
- Labels:
-
Apache Hive
03-14-2018
11:34 PM
Thanks a lot Aditya! I executed the corrected code and got desired output! (30,1,6.0,1,0,0) (34,2,5.5,2,0,0) (35,3,5.333333333333333,1,0,2) (38,1,6.0,0,1,0) Thanks again for explaining the mistake I was making and correcting it!
... View more
03-14-2018
08:48 PM
Thanks a lot Aditya and Rahul! I executed the corrected code and got desired output! (30,1,6.0,1,0,0) (34,2,5.5,2,0,0) (35,3,5.333333333333333,1,0,2) (38,1,6.0,0,1,0) Thanks again for explaining the mistake I was making and correcting it!
... View more
03-14-2018
01:19 AM
I have a dataset as below: abhi,34,brown,5 john,35,green,6 amy,30,brown,6 Steve,38,blue,6 Brett,35,brown,6 Andy,34,brown,6 Layout of above data set is, Name, age, eye color, height I want to achieve a result which shows in each age group how many people r there in total, the average height of all people in each age group and how many people are with brown eyes, black eyes and blue eyes in each age group. The result should look like below 34, 2,5.5,2,0,0 35,2,6.0,1,10 and so on.. format of above result set is, <age>, total no of people in that age, avg height in that age group, no of brown eyes in that age group, no of green eyes in the age group, no of blue eyes in the age group. My script is as below: grunt> my_data = LOAD 'customers.txt' using PigStorage() >> as (name:chararray, age:int, eye_color:chararray, height:int); grunt> my_data = FOREACH my_data>> GENERATE name, age, height,>> (eye_color == 'brown' ? 1 : 0) AS brown_eyes,>> (eye_color == 'blue' ? 1 : 0) AS blue_eyes,>> (eye_color == 'green' ? 1 : 0 ) AS green_eyes; grunt> by_age = group my_data by age; grunt> final_data = FOREACH by_age GENERATE >> group as age, >> COUNT(my_data) as num_people,>> AVG(my_data.height) as avg_height,>> SUM(brown_eyes) as num_brown_eyes,>> SUM(blue_eyes) as num_blue_eyes,>> SUM(green_eyes) as num_green_eyes; I am getting below error after the last line of the script is executed: 2018-03-14 00:44:54,181 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025: <line 22, column 8> Invalid field projection. Projected field [brown_eyes] does not exist in schema: group:int,my_data:bag{:tuple(name:chararray,age:int,height:int,brown_eyes:int,blue_eyes:int,green_eyes:int)}. The schema of the by_age relation clearly shows it contains the field brown_eyes but why I am still getting this error and how can I resolve it please?
... View more
- Tags:
- Hadoop Core
- Pig
Labels:
- Labels:
-
Apache Pig