Member since
03-20-2016
21
Posts
5
Kudos Received
0
Solutions
03-26-2016
05:40 PM
Thanks for the quick response. I am going to use putty to ssh in - so is it correct to type into the Host Name on putty: my public DNS for sandbox:8080. I have done that and logged in as root when prompted and then entered "hadoop" as the password. I am now getting an access denied message. Any ideas? Thanks!
... View more
03-26-2016
05:09 PM
Hi Artem - Apologies as this question might seem elementary but I am very new to Sandbox. Where can I access the terminal? Thank you.
... View more
03-26-2016
04:17 PM
Hi all, I have been using Hive on Sandbox for the past few days. It was working fine up until yesterday when I noticed that my queries were taking an unusually long time to run or, more annoyingly, not running at all. On further investigation, I checked the 'History' tab and noticed that there are a large number of queries which are still running. I have been trying to terminate/kill the sessions without success (It will say "stopping" but never turns to killed). I have also tried rebooting and redeploying my VM. Does anyone know how I can stop all running processes in Hive? Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Hive
03-24-2016
10:07 PM
Hi - I am having the same problem and have restarted my VM to no success. My tables have still not reappeared even though it has been some time. Can anyone help? Thanks
... View more
03-24-2016
08:33 PM
Hi there, I am new to the Hive QL language and am trying to solve the following problem. I have a set of data with user Id's, each with a corresponding score. An example of the kind of data I have is below: stackdata_clean.owneruserid stackdata_clean.score 1 5 2 6 3 5 1 4 2 4 I want to find the top 10 users by score. In other words, I want code to make a table like the below and then pick the top 10 users with the highest aggregate score from it: stackdata_clean.owneruserid stackdata_clean.score 2 10 1 9 3 5 My table name is stackdata_clean and the code I am trying to use is: SELECT stackdata_clean.owneruserid,
SUM(stackdata_clean.score) over(PARTITION BY stackdata_clean.owneruserid)
FROM stackdata_clean
GROUP BY stackdata_clean.owneruserid
ORDER BY sum(stackdata_clean.score)DESC LIMIT 10; I am being returned the following error: Error while compiling statement: FAILED:
SemanticException Failed to breakup Windowing invocations into Groups.
At least 1 group must only depend on input columns. Also check for
circular dependencies.
Underlying error: org.apache.hadoop.hive.ql.parse.SemanticException:
Line 2:20 Invalid column reference 'score' [ERROR_STATUS] Can anyone help solve this problem? Any help is greatly appreciated! Thanks in advance 🙂
... View more
Labels:
- Labels:
-
Apache Hive
03-21-2016
07:04 PM
1 Kudo
Ah - understood now. This worked! Thank you 🙂
... View more
03-21-2016
06:16 PM
1 Kudo
Brilliant - that works. Thanks!
... View more
03-21-2016
11:14 AM
1 Kudo
Hi Artem - thanks for the response. Can you please explain " As long as you use HDP and you have pig client installed on your edgenode"? - Is this something additional I need to do/install? I cannot locate the folder "/usr" on hdfs. Thanks for your help, Maeve
... View more
03-21-2016
12:14 AM
Hi all, I am new to Sandbox and am trying to run Pig on Microsoft Azure. To load one of my tables, I need to use the piggybank jar. I have downloaded this and saved it to hdfs in the path tmp/stackexchange Here is the code I am trying to run: REGISTER /tmp/stackexchange/piggybank.jarRAW_LOGS1 = LOAD Query_1-50000.csv USING org.apache.pig.piggybank.storage.CSVExcelStorage(',', YES_MULTILINE) as (Id:Long, PostTypeID:chararray, AcceptedAnswerID:chararray, ParentID:chararray, CreationDate:chararray, DeletionDate:chararray, Score:long, ViewCount:long, Body:chararray, OwnerUserID:chararray, OwnerDisplayName:chararray, LastEditorUserId:chararray, LastEditorDisplayName:chararray, LastEditDate:chararray, LastActivityDate:chararray, Title:chararray, Tags:chararray, AnswerCount:int, CommentCount:int, FavoriteCount:int, ClosedDate:chararray, CommunityOwnedDate:chararray); However, I am being returned the error message: 2016-03-20 17:22:48,506 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 101: file '/tmp/stackexchange/piggybank.jar' does not exist. Does anyone know what could be wrong? Am I missing a step required to register the piggybank file perhaps? Any help is greatly appreciated - thanks in advance.
... View more
Labels:
- Labels:
-
Apache Pig
- « Previous
-
- 1
- 2
- Next »