About maeve_ryan226

maeve_ryan226 · ‎03-26-2016

Thanks for the quick response. I am going to use putty to ssh in - so is it correct to type into the Host Name on putty: my public DNS for sandbox:8080. I have done that and logged in as root when prompted and then entered "hadoop" as the password. I am now getting an access denied message. Any ideas? Thanks!

maeve_ryan226 · ‎03-26-2016

Hi Artem - Apologies as this question might seem elementary but I am very new to Sandbox. Where can I access the terminal? Thank you.

maeve_ryan226 · ‎03-26-2016

Hi all, I have been using Hive on Sandbox for the past few days. It was working fine up until yesterday when I noticed that my queries were taking an unusually long time to run or, more annoyingly, not running at all. On further investigation, I checked the 'History' tab and noticed that there are a large number of queries which are still running. I have been trying to terminate/kill the sessions without success (It will say "stopping" but never turns to killed). I have also tried rebooting and redeploying my VM. Does anyone know how I can stop all running processes in Hive? Thanks in advance.

maeve_ryan226 · ‎03-26-2016

That worked - thanks a lot!

maeve_ryan226 · ‎03-24-2016

Hi - I am having the same problem and have restarted my VM to no success. My tables have still not reappeared even though it has been some time. Can anyone help? Thanks

maeve_ryan226 · ‎03-24-2016

Hi there, I am new to the Hive QL language and am trying to solve the following problem. I have a set of data with user Id's, each with a corresponding score. An example of the kind of data I have is below: stackdata_clean.owneruserid stackdata_clean.score 1 5 2 6 3 5 1 4 2 4 I want to find the top 10 users by score. In other words, I want code to make a table like the below and then pick the top 10 users with the highest aggregate score from it: stackdata_clean.owneruserid stackdata_clean.score 2 10 1 9 3 5 My table name is stackdata_clean and the code I am trying to use is: SELECT stackdata_clean.owneruserid, SUM(stackdata_clean.score) over(PARTITION BY stackdata_clean.owneruserid) FROM stackdata_clean GROUP BY stackdata_clean.owneruserid ORDER BY sum(stackdata_clean.score)DESC LIMIT 10; I am being returned the following error: Error while compiling statement: FAILED: SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies. Underlying error: org.apache.hadoop.hive.ql.parse.SemanticException: Line 2:20 Invalid column reference 'score' [ERROR_STATUS] Can anyone help solve this problem? Any help is greatly appreciated! Thanks in advance 🙂

maeve_ryan226 · ‎03-21-2016

Ah - understood now. This worked! Thank you 🙂

maeve_ryan226 · ‎03-21-2016

Brilliant - that works. Thanks!

maeve_ryan226 · ‎03-21-2016

Hi Artem - thanks for the response. Can you please explain " As long as you use HDP and you have pig client installed on your edgenode"? - Is this something additional I need to do/install? I cannot locate the folder "/usr" on hdfs. Thanks for your help, Maeve

maeve_ryan226 · ‎03-21-2016

Hi all, I am new to Sandbox and am trying to run Pig on Microsoft Azure. To load one of my tables, I need to use the piggybank jar. I have downloaded this and saved it to hdfs in the path tmp/stackexchange Here is the code I am trying to run: REGISTER /tmp/stackexchange/piggybank.jarRAW_LOGS1 = LOAD Query_1-50000.csv USING org.apache.pig.piggybank.storage.CSVExcelStorage(',', YES_MULTILINE) as (Id:Long, PostTypeID:chararray, AcceptedAnswerID:chararray, ParentID:chararray, CreationDate:chararray, DeletionDate:chararray, Score:long, ViewCount:long, Body:chararray, OwnerUserID:chararray, OwnerDisplayName:chararray, LastEditorUserId:chararray, LastEditorDisplayName:chararray, LastEditDate:chararray, LastActivityDate:chararray, Title:chararray, Tags:chararray, AnswerCount:int, CommentCount:int, FavoriteCount:int, ClosedDate:chararray, CommunityOwnedDate:chararray); However, I am being returned the error message: 2016-03-20 17:22:48,506 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 101: file '/tmp/stackexchange/piggybank.jar' does not exist. Does anyone know what could be wrong? Am I missing a step required to register the piggybank file perhaps? Any help is greatly appreciated - thanks in advance.

Online	Offline
Last Visited	‎04-05-2017 08:47 AM

Member Since	‎03-20-2016 08:58 PM
Last Visited	‎04-05-2017 08:47 AM
Posts	21
Kudos received	5

Cloudera Community

Re: How to shut down/kill all queries in Hive

Re: How to shut down/kill all queries in Hive

How to shut down/kill all queries in Hive

Re: Hive QL - Aggregating within a group

Re: H100 Unable to submit statement show databases...

Hive QL - Aggregating within a group

Re: piggybank jar file does not exist

Re: piggybank jar file does not exist

Re: piggybank jar file does not exist

piggybank jar file does not exist