Member since
03-20-2016
21
Posts
5
Kudos Received
0
Solutions
04-04-2017
05:47 PM
@amankumbare Please see attached screenshot. As you can see 18.8GB is being used. This was the same amount before I deleted the table. I would have expected this to decrease by about 9GB once I droped and removed the table from trash.
... View more
04-04-2017
02:54 PM
Hi all, I have been using Hive on sandbox for a college project. Everything was working fine up until yesterday, when I noticed that the memory was running out due to the data I am using being rather large. To free up memory I dropped a table I no longer need and then went into Files_View/user/admin/.Trash/<table name> and deleted the table from the trash folder. However after doing so the HDFS memory is still full and has not reduced at all. I then checked the following Files_View/Apps/hive/warehouse/<database im using>/<table_name> to ensure that the table was deleted and it is gone. Does anyone know how I can permanently delete the table so that the memory also frees up within the HDFS? Thanks in advance.
... View more
Labels:
04-17-2016
06:59 PM
@Benjamin Leonhardi - This was indeed part of the reason. Thank you very much for your help!
... View more
04-17-2016
06:58 PM
@Jitendra Yadav Thanks very much for your help. This issue has been resolved.
... View more
04-14-2016
03:54 PM
@Jitendra Yadav Thanks for the quick response! Are these things I can do on the Hortonworks console or do I need to ssh into the instance? I am new to Hadoop so apologies if the above question seems elementary!
... View more
04-14-2016
03:32 PM
Hi all, I have a query in relation to Space Allocation within HDFS. I am currently trying to run a large query in Hive (a wordsplit on a large file). However, I am unable to complete this due to running out of disk space. I have deleted any unnessary files from HDFS and have reduced my starting Disk Usage to 38%. However, I am wondering what non DFS is as this appears to be taking up the majority of my disk space. How can I go about reducing the disk space that Non DFS takes up? Any help is greatly appreciated. Thanks in advance.
... View more
Labels:
03-29-2016
09:21 AM
Thanks a lot Benjamin - I did realise after posting the above that I needed a UDF to use with the rank function on its own. It's working now so thank you.
... View more
03-28-2016
01:10 AM
1 Kudo
Hi all, I have a table with the fields user_id and value and I want to order the values in descending order within each user_id and then only emit the top 100 records for each user_id. This is the code I am attempting to use: DROP TABLE IF EXISTS mytable2
CREATE TABLE mytable2 AS
SELECT * FROM
(SELECT *, rank (user_id) as rank
FROM
(SELECT * from mytable
DISTRIBUTE BY user_id
SORT BY value DESC)a )b
WHERE rank<101
ORDER BY rank;
However when I run this query, I get the following error: Error while compiling statement: FAILED: SemanticException [Error 10247]: Missing over clause for function : rank [ERROR_STATUS] Can anyone help? Thanks in advance.
... View more
Labels:
03-27-2016
10:08 PM
Thanks Scott - my problem is sorted now!
... View more
03-27-2016
09:37 PM
Issue is sorted now - now I finally know how to use putty! Thanks all
... View more
03-26-2016
05:40 PM
Thanks for the quick response. I am going to use putty to ssh in - so is it correct to type into the Host Name on putty: my public DNS for sandbox:8080. I have done that and logged in as root when prompted and then entered "hadoop" as the password. I am now getting an access denied message. Any ideas? Thanks!
... View more
03-26-2016
05:09 PM
Hi Artem - Apologies as this question might seem elementary but I am very new to Sandbox. Where can I access the terminal? Thank you.
... View more
03-26-2016
04:17 PM
Hi all, I have been using Hive on Sandbox for the past few days. It was working fine up until yesterday when I noticed that my queries were taking an unusually long time to run or, more annoyingly, not running at all. On further investigation, I checked the 'History' tab and noticed that there are a large number of queries which are still running. I have been trying to terminate/kill the sessions without success (It will say "stopping" but never turns to killed). I have also tried rebooting and redeploying my VM. Does anyone know how I can stop all running processes in Hive? Thanks in advance.
... View more
Labels:
03-26-2016
09:43 AM
1 Kudo
Hi all, I am trying to perform a version of the word count function in Hive. Ii have the following fields: Owner_key and Post. I want to split the post into its individual words and then group by each UserId along with giving a count of each word. For example, say if this was my data: Owner_key Post 1 apple orange apple 2 melon kiwi
I would like the following output: Owner_key word count 1 apple 2 1 orange 1 2 melon 1 2 kiwi 1 The code I have attempted is below. Hive is not necessarily giving me an error message; however it never shows me any results even when the status is at 100%. Can anyone help? Thanks in advance. SELECT owner_key, word,
count(*)FROM stackdata_updtd
LATERAL VIEW explode(split(lower(post), '\\W+')) t1 AS word
GROUP BY owner_key, word;
... View more
Labels:
03-24-2016
10:07 PM
Hi - I am having the same problem and have restarted my VM to no success. My tables have still not reappeared even though it has been some time. Can anyone help? Thanks
... View more
03-24-2016
08:33 PM
Hi there, I am new to the Hive QL language and am trying to solve the following problem. I have a set of data with user Id's, each with a corresponding score. An example of the kind of data I have is below: stackdata_clean.owneruserid stackdata_clean.score 1 5 2 6 3 5 1 4 2 4 I want to find the top 10 users by score. In other words, I want code to make a table like the below and then pick the top 10 users with the highest aggregate score from it: stackdata_clean.owneruserid stackdata_clean.score 2 10 1 9 3 5 My table name is stackdata_clean and the code I am trying to use is: SELECT stackdata_clean.owneruserid,
SUM(stackdata_clean.score) over(PARTITION BY stackdata_clean.owneruserid)
FROM stackdata_clean
GROUP BY stackdata_clean.owneruserid
ORDER BY sum(stackdata_clean.score)DESC LIMIT 10; I am being returned the following error: Error while compiling statement: FAILED:
SemanticException Failed to breakup Windowing invocations into Groups.
At least 1 group must only depend on input columns. Also check for
circular dependencies.
Underlying error: org.apache.hadoop.hive.ql.parse.SemanticException:
Line 2:20 Invalid column reference 'score' [ERROR_STATUS] Can anyone help solve this problem? Any help is greatly appreciated! Thanks in advance 🙂
... View more
Labels:
03-21-2016
07:04 PM
1 Kudo
Ah - understood now. This worked! Thank you 🙂
... View more
03-21-2016
06:16 PM
1 Kudo
Brilliant - that works. Thanks!
... View more
03-21-2016
11:14 AM
1 Kudo
Hi Artem - thanks for the response. Can you please explain " As long as you use HDP and you have pig client installed on your edgenode"? - Is this something additional I need to do/install? I cannot locate the folder "/usr" on hdfs. Thanks for your help, Maeve
... View more
03-21-2016
12:14 AM
Hi all, I am new to Sandbox and am trying to run Pig on Microsoft Azure. To load one of my tables, I need to use the piggybank jar. I have downloaded this and saved it to hdfs in the path tmp/stackexchange Here is the code I am trying to run: REGISTER /tmp/stackexchange/piggybank.jarRAW_LOGS1 = LOAD Query_1-50000.csv USING org.apache.pig.piggybank.storage.CSVExcelStorage(',', YES_MULTILINE) as (Id:Long, PostTypeID:chararray, AcceptedAnswerID:chararray, ParentID:chararray, CreationDate:chararray, DeletionDate:chararray, Score:long, ViewCount:long, Body:chararray, OwnerUserID:chararray, OwnerDisplayName:chararray, LastEditorUserId:chararray, LastEditorDisplayName:chararray, LastEditDate:chararray, LastActivityDate:chararray, Title:chararray, Tags:chararray, AnswerCount:int, CommentCount:int, FavoriteCount:int, ClosedDate:chararray, CommunityOwnedDate:chararray); However, I am being returned the error message: 2016-03-20 17:22:48,506 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 101: file '/tmp/stackexchange/piggybank.jar' does not exist. Does anyone know what could be wrong? Am I missing a step required to register the piggybank file perhaps? Any help is greatly appreciated - thanks in advance.
... View more
Labels: