Hi,
If I run query in Hue that returns huge amount of rows, is it possible to download them through UI? I tried it using Hive query and .csv, download was succesful, but it turned out the file had exactly 100000001 rows, while actual result should be bigger. Is 100 milion some kind of limit - if so could it be lifted?
I was also thinking about storing results in HDFS and downloading them through file browser, but the problem is that when you click "save in HDFS", the whole query runs again from scratch, so effectively you need to run it twice to be able to do it (and i haven't checked if result would be stored as one file and if Hue could download it).
In short, is such a use case possible in Hue?
Created 01-28-2015 07:35 AM
Created 01-28-2015 05:36 AM
Errata, the file had only 1 milion lines, not 100 milions
Created 01-28-2015 06:51 AM
Created 01-28-2015 07:05 AM
But i dont need to see that data in a browser, i just want to download it on my PC...
Created 01-28-2015 07:14 AM
Created 01-28-2015 07:28 AM
I can download gigs of data from google drive or file hosting websites using my browser, why wouldn't it be possible here?
This means my only alternative is to tell users to install hive and tell to run something like
beeline -u jdbc:hive2://bla:10000 -n user -p password -f yourscript.q > yourresults.txt
which is a bit crap... (not to mention until Hive 13 beeline doesnt report any progress on the operation). Or let them log to my server directly and wreak havoc there 😕
All that Hue gives you already is awesome, but it needs to do more!
Created 01-28-2015 07:35 AM
Created 01-28-2015 11:23 AM
I see. Maybe then there should be also some option like "execute and save to hdfs", where Hue doesnt dump results to the browser, but puts them in one file in HDFS directly? So user can get it by other means? I recently managed to store results and then download 600 MB csv file in HDFS using Hue and it kinda worked (9 milions lines, new record). Altough few minutes the service went down (not sure if because of it, or because i just started presenting Hue to my boss) so not sure if this would work.
I guess we gonna instructl users to always use LIMIT clause on their quiries, telling that this is to avoid overloading our servers (which is technically true).
Thanks for your help!
Created 01-28-2015 02:49 PM
Created 02-06-2015 01:40 PM
Got it. We will go this way, ironically it turned out that due to some regulatory stuff, downloading raw data from our system shouldn't bee too easy, so... we are going for good old 'it's not a bug, it's a feature' 😉
FYI, i also tried this :
beeline -u jdbc:hive2://hname:10000 -n bla -p bla -f query.q > results.txt
but it didn't do much, just hanged. Maybe hive2 (or beeline?) isn't powerful enough as well.
Thanks for all the clarifications!