Member since
01-19-2015
25
Posts
0
Kudos Received
0
Solutions
12-08-2017
04:37 AM
Hi, AFAIK `yarn logs` command could be used to view aggregated logs of finsihed YARN applications. For ones that not finished yet, you had to either use YARN UI or ssh to node managers. However on Hortonworks page I see their yarn logs works for running apps as well already: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_yarn-resource-management/content/ch_yarn_cli_view_running_applications.html Is there any plan/way to make it work for cloudera as well? Thanks!
... View more
Labels:
- Labels:
-
Apache YARN
10-08-2017
05:22 PM
Thanks for the info. From my point of view there would be two ways of approaching this problem: 1) Provide "read-only" access on per-service basis 2) Create some sort of "monitoring" user than can only see non-data related stuff, ie metrics and statuses As noted above, looks like current version can hide Impala queries, but now we are blocked by issue of being able to see Hive queries in logs (I know they could be redacted, but that's besides the point). Will be looking forward to grantular access features then, which hopefully will solve all the above - is there any predicted date for the release?
... View more
10-05-2017
08:49 AM
Hi, I am trying to figure out how to make hadoop stream metrics to influxdb (using metrics2) framework, but the docs are extremly unclear how to do it. From what I read in Ganglia I need to use AdvancedConfigurationSnippet to add extra properties in each service i want metrics for (I will put mine in YARN RM and HDFS NN). Is it it? I believe I also need to put InfluxDB somewhere on some classpath (https://github.com/arnobroekhof/hadoop-metrics-influxdb) - any idea how do i do it?
... View more
Labels:
- Labels:
-
Apache YARN
-
Cloudera Manager
-
HDFS
10-05-2017
06:43 AM
Got quite excited by this, but customer devs pointed out, that read-only user can still see Hive logs (redacting them is on their roadmap) and possibly Hive queries, so its still no go 😞 Sight, looks like my next week project is implementing own metric monitor for hadoop...
... View more
10-03-2017
08:12 AM
Hello, Our client has Cloudera Enterprise physical cluster. Our software depends on the cluster to perform work for our client's user. To be able to properly monitor the whole platform and better debug performance issues, we wanted to ask client to give us read-only access to Cloudera Manager interface. Unfortunately there is a catch - the data itself is confidential and we shouldn't be able to see it. Read-only user role in CM would do, as all we really want are performance metrics and monitoring. *Unfortunately Read-Only is also able to see Impala queries being executed* - which can contain sensitive data themselves e.g. inside literals. That's why we can't be granted it. Would it be possible to restrict Read-Only users to only sub-set of services? In our case we don't care about monitoring Imapala at all, we would just want to be able to monitor Hosts, HDFS, YARN and Spark.
... View more
Labels:
- Labels:
-
Apache Impala
-
Cloudera Manager
02-09-2015
09:37 AM
Good read! Administrators are sometimes surprised that modifying /etc/hadoop/conf and then restarting HDFS has no effect Oh yes. Ok, I think I understand now where server-side configuration comes from. I can find it in CM, altough I still have a bit problem with finding it on a filesystem. When I go to my namenode I see this: root@node9:/var/run/cloudera-scm-agent/process# ls -qlrt | grep NAME
drwxr-x--x 3 hdfs hdfs 420 Sep 28 12:40 4035-hdfs-NAMENODE
drwxr-x--x 3 hdfs hdfs 420 Sep 28 13:17 4091-hdfs-NAMENODE-refresh
drwxr-x--x 3 hdfs hdfs 420 Sep 28 13:18 4092-hdfs-NAMENODE-monitor-decommissioning
drwxr-x--x 3 hdfs hdfs 420 Sep 28 13:24 4097-hdfs-NAMENODE-refresh
drwxr-x--x 3 hdfs hdfs 420 Sep 28 13:25 4098-hdfs-NAMENODE-monitor-decommissioning
drwxr-x--x 3 hdfs hdfs 420 Sep 28 13:25 4100-hdfs-NAMENODE-refresh
drwxr-x--x 3 hdfs hdfs 420 Sep 28 13:30 4149-hdfs-NAMENODE-refresh
drwxr-x--x 3 hdfs hdfs 420 Sep 28 13:30 4150-hdfs-NAMENODE-monitor-decommissioning
drwxr-x--x 3 hdfs hdfs 420 Sep 28 13:30 4152-hdfs-NAMENODE-refresh
drwxr-x--x 3 hdfs hdfs 420 Sep 28 13:46 4167-hdfs-NAMENODE-createdir
drwxr-x--x 3 hdfs hdfs 420 Sep 28 13:50 4185-hdfs-NAMENODE
drwxr-x--x 3 hdfs hdfs 420 Jan 26 16:19 4785-hdfs-NAMENODE-refresh
drwxr-x--x 3 hdfs hdfs 420 Jan 26 16:19 4787-hdfs-NAMENODE-monitor-decommissioning Soo, which of these contains hdfs-site.xml of my currently running NameNode?
... View more
02-06-2015
01:40 PM
Got it. We will go this way, ironically it turned out that due to some regulatory stuff, downloading raw data from our system shouldn't bee too easy, so... we are going for good old 'it's not a bug, it's a feature' 😉 FYI, i also tried this : beeline -u jdbc:hive2://hname:10000 -n bla -p bla -f query.q > results.txt but it didn't do much, just hanged. Maybe hive2 (or beeline?) isn't powerful enough as well. Thanks for all the clarifications!
... View more
02-06-2015
01:08 PM
While i am here, you could also bold subjects of unread messages in the inbox (or mark them somehow). I was having "2 unread messages", but had no idea which ones are those...
... View more
02-06-2015
01:06 PM
Well, this is embarassing - I just saw in my cluster Cloudera Manager that security (dfs.permissioning) is false... That explains everything. HOWEVER: the reason i was confused is I couldn't see this property set in any of conf files (grep dfs.permissions /etc/hadoop/conf/*.xml). And according to documentation the default value is true. Could anyone please let me know then where does this property get overriden?
... View more
01-28-2015
11:23 AM
I see. Maybe then there should be also some option like "execute and save to hdfs", where Hue doesnt dump results to the browser, but puts them in one file in HDFS directly? So user can get it by other means? I recently managed to store results and then download 600 MB csv file in HDFS using Hue and it kinda worked (9 milions lines, new record). Altough few minutes the service went down (not sure if because of it, or because i just started presenting Hue to my boss) so not sure if this would work. I guess we gonna instructl users to always use LIMIT clause on their quiries, telling that this is to avoid overloading our servers (which is technically true). Thanks for your help!
... View more
01-28-2015
11:13 AM
How? I see there is some "options" and arrow next to it, but when i click it, it just scrolls me to the top of the page and nothing happens
... View more
01-28-2015
07:28 AM
I can download gigs of data from google drive or file hosting websites using my browser, why wouldn't it be possible here? This means my only alternative is to tell users to install hive and tell to run something like beeline -u jdbc:hive2://bla:10000 -n user -p password -f yourscript.q > yourresults.txt which is a bit crap... (not to mention until Hive 13 beeline doesnt report any progress on the operation). Or let them log to my server directly and wreak havoc there 😕 All that Hue gives you already is awesome, but it needs to do more!
... View more
01-28-2015
07:05 AM
But i dont need to see that data in a browser, i just want to download it on my PC...
... View more
01-28-2015
05:59 AM
I often make dypos, and then there is no way to correct them, but write another post....
... View more
01-28-2015
05:58 AM
I read the documentation about permissioning in hdfs and it says: 1) by default Hadoop checks "groups" output of linux command for a user 2) "supergroup" is default super users group My root directory looks like this: drwxr-xr-x - hdfs supergroup 0 2015-01-27 23:08 / So i would assume only hdfs and users belonging to supergroup would be able to create directory under it. But there is no "supergroup" group on any of my boxes! There is ony "hadoop" one, that contains hdfs, yarn and mapred. And basically any user that i create, can execute hdfs commands, like hdfs dfs -mkdir /blabla and do whatever he wants. The files created will have set him as an owner, and supergroup as a group. Even though he doesnt belong to "supergroup" neither to "hadoop". How does it work then? And is there some simple way to prevent it and make it work as in docs, i.e. make hadoop listen to linux permissions (the only access to cluster is through box managed by us anyway, so this would be enough).
... View more
Labels:
- Labels:
-
HDFS
01-28-2015
05:34 AM
Hi,
If I run query in Hue that returns huge amount of rows, is it possible to download them through UI? I tried it using Hive query and .csv, download was succesful, but it turned out the file had exactly 100000001 rows, while actual result should be bigger. Is 100 milion some kind of limit - if so could it be lifted?
I was also thinking about storing results in HDFS and downloading them through file browser, but the problem is that when you click "save in HDFS", the whole query runs again from scratch, so effectively you need to run it twice to be able to do it (and i haven't checked if result would be stored as one file and if Hue could download it).
In short, is such a use case possible in Hue?
... View more
Labels:
- Labels:
-
Cloudera Hue
01-27-2015
05:46 AM
ah, tarballs section. Thanks!
... View more
01-26-2015
01:37 PM
Hi, I want to check which versions of every hadoop component (i.e. which version of Hive, HDFS, HBase etc.) come in newest CDH, but I can't seem to find it. I looked into release notes, but no luck, is it written down anywhere maybe?
... View more
Labels:
01-26-2015
06:52 AM
I see. I was asking because in 0.14 they fixed a bug, which was apparently introduced in 12 or earlier (https://issues.apache.org/jira/browse/PIG-3985) and I spent whole day fighting with. I tried to upgrade myself, built 14 with flag for Hadoop 2, but got a lots of warnings and then ant test wasn't passing. Therefore I decided for now we will stick to Cloudera-approved 12 and just use the workaround describe in that JIRA. Thanks for your help!
... View more
01-19-2015
07:06 AM
I need Pig installed on my Ubuntu server, currently it uses cloudera distribution: pig_0.12.0+cdh5.3.0+46-1.cdh5.3.0.p0.24~precise-cdh5.3.0_all.deb However I need to upgrade to Pig 14, and it doesn't seem there is one there. Any chance to get it from Cloudera or do i need to set it up by myself? Doing the latter wouldn't be that hard, but I would prefer to keep my environment consistent and easier to mantain.
... View more
Labels:
- Labels:
-
Apache Pig