Member since
09-11-2015
115
Posts
126
Kudos Received
15
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1059 | 08-15-2016 05:48 PM | |
759 | 05-31-2016 06:19 PM | |
868 | 05-11-2016 03:10 PM | |
522 | 05-10-2016 07:06 PM | |
1985 | 05-02-2016 06:25 PM |
11-12-2015
04:03 PM
The openweathermap example in the Knox Dev Guide looks great as a reference for extending Knox yourself. Do you know where some existing community extensions, like the Falcon or NN/RM UIs, can be found? I checked the Hortonworks Gallery with no luck.
... View more
11-12-2015
02:46 PM
1 Kudo
I figured something like haproxy or nginx would work. Preferably looking for an example config, or if anyone has extended Knox with a custom provider then even better.
... View more
11-11-2015
05:05 PM
1 Kudo
Knox 0.6.0 has built-in support for these 7 services:
WebHDFS WebHCat Oozie HBase Hive Yarn Storm Is there a recommended approach to expose other services from the gateway host? Particularly web UIs, such as Ambari & Ranger.
... View more
Labels:
11-11-2015
04:16 PM
3 Kudos
SYMPTOM: For a Capacity Scheduler queue that specifies some groups in its acl_submit_applications property, a user who is not a member of any of those groups is still able to submit jobs to the queue.
ROOT CAUSE: By default the root queue is allow-all, which results in all child queues defaulting to allow-all. The acl_submit_applications property is described as: The ACL which controls who can submit applications to the given queue. If the given user/group has necessary ACLs on the given queue or one of the parent queues in the hierarchy they can submit applications. ACLs for this property are inherited from the parent queue if not specified. SOLUTION: Set the root queue to deny-all, by entering a "space" for the value. Then set who to allow in the ACL for each child queue. For example: yarn.scheduler.capacity.root.acl_submit_applications=
yarn.scheduler.capacity.root.default.acl_administer_jobs=appdev
yarn.scheduler.capacity.root.default.acl_submit_applications=appdev
yarn.scheduler.capacity.root.system.acl_administer_jobs=dbadmin
yarn.scheduler.capacity.root.system.acl_submit_applications=dbadmin
... View more
- Find more articles tagged with:
- Capacity Scheduler
- Cloud & Operations
- Permissions
- YARN
Labels:
11-11-2015
03:14 PM
Now that the Garbage-First garbage collector is fully supported by Oracle, have we seen anyone using it for production clusters? Is it officially supported by Hortonworks when using Java 8?
... View more
11-10-2015
04:56 AM
3 Kudos
SYMPTOM: Attempting to submit a Pig job via WebHCat that defines parameter(s) for substitution as command-line arguments results in an "incorrect usage" message and job does not run. Doing the same through Hue results in an "undefined parameter" message and job does not run.
ROOT CAUSE: If the parameter is passed to curl as a single argument (-d 'arg=-param paramName=paramValue') it is interpreted incorrectly by Pig. Submitting the parameter via Hue as a single argument has the same unwanted effect. WORKAROUND: Pass the parameter as two arguments: curl -d file=myScript.pig -d 'arg=-param' -d 'arg=paramName=paramValue' 'http://<server>:50111/templeton/v1/pig' To achieve the same using Hue, pass two arguments in sequence (refer to attached image for an example). RESOLUTION: WebHCat works as designed. This issue is a limitation of curl. The Hue workaround is good for a single parameter, however multiple parameters may not work.
... View more
- Find more articles tagged with:
- Data Processing
- error
- hue
- Pig
- WebHCat
Labels:
11-10-2015
02:58 AM
I was mistakenly using the HDP 2.3.0 Sandbox, which uses Ambari 2.1.0. Your advice worked perfectly in the latest version. Thanks!
... View more
11-10-2015
02:56 AM
Ambari attempts to determine whether the demo LDAP server supports paged results, which it does not, so it responds with UNAVAILABLE_CRITICAL_EXTENSION. The demo LDAP server in Knox 0.6.0 (HDP 2.3.0) is based on ApacheDS 2.0.0-M15. Support for paged results was added in version 2.0.0-M13 (DIRSERVER-434), so I'm not sure why this wouldn't work. It's unlikely to be solved by configuration though.
... View more
11-10-2015
02:56 AM
4 Kudos
Here's a complete guide, thanks to @Paul Codding's advice to disable pagination. Requires HDP Sandbox 2.3.2 or later (Ambari 2.1.1+) 1. In Ambari, start the demo LDAP server (Knox gateway is not required):
Knox > Service Actions > Start Demo LDAP 2. Follow the Ambari Security Guide to enable LDAP (press Enter for blank values)... [root@sandbox ~]# ambari-server setup-ldap
Using python /usr/bin/python2.6
Setting up LDAP properties...
Primary URL* {host:port} : sandbox.hortonworks.com:33389
Secondary URL {host:port} :
Use SSL* [true/false] (false): false
User object class* (posixAccount): person
User name attribute* (uid): uid
Group object class* (posixGroup): groupofnames
Group name attribute* (cn): cn
Group member attribute* (memberUid): member
Distinguished name attribute* (dn): dn
Base DN* : dc=hadoop,dc=apache,dc=org
Referral method [follow/ignore] :
Bind anonymously* [true/false] (false): false
Manager DN* : uid=guest,ou=people,dc=hadoop,dc=apache,dc=org
Enter Manager Password* : guest-password
Re-enter password: guest-password
====================
Review Settings
====================
authentication.ldap.managerDn: uid=guest,ou=people,dc=hadoop,dc=apache,dc=org
authentication.ldap.managerPassword: *****
Save settings [y/n] (y)? y
Saving...done
Ambari Server 'setup-ldap' completed successfully.
3. Configure Ambari to disable pagination, and restart Ambari Server: [root@sandbox ~]# echo "authentication.ldap.pagination.enabled=false" >> /etc/ambari-server/conf/ambari.properties
[root@sandbox ~]# ambari-server restart
4. When Ambari startup completes, the objects in /etc/knox/conf/users.ldif are available in Ambari. Here’s a quick reference:
admin / admin-password guest / guest-password sam / sam-password tom / tom-password Note: LDAP accounts with the same names as local accounts will replace the local accounts. The admin password will now be 'admin-password' instead of 'admin' 5. To customize the demo LDAP directory:
In Ambari: Knox > Service Actions > Stop Demo LDAP Edit /etc/knox/conf/users.ldif Start the LDAP server manually (Ambari will overwrite users.ldif) nohup su - knox -c 'java -jar /usr/hdp/current/knox-server/bin/ldap.jar /usr/hdp/current/knox-server/conf' &
Synchronize LDAP Users & Groups (see console output below)... [root@sandbox ~]# ambari-server sync-ldap --all
Using python /usr/bin/python2.6
Syncing with LDAP...
Enter Ambari Admin login: admin
Enter Ambari Admin password: admin-password
Syncing all...
Completed LDAP Sync.
Summary:
memberships:
removed = 0
created = 2
users:
updated = 0
removed = 1
created = 3
groups:
updated = 2
removed = 0
created = 0
Ambari Server 'sync-ldap' completed successfully.
... View more
11-09-2015
06:13 PM
1 Kudo
Unfortunately config groups are only applicable when the HS2 instances are on different hosts.
... View more
11-09-2015
06:12 PM
You can manually startup the second HS2 instance and use --hiveconf to override some of the properties from the standard config.
... View more
11-09-2015
04:19 PM
1 Kudo
@Kent Baxley did you have a chance to try using screen before uninstalling Accumulo client? Based on your discovery that redirecting output (sqoop.sh &> /dev/null &) was successful, I would think using screen would also work.
... View more
11-06-2015
05:15 PM
Wow, good catch. Unfortunately I'm still getting the same error with pagination disabled, so maybe it's a different feature that ApacheDS doesn't support: REASON: Caught exception running LDAP sync. [LDAP: error code 12 - Unsupport critical control: 1.2.840.113556.1.4.319]; nested exception is javax.naming.OperationNotSupportedException: [LDAP: error code 12 - Unsupport critical control: 1.2.840.113556.1.4.319]; remaining name 'dc=hadoop,dc=apache,dc=org'
... View more
11-04-2015
04:59 AM
2 Kudos
After studying the basics on Java GC, it seems like the Serial (default) GC would be best for YARN containers (low core:task ratio), and CMS or G1 would be best for long-running services that occupy more memory (master services and some edge servers). Are these assumptions valid? What is recommended for worker services? Is there any situation in the HDP ecosystem where it's recommended to start with ParallelGC or ParallelOldGC? I still hear of people using CMS, but it looks like that is replaced in favor of G1 as of Java 7+. Is there any reason to choose CMS over G1 when the latter is available? Are there additional garbage collectors worth learning about, beyond: Serial, Parallel, ParallelOld, CMS, and G1?
... View more
11-04-2015
04:14 AM
Are the Tez jobs submitting to the same queue as MR jobs? (hive.server2.tez.default.queues, hive.server2.tez.sessions.per.default.queue) How do Tez container settings compare with general YARN container settings?(tez.am.resource.memory.mb, tez.am.java.opts, hive.tez.container.size, hive.tez.java.opts)
... View more
11-02-2015
08:08 PM
Ok, across AWS Regions I understand, but it seems like AZs should have minimal performance impacts (latency isn't much higher) and would provide redundancy for HA. Either way, I'm glad to hear feedback from what is seen in the field and from other providers.
... View more
11-02-2015
03:32 PM
Oops, #2 is answered on the wiki: Flag content that is not appropriate. Replying to abusive, off-topic, or inappropriate content only encourages it – whereas flagging allows removal without providing undue attention. To flag a question or answer, click the “Report” option next to the post. In the dialogue box, select the reason for the flag.
... View more
11-02-2015
03:30 PM
Two additional questions regarding best practices for this community:
When/how are "available votes" replenished? It looks like I started with 20, and after I spend them they slowly increase over time. This falls under best practices because it should influence how people spend their votes. Is there a tag we're using to indicate "attention needed" before AH becomes publicly visible? More like "let's get a second set of eyes to see if this needs scrubbing" as opposed to an aggressive "Violation!"
... View more
11-01-2015
01:04 AM
2 Kudos
Excellent tips, thank you. Is there a guideline for when to add another pair of ZK servers? Cluster size, number of services that use ZK, any services that are particularly demanding, etc?
... View more
10-30-2015
06:22 PM
Thanks, hopefully it will save someone the hassle in the future. In the future, please leave this as a comment rather than a separate answer.
... View more
10-30-2015
06:21 PM
Fixed, thank you
... View more
10-30-2015
05:51 PM
Now that you mention my omission of HIVE_CONF_DIR, I realize it's simpler to override the hive.server2 properties rather than duplicate the entire configs. "Two HS2 instances on a single host" has been updated with this change.
... View more
10-30-2015
05:34 PM
Good catch, thanks. It's corrected now.
... View more
10-30-2015
05:42 AM
2 Kudos
Glad to hear that HIVE-5312 will allow a single HS2 instance to run both modes simultaneously. In the meantime you have a couple options...
Two HS2 instances on a single host, different modes on different ports Two HS2 instances on different hosts, different modes on different or same port Two HS2 instances on a single host Note: the second instance will not be managed by Ambari Start HS2 manually, and override transport mode and port properties: su - hive /usr/hdp/current/hive-server2/bin/hiveserver2 \
-hiveconf hive.metastore.uris=' ' \
-hiveconf hive.server2.transport.mode=http \
-hiveconf hive.server2.thrift.http.port=10001 \
>/var/log/hive/hiveserver2.out 2> /var/log/hive/hiveserver2.log & Alternatively, you may duplicate the config directory[1] and set environment variable HIVE_CONF_DIR instead of overriding the hive.server2 properties with -hiveconf. [1] HDP 2.3+: /etc/hive/conf/conf.server | HDP < 2.3: /etc/hive/conf.server Two HS2 instances on different hosts Note: using Ambari is preferable, however you can apply the manual steps from the previous section for clusters managed by Ambari 1.x or without Ambari
Add a HS2 instance to the desired host using Ambari Add a new Hive config group for the host where the new HS2 instance was deployed Modify the config group properties: hive.server2.transport.mode & hive.server2.thrift.http.port Manage the new HS2 component using Ambari Standard values:
hive.server2.transport.mode=binary & hive.server2.thrift.port=10000 hive.server2.transport.mode=http & hive.server2.thrift.http.port=10001
... View more
10-30-2015
04:42 AM
2 Kudos
What other services are best to colocate on a host with Zookeeper, and how does this change as number of hosts increases? Does it make sense not to run it on a host with HA services, since those are what it protects? If running on a NodeManager, what adjustments should be made to memory available for YARN containers?
... View more
Labels:
10-30-2015
04:27 AM
6 Kudos
Authorization Models applicable to the Hive CLI
Hive provides a few different authorization models plus Apache Ranger, as described in the Hive Authorization section of the HDP System Administration Guide. Hive CLI is subject to the following two models-- Hive default (Insecure) - Any user can run GRANT statements - DO NOT USE Storage-based (Secure) - Authorization at the level of databases/tables/partitions, based on HDFS permissions (and ACLs in HDP 2.2.0+)
Frequently Asked Questions about Hive CLI Security
Can I set restrictive permissions on the hive executable (shell wrapper script) and hive-cli jar?No, components such as Sqoop and Oozie may fail. Additionally, a user can run their own copy of the hive client from anywhere they can set execution privileges. To avoid this limitation, migrate to the Beeline CLI and utilize HiveServer2, and restrict access to the cluster through a gateway such as Knox. Can Ranger be used to enforce permissions for Hive CLI users?HDFS policies can be created in Ranger, and the Hive Metastore Server can enforce HDFS permissions (and ACLs in HDP 2.2+) using storage-based authorization. The user executing hive-cli can bypass authorization mechanisms by overriding properties on the command line, so the Ranger Hive plugin does not enforce permissions for Hive CLI users.
Related Tutorials Secure JDBC and ODBC Clients’ Access to HiveServer2 using Apache Knox Manage Security Policy for Hive & HBase with Knox & Ranger
... View more
- Find more articles tagged with:
- authorization
- Hive
- Security
Labels:
10-30-2015
04:13 AM
5 Kudos
A datanode is considered stale when: dfs.namenode.stale.datanode.interval < last contact < (2 * dfs.namenode.heartbeat.recheck-interval) In the NameNode UI Datanodes tab, a stale datanode will stand out due to having a larger value for Last contact among live datanodes (also available in JMX output). When a datanode is stale, it will be given lowest priority for reads and writes. Using default values, the namenode will consider a datanode stale when its heartbeat is absent for 30 seconds. After another 10 minutes without a heartbeat (10.5 minutes total), a datanode is considered dead. Relevant properties include: dfs.heartbeat.interval - default: 3 seconds
dfs.namenode.stale.datanode.interval - default: 30 seconds
dfs.namenode.heartbeat.recheck-interval - default: 5 minutes dfs.namenode.avoid.read.stale.datanode - default: true
dfs.namenode.avoid.write.stale.datanode - default: true This feature was introduced by HDFS-3703.
... View more
10-30-2015
03:02 AM
1 Kudo
This bug is fixed in all HDP releases after (but not including) HDP 2.2.8. It is fixed in 2.3.0+
... View more
10-29-2015
09:10 PM
4 Kudos
container-executor.cfg YARN containers in a secure cluster use the operating system facilities to offer execution isolation for containers. Secure containers execute under the credentials of the job user. The operating system enforces access restriction for the container. The container must run as the user that submitted the application. Therefore it is recommended to never submit jobs from a superuser account (HDFS or Linux) when LinuxContainerExecutor is used. To prevent superusers from submitting jobs, the container executor configuration (/etc/hadoop/conf/container-executor.cfg) includes the properties banned.users and min.user.id. Attempting to submit a job that violates either of these settings will result in an error indicating the AM container failed to launch:
INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
Application application_1234567890123_4567 failed 2 times due to AM
Container for appattempt_1234567890123_4567_000002 exited with exitCode: -1000 Followed by one of these two diagnostic messages: Diagnostics: Application application_1234567890123_4567 initialization failed (exitCode=255) with output:
Requested user hdfs is not whitelisted and has id 507,which is below the minimum allowed 1000
Diagnostics: Application application_1234567890123_4567 initialization failed (exitCode=255) with output: Requested user hdfs is banned Although it is possible to modify these properties, leaving the default values is recommended for security reasons. yarn-site.xml
yarn.nodemanager.linux-container-executor.group - A special group (e.g. hadoop) with executable permissions for the container executor, of which the NodeManager Unix user is the group member and no ordinary application user is. If any application user belongs to this special group, security will be compromised. This special group name should be specified for the configuration property. Learn more about YARN Secure Containers from the Apache Hadoop docs.
... View more
- Find more articles tagged with:
- best-practices
- Cloud & Operations
- Security
- YARN
Labels:
- « Previous
- Next »