Member since
09-11-2015
41
Posts
45
Kudos Received
14
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
632 | 02-03-2017 09:39 AM | |
591 | 01-31-2017 12:41 PM | |
880 | 01-20-2017 12:38 PM | |
1626 | 01-18-2017 01:26 PM | |
2013 | 01-11-2017 02:35 PM |
07-02-2018
09:52 PM
Hi @Ivan Diaz Unfortunately, as of Ambari 2.6.2.2 / HDP 2.6.5.0, whilst both Ambari and Hive support PAM authentication the Ambari Hive View does not support this authentication scheme. It is a feature that is being looked at for a future release but there is no timescale at present.
... View more
09-27-2017
08:47 AM
7 Kudos
Since Ranger 0.5 there has been the ability to summarize audit events that differ only by timestamp to reduce the amount of events logged in a busy system. When enabled, if a Ranger plugin logs consecutive audit events that differ only by timestamp it will coalesce all such events in to a single event and set 'event_count' to the number of events logged and 'event_dur_ms' to the time difference in milliseconds between the first and last event. To enable this feature you must set the following properties in the Ranger plugin's configuration: Configuration name Notes xasecure.audit.provider.summary.enabled To enable summarization set this property to true . This would cause audit messages to be summarized before they are sent to various sinks. By default it is set to false i.e. audit summarization is disabled. xasecure.audit.provider.queue.size If unspecified this value defaults to 1048576 , i.e. the queue is sized to store 1M (1024 * 1024) messages. Note the difference in property name that controls the size of summary queue. xasecure.audit.provider.summary.interval.ms The max time interval at which messages would be summarized. If unspecified it defaults to 5000 , i.e. 5 seconds. Summarization Batch size Note that regardless of this time interval while summarizing at most 100k messages at a time are considered for aggregation. Thus, if more than 100k messages are logged during this interval then similar messages could show up as multiple summarized audit messages even though they are logged within the configured time interval. Currently, this value of 100k is not user configurable. It is mentioned here for better understanding of Summarization logic. More details can be found here: Ranger 0.5 Audit log summarization
... View more
- Find more articles tagged with:
- administration
- audit
- FAQ
- Ranger
- ranger-audit
- Security
Labels:
09-19-2017
10:29 PM
Technically, step 3 and step 4 are mutually exclusive. If you're using the Java cacerts then you don't need to set up a truststore for Ranger and vice-versa. If doing step 3, make sure you update the correct Java cacerts as the Ranger JVM is started with just the command 'java' (not the full path to java) so if you have both OpenJDK and Oracle JDK installed and your Hadoop JAVA_HOME is set to the Oracle JDK, Ranger will actually be started with OpenJDK if /etc/alternatives has not been updated. Also, 'rangertruststore' should probably be called 'rangertruststore.jks' for consistency.
... View more
04-13-2017
12:13 PM
3 Kudos
When trying to add a policy that has many resource paths to Ranger using the API it can fail with the error Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.5.2.v20140319-9ad6abd): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: com.mysql.jdbc.MysqlDataTruncation: Data truncation: Out of range value for column 'sort_order' at row 1
Error Code: 1264
Call: INSERT INTO x_policy_resource_map (ADDED_BY_ID, CREATE_TIME, sort_order, resource_id, UPDATE_TIME, UPD_BY_ID, value) VALUES (?, ?, ?, ?, ?, ?, ?)
bind => [7 parameters bound]
Query: InsertObjectQuery(XXPolicyResourceMap [XXDBBase={createTime={Thu Apr 13 11:42:38 UTC 2017} updateTime={Thu Apr 13 11:42:39 UTC 2017} addedByUserId={1} updatedByUserId={1} } id=null, resourceId=43, value=/tmp/129, order=128]) This is caused by a limit in Ranger policies that can only contain a maximum of 128 resource paths in a single policy. The work-around would be to split the policy in to two or more policies each containing less that 128 resource paths.
... View more
Labels:
02-22-2017
11:41 AM
@Nigel Jones Ranger is not available to install using the cluster install wizard normally. You have to install the cluster first, then install Ranger once the cluster is up and running. I'm not sure of the exact reason but I suspect it is because it would not be set up properly (users and groups not sync'd, etc.) and then set up of services would fail because of permissions issues.
... View more
02-10-2017
01:25 PM
3 Kudos
@Maher Hattabi Sandbox is supposed to be used for learning HDP and what tools it provides. It isn't supposed to be used for production use. Sandbox is a pre-built single-node version of HDP and is built with all the same features and tools. You would use it for exploring HDP and perhaps testing your ideas out on, then use HDP to install a full cluster (with multiple nodes) based on that experience.
... View more
02-03-2017
09:39 AM
3 Kudos
@rbailey No, technically they don't need a group associated with them. Also they don't need to be able to login to any systems. As long as there is a principal in Kerberos for them and they can authenticate against the KDC you should be okay. As per the answer in the other article you linked to I usually just create a single 'rangerlookup' user and principal to be used by all the services.
... View more
01-31-2017
12:41 PM
1 Kudo
@Dinesh Das Try running the chmod command as user 'hdfs': su - hdfs -c 'hdfs dfs -chmod -R 700 /user/hive' In HDFS, 'root' doesn't have any special access but the user 'hdfs' is considered a super-user so can read/write any file.
... View more
01-23-2017
10:39 AM
2 Kudos
@shashi kumar The URL looks okay - try doing a curl directly to the ResourceManager (i.e. without Knox) to verify that it is working as expected. This will eliminate YARN as the issue. The error 'SSL23_GET_SERVER_HELLO:unknown protocol' looks like there is an issue establishing an SSL connection to Knox so I think this is the source of your issue. Check that the Knox server is set up correctly and all the certificates are working properly.
... View more
01-23-2017
09:26 AM
@sreehari takkelapati Further to apappu's answer, if you're using an HDP version prior to 2.5.0 then the table you want in Ranger's database is xa_access_audit, but as this is now deprecated and no longer used I wouldn't build any processes around that. Instead you will find that, provided your system is configured correctly, Ranger audit logs will be written to HDFS (under /ranger/audit/<component name>) and/or Solr (in Ambari Infra.) The Solr copy is easy to query to get the results you want provided you know how to write Solr queries, but it only indexes the last 30 days of audit records. The HDFS copy stores all auditing events unless you explicitly delete them. The audit events are stored in JSON format and the fields are fairly self-explanatory. This is an example from Hiveserver2: {"repoType":3,"repo":"hdp250_hive","reqUser":"usera","evtTime":"2016-11-24 04:08:10.179","access":"UPDATE","resource":"z_ssbi_hive_tdzm/imei_table","resType":"@table","action":"update","result":1,"policy":19,"enforcer":"ranger-acl","sess":"b87d8c0e-920f-4a62-8c44-82d7521a1b96","cliType":"HIVESERVER2","cliIP":"10.0.2.36","reqData":"INSERT INTO z_ssbi_hive_tdzm.imei_table PARTITION (partkey\u003d\u00271\u0027)\nSELECT COUNT(*) FROM default.imei_staging_table \nUNION ALL \nSELECT COUNT(*) FROM default.imei_staging_table","agentHost":"hdp250.local","logType":"RangerAudit","id":"d27e1496-08cc-4dad-a6ba-f87736b44a13-26","seq_num":53,"event_count":1,"event_dur_ms":0,"tags":[],"additional_info":"{\"remote-ip-address\":10.0.2.36, \"forwarded-ip-addresses\":[]"} You will need to read these in, parse the JSON and total up the access using a script. It should be fairly easy to write this in something like Perl or Python.
... View more
01-20-2017
07:24 PM
1 Kudo
@Dinesh Das The HDP Sandbox 2.5 comes with Knox that includes a demo LDAP server which should be sufficient for testing purposes. You can start and stop this from Ambari under Knox > Service Actions. In the Knox configuration is a section called 'Advanced users-ldif' which contains the LDIF data loaded by the demo LDAP server. You can add users and groups to this LDIF, save the configuration and then restart the demo LDAP server. If you're not familiar with LDIF then the template to add a user is something like: dn: uid=<username>,ou=people,dc=hadoop,dc=apache,dc=org
objectclass: top
objectclass: person
objectclass: organizationalPerson
objectclass: inetOrgPerson
cn: <common name, e.g. Joe Bloggs>
sn: <surname, e.g. Bloggs>
uid: <username>
userPassword: <password>
Replace <username> with the username you want to add, <common name, e.g. Joe Bloggs> with the full name of the user, <surname, e.g. Bloggs> with the surname of the user, and <password> with the password you want. Similarly for groups, use the template dn: cn=<groupname>,ou=groups,dc=hadoop,dc=apache,dc=org
objectclass:top
objectclass: groupofnames
cn: <groupname>
member: uid=<username>,ou=people,dc=hadoop,dc=apache,dc=org
Replace <groupname> with the group name you want and add and as many of the member: lines as you want to add users to the group, e.g. member: uid=user_a,ou=people,dc=hadoop,dc=apache,dc=org
member: uid=user_b,ou=people,dc=hadoop,dc=apache,dc=org
member: uid=user_c,ou=people,dc=hadoop,dc=apache,dc=org
Configuring your OS to read these users and groups from the demo LDAP server is quite complex - you'll need a lot more information in the LDIF file to support this and to configure PAM/NSS to talk to the LDAP server so for your purposes I'd stick to using 'adduser' and 'addgroup' to add all the users and groups you want to the OS manually. Once you've added the users and groups you want and started the demo LDAP you can use the instructions here to connect Ambari up with the demo LDAP server: https://community.hortonworks.com/questions/2838/has-anyone-integrated-for-demo-purposes-only-the-k.html For Ranger, you should also leave it syncing users from the OS (the default configuration) as you will have used 'adduser' and 'addgroup' to add all the users to the OS so Ranger will automatically sync these for you. If you really want to sync the users from the demo LDAP server then you'll need the set the following properties for Ranger Admin and Ranger Usersync. Note that I haven't tried this so it may not work and you may need to experiment with some of the settings. Ranger: ranger.ldap.base.dn=dc=hadoop,dc=apache,dc=org
ranger.ldap.bind.dn=uid=admin,ou=people,dc=hadoop,dc=apache,dc=org
ranger.ldap.bind.password=admin-password
ranger.ldap.group.roleattribute=cn
ranger.ldap.group.searchbase=ou=groups,dc=hadoop,dc=apache,dc=org
ranger.ldap.group.searchfilter=(member=uid={0},ou=people,dc=hadoop,dc=apache,dc=org)
ranger.ldap.referral=follow
ranger.ldap.url=ldap://localhost:33389
ranger.ldap.user.dnpattern=uid={0},ou=people,dc=hadoop,dc=apache,dc=org ranger.ldap.user.searchfilter=(uid={0}) UserSync: ranger.usersync.group.memberattributename=member
ranger.usersync.group.nameattribute=cn
ranger.usersync.group.objectclass=groupofnames
ranger.usersync.group.search.first.enabled=false
ranger.usersync.group.searchbase=ou=groups,dc=hadoop,dc=apache,dc=org
ranger.usersync.group.searchenabled=true
ranger.usersync.group.searchfilter=
ranger.usersync.group.searchscope=sub
ranger.usersync.group.usermapsyncenabled=true
ranger.usersync.ldap.binddn=uid=admin,ou=people,dc=hadoop,dc=apache,dc=org
ranger.usersync.ldap.groupname.caseconversion=none
ranger.usersync.ldap.ldapbindpassword=admin-password
ranger.usersync.ldap.referral=follow
ranger.usersync.ldap.searchBase=dc=hadoop,dc=apache,dc=org
ranger.usersync.ldap.url=ldap://localhost:33389
ranger.usersync.ldap.user.groupnameattribute=memberof,ismemberof
ranger.usersync.ldap.user.nameattribute=uid
ranger.usersync.ldap.user.objectclass=person
ranger.usersync.ldap.user.searchbase=ou=people,dc=hadoop,dc=apache,dc=org
ranger.usersync.ldap.user.searchfilter=
ranger.usersync.ldap.user.searchscope=sub
ranger.usersync.ldap.username.caseconversion=none
... View more
01-20-2017
12:38 PM
1 Kudo
@Dinesh Das In Ambari you can add users and groups manually - click on the 'admin' button at the top right, select 'Manage Ambari' and then click on either Users or Groups and then the 'Create Local User/Group' button. These users only exist in Ambari, not in the OS or in Ranger. Alternatively you can configure Ambari to pull users and groups from LDAP/Active Directory - see Configuring Ambari for LDAP or Active Directory Authentication If you want to be able to 'su' to the user in the OS then you'll need to configure your OS to also read the users from LDAP/Active Directory or manually add them to your OS using 'adduser' and 'addgroup'. Ranger can synchronize your users and groups either from the OS or from LDAP/Active Directory - see Advanced Usersync Settings. The best choice is to sync all three - OS, Ambari and Ranger - from LDAP/Active Directory. That way you ensure that all users and groups exist in all three components.
... View more
01-19-2017
01:43 PM
@Sankar T Also Ranger audit logs if you have it installed and have the HDFS plugin enabled. In general if you're worried about who does what on your system then you should consider using Ranger at least and possibly Atlas as well.
... View more
01-18-2017
01:26 PM
@Baruch AMOUSSOU DJANGBAN Currently this is not possible. HADOOP-10019 is the community JIRA to add this functionality to HDFS.
... View more
01-11-2017
02:35 PM
3 Kudos
@sagar pavan This will happen when there are not enough resources (memory) to run the AppMaster container needed to control the Tez job. In YARN capacity-scheduler.xml there is a property yarn.scheduler.capacity.maximum-am-resource-percent which controls the percentage of total cluster memory that can be used by AM containers. If you have several jobs running then each AM will consume the memory required for one container. If this exceeds the given % of total cluster memory the next AM to run will wait until there are free resources for it to run. You'll need to increase yarn.scheduler.capacity.maximum-am-resource-percent to get the AM to run.
... View more
01-10-2017
01:32 PM
@Timo Burmeister I don't believe there are any recommendations written down. Knox itself is more-or-less just a proxy server. It uses Jetty internally and will comfortably run in 8GB of RAM on most production systems. As Knox is multi-threaded, having more CPU threads will allow you to process more simultaneous requests through Knox. I'm not aware of any performance testing done so you'd need to experiment to find out based on your expected load how many CPU threads would work, but in general 8 CPU threads should be a good starting point. The number of NICs needed again depends on the work-load you expect also. If you're pushing large volumes of data through Knox then obviously you'll need to think about 10GbE or multiple bonded 1GbE NICs. You should probably also have a separate NIC for the external network and the internal cluster network; unless you're using VLANs or virtual IPs on a single NIC. For most starting configurations I'd say a 1GbE NIC would be sufficient.
... View more
12-22-2016
11:43 AM
@Tony Hake Which verson of Ambari are you using? This should be fixed in Ambari 2.4.0. If you're still seeing it then it could be indicative of a configuration problem, for instance issues with Kerberos if you have it enabled.
... View more
12-22-2016
11:20 AM
1 Kudo
@Sagar Shimpi Probably worth pointing out that this will be fixed in Knox 0.10 by the looks of it: KNOX-644
... View more
12-21-2016
11:48 AM
3 Kudos
@Jay SenSharma It's worth pointing out that unless you're using Ranger 0.4 or below, that API is obsolete. You should be using the v2 API linked to by @mvaradkar: https://cwiki.apache.org/confluence/display/RANGER/REST+APIs+for+Service+Definition%2C+Service+and+Policy+Management
... View more
12-12-2016
05:25 PM
@Davide Isoardi Only Spark UI is supported in Knox 0.9 which is included in HDP 2.5.x. Earlier versions of Knox do not support Spark at all. There is a community JIRA to add this support in the future (AMBARI-18610) but no movement on it so far.
... View more
12-06-2016
09:36 PM
1 Kudo
@Sami Ahmad These messages are the Ranger plugins for HDFS and Hive connecting to Ranger Admin to check that they have the latest policies. If you want to stop these messages then you'll need to turn the Ranger Admin logging down to WARN, but these messages are normal. If you look at the frequency they'll occur every 30 seconds which is the default polling period for the plugins.
... View more
12-01-2016
04:42 PM
1 Kudo
@Arpan Rajani Yes, you can use a wildcard certificate - see https://en.wikipedia.org/wiki/Wildcard_certificate If you're using a CA authority then most will generate wildcard certificates for you. If you're using an internal CA or self-signed certificates then this link shows you how: https://serversforhackers.com/self-signed-ssl-certificates In terms of using it for Hadoop, it is used in the same way as a regular certificate but you only have one certificate for all the services. The main security issue with this is that if someone gets hold of the certificate they can install it on any host in your network that matches the domain in DNS (for example *.example.com) and get a valid certificate on that host.
... View more
11-30-2016
05:39 PM
@Pradheep Shan Currently modifying the Grafana dashboards is not supported in Ambari.
... View more
11-30-2016
10:26 AM
1 Kudo
@Gerd Koenig xasecure.audit.destination.hdfs.dir should be the base path for all audit logs for all plugins. The plugins themselves add their own name - 'hdfs' in this case, Hiveserver2 adds 'hiveserver2', Hbase adds 'hbase', etc - and a daily datestamp automatically. This behaviour is fixed and I don't think there's any way to change it. You should just set it to 'hdfs://<Nameservice ID>/ranger/audit'
... View more
11-24-2016
09:29 AM
1 Kudo
@Chris L The Ambari Agent is a python process so uses Python's logging facility. To change the pattern, go to /etc/ambari-agent/conf and copy logging.conf.sample to logging.conf, then edit it and look for the lines [formatter_logfileformatter]
format=%(levelname)s %(asctime)s %(filename)s:%(lineno)d - %(message)s Change the 'format' line to suit your needs. There is information on the format for this here. The available attributes that can be written to the file from each log record are given here.
... View more
11-11-2016
09:10 PM
Yes, you should copy the id_rsa file from the sandbox to your Windows host. Alternatively you can copy and paste the contents of id_rsa in to the edit box that says 'ssh private key' like in the screenshot below.
... View more
11-10-2016
11:16 AM
@A. Karray You can specify JARs to use with Livy jobs using livy.spark.jars in the Livy interpreter conf. This should be a comma separated list of JAR locations which must be stored on HDFS. Currently local files cannot be used (i.e. they won't be localized on the cluster when the job runs.) It is a global setting so all JARs listed will be available for all Livy jobs run by all users.
... View more
11-03-2016
11:26 AM
1 Kudo
@vamsi valiveti 1) 'show tables;' is the standard SQL way of getting table names. '!tables' is specific to Beeline so use 'show tables;' to make sure your SQL is portable to other SQL clients. 2) Use '!sh <command>' to run shell command, e.g. 0: jdbc:hive2://hdp224.local:10000/default> !sh hdfs dfs -ls /
Found 9 items
drwxrwxrwx - yarn hadoop 0 2016-11-01 14:07 /app-logs
drwxr-xr-x - hdfs hdfs 0 2016-11-01 12:41 /apps
drwxr-xr-x - yarn hadoop 0 2016-11-01 15:55 /ats
drwxr-xr-x - usera users 0 2016-11-01 14:29 /data
drwxr-xr-x - hdfs hdfs 0 2016-11-01 12:38 /hdp
drwxr-xr-x - mapred hdfs 0 2016-11-01 12:38 /mapred
drwxrwxrwx - mapred hadoop 0 2016-11-01 12:38 /mr-history
drwxrwxrwx - hdfs hdfs 0 2016-11-01 15:56 /tmp
drwxr-xr-x - hdfs hdfs 0 2016-11-01 14:06 /user
... View more
10-31-2016
04:41 PM
1 Kudo
@Roger Young Assuming you are running Ambari as 'root' it will be in ~root/.ssh/id_rsa. If you're running Ambari as a non-root user you will need to set up passworldless SSH for that user so the file will be ~<username>/.ssh/id_rsa.
... View more
10-26-2016
09:47 AM
2 Kudos
@Pooja Kamle Ranger policies are not applied to Hive CLI which is old technology and may be phased out in the future. You should be using Beeline/JDBC/ODBC to connect to Hiveserver2.
... View more