About slachterman

slachterman · ‎12-08-2016

You could use NiFi to read from the first, raw Kafka topic, enrich the messages with the appropriate values and perform other simple transformations, and then push to a second, final Kafka topic.

slachterman · ‎12-07-2016

The 2.5 Sandbox runs within a docker container running within the VM. When you are in the VirtualBox console you are in the VM, but when you are in the shell web client, you are in the Docker container running within the VM. SSH'ing to port 2122 would log you into the VM where as SSH'ing to port 2222 would log you into the Docker container.

slachterman · ‎12-07-2016

@Sami Ahmad you are receiving this error because you do not have a valid TGT for the hdfs user. You need to kinit with the keytab for the hdfs principal. You should be able to see the principal by running klist -kte /etc/security/keytabs/hdfs.headless.keytab This hdfs Kerberos principal was created when you kerberized the cluster. You can get a TGT from the KDC by running kinit -kt /etc/security/keytabs/hdfs.headless.keytab <hdfs-principal> You can run dfsadmin commands with any user that belongs to the dfs.permissions.supergroup group, it would be a better practice to run this using an admin principal that belonged to this group.

slachterman · ‎12-07-2016

@Dmitry Otblesk scp expects the new password, i.e., the value to which you changed the root user's password using passwd. Can you SSH successfully as root on port 2222 using the new password? Running scp -v will return more verbose output that could provide a clue. I think it is worth trying with the hosts entry change in my first answer as well.

slachterman · ‎12-07-2016

Please use beeline as it is the CLI for HiveServer2, the hive CLI (and HiveServer1) is deprecated. Syntax to connect to kerberized Hive is beeline -u jdbc:hive2://<HIVE_HOST_FQDN>:10000;principal=hive/<hive_host_fqdn>@YOUR.REALM You must kinit first. You will authenticate to HiveServer2 using the credentials associated with the Kerberos principal that you use to request a TGT from your KDC. Users that can authenticate to HS2 don't need to be part of the hdfs group. They need to be present as Kerberos principals in the KDC for the realm which you used to kerberize the cluster, or within some other realm that is trusted by the former.

slachterman · ‎12-07-2016

You may want to try adding 127.0.0.1 sandbox.hortonworks.com to /etc/hosts and retrying the scp command with sandbox.hortonworks.com instead of localhost

slachterman · ‎11-28-2016

1) In many cases, the edge device would run MiNiFi and push the data to a NiFi processing cluster using the site-to-site protocol 2) Perhaps, but the core issue is arriving at the right design pattern and the separation of concerns between dataflow management and streaming analytics. Many streaming use cases require functionality that is rooted in mediation and exchange of data, as well as complex event processing and computation on streams. NiFi is the natural home for the former concerns. 3) See 1)

slachterman · ‎11-17-2016

SmartSense 1.3 includes Activity Explorer, which hosts prebuilt notebooks that visualize cluster utilization data related to user, queue, job duration, and job resource consumption, including an HDFS Dashboard notebook. This dashboard helps operators better understand how HDFS is being used and which users and jobs are consuming the most resources within the file system. It's important to note that the source data for ACTIVITY.HDFS_USER_FILE_SUMMARY comes from fsimage, which does not contain file- and directory-level information. Many operators are also interested in more fine-grained analytics regarding cluster data use, which can drive decisions such as storage tiering using HDFS heterogeneous storage. Since these data are not available in fsimage, we will use the Ranger audit data for HDFS which the Ranger plugin writes during authorization events. The best practice is for the plugin to write these data to both Solr (for short-term use, driving performance in the UI) as well as HDFS for long-term storage. Please note the principal used in the GetHdfs processor will need read access for the HDFS directory storing the Ranger audit data. The audit data, after some formatting for readability, looks like: We will create a NiFi dataflow (ranger-audit-analytics.xml) to shred this JSON data into a Hive table, please see the below and the attached template. We first use GetHDFS to pull the audit data file, and then split the flowfile by line as each line contains a JSON fragment. EvaluateJsonPath is used to pull particular attributes that are valuable for analytics: We use ReplaceText to create the DDL statements to populate our Hive table: And finally, we use PutHiveQL to execute these INSERT statements. Once we've loaded these data into Hive, we're ready to use Zeppelin to explore and visualize the data. For instance, let's take a look at most frequently accessed directories: As another example, we can see the last time a particular resource was accessed: These visualizations can be combined with the HDFS Dashboard ones for a more robust picture of HDFS-related activity on a multi-tenant cluster. Hive Table Schema: create external table audit (reqUser string, evtTime timestamp, access string, resource string, action string, cliIP string ) ROW FORMAT DELIMITED STORED AS ORC LOCATION '/user/nifi/audit';

slachterman · ‎11-15-2016

Ranger supports row-level filtering for Hive in 2.5, and accomplishes this by dynamically rewriting the query. I believe LLAP is a dependency for row-level filtering in SparkSQL.

slachterman · ‎11-11-2016

Please see the below working shiro.ini. Uncommenting the sessionManager and securityManager lines, as below, and ensuring that the [roles] block was included, resolved the issue for me with the interpreters not displaying in the UI after enabling authentication. sessionManager = org.apache.shiro.web.session.mgt.DefaultWebSessionManager securityManager.sessionManager = $sessionManager securityManager.sessionManager.globalSessionTimeout = 86400000 shiro.ini: [users] # List of users with their password allowed to access Zeppelin. # To use a different strategy (LDAP / Database / ...) check the shiro doc at http://shiro.apache.org/configuration.html#Configuration-INISections admin = password1, admin #user1 = password2, role1, role2 #user2 = password3, role3 #user3 = password4, role2 # Sample LDAP configuration, for user Authentication, currently tested for single Realm [main] #ldapRealm = org.apache.shiro.realm.ldap.JndiLdapRealm #ldapRealm.userDnTemplate = CN={0},OU=standard,OU=Users,ou=enterprise,dc=vds,dc=logon #ldapRealm.contextFactory.url = ldaps://ProdIZvds.8389.corporate.ge.com:636 #ldapRealm.contextFactory.authenticationMechanism = SIMPLE sessionManager = org.apache.shiro.web.session.mgt.DefaultWebSessionManager securityManager.sessionManager = $sessionManager # 86,400,000 milliseconds = 24 hour securityManager.sessionManager.globalSessionTimeout = 86400000 shiro.loginUrl = /api/login [roles] role1 = * role2 = * role3 = * admin = * [urls] # anon means the access is anonymous. # authcBasic means Basic Auth Security # To enfore security, comment the line below and uncomment the next one #/api/version = anon #/** = authc #/** = authc /**=authc Thanks to @Ancil McBarnett for his guidance.

Online	Offline
Last Visited	‎05-03-2018 08:43 PM

Member Since	‎06-20-2016 02:58 PM
Last Visited	‎05-03-2018 08:43 PM
Posts	251
Kudos received	196

Cloudera Community

Re: PySpark and Python version (<3.6)?

Re: Ambari Server Start failure - Ranger Atlas Ta...

Re: Using underscore _ in a database name in HIVE

Re: Active directory as Directory Service and MIT ...

Re: 4 node cluster configuration

Re: Data migration to Kafka from Oracle

Re: Are there different virtual machines running

Re: Best practices with Ranger security

Re: Password for scp command

Re: How to access Hive in CLI using non Root users...

Re: Password for scp command

Re: Can NiFi help other technologies perform bette...

Ranger Audit Analytics with NiFi and Zeppelin

Re: Ranger Dynamic query rewrite available for hiv...

Re: Zeppelin interpreters disappear when security ...