Member since
06-20-2016
251
Posts
196
Kudos Received
36
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
9640 | 11-08-2017 02:53 PM | |
2050 | 08-24-2017 03:09 PM | |
7798 | 05-11-2017 02:55 PM | |
6399 | 05-08-2017 04:16 PM | |
1932 | 04-27-2017 08:05 PM |
12-08-2016
02:50 PM
You could use NiFi to read from the first, raw Kafka topic, enrich the messages with the appropriate values and perform other simple transformations, and then push to a second, final Kafka topic.
... View more
12-07-2016
11:10 PM
1 Kudo
The 2.5 Sandbox runs within a docker container running within the VM. When you are in the VirtualBox console you are in the VM, but when you are in the shell web client, you are in the Docker container running within the VM. SSH'ing to port 2122 would log you into the VM where as SSH'ing to port 2222 would log you into the Docker container.
... View more
12-07-2016
10:42 PM
@Sami Ahmad you are receiving this error because you do not have a valid TGT for the hdfs user. You need to kinit with the keytab for the hdfs principal. You should be able to see the principal by running klist -kte /etc/security/keytabs/hdfs.headless.keytab This hdfs Kerberos principal was created when you kerberized the cluster. You can get a TGT from the KDC by running kinit -kt /etc/security/keytabs/hdfs.headless.keytab <hdfs-principal> You can run dfsadmin commands with any user that belongs to the dfs.permissions.supergroup group, it would be a better practice to run this using an admin principal that belonged to this group.
... View more
12-07-2016
10:34 PM
@Dmitry Otblesk scp expects the new password, i.e., the value to which you changed the root user's password using passwd. Can you SSH successfully as root on port 2222 using the new password? Running scp -v will return more verbose output that could provide a clue. I think it is worth trying with the hosts entry change in my first answer as well.
... View more
12-07-2016
08:17 PM
Please use beeline as it is the CLI for HiveServer2, the hive CLI (and HiveServer1) is deprecated. Syntax to connect to kerberized Hive is beeline -u jdbc:hive2://<HIVE_HOST_FQDN>:10000;principal=hive/<hive_host_fqdn>@YOUR.REALM You must kinit first. You will authenticate to HiveServer2 using the credentials associated with the Kerberos principal that you use to request a TGT from your KDC. Users that can authenticate to HS2 don't need to be part of the hdfs group. They need to be present as Kerberos principals in the KDC for the realm which you used to kerberize the cluster, or within some other realm that is trusted by the former.
... View more
12-07-2016
05:56 PM
You may want to try adding 127.0.0.1 sandbox.hortonworks.com to /etc/hosts and retrying the scp command with sandbox.hortonworks.com instead of localhost
... View more
11-28-2016
03:59 AM
3 Kudos
1) In many cases, the edge device would run MiNiFi and push the data to a NiFi processing cluster using the site-to-site protocol 2) Perhaps, but the core issue is arriving at the right design pattern and the separation of concerns between dataflow management and streaming analytics. Many streaming use cases require functionality that is rooted in mediation and exchange of data, as well as complex event processing and computation on streams. NiFi is the natural home for the former concerns. 3) See 1)
... View more
11-17-2016
09:27 PM
9 Kudos
SmartSense 1.3 includes Activity Explorer, which hosts prebuilt notebooks that visualize cluster utilization data related to user, queue, job duration, and job resource consumption, including an HDFS Dashboard notebook. This dashboard helps operators better understand how HDFS is being used and which users and jobs are consuming the most resources within the file system. It's important to note that the source data for ACTIVITY.HDFS_USER_FILE_SUMMARY comes from fsimage, which does not contain file- and directory-level information. Many operators are also interested in more fine-grained analytics regarding cluster data use, which can drive decisions such as storage tiering using HDFS heterogeneous storage. Since these data are not available in fsimage, we will use the Ranger audit data for HDFS which the Ranger plugin writes during authorization events. The best practice is for the plugin to write these data to both Solr (for short-term use, driving performance in the UI) as well as HDFS for long-term storage. Please note the principal used in the GetHdfs processor will need read access for the HDFS directory storing the Ranger audit data. The audit data, after some formatting for readability, looks like: We will create a NiFi dataflow (ranger-audit-analytics.xml) to shred this JSON data into a Hive table, please see the below and the attached template. We first use GetHDFS to pull the audit data file, and then split the flowfile by line as each line contains a JSON fragment. EvaluateJsonPath is used to pull particular attributes that are valuable for analytics: We use ReplaceText to create the DDL statements to populate our Hive table: And finally, we use PutHiveQL to execute these INSERT statements. Once we've loaded these data into Hive, we're ready to use Zeppelin to explore and visualize the data. For instance, let's take a look at most frequently accessed directories: As another example, we can see the last time a particular resource was accessed: These visualizations can be combined with the HDFS Dashboard ones for a more robust picture of HDFS-related activity on a multi-tenant cluster. Hive Table Schema: create external table audit
(reqUser string,
evtTime timestamp,
access string,
resource string,
action string,
cliIP string
)
ROW FORMAT DELIMITED
STORED AS ORC
LOCATION '/user/nifi/audit';
... View more
Labels:
11-15-2016
12:37 AM
1 Kudo
Ranger supports row-level filtering for Hive in 2.5, and accomplishes this by dynamically rewriting the query. I believe LLAP is a dependency for row-level filtering in SparkSQL.
... View more
11-11-2016
09:44 PM
2 Kudos
Please see the below working shiro.ini.
Uncommenting the sessionManager and securityManager lines, as below, and ensuring that the [roles] block was included, resolved the issue for me with the interpreters not displaying in the UI after enabling authentication.
sessionManager = org.apache.shiro.web.session.mgt.DefaultWebSessionManager
securityManager.sessionManager = $sessionManager
securityManager.sessionManager.globalSessionTimeout = 86400000
shiro.ini: [users]
# List of users with their password allowed to access Zeppelin.
# To use a different strategy (LDAP / Database / ...) check the shiro doc at http://shiro.apache.org/configuration.html#Configuration-INISections
admin = password1, admin
#user1 = password2, role1, role2
#user2 = password3, role3
#user3 = password4, role2
# Sample LDAP configuration, for user Authentication, currently tested for single Realm
[main]
#ldapRealm = org.apache.shiro.realm.ldap.JndiLdapRealm
#ldapRealm.userDnTemplate = CN={0},OU=standard,OU=Users,ou=enterprise,dc=vds,dc=logon
#ldapRealm.contextFactory.url = ldaps://ProdIZvds.8389.corporate.ge.com:636
#ldapRealm.contextFactory.authenticationMechanism = SIMPLE
sessionManager = org.apache.shiro.web.session.mgt.DefaultWebSessionManager
securityManager.sessionManager = $sessionManager
# 86,400,000 milliseconds = 24 hour
securityManager.sessionManager.globalSessionTimeout = 86400000
shiro.loginUrl = /api/login
[roles]
role1 = *
role2 = *
role3 = *
admin = *
[urls]
# anon means the access is anonymous.
# authcBasic means Basic Auth Security
# To enfore security, comment the line below and uncomment the next one
#/api/version = anon
#/** = authc
#/** = authc
/**=authc
Thanks to @Ancil McBarnett for his guidance.
... View more