Member since
07-30-2019
181
Posts
205
Kudos Received
51
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2412 | 10-19-2017 09:11 PM | |
821 | 12-27-2016 06:46 PM | |
557 | 09-01-2016 08:08 PM | |
609 | 08-29-2016 04:40 PM | |
1144 | 08-24-2016 02:26 PM |
06-08-2016
02:22 PM
1 Kudo
@chennuri gouri shankar HDInsight does not include the full stack of HDP components. If you'd like to use Ranger and other components not included with HDI (e.g. Spark, Kafka, Storm), then you should look at using HDP on the Azure Marketplace. You can stand up a cluster quickly and use the full HDP stack.
... View more
06-07-2016
06:30 PM
1 Kudo
@R M When you create the SSH action, you can give Oozie the username and hostname to execute the command: <workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">
...
<action name="[NODE-NAME]">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>[USER]@[HOST]</host>
<command>[SHELL]</command>
<args>[ARGUMENTS]</args>
...
<capture-output/>
</ssh>
<ok to="[NODE-NAME]"/>
<error to="[NODE-NAME]"/>
</action>
...
</workflow-app>
The command needs to exist on the node you specify, and the command will be run as the user specified in the action definition from that user's home directory. This page of the oozie docs has some additional information as well.
... View more
06-07-2016
05:26 PM
@Kaliyug Antagonist The permissions that are given to the ambari agent user include those required to create service accounts, install packages, and start/stop all services, run commands as the service accounts, etc. Once the sudo rules are in place, you can install, start, stop, etc., all of the various services in the HDP stack.
... View more
06-07-2016
05:21 PM
1 Kudo
@Mohana Murali Gurunathan Ranger does not currently support authorization in Cassandra. JIRA Ranger-925 has been opened to add this functionality. You can track the progress there.
... View more
06-07-2016
04:01 PM
2 Kudos
@Kaliyug Antagonist There is a way to install Ambari and the HDP stack as a non-root user, but you will have to have someone with root privileges help you set up the access on the systems to do this. Certain sudo privileges need to be assigned to the user you're going to run the ambari agent as. You will need a root user to add these sudo configurations for you. The Ambari Security guide has a section about how to setup a non-root Ambari installation. One of the commands that will be given to the ambari user is the ability to install packages (via yum, zypper, etc.). Once these rules are in place, you can use these privileges to install the ambari-server and ambari-agent packages on all of your nodes and proceed with the installation.
... View more
06-07-2016
03:49 PM
@sbhat I've not seen a comprehensive document that details which commands are available for which services, but the Ambari CWiki page has some great usage scenarios and FAQs that may help.
... View more
06-06-2016
09:03 PM
3 Kudos
@Timothy Spann Compression can improve the performance of Hive queries by decreasing the amount of data that needs to be read from disk (reading compressed vs. uncompressed). Just because your query only returns a small number of rows, doesn't mean that the processing isn't going to read lots of data to process the query, too. There would be an inflection point where reading the uncompressed data costs less than uncompressing the data (small data sets) where you might want to skip the compression. However, when larger amounts of data need to be read to fulfill a query, compressing the data can provide performance gains.
... View more
06-06-2016
08:57 PM
If the username in AD is the same (e.g. ambari@EXAMPLE.COM), then SSSD integration will use the AD account instead of the local account. Ideally, you'd already have SSSD set up before doing the Ambari installation. If you're using customized service account names (e.g. my_hive, somecustomuser), then you'd need to modify the sudo entries for the "Customizable Users" to account for this.
... View more
06-06-2016
08:50 PM
1 Kudo
@Scott Shaw One of the commands that you grant to the ambari user via sudo is the adduser command. This allows the ambari user to create the service accounts on each node of the cluster. All you need to do is install and start the ambari agent on each node (which you can do as the ambari user once the sudo rules are in place).
... View more
06-03-2016
06:33 PM
1 Kudo
@Pardeep Gorla Typically, in order to set up an SMTP proxy, you'll need to set up your own mail server to send emails to the external SMTP server on behalf of the domain. What you would need to have installed would be something like postfix to handle the mail. Once you get your local email server successfully forwarding emails to the main server, you'll set your email server in the Ambari Alerts to the local email server. This article has some information to get you started on configuring a postfix email proxy.
... View more
06-03-2016
03:19 PM
@PJ Moutrie Have you verified that the firewall is open on the NiFi nodes?
... View more
06-01-2016
07:45 PM
1 Kudo
@Sri Bandaru If all you need to do is automate the grabbing of the ticket, then you can set up a keytab file and use the login script to automatically kinit when the user logs in with something similar to the following: > ktutil
ktutil: addent -password -p username@DOMAIN.COM -k 1 -e rc4-hmac
Password for username@DOMAIN.COM: [enter your password]
ktutil: addent -password -p username@DOMAIN.COM -k 1 -e aes256-cts
Password for username@DOMAIN.COM: [enter your password]
ktutil: wkt username.keytab
ktutil: quit
> mkdir /home/username/keytabs
> chmod 700 /home/username/keytabs
> mv username.keytab /home/username/keytabs
> chmod 600 /home/username/keytabs/username.keytab
> echo "kinit -kt /home/username/keytabs/username.keytab username@DOMAIN.COM" >> /home/username/.bash_profile This will create a keytab for the user, move it into a secure directory, and automatically get a ticket when the user logs in with a bash shell. If you are trying to automate the use of a ticket from the desktop, then you can use a similar method. You will have to install something like the Oracle JDK to get a kinit tool, but you can create the keytab on a Linux machine and copy it to the windows system. Obviously, whatever tool you are trying to use (SAS, etc.) will need to be able to pass the Kerberos ticket to the cluster for authentication.
... View more
05-31-2016
07:46 PM
1 Kudo
@Davide Ferrari If you are referring to the database ambari uses to to store it's configuration info, then you'll need to re-setup Ambari for the new MySQL address: [root@sandbox nifi_demo]# ambari-server setup
Using python /usr/bin/python2
Setup ambari-server
Checking SELinux...
SELinux status is 'disabled'
Customize user account for ambari-server daemon [y/n] (n)? n
Adjusting ambari-server permissions and ownership...
Checking firewall status...
Checking JDK...
Do you want to change Oracle JDK [y/n] (n)? n
Completing setup...
Configuring database...
Enter advanced database configuration [y/n] (n)? y
Configuring database...
==============================================================================
Choose one of the following options:
[1] - PostgreSQL (Embedded)
[2] - Oracle
[3] - MySQL
[4] - PostgreSQL
[5] - Microsoft SQL Server (Tech Preview)
[6] - SQL Anywhere
==============================================================================
Enter choice (1): 3
Hostname (localhost):
The MySQL database that is restarted by Ambari is most likely the Hive Metastore database. You will need to change the configuration for the Hive database to an "Existing MySQL" database. Shutdown Hive and repoint the database to the new HA configuration.
... View more
05-18-2016
09:53 PM
1 Kudo
@Nicola Marangoni What user is your Ambari view server running as? It looks like "ambari-server" in your configs, is that correct? If so, then you need to add the following parameters to the custom core-site.xml in HDFS configs: hadoop.proxyuser.ambari-server.groups=*
hadoop.proxyuser.ambari-server.hosts=* This assumes that you have run ambari-server setup and changed the user that the server runs as to "ambari-server". If you haven't done that, then Ambari is still running as the root user and your proxy user settings will not work. If your Ambari view server runs as root, then you need to change your settings to: auth=KERBEROS;proxyuser=roo
... View more
05-17-2016
01:05 PM
8 Kudos
Virtual Memory swapping can have a large impact on the performance of a Hadoop system. Because of the memory requirements of YARN containers and processes running on the nodes in a cluster, swapping process out of memory to disk can cause serious performance limitations. As such, the historical recommendations for setting the swappiness, or propensity to swap out a process, on a Hadoop system has been to disable swap altogether. With newer versions of the Linux kernel, Out Of Memory (OOM) situations can be more likely to indiscriminately kill important processes to reclaim valuable physical memory on the system with a swappiness of 0. In order to prevent the system from swapping processes too frequently, but still allow for emergency swapping (instead of killing processes), the recommendation is now to set swappiness to 1 on Linux systems. This will still allow swapping, but with the least possible aggressiveness (for comparison, the default value for swappiness is 60). To change the swappiness on a running machine, use the following command: echo "1" > /proc/sys/vm/swappiness To ensure the swappiness is set appropriately on reboot, use the following command: echo "vm.swappiness=1" >> /etc/sysctl.conf
... View more
- Find more articles tagged with:
- Cloud & Operations
- FAQ
- help
- memory
- operations
- optimization
05-15-2016
05:07 PM
@ida ida There a couple of ways to accomplish this id recommend starting with sqoop. It is a tool designed specifically to extract data from an RDBMS and load it into Hadoop. This tutorial should help you get started.
... View more
05-13-2016
06:26 PM
@devers If you mean a way to decrypt a file that has been encrypted with HDFS encryption, then no. The encryption and decryption with HDFS as-rest encryption is more complex. The EEK is stored with the file, and you have to talk to the KMS to get the decrypted key, etc. You can use HDFS encryption with Hive and Spark to take care of this for you. If you want to generate a key pair and use that for both Hive and Spark to encrypt/decrypt data, that can be done, but would be part of loading and working with the data. You'd need to define a UDF for Hive to use for decryption so you could reference it with a select statement, and you'd need to use libraries in Scala or Python for Spark to decrypt the data. Both would have to have access to the keys for decryption, though, and that may be difficult to architect in a secure fashion.
... View more
05-12-2016
08:59 PM
1 Kudo
@Ovidiu Petridean Per the release notes, HDP 2.4.2 includes Kafka 0.9.0.1. I see that the packages are named something like kafka_2_4_2_0_258-0.9.0.2.4.2.0-258.el6.noarch, however, you'll notice that all of the 2.4.2 packages end with 2.4.2.0-258 (the final version of 2.4.2). The Kafka version is actually 0.9.0.1 in the packages.
... View more
05-10-2016
08:23 PM
@Ash Pad You need to provide additional privileges to the user via keyadmin. The user will need "Get Keys", "Get Metadata", and "Decrypt EEK" privileges on the key to read files in the encryption zone.
... View more
05-09-2016
02:07 PM
@Neeraj Sabharwal 3rd Parth KMS solutions are not supported yet. This is on the roadmap, though.
... View more
05-04-2016
04:20 PM
@kavitha velaga The audit logs are created on the respective nodes within the cluster where the services run (not on the edge node). For example, if you are looking for the Hive audit logs, look in /var/log/hive on the node in your cluster where Hiveserver2 runs. Alternatively, you can view access and admin audit information through the Ranger UI on the "Audit" tab.
... View more
05-04-2016
03:49 PM
@Alexander Feldman You should be able to export policies via the REST API from the 0.4 system and import them into the 0.5 system. Here's a link to the Ranger REST API docs.
... View more
05-04-2016
01:34 PM
@Kashif Khan Make sure your Ambari Metrics service is running and that it was running for the time period for which you are trying to request data. On my Sandbox, the Metrics service was shut down. When I started it, the query still didn't return anything because there was no data available to return. When there is data, you will still get the header that you are getting plus the data points. Here is what I get on a different cluster: {
"href" : "https://localhost:8443/api/v1/clusters/SMESecurityTEST?fields=metrics/load[1462233600,1462406399]&_=1462350240",
"Clusters" : {
"cluster_name" : "SMESecurityTEST",
"version" : "HDP-2.4"
},
"metrics" : {
"load" : {
"1-min" : [
[
0.0,
1462233600
],
[
0.3094347587719297,
1462234576
],
[
0.32168695175438616,
1462238176
],
[
0.433827850877193,
1462241776
],
---truncated----
... View more
05-04-2016
12:46 PM
1 Kudo
@Raghu Ramamoorthi Have you ensured that SELinux is disabled on all nodes? This can wreak havoc as ACLs are not set up on the system for the installation. If SELinux is disabled, can you post the text of the error from /var/log/ambari-server/ambari-server.log?
... View more
05-02-2016
09:53 PM
1 Kudo
@sujitha snake Those instructions start with "Download the Sandbox VM." They are intended to be run on a Sandbox. You could modify the demo for your own cluster, but you'll need to change any references to sandbox.hortonworks.com to fit with your cluster.
... View more
04-29-2016
05:38 PM
1 Kudo
@Radhakrishna kaligotla On the Sandbox Download Page, there's a "Hortonworks Sandbox Archive" expandable list under the "Sandbox in the Cloud" section. You can find the 2.3 version of the Sandbox there.
... View more
04-27-2016
08:47 PM
@Roberto Sancho
Pig is a good tool to use for ETL and data warehouse type of processing on your data. It provides an abstraction layer for the underlying processing engine (MR or Tez). You can use Tez as the execution engine to speed up processing. This Pig Tutorial has additional information.
... View more
04-26-2016
03:55 PM
@David Lays The two main options for replicating the HDFS structure are Falcon and distcp. The distcp command is not very feature rich, you give it a path in the HDFS structure and a destination cluster and it will copy everything to the same path on the destination. If the copy fails, you will need to start it again, etc. Another method for maintaining a replica of your HDFS structure is Falcon. There are more data movement options and you can more effectively manage the lifecycle of all of the data on both sides. If you're moving Hive table structures, there is some more complexity to making sure the tables are created on the DR side, but moving the actual files is done the same way
... View more
04-22-2016
04:19 PM
2 Kudos
@Hefei Li The data is stored encrypted with a copy of the encrypted decryption key (EDEK) attached to the file. No user will be able to access the contents of the O/S level files unless they get the KMS to provide an unencrypted version of the decryption key (DEK). The EDEK is stored with the file so the KMS can determine which version of the key was used to encrypt the file to provide the appropriate DEK once policy checks for access to the file have passed. At the HDFS layer, the user has to have policy access to the KMS key to unencrypt the file. The user will not be able to decrypt the file unless this policy check passes. If you uninstall Ranger and the KMS, you will start seeing errors in the HDFS logs when you try to access files in an encryption zone because the namenode will no longer be able to communicate with the KMS for keys or Ranger for key access policies to the files.
... View more
04-21-2016
06:37 PM
4 Kudos
@Artem Ervits This can definitely be done, but you'll need a different "database" (MySQL parlance) or "schema" (Oracle, DB2 parlance) for each Ambari cluster. For example, you might create an "ambari-Prod1" database or schema for the Prod1 HDP cluster and an "ambari-Test2" database/schema for the Test2 HDP cluster.
... View more
- « Previous
- Next »