About emaxwell

emaxwell · ‎06-06-2016

@Timothy Spann Compression can improve the performance of Hive queries by decreasing the amount of data that needs to be read from disk (reading compressed vs. uncompressed). Just because your query only returns a small number of rows, doesn't mean that the processing isn't going to read lots of data to process the query, too. There would be an inflection point where reading the uncompressed data costs less than uncompressing the data (small data sets) where you might want to skip the compression. However, when larger amounts of data need to be read to fulfill a query, compressing the data can provide performance gains.

emaxwell · ‎06-06-2016

If the username in AD is the same (e.g. ambari@EXAMPLE.COM), then SSSD integration will use the AD account instead of the local account. Ideally, you'd already have SSSD set up before doing the Ambari installation. If you're using customized service account names (e.g. my_hive, somecustomuser), then you'd need to modify the sudo entries for the "Customizable Users" to account for this.

emaxwell · ‎06-06-2016

@Scott Shaw One of the commands that you grant to the ambari user via sudo is the adduser command. This allows the ambari user to create the service accounts on each node of the cluster. All you need to do is install and start the ambari agent on each node (which you can do as the ambari user once the sudo rules are in place).

emaxwell · ‎06-03-2016

@Pardeep Gorla Typically, in order to set up an SMTP proxy, you'll need to set up your own mail server to send emails to the external SMTP server on behalf of the domain. What you would need to have installed would be something like postfix to handle the mail. Once you get your local email server successfully forwarding emails to the main server, you'll set your email server in the Ambari Alerts to the local email server. This article has some information to get you started on configuring a postfix email proxy.

emaxwell · ‎06-03-2016

@PJ Moutrie Have you verified that the firewall is open on the NiFi nodes?

emaxwell · ‎06-01-2016

@Sri Bandaru If all you need to do is automate the grabbing of the ticket, then you can set up a keytab file and use the login script to automatically kinit when the user logs in with something similar to the following: > ktutil ktutil: addent -password -p username@DOMAIN.COM -k 1 -e rc4-hmac Password for username@DOMAIN.COM: [enter your password] ktutil: addent -password -p username@DOMAIN.COM -k 1 -e aes256-cts Password for username@DOMAIN.COM: [enter your password] ktutil: wkt username.keytab ktutil: quit > mkdir /home/username/keytabs > chmod 700 /home/username/keytabs > mv username.keytab /home/username/keytabs > chmod 600 /home/username/keytabs/username.keytab > echo "kinit -kt /home/username/keytabs/username.keytab username@DOMAIN.COM" >> /home/username/.bash_profile This will create a keytab for the user, move it into a secure directory, and automatically get a ticket when the user logs in with a bash shell. If you are trying to automate the use of a ticket from the desktop, then you can use a similar method. You will have to install something like the Oracle JDK to get a kinit tool, but you can create the keytab on a Linux machine and copy it to the windows system. Obviously, whatever tool you are trying to use (SAS, etc.) will need to be able to pass the Kerberos ticket to the cluster for authentication.

emaxwell · ‎05-31-2016

@Davide Ferrari If you are referring to the database ambari uses to to store it's configuration info, then you'll need to re-setup Ambari for the new MySQL address: [root@sandbox nifi_demo]# ambari-server setup Using python /usr/bin/python2 Setup ambari-server Checking SELinux... SELinux status is 'disabled' Customize user account for ambari-server daemon [y/n] (n)? n Adjusting ambari-server permissions and ownership... Checking firewall status... Checking JDK... Do you want to change Oracle JDK [y/n] (n)? n Completing setup... Configuring database... Enter advanced database configuration [y/n] (n)? y Configuring database... ============================================================================== Choose one of the following options: [1] - PostgreSQL (Embedded) [2] - Oracle [3] - MySQL [4] - PostgreSQL [5] - Microsoft SQL Server (Tech Preview) [6] - SQL Anywhere ============================================================================== Enter choice (1): 3 Hostname (localhost): The MySQL database that is restarted by Ambari is most likely the Hive Metastore database. You will need to change the configuration for the Hive database to an "Existing MySQL" database. Shutdown Hive and repoint the database to the new HA configuration.

emaxwell · ‎05-18-2016

@Nicola Marangoni What user is your Ambari view server running as? It looks like "ambari-server" in your configs, is that correct? If so, then you need to add the following parameters to the custom core-site.xml in HDFS configs: hadoop.proxyuser.ambari-server.groups=* hadoop.proxyuser.ambari-server.hosts=* This assumes that you have run ambari-server setup and changed the user that the server runs as to "ambari-server". If you haven't done that, then Ambari is still running as the root user and your proxy user settings will not work. If your Ambari view server runs as root, then you need to change your settings to: auth=KERBEROS;proxyuser=roo

emaxwell · ‎05-17-2016

Virtual Memory swapping can have a large impact on the performance of a Hadoop system. Because of the memory requirements of YARN containers and processes running on the nodes in a cluster, swapping process out of memory to disk can cause serious performance limitations. As such, the historical recommendations for setting the swappiness, or propensity to swap out a process, on a Hadoop system has been to disable swap altogether. With newer versions of the Linux kernel, Out Of Memory (OOM) situations can be more likely to indiscriminately kill important processes to reclaim valuable physical memory on the system with a swappiness of 0. In order to prevent the system from swapping processes too frequently, but still allow for emergency swapping (instead of killing processes), the recommendation is now to set swappiness to 1 on Linux systems. This will still allow swapping, but with the least possible aggressiveness (for comparison, the default value for swappiness is 60). To change the swappiness on a running machine, use the following command: echo "1" > /proc/sys/vm/swappiness To ensure the swappiness is set appropriately on reboot, use the following command: echo "vm.swappiness=1" >> /etc/sysctl.conf

emaxwell · ‎05-15-2016

@ida ida There a couple of ways to accomplish this id recommend starting with sqoop. It is a tool designed specifically to extract data from an RDBMS and load it into Hadoop. This tutorial should help you get started.

Online	Offline
Last Visited	‎08-09-2023 04:54 PM

Member Since	‎07-30-2019 08:48 AM
Last Visited	‎08-09-2023 04:54 PM
Posts	181
Kudos received	197

Cloudera Community

Re: HDP 2.6.2 Upgrade Stuck

Re: What are the best practices around HDFS Transp...

Re: Any documentation on Kafka Governance ?

Re: Can we set exceptions to a SuperUser's access ...

Re: oozie sqoop job error

Re: ORC with Zlib vs ORC with No Compression

Re: Do I need to create service accounts prior to ...

Re: Do I need to create service accounts prior to ...

Re: Ambari Email Alerts

Re: can't connect to the NiFi cluster, after start...

Re: Automation of kinit process without login into...

Re: Un-manage mysql

Re: Views not accessible in a kerberized cluster

Swappiness setting recommendation

Re: How to extract and load data from an Oracle da...