About arald

arald · ‎08-07-2018

I found this: https://community.hortonworks.com/questions/91550/hive-execution-engine-set-to-spark-is-recommended.html Maybe @Jay Kumar SenSharma can provide a hint if there has something changed on the Spark support in hdp since last year. Otherwise the documentation on how to set up spark as execution engine for hive is described here: https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started

arald · ‎08-07-2018

Looks like you have an issue with the SSH login to the machines. Based on your configuration in the /etc/hosts file, root@slave1 actually is root@10.0.3.68, given your config in /usr/local/hbase/conf/regionservers, this is actually what you should expect. It will try to connect to the named configured in regionservers, try to resolve the name to an IP address using DNS or /etc/hosts in your case. So when you run on a console ssh slave1 it should get you to the login of 10.0.3.68, I think that's alright here. Your issue is that the logins are happening in parallel, not giving you a real chance to enter the login? My recommendation to solve it is to provide SSH key authentication, so that the password isn't prompted. How you can do this is described here: https://www.ssh.com/ssh/keygen/ Basically it is about using ssh-keygen and ssh-copy-id (which you can do manually as well). In your case you should create the key pair on your master system (seemingly for the root user), and copy the public keys to the slave machines. If you don't want to be prompted at all you should not enter a passphrase, but you will have to keep the key protected at all times. If you really want to enter a password, you can change the start script that the second server login is waiting for the first to be done, but this is working for 3 machines, but will take quite a long time if you have setup with many nodes.

arald · ‎07-31-2018

Is there any specific reason why you need a load balancer? Kafka is supposed to work without a separated load balancer, and handle the load on the cluster. With the load balancer the client might fail to connect when the broker tries to redirect the connections to another listener. The brokers section in the config for a client is used to try getting the actual connect parameter (listeners) from the cluster. You typically provide multiple brokers to avoid having an issue if the initial broker is down while connecting. When you try to change the listener configs in the kafka brokers (or zookeeper), i think you actually disable the cluster, as the broker are communicating to each other as well.

arald · ‎07-17-2018

please check here for the config of the putHDFS processor to write to Azure: https://community.hortonworks.com/content/kbentry/71916/connecting-to-azure-data-lake-from-a-nifi-dataflow.html

arald · ‎06-14-2018

it's quite usual, that the DB servers are protected from access from the internet, and if your hadoop cluster is not in the company network, it is considered as being internet, so you security team will not allow access. You can discuss with your networks security team, if a VPN tunnel from your hadoop cluster to the sql server is possible, but in most cases, network security will require in that cases to apply all internal security standards on your hadoop cluster as well, as otherwise they are not considered as trustworthy. Another option is, if a SSH connection is allowed from the ms SQL server to your hadoop cluster, in that case you could tunnel the port 1433 as well to use it from hadoop. A common solution to this situation is also to migrate the hadoop cluster from a cloud location to a on premise installation.

arald · ‎06-07-2018

here are some hints given: http://hbase.apache.org/0.94/book/secondary.indexes.html In most cases you'll have to create a second index table.

arald · ‎06-07-2018

Within Ambari, you can go to the config: Ambari UI --> Ambari Metrics --> Configs --> Advanced ams-log4j There you should find this property, when being set to something like 31 in combination with a daily rotation you will only keep 1 month of logs: log4j.appender.file.MaxBackupIndex=31 Or when you configure a monthly rotation, you set the number to 1. It doesn't delete the existing older log files, you'll have to clean it manually.

arald · ‎05-26-2018

Yes, the resource lack is on the worker machine, where the container is executed. If you don't have OOM kills from the kernel on the worker machine (they would be reported via dmesg), there are of course other possible reasons. It could be the JVM settings as well. Do you know which jobs get killed? Always hive jobs, or always spark jobs?

arald · ‎05-25-2018

the exit code 137 indicates a resource issue. In most cases it will be the RAM. You can try setting the yarn.scheduler.minimum-allocation-mb property to ensure a minimum a RAM available before yarn starts the job. If this doesn't help, try dmesg to see the kernel messages, which should indicate why your job gets killed. https://github.com/moby/moby/issues/22211

arald · ‎05-16-2018

i think this solution is what you need: https://community.hortonworks.com/articles/29900/zookeeper-using-superdigest-to-gain-full-access-to.html

Online	Offline
Last Visited	‎08-19-2019 03:23 AM

Member Since	‎06-28-2017 06:04 AM
Last Visited	‎08-19-2019 03:23 AM
Posts	279
Kudos received	43

Cloudera Community

Re: secured nifi cluster must import a cert to bro...

Re: Nifi Epoch conversion not working?

Re: Scenario when we store data in HBase and acce...

Re: Setup environment variables in NiFi cluster se...

Re: CREATE EXTERNAL HIVE TABLE on existing HBASE T...

Re: can i directly change hive execution engine fr...

Re: cannot start hbase in cluster mode

Re: Kafka behind an external Load Balancer

Re: How to put data into hdfs from sqlserver on az...

Re: Connecting to MS SQL Server through secured li...

Re: What are the best practices for HBase secondar...

Re: how to limit the logs rotation under /var/log/...

Re: container exit code 137

Re: container exit code 137

Re: How to SetACL or disable ACL check ( temporari...