Member since
06-28-2017
279
Posts
43
Kudos Received
24
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2020 | 12-24-2018 08:34 AM | |
5400 | 12-24-2018 08:21 AM | |
2252 | 08-23-2018 07:09 AM | |
9813 | 08-21-2018 05:50 PM | |
5192 | 08-20-2018 10:59 AM |
08-07-2018
06:56 AM
1 Kudo
I found this: https://community.hortonworks.com/questions/91550/hive-execution-engine-set-to-spark-is-recommended.html Maybe @Jay Kumar SenSharma can provide a hint if there has something changed on the Spark support in hdp since last year. Otherwise the documentation on how to set up spark as execution engine for hive is described here: https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
... View more
08-07-2018
06:50 AM
1 Kudo
Looks like you have an issue with the SSH login to the machines. Based on your configuration in the /etc/hosts file, root@slave1 actually is root@10.0.3.68, given your config in /usr/local/hbase/conf/regionservers, this is actually what you should expect. It will try to connect to the named configured in regionservers, try to resolve the name to an IP address using DNS or /etc/hosts in your case. So when you run on a console ssh slave1 it should get you to the login of 10.0.3.68, I think that's alright here. Your issue is that the logins are happening in parallel, not giving you a real chance to enter the login? My recommendation to solve it is to provide SSH key authentication, so that the password isn't prompted. How you can do this is described here: https://www.ssh.com/ssh/keygen/ Basically it is about using ssh-keygen and ssh-copy-id (which you can do manually as well). In your case you should create the key pair on your master system (seemingly for the root user), and copy the public keys to the slave machines. If you don't want to be prompted at all you should not enter a passphrase, but you will have to keep the key protected at all times. If you really want to enter a password, you can change the start script that the second server login is waiting for the first to be done, but this is working for 3 machines, but will take quite a long time if you have setup with many nodes.
... View more
07-31-2018
10:52 AM
Is there any specific reason why you need a load balancer? Kafka is supposed to work without a separated load balancer, and handle the load on the cluster. With the load balancer the client might fail to connect when the broker tries to redirect the connections to another listener. The brokers section in the config for a client is used to try getting the actual connect parameter (listeners) from the cluster. You typically provide multiple brokers to avoid having an issue if the initial broker is down while connecting. When you try to change the listener configs in the kafka brokers (or zookeeper), i think you actually disable the cluster, as the broker are communicating to each other as well.
... View more
07-17-2018
10:53 AM
1 Kudo
please check here for the config of the putHDFS processor to write to Azure: https://community.hortonworks.com/content/kbentry/71916/connecting-to-azure-data-lake-from-a-nifi-dataflow.html
... View more
06-14-2018
08:05 AM
it's quite usual, that the DB servers are protected from access from the internet, and if your hadoop cluster is not in the company network, it is considered as being internet, so you security team will not allow access. You can discuss with your networks security team, if a VPN tunnel from your hadoop cluster to the sql server is possible, but in most cases, network security will require in that cases to apply all internal security standards on your hadoop cluster as well, as otherwise they are not considered as trustworthy. Another option is, if a SSH connection is allowed from the ms SQL server to your hadoop cluster, in that case you could tunnel the port 1433 as well to use it from hadoop. A common solution to this situation is also to migrate the hadoop cluster from a cloud location to a on premise installation.
... View more
06-07-2018
10:26 AM
here are some hints given: http://hbase.apache.org/0.94/book/secondary.indexes.html In most cases you'll have to create a second index table.
... View more
06-07-2018
07:18 AM
1 Kudo
Within Ambari, you can go to the config: Ambari UI --> Ambari Metrics --> Configs --> Advanced ams-log4j There you should find this property, when being set to something like 31 in combination with a daily rotation you will only keep 1 month of logs: log4j.appender.file.MaxBackupIndex=31 Or when you configure a monthly rotation, you set the number to 1. It doesn't delete the existing older log files, you'll have to clean it manually.
... View more
05-26-2018
06:09 AM
Yes, the resource lack is on the worker machine, where the container is executed. If you don't have OOM kills from the kernel on the worker machine (they would be reported via dmesg), there are of course other possible reasons. It could be the JVM settings as well. Do you know which jobs get killed? Always hive jobs, or always spark jobs?
... View more
05-25-2018
05:29 AM
the exit code 137 indicates a resource issue. In most cases it will be the RAM. You can try setting the yarn.scheduler.minimum-allocation-mb property to ensure a minimum a RAM available before yarn starts the job. If this doesn't help, try dmesg to see the kernel messages, which should indicate why your job gets killed. https://github.com/moby/moby/issues/22211
... View more
05-16-2018
06:12 AM
i think this solution is what you need: https://community.hortonworks.com/articles/29900/zookeeper-using-superdigest-to-gain-full-access-to.html
... View more