Member since
09-29-2015
286
Posts
601
Kudos Received
60
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
11520 | 03-21-2017 07:34 PM | |
2916 | 11-16-2016 04:18 AM | |
1629 | 10-18-2016 03:57 PM | |
4301 | 09-12-2016 03:36 PM | |
6312 | 08-25-2016 09:01 PM |
01-21-2016
04:59 PM
See also https://community.hortonworks.com/questions/1848/python-errors-or-script-does-not-exist-while-insta.html
... View more
01-21-2016
04:02 PM
5 Kudos
dfs.datanode.max.xcievers / dfs.datanode.max.transfer.threads = 4096 (use 16k if running HBase)
dfs.datanode.balance.max.concurrent.moves = 500 (can go to 1000 if needed)
/* Each data node has a limited bandwidth for rebalancing. The default value for the bandwidth is 5MB/s. In the worst case, each data transfer has a limited
bandwidth of 1MB/s. Default is dfs.datanode.balance.bandwidthPerSec = 5242880 */
dfs.datanode.balance.bandwidthPerSec = 104857600 /* 100 MB/s */
hdfs balancer -Dfs.defaultFS=hdfs://<NN_HOSTNAME>:8020 -Ddfs.balancer.movedWinWidth=5400000 -Ddfs.balancer.moverThreads=1000 -Ddfs.balancer.dispatcherThreads=200 -Ddfs.datanode.balance.max.concurrent.moves=5 -Ddfs.balance.bandwidthPerSec=100000000 -Ddfs.balancer.max-size-to-move=10737418240 -threshold 5
... View more
Labels:
01-20-2016
07:47 PM
1 Kudo
@Peter Young After contacting support someone posted the answer to stale alerts here: https://community.hortonworks.com/questions/9762/how-to-get-rid-of-stale-alerts-in-ambari.html
... View more
01-20-2016
07:17 PM
1 Kudo
@surender nath reddy kudumula This may not answer the image size question but may give you some ideas on your POC. The images are stored in HBASE and processed. See
A Non Standard Use Case of Hadoop High Scale Image Processing and Analysis by TrueCar The slides are at Hadoop Image Processing Pipeline
... View more
01-20-2016
05:02 PM
Yes, but it does not go into deployment from a cluster topology point of view (except for the discussion on zookeeper) @Wes Floyd
... View more
01-20-2016
04:56 PM
3 Kudos
What are best practices for Deploying Storm Components on a cluster for scalablity and growth? We are thinking of having dedicated nodes for Storm on YARN. Also would anything go on an edge node? For example in a cluster, the thought is to have three Storm nodes (S1, S2, S3) dedicated with the following allocations: Storm Nimbus: Choose S1 as Storm Master to deploy Storm Nimbus... Or Probably a best practice is to not co-locate Nimbus with any worker node S1 node will also have the Storm UI Storm Supervisors/ Workers Choose S1, S2, S3 to deploy Storm Supervisors Zookeeper Cluster Since Kafka is usually used with Storm, have a separate Zookeeper cluster for Kafka and Storm. DON"T put the Zookeeper cluster on the Kafka nodes (K1, K2, K3). Put the Zookeeper on the Storm nodes (S1, S2, S3) Storm UI Will be on the same node as the Nimbus: S1 or Edge DRPC Server What is the best practice to place this? So in Summary, if we have three dedicated nodes for Storm, the thinking is to allocate as follows: S1 Node: Storm Nimbus/ Storm UI (Maybe it is not a best practice to put Storm Nimbus on worker nodes and put this on Edge node?) Storm Supervisor Zookeeper S2 Node: Storm Supervisor Zookeeper S3 Node: Storm Supervisor Zookeeper Edge Node:
Storm Nimbus/ Storm UI (Maybe it is not a best practice to put Storm Nimbus on worker nodes and put this on Edge node?)
Finally would the DRPC go on the Nimbus node? Any thoughts on this? Am I on the right track? Would anything go on an edge node?
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Storm
01-20-2016
03:08 PM
So true... OpenSSL can be the issue also. See this link for more info. https://community.hortonworks.com/questions/145/openssl-error-upon-host-registration.html
... View more
01-20-2016
05:41 AM
2 Kudos
@Kyriakos Spyropoulos Maybe your security group is preventing your node from connecting connect back to the Ambari Server. Log onto one of the other nodes and try telnet back to the Ambari server on port 8670. Here is an example of a simple configuration to allow the internal ports to be open to each other. "You will obviously need to replace “[Your Public IP]/32” with your IP address and subnet mask you wish to access the cluster from. In this example the /32 denotes a single host that we will access the cluster from. Also replace “[Your Security Group]” with the id of your actual security group. You will notice that for brevity sake we have just opened all ports internally to the security group. They are not accessible to the outside world but only between the nodes in the cluster. In a production environment you would probably want to be very specific with regards to the ports opened internally as well." See also this blog Deploying Hadoop Cluster Amazon ec2 Hortonworks See also the answers to this question: Looking for Steps to Install HDP on AWS
... View more
01-20-2016
03:57 AM
1 Kudo
@Mehdi TAZI Your best Architectures for real time analytics with Hadoop, usually involve using as much as possible Hadoop for its distributed storage and distributed compute capabilities, rather than sit outside of it.. You can be guided by looking at architectures that utilize Hadoop as distributed compute, not just distributed storage. Example Alternative Architectures:
Instead of using BI ON Hadoop (Like Tableau etc), you can achieve BI IN Hadoop, e.g. Arcadia Data Instead of MPP outside Hadoop, You can also utilize an MPP solution for Hadoop such as Apache Hawq or Actian Vector for SQL Analytics, and get as much SQL functionality as possible. And yes you can use HBASE as a nosql solution.
Finally Hive is making progress as also a solution for near real time capabilities, so it is always good to watch for that.
... View more
01-20-2016
03:38 AM
1 Kudo
See if also the Answer to this Question helps https://community.hortonworks.com/questions/10149/installed-sandbox-but-cant-get-the-welcome-hdp-pag.html#answer-10156
... View more