Member since
07-31-2019
346
Posts
259
Kudos Received
62
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2937 | 08-22-2018 06:02 PM | |
1691 | 03-26-2018 11:48 AM | |
4215 | 03-15-2018 01:25 PM | |
5085 | 03-01-2018 08:13 PM | |
1433 | 02-20-2018 01:05 PM |
08-01-2016
04:14 PM
Building out a cluster is a bit of puzzle and gets especially hairy when the cluster is small, say < 12 nodes. For good or bad this is how I tend to generalize my approach: 1. There are master services (NN, RM) and there are client services (Spark, Hive). Think HA and redundancy for master services. Best not to co-locate multiple master services since that could cause a SPOF. Do not co-locate master and worker (HDFS) services. 2. Services such as Storm, HBase, and Solr will do better on dedicated servers because of their high resource requirements. Not required of course, but be cognizant of the trade-offs. 3. Spark is memory bound, Kafka is IO bound, Storm is CPU bound. When looking at co-locating services try to mix and match. Don't put 2 memory bound services on a single server. 4. I prefer to have a small, dedicated Ambari server. Seems cleaner to me but your mileage may vary. 5. Try to use existing database infrastructure for all your metastores, e.g. Oracle. 6. Never use SAN 7. Think about virtualizing master services, edge nodes, and dev. This list is by no means conclusive and every architect will have additional details (e.g. placing the Spark History server on the same server as HiveServer2). When it really comes down to it you can plan for the worst and hope for the best. Your cluster WILL change over time...guaranteed. Of course you could just deploy in Azure HDInsight and be done with it.... 😉
... View more
08-01-2016
03:23 PM
2 Kudos
Hi @Christopher Amatulli. I'd strongly advise against siloing your cluster based on storage, processing, and services. This goes against the concepts of a cluster and moves you back into traditional application silos. Think of it more as a single cluster with distributed and shared storage and processing. You may want to assign certain servers to certain services based on high availability requirements, or IO\CPU\Memory requirements, but the cluster as a whole will be under a single operations and management service (Ambari) as well as a single resource layer (YARN). For small clusters you may have 2 Master servers, an edge node, and n number of data nodes. You should review our cluster planning guide http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_cluster-planning-guide/content/ch_hardware-recommendations_chapter.html as well as any number of good design articles on HCC. Hope this helps.
... View more
07-31-2016
01:05 PM
1 Kudo
Hi @Jon Maestas. Executing the following should resolve the issue. #Set file-max; no. of open files for single user sudo sh -c 'echo "* soft nofile 200000" >> /etc/security/limits.conf' sudo sh -c 'echo "* hard nofile 200000" >> /etc/security/limits.conf' sudo sh -c 'echo "200000" >> /proc/sys/fs/file-max' sudo sh -c 'echo "fs.file-max=65536" >> /etc/sysctl.conf' #Set process-max sudo sh -c 'echo "* soft nproc 8192" >> /etc/security/limits.conf' sudo sh -c 'echo "* hard nproc 16384" >> /etc/security/limits.conf' sudo sh -c 'echo "* soft nproc 16384" >> /etc/security/limits.d/90-nproc.conf' # ULIMITS to be set sudo sh -c 'echo ULIMITS adjustments' sudo sh -c 'echo "hdfs - nofile 32768" >> /etc/security/limits.conf' sudo sh -c 'echo "mapred - nofile 32768" >> /etc/security/limits.conf' sudo sh -c 'echo "hbase - nofile 32768" >> /etc/security/limits.conf' sudo sh -c 'echo "hdfs - nproc 32768" >> /etc/security/limits.conf' sudo sh -c 'echo "mapred - nproc 32768" >> /etc/security/limits.conf' sudo sh -c 'echo "hbase - nproc 32768" >> /etc/security/limits.conf' #
... View more
07-12-2016
02:16 PM
1 Kudo
@Sunit Gupta Here are some good resources: http://thornydev.blogspot.com/2013/07/querying-json-records-via-hive.html http://engineering.skybettingandgaming.com/2015/01/20/parsing-json-in-hive/
... View more
07-11-2016
06:32 PM
@mrizvi could you please attach your code? Thanks.
... View more
07-11-2016
12:36 PM
1 Kudo
@Anurag Setia This doesn't directly answer your question but I would advise against installing HDP on Windows. This option will be deprecated. I'd suggest installing the sandbox if you would like to get familiar with the HDP platform. http://hortonworks.com/products/sandbox/#downloads
... View more
06-29-2016
04:33 PM
3 Kudos
@Rahul Mishra You may want to start with this documentation http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_cluster-planning-guide/content/ch_hardware-recommendations_chapter.html. For small clusters like yours where HA isn't a concern you basically are dealing with only 2 types of nodes - master and worker nodes. I certainly wouldn't over-architect it. For an 8 node cluster you would have your Ambari Server which can also hold your client services, 2 master nodes, and finally 5 worker nodes. If you have a homogeneous cluster like yours where each node has low resources, you're primary concern is co-locating services requiring the same type of resources. For example, it would be ok to have an in-memory service like Spark co-exist with a more IO intensive service, but not 2 in-memory intensive services on the same node. In your case you'll just have to build it out and monitor and be aware that running certain operations together may cause performance issues. The good thing about HDP is its ability to scale so you are never really quite "locked-in" to a particular architecture.
... View more
06-14-2016
07:52 PM
@charan tej At one point you could use Microsoft System Center to monitor HDP on Windows. https://cwiki.apache.org/confluence/display/AMBARI/Ambari+SCOM+Management+Pack. It looks like it hasn't been updated in awhile. Because of the lack of Ambari and Kerberos, we recommend not running HDP on Windows.
... View more
06-13-2016
12:53 PM
@sankar rao It indicates there is no server component. Many services consist of a server component and a client component. For example, MapReduce has a History server component, Hive has a HiveServer2, etc. Service such as Tez, Sqoop, and Pig do not have any server component. You can see this by clicking on the service and looking at the running components and noticing there is only a client service running. This is important when considering management and operations, especially stopping and starting services and understanding where the service runs. Many server services run on different nodes than the client components. Clients will run on all data nodes. Hope this helps.
... View more
06-13-2016
11:38 AM
2 Kudos
@charan tej You're correct. HDP on Windows does not support Ambari. You could try using a 3rd party tool such as SQuirreL for Hive or access Hive using SQL Server 2016 Polybase. HiveServer2 accepts ODBC connections.
... View more