Member since
03-16-2016
707
Posts
1753
Kudos Received
203
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 6986 | 09-21-2018 09:54 PM | |
| 8749 | 03-31-2018 03:59 AM | |
| 2627 | 03-31-2018 03:55 AM | |
| 2758 | 03-31-2018 03:31 AM | |
| 6187 | 03-27-2018 03:46 PM |
09-30-2016
05:28 PM
5 Kudos
@Vijaya Narayana Reddy Bhoomi Reddy Edge nodes, while they may be in the same subnet with your HDP clusters, they are really not part of the actual clusters and as such there is no HDP configuration trick to redirect via edge nodes and Private Link. If you wish to use the 10 GB Private Link, it is just a matter of working with your network team to have those HDP clusters communicate via that Private Link instead of the firewall channeled network (doubt that they will want to do it). You did not put a number next to that "Firewall" line, but I assume that is much smaller since you want to use the other one. Maybe the network team needs to upgrade the firewall channeled network to meet the SLA. That is the correct approach and not use some trick to use the Private Link between edge nodes. It would meet your SLA and will also make network team happy to keep the firewall function in place. Network team may be able to peer-up those clusters to redirect the traffic through the private link without going through the edge nodes and by-passing the firewall channeled network, but I am pretty that they will break their network design principles going that way. The best approach is to upgrade the firewall channeled network to meet your needs.
... View more
09-30-2016
05:13 PM
3 Kudos
@hitaay Snapshots do not create extra copies of blocks on the file system. Snapshots are stored along with the NameNode’s file system namespace. What do you mean by "huge size of snapshots and restoring the backups"? The entire point of snapshot is to not create extra copies of blocks on the file system and restore to a point in time a specific file or all. a) There are always many ways to skin a cat, but what test did you do with the HDFS snapshot and failed you? Could you elaborate a little. That would help. b) "Point in Time Recovery" - question for WANDisco. We endorse HDFS snapshot first for its function. WanDisco or other tool is your option.
... View more
09-30-2016
04:56 PM
3 Kudos
@ed day Can you telnet to that port? You just need to check that the port is accessible. Are you using sandbox or pointing to an actual cluster? If sandbox and and you could not telnet, then you need to add a forward rule to the wanted ported. If the later, then you may be in a different subnet and may have to work with your network admin.
... View more
09-26-2016
06:21 PM
4 Kudos
@Brenden Cobb No. Stop services in the proper order for a clean bookmarking (stop the consumers, then kafka service/broker/zookeeper in that order), make your changes (as you already stated them) and restart services in proper order (reverse order from you stopped them). Just for your sake, test this in your dev or test environment.
... View more
09-26-2016
06:15 PM
5 Kudos
@Sami Ahmad Is the following missing from your import statements? import org.apache.spark.sql.DataFrame
... View more
09-26-2016
06:05 PM
6 Kudos
@Arkaprova Saha It depends on you feel about yourself and your future. If you consider yourself a software engineer that has solid Java background and wants to deliver highly optimized and scalable software products based on Spark then you may want to focus more on Scala. If you are more focused on data wrangling, discovery and analysis, short-term use focused studies, or to resolve business problems as quick as possible then Python is awesome. Python has such a large community and code snippets, applications etc. Don't get me wrong, but Python could also be used to deliver enterprise-level applications, but it is more often to use Java and Scala for highly optimized. Python has some culprits, which we will not debate here. Anyhow, I would say that Python is kind of a MUST HAVE and Scala is NICE TO HAVE. Obviously, this is my 2c and I would be amazed that any of these responses in this thread is the ANSWER.
... View more
09-26-2016
05:52 PM
4 Kudos
@Bala Vignesh N V If your table is an actual Hive table (not an external table) and it is ACID-enabled (require ORC file format) and Hive/Tez is enabled globally for parallelism and you write those SQL statements as separate jobs, then YES. The assumption is that you run one of the versions of Hive capable of ACID which most likely you do if you use anything released in the last 1.5-2 years.
... View more
09-26-2016
05:43 PM
1 Kudo
@Anas A @sbhat is correct. I would like to add to her response that HDP stack is 100% open source based on Apache. It is a tested platform as such tools from the ecosystem can work together and deliver enterprise level quality. Taking the tools from Apache does not assure that they will work smoothly together. There is no concept of license associated with HDP. You can use the distribution as-is, however, enterprises elect to purchase paid support as such that can receive 24x7 support and get the chance to influence the roadmap or receive special attention on critical issues. Hortonworks engineers are actively involved in Hadoop ecosystem tools development and they can help with addressing bugs or including features that the community would like to have added. Best for you would be to start with downloading the sandbox as @sbhat suggested. Good luck!
... View more
09-24-2016
02:12 AM
@Shankar P Following @cduby you can always create a single-node cluster like a sandbox.
... View more
09-24-2016
01:24 AM
1 Kudo
@Vinay Sharma Assuming that you reasons is that your user does not have yum privilege, you can try local repository approach if there is no access to internet and yum steps should be optional. You could start by installing Ambari: http://docs.hortonworks.com/HDPDocuments/Ambari-2.4.1.0/bk_ambari-installation/content/setting_up_a_local_repository.html
... View more