Member since
03-16-2016
707
Posts
1753
Kudos Received
203
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 6986 | 09-21-2018 09:54 PM | |
| 8751 | 03-31-2018 03:59 AM | |
| 2628 | 03-31-2018 03:55 AM | |
| 2758 | 03-31-2018 03:31 AM | |
| 6188 | 03-27-2018 03:46 PM |
09-21-2016
02:36 PM
@Sandeep Nemuri Could you answer @Mats Johansson? I am interested in the meaning of your question ... The query seems abandoned and the community needs to understand the question and the answer.
... View more
09-21-2016
02:32 PM
4 Kudos
@Kumar Veerappan 1.3.1 is the Spark version supported by HDP 2.3.0. Would it be possible that someone installed a newer version of Spark outside of Ambari then uninstalled and Ambari is caching somehow that version. Did you restart Ambari server and checked again?
... View more
09-21-2016
02:28 PM
@RAMESH K If the response was helpful, please vote and accept it as the best answer.
... View more
09-21-2016
02:24 PM
4 Kudos
@henryon wen As you already know, Spark 1.6.1 is part of HDP 2.4.2. While is technically possible to upgrade to 1.6.2, it is not supported by Hortonworks. There may be other implications for Zeppelin and other tools from the ecosystem based on how your applications are built and executed. If you have paid support, make sure that you contact support before proceeding.
... View more
09-19-2016
05:32 PM
2 Kudos
@Shiva Nagesh I agree with @hkropp. While you can, it does not mean you should AS-IS. You need to account for shortcomings, architecturally and resource management-wise, to not mention security concerns, bringing more services on the edge nodes than usually manageable. I get it that you have capacity on those edge nodes and would like to use them as a BURST in case of need. You could consider DOCKER containers on your EDGE SERVERS as such you can separate the true edge nodes from workers on demand. Those DOCKER containers would use a WORKER template and will be spinned-up quickly as an additional node, something similar with what you would do in a cloud.
... View more
09-19-2016
05:20 PM
2 Kudos
@srinivasa rao I guess you read about when you perform a "select * from <tablename>", Hive fetches the whole data from file as a FetchTask rather than a mapreduce task which just dumps the data as it is without doing anything on it, similar to "hadoop dfs -text <filename>" However, the above does not take advantage of the true parallelism. In your case, for 1 GB will not make the difference, but image a 100 TB table and you do use a single threaded task in a cluster with 1000 nodes. FetchTask is not a good use of parallelism. Tez provides some options to split the data set to allow true parallelism. tez.grouping.max-size and tez.grouping.min-size are split parameters. Ref: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_installing_manually_book/content/ref-ffec9e6b-41f4-47de-b5cd-1403b4c4a7c8.1.html If any of the responses was helpful, please don't forget to vote/accept the answer.
... View more
09-17-2016
01:46 AM
5 Kudos
@Rahul Reddy Kamuru Yes. You can install and integrate. However, you can also have your web and application servers residing on their dedicated infrastructure and accessing various services from the Hadoop ecosystem via JDBC, ODBC or REST API. From the Hadoop lingo point of view, they would reside on the EDGE nodes and act as Hadoop clients. Installing on the Hadoop cluster while is possible, it will make operations on the Hadoop cluster to impact also those web and application servers, making everything more complicated also on upgrades etc. Separation of concerns generates a clean architecture. However, it is not unusual to have Tomcat installed on the cluster to allow development of various services which will either access data stored in HDFS, Hive, HBase etc. or even submit jobs to Spark. Those will be mainly "glue" services that will allow building some data pipelines or enable data services for BI tools residing outside of the cluster, e.g. Tableau, MicroStrategy, ZoomData etc. If this response helped, please vote and accept it as a best answer.
... View more
09-16-2016
11:35 PM
@Jitendra Yadav Good question. It shouldn't. Occasionally, a feature may slip in. As long as it is not a major change, it is probably something tolerated. It could be also a Tech Preview which is probably the case with Grafana and that is probably fine. Did that feature break an existent functionality?
... View more
09-16-2016
09:01 PM
1 Kudo
@P D Ambari repo includes only sources right now: http://www.apache.org/dist/ambari/ambari-2.4.1/ As soon as binaries will be posted you can find them at the ambari link or on HDP ambari public repo. It is a matter of days to have them published. Ambari 2.4.1 was just released this week.
... View more
09-16-2016
08:42 PM
@srivatsan chakravarti You can also read all the messages that are in the retention period for your topic. That way you don't have to run your producer while you test your consumer. You can consume as many times you want from what was produced and it is still retained, usually 7 days, by default. You would have to use low level SimpleConsumer API to implement Java code that will emulate what you can do from the CLI with: bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
... View more