Member since
09-23-2015
14
Posts
21
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2050 | 09-29-2015 06:31 PM |
02-20-2018
05:30 PM
It could be many things. 1. What volume of data is under consideration in the Hive queries? 2. What file format is the data stored in? 3. How was the data prepared and loaded (sorting, partitioning, etc.)? 4. etc. There
isn't enough information in your question to really give anyone a
single answer which will help you. You may have to explore a bit and
provide more details...
Yes, a single node has limitations.
It's not that it is intentionally deteriorating the performance, but
just that the system is designed for scaling through parallelism, and
you have just a single node, so you are limiting the abilities of the
software to scale (if that is what is needed) Sandbox is meant for
tutorials and exploration of simple capabilities on small data. If you
want to try the actual HDP software on real data, you can install a
small multi-node cluster using the HDP installation processes documented
at docs.hortonworks.com.
... View more
08-14-2017
03:22 AM
@Muji, is this single setting "use.hive.interactive.mode=true" a new feature of the Hive View 2.0 as of HDP-2.6? Do you know when this became available? Your answer worked immediately for me on 2.6.1. I think the Hive View 2.0 version is "2.0.0".
... View more
04-26-2016
06:04 PM
1 Kudo
I would look at the alternate answers to this question, which are simpler and allow the package manager to gracefully downgrade the snappy library without breaking dependencies or unnecessarily removing other packages.
... View more
04-26-2016
06:01 PM
1 Kudo
I would never remove and reinstall a package (which is the more complex answer listed as the 'Best Answer' above) when a simple downgrade or upgrade will work, as the RPM package manager will retain the dependencies and prevent unnecessary uninstallation of related packages. Kirk's answer here looks legitimately easier and better.
... View more
12-09-2015
10:36 PM
Hi Piotr, great idea to share this repo, but I'm wondering if there is a way to expand/edit this post by putting a brief description of the use-case, i.e. if I don't know what Tika is, what would cause this post to be found in my search? (essentially "processing and analysis of binary documents" is more vague than describing how Tika would assist with: OCR, Full-text, text scan, recognition, imagery, etc.) Just my thoughts... Thanks!
... View more
12-09-2015
10:30 PM
1 Kudo
Vinod, this is a great FAQ article. Is this the "latest" certification? Will this be limited to HDP 2.2 and HDP 2.3? How do we maintain this info in this post so it stays current over the years as multiple certifications are done over many versions?
... View more
12-07-2015
09:24 PM
I am working with a lot of geospatial data and big data sets that need analysis and interpretation. Can you point me to best practices on doing this.
... View more
Labels:
- Labels:
-
Apache Spark
09-29-2015
06:31 PM
3 Kudos
The best-practice is to avoid the use of active Anti-Virus (AV) systems that monitor access to the underlying disk systems being used for metadata storage by the following processes: Apache Hadoop HDFS Namenode HDFS Datanode YARN Resource Manager YARN Node Manager Apache Accumulo Apache Flume Apache HBase Apache Kafka Apache ZooKeeper These processes store data structures only, and there is nothing stored by these processes that is executable by the underlying OS. As these processes can be very active, potentially performing continuous writes against large files, the best performance requires direct, unimpeded access to the underlying filesystem, and any AV system that traps filesystem calls will have a negative impact on Hadoop system performance. Some sites choose to implement AV "scans" that run periodically (like a weekly scan) on clients, gateway and "edge node" systems where users & developers connect and run local processes. These scans do not interfere with cluster performance, but are important to safeguard the edge-connected systems that are the main clients of the cluster.
... View more
09-25-2015
11:46 PM
7 Kudos
Some helpful ideas to keep content organized in the community forums. Topic Tags : Pay attention to misspelled tags and avoid conjoined tags.
The ability to group forum posts together by a topic is diminished when there are multiple, differently-spelled tags for the same topic. Having all similar posts tagged in common helps with grouping and ranking of content. Tags will add value for integration with the Hortonworks Gallery and other public sites, so having clean consistent usage of topic tags, especially for standard HDP components, will really help things look good.
As an example: If some posts are tagged as 'NiFi' and some as 'dataflow' but they are referring to the same thing, then it will be hard for the AnswerHub system to rank the one overall topic by popularity, or for users to click on a tag and see all relevant questions. Best-practices:
When posting a question, idea, or article, scan the topics page and search for the proper tag as it may exist before you create a new one.
If you see multiple tags with the same meaning, consolidate to the correct tag.
If you see a topic that is misspelled, you can edit and correct that tag. Use each topic in an individual tag. Having a conjoined tag such as [ambari with kerberos] or [nifi dataflow] is not useful compared to the standard of using separate individual tags, like [Ambari], [kerberos] or [nifi], [dataflow]. Comments vs. Replies : Pay attention to the difference between a comment and a reply. A comment to a question is used if requesting clarification or
validation of the question. Comments cannot be accepted as a valid
answer by the requester. Comments cannot be liked or shared by anyone,
and do not contribute to the overall 'Number of replies' count for the
post.
A reply is used to provide an answer. Replies can be accepted as a valid answer, liked, rewarded and shared, so there is much more value
to the community if your answer is placed into a proper reply. If you
reply and it is accepted, it helps to increase your reputation as well
as the value score of the question. Best-practices:
If you are providing any intrinsic
value such as a technical answer, solution architecture, suggested
design, alternative idea, etc. you should use the reply function so your
information provides context. If you see that someone has
provided 'comment-level' text in a reply, click on the gear-menu and convert the reply into a comment by selecting the menu option 'Convert Answer to Comment' (limited to track moderators)
... View more
09-25-2015
11:12 PM
1 Kudo
@sraghavan@hortonworks.com
I think the point of tags should be to separate the various components/keywords. Your tag 'nifi for rabitmq and couchbase' isn't really a single tag. I have re-tagged with 3 separate tags as separate words: Nifi rabbitmq and couchbase. Thanks.
... View more