About cstanca

cstanca · ‎03-17-2017

@Vipul Choudhary 1) Please define actual size and performance numbers that you encountered. 2) Clarify what test beds you are referring and how did you use them? 3) Clarify what is the type of test case you execute? It is important to clarify because some tests can be disk I/O intensive, others can be memory intensive. After clarifying all the above, we can state that driving a bike is sometimes faster than driving a Ferrari. That may be because the bike is better suited for niche cases where there is a little space for a car to go through (narrow roads, etc). I would not generalize that easy. I am not sure about anything stated as "is always better". There is always an exception. Anyhow, you can set the desired engine the session level, if you wish to use MR or Tez. Thus, for cases where MR performs better, use it. It is not like you have to code it when you execute a Hive query.

cstanca · ‎03-17-2017

@viswanath kammula The traceback shows maybe a symptom. Best place to start are the hiverserver2 logs and grep for errors. Most likely, your LLAP settings may not be appropriate, but until you mine the logs, that is just an assumption. Please provide excerpt from logs that indicate an error at the same timestamp.

cstanca · ‎03-16-2017

@dvt isoft This has been introduced in Hive 2.2.0 via HIVE-14558. Your HDP 2.4 does not have that version. +++ Could you please vote and accept this response and also the other one regarding French characters? Even they don't resolve your immediate expectation that is the state of the software you use for test. It is still the best answer. Thanks.

cstanca · ‎03-16-2017

@Andi Sonde See this unresolved issue: https://issues.apache.org/jira/browse/STORM-1898 +++ If it helped, please vote/accept response.

cstanca · ‎03-15-2017

@Artur Dushelyubov Attempted means that the initiator attempted to schedule a compaction but it failed. As such, there will be no compaction associated with those requests. Other than visually unpleasant, there is no reason to worry about. They are part of the metastore log and it will show upto the threshold set. See threshold for attempted below. They will not be displayed after that number. Compaction History hive.compactor.history.retention.succeeded Default: 3 Metastore Number of successful compaction entries to retain in history (per partition). hive.compactor.history.retention.failed Default: 3 Metastore Number of failed compaction entries to retain in history (per partition). hive.compactor.history.retention.attempted Default: 2 Metastore Number of attempted compaction entries to retain in history (per partition). hive.compactor.initiator.failed.compacts.threshold Default: 2 Metastore Number of of consecutive failed compactions for a given partition after which the Initiator will stop attempting to schedule compactions automatically. It is still possible to use ALTER TABLE to initiate compaction. Once a manually initiated compaction succeeds auto initiated compactions will resume. Note that this must be less than hive.compactor.history.retention.failed. hive.compactor.history.reaper.interval Default: 2m Metastore Controls how often the process to purge historical record of compactions runs. If this was helpful, please vote/accept best answer. ++++++ A little bit of theory below for others who may have a similar question. SHOW COMPACTIONS returns a list of all tables and partitions currently being compacted or scheduled for compaction when Hive transactions are being used, including this information: database name table name partition name (if the table is partitioned) whether it is a major or minor compaction the state the compaction is in, which can be: "initiated" – waiting in the queue to be compacted "working" – being compacted "ready for cleaning" – the compaction has been done and the old files are scheduled to be cleaned "failed" – the job failed. The metastore log will have more detail. "succeeded" – A-ok "attempted" – initiator attempted to schedule a compaction but failed. The metastore log will have more information. thread ID of the worker thread doing the compaction (only if in working state) the time at which the compaction started (only if in working or ready for cleaning state) Compactions are initiated automatically, but can also be initiated manually with an ALTER TABLE COMPACT statement.

cstanca · ‎03-15-2017

@Kelvin Mitchell Your QueryTable processor queries continuously your table and generates a FlowFile with each query result. Right-click on QueryTable processor, then Configure and Settings tab and you will see that by default Run Schedule is set to 0 seconds. That means query after query. You can change the interval to more seconds as such a new query will be fired later. Keep in mind that your query can be also smart to pick-up only new records. Is query doing that? +++ If this helped, please vote/accept best answer.

cstanca · ‎03-15-2017

@dvt isoft Your question is how to store and retrieve encoded characters in French from table data definition, specifically table properties. Hive expects UTF-8 by default in data definition and even data store. I am not aware of the option to use that approach for data definition. Regarding data store you can encode/decode using a special SerDe as specified above by @Boris Demerov.

cstanca · ‎03-15-2017

@learninghuman I assume that you are asking for production environment. Hadoop is a scale out and shared nothing architecture SAN/NAS are not at all recommended for I/O sensitive and CPU bound jobs , that is to avoid bottleneck situations while reading data from disk or from network or in processing data However, it is possible to use, but I haven't found one implementation to deliver performance. I would not recommend it for production. For a dev environment, maybe. Maybe 5% of companies in Hadoop use Isilon for Hadoop. Those are those that are in a close relationship with EMC. There are references using storage arrays like Isilon. Hortonworks supports it. Performance is less than using internal JBOD disks, but it works. Yes. It has been tested. You may want to go to EMC published articles on Isilon. I won't be able to provide confidential data that is not in the public domain. If you need confidential you could check with EMC/Dell or Hortonworks account manager for your company. Take a look at the following: https://community.hortonworks.com/questions/15332/san-vs-dasjbod-on-data-node.html which will show differences between NAS and NAS when to be used with Hadoop. ++++ Hopefully, it helps and you can vote/accept best answer.

cstanca · ‎03-15-2017

@ripunjay godhani If your application has direct access to the Hadoop cluster, then that application server is your "edge" node. However, the fact that you don't need it in your special case, it does not mean is not a good practice because they are in the same network. That is not the explanation. @SBandaru explanation is valid and a best practice for those cases he mentioned.

cstanca · ‎03-13-2017

@Subramaniyam KMV Go to: https://mosquitto.org/download/ and follow Centos 6 instructions. If this helps, please vote/accept best answer. Time spent to help should be appreciated.

Online	Offline
Last Visited	‎03-22-2019 03:12 AM

Member Since	‎03-16-2016 04:06 PM
Last Visited	‎03-22-2019 03:12 AM
Posts	707
Kudos received	1728

Cloudera Community

Re: 5th attempt at getting an answer to this quest...

Re: Trying to reinstall Apache NiFi 1.5 on HDF 3.1

Re: Is it mandatory that we should have exact moun...

Re: Alternate to smartsense

Re: Tracking of Hive tables metadata changes in re...

Re: On processing Large volumes tables MR is perf...

Re: Unable to start the Hive2 Interactive Server

Re: ParseException while trying to execute a "SHOW...

Re: How do I set maximum batch size when using Tri...

Re: How to abort HIVE-compactions

Re: Files Duplicating using QueryDatabaseTable and...

Re: Is that possible to display special characters...

Re: Feasibility and recommendation for running HDF...

Re: Edge node do we really need in Production hort...

Re: mosquitto installation on hdp2.5