Member since
06-07-2016
923
Posts
322
Kudos Received
115
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4125 | 10-18-2017 10:19 PM | |
| 4359 | 10-18-2017 09:51 PM | |
| 14901 | 09-21-2017 01:35 PM | |
| 1859 | 08-04-2017 02:00 PM | |
| 2431 | 07-31-2017 03:02 PM |
09-14-2016
06:14 AM
It says submitted to scheduled. What else do you have running on cluster? It appears that the job is in queue. Can you check what you have allocated to your YARN queue and the credentials for the user running the job(likely sqoop). Does this user have enough resources allocated to run yarn jobs. Seems like a queue issue.
... View more
09-14-2016
05:58 AM
2 Kudos
@Saikrishna Tarapareddy The type and size of hardware needed for Nifi are really dependent on your load. Nifi stores data on disk while processing it. So you need sufficient disk capacity for your content repository, flow file repository as well as provenance (data lineage) repository. Have you enabled archiving (I am assuming, yes). Then, for how long do you archive your data? You need space for that. To your question about whether Nifi is memory intensive or processor intensive, the answer is processor. Unless, you are doing bulk loads which I think you should not, you likely want to make sure you have enough processing power. Please see following link for performance expectations. http://docs.hortonworks.com/HDPDocuments/HDF1/HDF-1.2/bk_Overview/content/performance-expectations-and-characteristics-of-nifi.html
... View more
09-14-2016
05:35 AM
2 Kudos
@Gaurab D Can you please share logs for the job? There should be more info you can find either in Ambari or just look under /var/log/sqoop folder or may be /var/log/yarn
... View more
09-08-2016
08:25 PM
1 Kudo
@Eric Periard I think what I am understanding from your question is your manager wants file blocks compressed at a lower level than HDFS (like at linux level). Is that right? If not, please elaborate your question. When you enable compression for Hadoop using Lzo, you are compressing files going into HDFS. Remember HDFS splits the files into its blocks and places blocks on different nodes (after all, it's a distributed file system). LZO is one of the compression mechanisms that allows for compressed blocks for files that have been split on different machines. It provides a good balance between read/write speed and compression ratio. You would have to compress all your files either upon ingestion or later on. At Hadoop level, to enable compression for the output being written by your MapReduce jobs, see the following link. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_hdfs_admin_tools/content/ch04.html
... View more
09-08-2016
05:49 PM
2 Kudos
@Alex Raj I am a little confused by the following statement: "We have HBase tables where the data is in in Binary Avro format." HBase stores data in HFiles and it's HBase's own format and not Avro. May be what you mean is you are exporting data from HBase into Avro and using Hive to read that data. If this is true, you can continue to do that as there are some advantages to this approach but if you want to keep data in HBase without moving it, then you can simply use Phoenix on top of HBase to read that data without moving it. In fact you can use Hive to read data in HBase. It's slow compared to Phoenix but it will do the job. May be that's what you are doing right now. On the other hand, if you want to use Phoenix on top of HBase, you can read HBase tables from Phoenix using SQL. Again, you don't have to export data. Here is a link to quick start Phoenix. The point is Avro doesn't come into play here and it's a little confusing why you are asking for Avro format. Between Phoenix and Drill, I would recommend using Phoenix because it's solely created for HBase and has better features and support compared to Drill.
... View more
09-08-2016
04:37 PM
1 Kudo
Hi @Michael Gregory Livy is a Rest server that acts as a Spark client. So you need to open nothing more than normal spark ports, listed here. I would very highly recommend going over slide 14-31 on this link. These are just images so it should be easy and simple to go over this but it will give you a better understanding of Livy and will give you confidence in what you are trying to do.
... View more
09-08-2016
04:18 PM
2 Kudos
@Zack Riesland Yes, it is safe to remove these folders and do a clean up. There are already actually cleanup scripts for this. Basically when a client runs a query with HiveServer2, Hive first creates these temporary folders to store intermediate/temporary data. For most queries, this is cleaned up at the end of query but sometimes due to issues with the query, these files are still hanging and you have to do a manual cleanup. Check this link for more details. https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration#AdminManualConfiguration-TemporaryFolders Following link might also give you some ideas on how to cleanup. https://community.hortonworks.com/questions/19204/do-we-have-any-script-which-we-can-use-to-clean-tm.html
... View more
09-08-2016
04:45 AM
Great. If you are using Kafka, then have a separate zookeeper for Kafka. I would never recommend using same Zookeeper for Kafka. Separate that out and the "funkiness" you have seen should go away :).
... View more
09-07-2016
09:53 PM
5 Kudos
@Kyle Travis To start with please check the following link for some basics which I assume you already know, but I think it's still a good start. https://community.hortonworks.com/questions/55201/number-of-zookeepers-in-a-3-rack-cluster-with-data.html Now, to the question whether you should have separate zookeeper quorums. You mention "a few hundred nodes". I am assuming it's a HBase cluster and Storm is writing to HBase. Let me make some assumptions about your application. 1. High read/write throughput. 2. Time sensitive. HBase latency is important. 3. Other hadoop components also running, like Hive or Phoenix may be. 4. Kafka is not being used. If my assumptions are reasonable and close to what you really have, I think having a separate Zookeeper for HBase might give you some benefits. Zookeeper is very sensitive with timeouts. Serving multiple components at this scale, it might make sense to give HBase its own Zookeeper. This would ensure better HBase operation/stability as compared to when you share Zookeeper across. Although, someone might argue that, you shouldn't run into any issue with just one zookeeper either. Have you seen any issues in your testing? For all the rest of cluster, just one zookeeper quorum is fine. However, if you are using Kafka, I will have one quorum for everything including HBase and one zookeeper for just Kafka. Kafka is very fragile with Zookeeper. In my experience, it is better Kafka has its own Zookeeper. This brings to last point of your question. Two zookeepers are currently not supported by Ambari. There is a jira open but no support yet. https://issues.apache.org/jira/browse/AMBARI-14714
... View more
09-06-2016
10:23 PM
5 Kudos
Well, this really depends on your tolerance for failure. Zookeeper requires a quorum of servers to be up at any time. It uses a majority quorum to make a decision. Zookeeper is up when ceil(N/2) servers are up where N are total number of servers in the quorum. For 3 node zookeeper, you can tolerate one failure. For 5 node zookeeper, you can tolerate up to 2 failures. the reason I would recommend 5 zookeeper nodes in your case is because you have a 100 node cluster. To make sure your business continuity and be confidently tolerate couple of failures, it's better to go with 5 zookeepers. Also, think about planned maintenance. With five zookeepers, you can take one out for maintenance and still have a tolerance of one more failure. With three zookeepers, maintenance is also a challenge. That being said, now that you know the implications of going with 3 vs 5 zookeepers, you can decide to go with three zookeepers knowing that in case of one zookeeper failure, you have limited window to bring the failed zookeeper up because one more zookeeper failure means risk to business.
... View more