Member since
06-07-2016
923
Posts
322
Kudos Received
115
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3318 | 10-18-2017 10:19 PM | |
3673 | 10-18-2017 09:51 PM | |
13376 | 09-21-2017 01:35 PM | |
1367 | 08-04-2017 02:00 PM | |
1794 | 07-31-2017 03:02 PM |
11-05-2016
02:10 AM
You cannot do this. This is not supported as @Artem Ervits has already stated. Imagine what would happen to writes when clusters span multiple data centers? Remember, networks are assumed to be unreliable and unsecured. Now, I hate to say this and please don't do it as it is unsupported but amazon offers VPC which makes AWS an extension of your network using a VPN.
... View more
10-31-2016
05:38 AM
@Hoang Le If you have 5 nodes, you can use 2 nodes as Masternodes and 3 as Workernodes (slave nodes).
... View more
04-06-2017
12:07 PM
1 Kudo
This problem has been happening on our side since many months as well. Both with Spark1 and Spark2. Both while running jobs in the shell as well as in Python notebooks. And it is very easy to reproduce. Just open a notebook and let it run for a couple of hours. Or just do some simple dataframe operations in an infinite loop. There seems to be something fundamentally wrong with the timeout configurations in the core of Spark. We will open a case for that as no matter what kind of configurations we have tried, the problem insists.
... View more
10-12-2016
02:24 PM
1 Kudo
@Saikrishna Tarapareddy The FlowFile repo will never get close to 1.2 TB in size. That is a lot of wasted money on hardware. You should inquire with your vendor about having them split that Raid in to multiple logical volumes, so you can allocate a large portion of it to other things. Logical Volumes is also a safe way to protect your RAID1 where you OS lives. If some error condition should occur that results in a lot of logging, the application logs may eat up all your disk space affecting you OS. With logical volumes you can protect your root disk. If not possible, I would recommend changing you setup to a a bunch of RAID1 setups. With 16 x 600 GB hard drives you have allocated above, you could create 8 RAID1 disk arrays. - 1 for root + software install + database repo + logs (need to make sure you have some monitioring setup to monitor disk usage on this RAID if logical volumes can not be supported) - 1 for flowfile repo - 3 for content repo - 3 for provenance repo Thanks, Matt
... View more
09-08-2016
06:30 AM
The more the nodes in a ZK ensemble (quorum) the faster the reads but the slower the writes. That's because a read can be done from any node, but a write is not complete before all nodes are updated. On top of that, early versions of Kafka (0.8.2 and older) keep Kafka offsets on ZK. Therefore, as already suggested by @mqureshi, it's the best to start by creating a dedicated ZK for Kafka, I'd go for 3 nodes, and keep the 5-node ZK for everything else. Beefing up the number of ZK's to 7 or more is a resounding No. Regarding the installation and management of the new Kafka ZK, it's pretty straightforward to install it manually, just follow the steps in one of "Non-Ambari cluster installation guides" like this one. You can also try to create a cluster composed of only Kafka and ZK and manage it by its own Ambari instance.
... View more
07-17-2018
08:31 AM
@mb I am facing the same issue, could you please advice how to work around or troubleshoot this problem ? Thanks, Nam
... View more
08-31-2016
02:42 AM
1 Kudo
@Cameron Warren You need to first scp your file to Azure. Once that's done, you can do "copyFromLocal" to copy file to your hdfs. hdfs dfs -copyFromLocal /path/to/file /dest/path/on/hdfs
... View more
08-30-2016
01:03 PM
Found the issue! probably LinkedHashSet lib is missing in hive!
... View more
08-24-2016
08:21 PM
Yes I am able to connect, this host is the Ambari Server
... View more