About mqureshi

mqureshi · ‎11-05-2016

You cannot do this. This is not supported as @Artem Ervits has already stated. Imagine what would happen to writes when clusters span multiple data centers? Remember, networks are assumed to be unreliable and unsecured. Now, I hate to say this and please don't do it as it is unsupported but amazon offers VPC which makes AWS an extension of your network using a VPN.

jstraub · ‎10-31-2016

@Hoang Le If you have 5 nodes, you can use 2 nodes as Masternodes and 3 as Workernodes (slave nodes).

georgios_gkekas · ‎04-06-2017

This problem has been happening on our side since many months as well. Both with Spark1 and Spark2. Both while running jobs in the shell as well as in Python notebooks. And it is very easy to reproduce. Just open a notebook and let it run for a couple of hours. Or just do some simple dataframe operations in an infinite loop. There seems to be something fundamentally wrong with the timeout configurations in the core of Spark. We will open a case for that as no matter what kind of configurations we have tried, the problem insists.

MattWho · ‎10-12-2016

@Saikrishna Tarapareddy The FlowFile repo will never get close to 1.2 TB in size. That is a lot of wasted money on hardware. You should inquire with your vendor about having them split that Raid in to multiple logical volumes, so you can allocate a large portion of it to other things. Logical Volumes is also a safe way to protect your RAID1 where you OS lives. If some error condition should occur that results in a lot of logging, the application logs may eat up all your disk space affecting you OS. With logical volumes you can protect your root disk. If not possible, I would recommend changing you setup to a a bunch of RAID1 setups. With 16 x 600 GB hard drives you have allocated above, you could create 8 RAID1 disk arrays. - 1 for root + software install + database repo + logs (need to make sure you have some monitioring setup to monitor disk usage on this RAID if logical volumes can not be supported) - 1 for flowfile repo - 3 for content repo - 3 for provenance repo Thanks, Matt

Eric_Periard · ‎09-29-2016

Changed YARN Java heap size from 1Gb to 4... it still dies?

pminovic · ‎09-08-2016

The more the nodes in a ZK ensemble (quorum) the faster the reads but the slower the writes. That's because a read can be done from any node, but a write is not complete before all nodes are updated. On top of that, early versions of Kafka (0.8.2 and older) keep Kafka offsets on ZK. Therefore, as already suggested by @mqureshi, it's the best to start by creating a dedicated ZK for Kafka, I'd go for 3 nodes, and keep the 5-node ZK for everything else. Beefing up the number of ZK's to 7 or more is a resounding No. Regarding the installation and management of the new Kafka ZK, it's pretty straightforward to install it manually, just follow the steps in one of "Non-Ambari cluster installation guides" like this one. You can also try to create a cluster composed of only Kafka and ZK and manage it by its own Ambari instance.

dolenam317 · ‎07-17-2018

@mb I am facing the same issue, could you please advice how to work around or troubleshoot this problem ? Thanks, Nam

mqureshi · ‎08-31-2016

@Cameron Warren You need to first scp your file to Azure. Once that's done, you can do "copyFromLocal" to copy file to your hdfs. hdfs dfs -copyFromLocal /path/to/file /dest/path/on/hdfs

abhishek8236 · ‎08-30-2016

Found the issue! probably LinkedHashSet lib is missing in hive!

prashanthi1601 · ‎08-24-2016

Yes I am able to connect, this host is the Ambari Server

Online	Offline
Last Visited	‎10-31-2017 03:17 AM

Member Since	‎06-07-2016 09:05 AM
Last Visited	‎10-31-2017 03:17 AM
Posts	923
Kudos received	310

Cloudera Community

Re: YARN recommended configuration

Re: How to resolve for NULL values when they are c...

Re: Why is spark has better speed than Hadoop

Re: Is it possible to assign Hadoop queues to Hado...

Re: Kafka NiFi HDF Installation

Re: Adding Hosts from different datacenters

Re: [HDP]How Cluster design with two server?

Re: Spark job stage cancelled because SparkContext...

Re: NiFI Server Configuration

Re: Lz0 is enabled now what?

Re: ZK Best Practices

Re: Spark sqlContext - unable to access hbase tabl...

Re: How to upload data from local machine to HDFS ...

Re: Java UDF returning different output while call...

Re: Java is unable to read the Kerberos credential...