About dorio

dorio · ‎12-15-2015

@Pardeep Yes, we had to ask Microsoft Azure Support to increase limits for both Cores and Storage Account. Look at this link: https://azure.microsoft.com/en-us/blog/azure-limits-quotas-increase-requests/

dorio · ‎12-11-2015

@jramakrishnan Do you plan to support Sqoop2 in the near future?

dorio · ‎12-11-2015

We've some mainframe with DB/2 and we need to bring every new record written inside DB/2 to Hadoop (Hive ORC table) in the fastest way possible. Is Nifi capable of doing such a thing?

dorio · ‎12-11-2015

@Ancil McBarnett Thanks! We need to keep indexes on HDFS but we need also to index files (about 500.000) on HDFS (PDF, EML and P7F). Following your suggestion could we deploy Solr on all DataNodes and also on two master nodes? @azeltov So is it correct to say that any Solr could service request on HTTP port 8983 (both Solr and Banana)? Do you have some suggestion about the load balancer? Thanks a lot!

dorio · ‎12-11-2015

We had some performance issue with a low profile config (4 vCores, 8 GB RAM), expecially with Oozie. Right now we reccomend at least 4 vCores and 24 GB or RAM). If you're planning a IaaS deployment on Azure using as metadata repository SQL Azure start directly with a S2/S3 instance: if you use Oozie it's the minimum requirement. Pay also attention to Ranger: it's OK to Ranger admin and users but for audits you need to look carefully at DB sizes: it can easly grow up faster. Use a script to truncate the table or use a different instance.

dorio · ‎12-10-2015

I think it's also a good starting point to use Availability sets for master nodes and worker nodes. Another good point is about using one storage account for every nodes in the cluster in order to bypass IOPS limits for multiple VMs on the same Storage Account. You can also try to use Azure Data Lake Store (with adl://) in order to check the performance on the new Azure service. You also need to remember the maintenance windows of every Azure region according to your customers: some regions could be a good choice for new service availability (e.g.: US East 2) but not from a maintenance point of view (expecially for european customers). We also verified great differences between IaaS performance and PaaS (HDInsight) performance due to low read/write performance of the Blob Storage: with the first one (configured correctly) you can achieve best performance.

dorio · ‎12-10-2015

We just released on a production cluster (HDP 2.2.8), Waterlinedata (http://www.waterlinedata.com/😞 it's a great tool for metadata enrichment, data dictionary, data lineage and autodiscovery for HDFS and Hive data that run on top of Hadoop (YARN). It's ready to "speak" with Atlas thru API and you have a great Web UI. One of the coolest feature is the possibility to create, thru Web UI, an external Hive table from an HDFS in 2 clicks.

dorio · ‎12-10-2015

While storage space is absolutely critical as @Neeraj Sabharwal and @Ali Bajwa wrote in their post we just "discovered" that also CPU is a key point. When HWX released AMS we began to deploy Ambari and AMS on the same machine, but soon the understood that for a production environment it could be a good practice to use one VM for Ambari and another VM for AMS, so the really high impact on computation resources of AMS didn't impact Ambari (sometimes, during the aggregation phase we got 16 CPU at 90% for 10/15 minutes).

dorio · ‎12-10-2015

We need to deploy Solr 5.2.1 on HDP 2.3.2 on a production environment (3 master nodes with HA on HDFS, YARN and Hive, 13 worker nodes, 2 edge, 2 support and 2 security). Is there a "best practice" for production? This is a multi-purpose cluster in which Hive, Pig, HOYA and Spark jobs are currently running.

dorio · ‎12-10-2015

Do you think Solr on YARN is ready for a PoC?

Online	Offline
Last Visited	‎04-29-2017 08:21 PM

Member Since	‎12-09-2015 08:30 PM
Last Visited	‎04-29-2017 08:21 PM
Posts	37
Kudos received	27

Cloudera Community

Re: Networking and edge nodes

Re: Database Sizing and recommendation for Ambari ...

Re: Recommendations for Microsoft Azure HDP Deploy...

Re: Ambari View: How many users per server

Re: Recommendations for Microsoft Azure HDP Deploy...

Re: Can NiFi be used to pipe the data from Oracle ...

Change data capture from DB2 using NiFi

Re: Solr architecture for a production environment

Re: Database Sizing and recommendation for Ambari ...

Re: Recommendations for Microsoft Azure HDP Deploy...

Re: metadata enrichment, data dictionary and data ...

Re: Storage space consideration for deploying AMS

Solr architecture for a production environment

Re: Installing Solr on yarn using Slider