Member since
12-09-2015
37
Posts
28
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5087 | 12-15-2015 07:47 AM | |
2590 | 12-11-2015 07:51 PM | |
3570 | 12-10-2015 10:06 PM | |
2572 | 12-09-2015 09:17 PM |
12-15-2015
07:28 AM
2 Kudos
@Pardeep Yes, we had to ask Microsoft Azure Support to increase limits for both Cores and Storage Account. Look at this link: https://azure.microsoft.com/en-us/blog/azure-limits-quotas-increase-requests/
... View more
12-11-2015
08:25 PM
@jramakrishnan Do you plan to support Sqoop2 in the near future?
... View more
12-11-2015
08:22 PM
We've some mainframe with DB/2 and we need to bring every new record written inside DB/2 to Hadoop (Hive ORC table) in the fastest way possible. Is Nifi capable of doing such a thing?
... View more
Labels:
- Labels:
-
Apache NiFi
12-11-2015
08:16 PM
@Ancil McBarnett Thanks! We need to keep indexes on HDFS but we need also to index files (about 500.000) on HDFS (PDF, EML and P7F). Following your suggestion could we deploy Solr on all DataNodes and also on two master nodes? @azeltov So is it correct to say that any Solr could service request on HTTP port 8983 (both Solr and Banana)? Do you have some suggestion about the load balancer? Thanks a lot!
... View more
12-11-2015
07:51 PM
1 Kudo
We had some performance issue with a low profile config (4 vCores, 8 GB RAM), expecially with Oozie. Right now we reccomend at least 4 vCores and 24 GB or RAM). If you're planning a IaaS deployment on Azure using as metadata repository SQL Azure start directly with a S2/S3 instance: if you use Oozie it's the minimum requirement. Pay also attention to Ranger: it's OK to Ranger admin and users but for audits you need to look carefully at DB sizes: it can easly grow up faster. Use a script to truncate the table or use a different instance.
... View more
12-10-2015
10:06 PM
3 Kudos
I think it's also a good starting point to use Availability sets for master nodes and worker nodes. Another good point is about using one storage account for every nodes in the cluster in order to bypass IOPS limits for multiple VMs on the same Storage Account. You can also try to use Azure Data Lake Store (with adl://) in order to check the performance on the new Azure service. You also need to remember the maintenance windows of every Azure region according to your customers: some regions could be a good choice for new service availability (e.g.: US East 2) but not from a maintenance point of view (expecially for european customers). We also verified great differences between IaaS performance and PaaS (HDInsight) performance due to low read/write performance of the Blob Storage: with the first one (configured correctly) you can achieve best performance.
... View more
12-10-2015
09:53 PM
We just released on a production cluster (HDP 2.2.8), Waterlinedata (http://www.waterlinedata.com/😞 it's a great tool for metadata enrichment, data dictionary, data lineage and autodiscovery for HDFS and Hive data that run on top of Hadoop (YARN). It's ready to "speak" with Atlas thru API and you have a great Web UI. One of the coolest feature is the possibility to create, thru Web UI, an external Hive table from an HDFS in 2 clicks.
... View more
12-10-2015
09:46 PM
1 Kudo
While storage space is absolutely critical as @Neeraj Sabharwal and @Ali Bajwa wrote in their post we just "discovered" that also CPU is a key point. When HWX released AMS we began to deploy Ambari and AMS on the same machine, but soon the understood that for a production environment it could be a good practice to use one VM for Ambari and another VM for AMS, so the really high impact on computation resources of AMS didn't impact Ambari (sometimes, during the aggregation phase we got 16 CPU at 90% for 10/15 minutes).
... View more
12-10-2015
09:38 PM
1 Kudo
We need to deploy Solr 5.2.1 on HDP 2.3.2 on a production environment (3 master nodes with HA on HDFS, YARN and Hive, 13 worker nodes, 2 edge, 2 support and 2 security). Is there a "best practice" for production? This is a multi-purpose cluster in which Hive, Pig, HOYA and Spark jobs are currently running.
... View more
Labels:
- Labels:
-
Apache Solr
- « Previous
-
- 1
- 2
- Next »