About wfloyd

wfloyd · ‎03-11-2016

Do we have any experience on typical disk usage rations for each of the repositories (Flow file, content, and provenance)? E.g. if Content requires 200 GB of storage, the provenance and flow file would require 20 GB (for typical flows)? Trying to use this information to decide how best to slice of a NiFi server which has 12 local drives. E.g. 8 drives allocated for Content, 2 for flow file, and 2 for provenance. Appreciate any thoughts!

wfloyd · ‎02-12-2016

Very helpful guys. Appreciated!

wfloyd · ‎02-09-2016

Customer has "Cluster A" (20 node standard Hadoop cluster: HDFS, YARN, Hive, etc. but no HBase). Customer is adding "Cluster B" (6 nodes dedicated for HBase use). Cluster A and Cluster B are on neighoring racks in the same datacenter, same VLAN, etc. Is it technically safe/possible to install the RegionServers in "Cluster B", but point them to the HDFS instance in "Cluster A"? If this is possible, what compromises would we make in terms of HBase performance? Certain SCANs would be more slow as the RegionServers loaded remote HFiles into memory? Writes would be more slow due to no DataNode service running in Cluster B with HBase servers? Thanks!

wfloyd · ‎02-05-2016

Found additional information in the article by @mclark here: https://community.hortonworks.com/articles/8607/how-to-create-nifi-fault-tolerance-using-multiple.html

wfloyd · ‎02-03-2016

@Artem Ervits - this question has not yet been completely answered.

wfloyd · ‎02-03-2016

Thank you Pardeep. This helps me understand how to see Table level statistics. Do you have a solution for Column level stats also?

wfloyd · ‎02-01-2016

What is the best way to access the Flume metrics data via REST API which is shown in the Ambari Flume service page (image attached). Tried to access this information via the standard Ambari REST API, however it only gave me high level information about the Flume service: curl --user admin:admin http://sandbox234:8080/api/v1/clusters/Sandbox/services/FLUME curl --user admin:admin http://sandbox234:8080/api/v1/clusters/Sandbox/hosts/sandbox.hortonworks.com/host_components/FLUME_HANDLER Should this information be available via the Ambari Metrics REST API instead? Thank you

wfloyd · ‎02-01-2016

Does anyone have experience using how Pig can handle error Tuples during the LOAD function? E.g. if we LOAD 10 lines which are comma delimited using PigStorage(',') yet the 9th line of the input data is Pipe delimited. What controls do we have on how these tuples are parsed and which Variable (relation) they are assigned to? Ideally, I'd like to have one Relation/Variable loaded with the successful rows and some other relation holding the rows which were not parsed properly.

wfloyd · ‎01-26-2016

Great point @Guilherme Braccialli ! I'll investigate and offer this to the customer.

wfloyd · ‎01-26-2016

We have a customer who wants to enable HBase to use multiple WAL Codecs at the same time. Is this possible? E.g. Phoenix configuration instructions ask the user to update the value for hbase.regionserver.wal.codec. However, the customer also wants to use the NGData HBase/Solr Indexer which also requires a custom value for h.r.wal.code. Can HBase RegionServer configuration accommodate two WAL codecs in parallel?

Online	Offline
Last Visited	‎04-24-2017 02:32 PM

Member Since	‎09-23-2015 09:15 PM
Last Visited	‎04-24-2017 02:32 PM
Posts	88
Kudos received	109

Cloudera Community

Re: Is there is any workaround to map csv columns ...

NiFi Repository - Typical Disk Usage Ratios among ...

Re: Do HBase and HDFS need to be co-located on the...

Do HBase and HDFS need to be co-located on the sam...

Re: NiFi Clustering: One NCM to manage multiple se...

Re: Hive User Concurrency - Reconciling YARN Capac...

Re: Viewing Hive Column or Table level Statistics

Best API to pull Flume Metrics from Ambari

Error Handling during Pig LOAD Function

Re: Can we configure HBase to use multiple WAL Cod...

Can we configure HBase to use multiple WAL Codecs?