About wfloyd

DianaTorres · ‎03-18-2025

@Scorpy257 As this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post. Thanks.

Koffi · ‎09-03-2019

hi @nshawa, I am having the following error on PutHiveStreaming processor after running the template you provided: Any idea how to fix this?

wfloyd · ‎12-07-2016

Issues: 1) In your table definition "create table ..." you do not specify the LOCATION attribute of your table. Therefore Hive will default to look for the file in the default warehouse dir path. The location in your screenshot is under /user/admin/. You can run the command "show create table ..." to see where Hive thinks the table's files are located. By default Hive creates managed tables, where files, metadata and statistics are managed by internal Hive processes. A managed table is stored under the hive.metastore.warehouse.dir path property, by default in a folder path similar to /apps/hive/warehouse/databasename.db/tablename/. The default location can be overridden by the location property during table creation. 2) You are specifying the format using hive.default.fileformat. I would avoid using this property. Instead simply use "STORED AS TEXTFILE" or "STORED AS ORC" in your table definition. Please change the above, retest and let us know how that works

MattWho · ‎10-12-2016

The mergeContent Processor simply bins and merges the FlowFiles it sees on an incoming connection at run time. In you case you want each bin to have a min 100 FlowFiles before merging. So you will need to specify that in the "Minimum number of entries" property. I never recommend setting any minimum value without also setting the "Max Bin Age" property as well. Let say you only ever get 99 FlowFiles or the amount of time it takes to get to 100 exceeds the useful age of the data being held. Those Files will sit in a bin indefinitely or for excessive amount of time unless that exit age has been set. Also keep in mind that if you have more then one connection feeding your mergeContent processor, on each run it looks at the FlowFiles on only one connection. It moves in round robin fashion from connection to connection. NiFi provides a "funnel" which allows you to merge FlowFiles from many connections to a single connection. Matt

pminovic · ‎12-15-2016

Thanks for the update!

wfloyd · ‎03-17-2016

Please find below a potential design for Disk & RAID configuration for a typical 12 disk Server running NiFi. This design is intended for a simple log ingestion use case, where customer needs very little provenance records, but would also like reliability on the storage layer. FlowFile repo: 2 drives setup as RAID 1 Provenance repo: 2 Drives RAID 1 Content repo setup either: 4 drives (RAID 10) /cont_repo1 4 drives (RAID 10) /cont_repo2 or 2 drives (RAID 1) /cont_repo1 2 drives (RAID 1) /cont_repo2 2 drives (RAID 1) /cont_repo3 2 drives (RAID 1) /cont_repo4 Thanks @mpayne & @Andrew Grande for guidance!

wfloyd · ‎02-12-2016

Very helpful guys. Appreciated!

jdyer · ‎02-01-2016

Wes - I know you are asking for REST API in your question but it seems to me that it would be better suited to pull this information using from Flume's JMX MBeans. It sounds to me like you are looking for lower level metrics like memory used, cpu, etc.

bleonhardi · ‎02-01-2016

It would be really convenient if PigStorage Serde would exist as a Pig function as well. Then one could load it as a String check if its valid with SPLIT and then parse it into a tuple. Something like: A = LOAD 'myfile'; B = SPLIT IF PigStorage_valid($0) GOODDATA, OTHERWISE BADDATA; C = FOREACH B GENERATE PigStorage_parse($0) ... But since this doesn't exist I think the only options are to write these functions yourself or as Artem says use a regex, filter, ... to verify correctness write it and load it again with PigStorage.

wfloyd · ‎01-26-2016

Great point @Guilherme Braccialli ! I'll investigate and offer this to the customer.

Online	Offline
Last Visited	‎04-24-2017 02:32 PM

Member Since	‎09-23-2015 09:15 PM
Last Visited	‎04-24-2017 02:32 PM
Posts	88
Kudos received	109

Cloudera Community

Re: Is there is any workaround to map csv columns ...

Re: NiFi Repository - Typical Disk Usage Ratios am...

Re: Stream data into HIVE like a Boss using NiFi H...

Re: Destination table is stored as ORC but the fil...

Re: Load balancing while the fetching of file fro...

Re: Does Knox’s Hive service configuration support...

NiFi Hypothetical Disk Layout and RAID Configurati...

Re: Do HBase and HDFS need to be co-located on the...

Re: Best API to pull Flume Metrics from Ambari

Re: Error Handling during Pig LOAD Function

Re: Can we configure HBase to use multiple WAL Cod...