Member since
09-23-2015
88
Posts
109
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
7034 | 08-24-2016 09:13 PM |
09-03-2019
01:28 PM
hi @nshawa, I am having the following error on PutHiveStreaming processor after running the template you provided: Any idea how to fix this?
... View more
12-07-2016
03:38 PM
Issues: 1) In your table definition "create table ..." you do not specify the LOCATION attribute of your table. Therefore Hive will default to look for the file in the default warehouse dir path. The location in your screenshot is under /user/admin/. You can run the command "show create table ..." to see where Hive thinks the table's files are located. By default Hive creates managed tables, where files, metadata and statistics are managed by internal Hive processes. A managed table is stored under the hive.metastore.warehouse.dir path property, by default in a folder path similar to /apps/hive/warehouse/databasename.db/tablename/. The default location can be overridden by the location property during table creation. 2) You are specifying the format using hive.default.fileformat. I would avoid using this property. Instead simply use "STORED AS TEXTFILE" or "STORED AS ORC" in your table definition. Please change the above, retest and let us know how that works
... View more
10-12-2016
06:04 PM
The mergeContent Processor simply bins and merges the FlowFiles it sees on an incoming connection at run time. In you case you want each bin to have a min 100 FlowFiles before merging. So you will need to specify that in the "Minimum number of entries" property. I never recommend setting any minimum value without also setting the "Max Bin Age" property as well. Let say you only ever get 99 FlowFiles or the amount of time it takes to get to 100 exceeds the useful age of the data being held. Those Files will sit in a bin indefinitely or for excessive amount of time unless that exit age has been set. Also keep in mind that if you have more then one connection feeding your mergeContent processor, on each run it looks at the FlowFiles on only one connection. It moves in round robin fashion from connection to connection. NiFi provides a "funnel" which allows you to merge FlowFiles from many connections to a single connection. Matt
... View more
12-15-2016
12:57 PM
Thanks for the update!
... View more
03-17-2016
10:06 PM
1 Kudo
Please find below a potential design for Disk & RAID configuration for a typical 12 disk Server running NiFi. This design is intended for a simple log ingestion use case, where customer needs very little provenance records, but would also like reliability on the storage layer. FlowFile repo: 2 drives setup as RAID 1
Provenance repo: 2 Drives RAID 1 Content repo setup either:
4 drives (RAID 10) /cont_repo1
4 drives (RAID 10) /cont_repo2
or
2 drives (RAID 1) /cont_repo1
2 drives (RAID 1) /cont_repo2
2 drives (RAID 1) /cont_repo3
2 drives (RAID 1) /cont_repo4 Thanks @mpayne & @Andrew Grande for guidance!
... View more
Labels:
03-15-2016
07:33 PM
1 Kudo
for you scenario with 12 disks (assuming all disk are 200 GB)
You can specify/define multiple Content repos and multiple Provenance repos; however, you can only define one FlowFile repository and one database repository.
- 8 disks for Content repos:
- /cont_repo1 <-- 200 GB
- /cont_repo2 <-- 200 GB
- /cont_repo3 <-- 200 GB
- /cont_repo4 <-- 200 GB
- /cont_repo5 <-- 200 GB
- /cont_repo6 <-- 200 GB
- /cont_repo7 <-- 200 GB
- /cont_repo8 <-- 200 GB
- 2 disks for Provenance repos:
- /prov_repo1 <-- 200 GB
- /prov_repo2 <-- 200 GB
- 1 disk split into multiple partitions for:
- /var/log/nifi-logs/ <-- 100 GB
-
OS partitions <-- split amongst other Standard OS (/tmp, /, etc...)
- 1 disk split into multiple partitions for:
- /opt/nifi <-- 50 GB
- /flowfile_repo/ <-- 50 GB
- /database_repo/ <-- 25 GB
- /opt/configuration-resources <-- 25 GB (this will hold any certs, config files, extras your NiFi processors/ dataflows may need).
... View more
02-12-2016
01:34 AM
1 Kudo
Very helpful guys. Appreciated!
... View more
02-01-2016
08:56 PM
2 Kudos
Wes - I know you are asking for REST API in your question but it seems to me that it would be better suited to pull this information using from Flume's JMX MBeans. It sounds to me like you are looking for lower level metrics like memory used, cpu, etc.
... View more
02-01-2016
05:24 PM
It would be really convenient if PigStorage Serde would exist as a Pig function as well. Then one could load it as a String check if its valid with SPLIT and then parse it into a tuple. Something like: A = LOAD 'myfile'; B = SPLIT IF PigStorage_valid($0) GOODDATA, OTHERWISE BADDATA; C = FOREACH B GENERATE PigStorage_parse($0) ... But since this doesn't exist I think the only options are to write these functions yourself or as Artem says use a regex, filter, ... to verify correctness write it and load it again with PigStorage.
... View more
01-26-2016
03:30 PM
Great point @Guilherme Braccialli ! I'll investigate and offer this to the customer.
... View more