Member since
09-29-2015
871
Posts
723
Kudos Received
255
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3348 | 12-03-2018 02:26 PM | |
2302 | 10-16-2018 01:37 PM | |
3615 | 10-03-2018 06:34 PM | |
2392 | 09-05-2018 07:44 PM | |
1814 | 09-05-2018 07:31 PM |
08-14-2018
02:33 PM
The parquet data itself has the schema, and your writer should be configured with schema access strategy to inherit schema from reader. Schema Access Strategyinherit-record-schema
Use 'Schema Name' Property Inherit Record Schema Use 'Schema Text' Property This will produce a flow file with many records. If you need 1 record per flow file then you would use SplitRecord after this, however generally it is better to keep many records together.
... View more
08-14-2018
01:37 PM
FetchParquet has a property for a record writer... when it fetches the parquet, it will read it record by record using Parquet's Avro reader, and then pass each record to the configured writer. So if you configured it with a JSON record writer, then the resulting flow file that is fetched will contain JSON. If you wanted to fetch raw parquet then you wouldn't use FetchParquet, but would instead just use FetchHDFS which fetches bytes unmodified.
... View more
08-07-2018
01:37 PM
1 Kudo
https://pierrevillard.com/2017/01/24/integration-of-nifi-with-ldap/ https://ijokarumawak.github.io/nifi/2016/11/15/nifi-auth/ https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#ldap_login_identity_provider
... View more
08-01-2018
01:24 PM
1 Kudo
How do you plan to determine the schema from your json? are you saying you want to infer a schema based on the data? Typically this approach doesn't work that great because it is hard to guess the correct type for a given field. Imagine the first record has a field "id" and the value is "1234" so it looks like it is a number, but the second record has id as "abcd", so if it guesses a number based on the first record then it will fail on the second record because its not a number. There is a processor that attempts to do this though, InferAvroSchema... you could probably do something like InferAvroSchema -> ConvertJsonToAvro -> PutParquet with Avro Reader.
... View more
07-31-2018
04:43 PM
The response of the POST should be the process group entity with the id populated, and in addition there should be a header that has the URI of the created process group.
... View more
07-26-2018
01:26 PM
Just wanted to add some more info... The Parquet Java API only allows reading and writing to and from Hadoop's Filesystem API, this is why NiFi currently can't provide a standard record reader and writer because those require reading and writing to Java's InputStream and OutputStream, which Parquet doesn't provide. So PutParquet can be configured with a record reader to handle any incoming data, and then converts it to Parquet and writes to HDFS, basically it has a record writer encapsulated in it that can only write to HDFS. FetchParquet does the reverse where it can read Parquet files from HDFS and then can be configured with a record writer to write them out as any form, in your case CSV. You can always create core-site.xml with a local filesystem to trick the Parquet processors into using local disk instead of HDFS.
... View more
07-03-2018
05:07 PM
If ListFTP is showing an error in the UI then that error message is in nifi-app.log somewhere, please provide the full stacktrace that goes with that error.
... View more
07-02-2018
04:52 PM
When you say it is "not working", what exactly is happening? There can only really be three possible outcomes... a) it fetched successfully b) it did not fetch successfully and there is a red error on the processor in the UI, and an error in nifi-app.log c) it is stuck connecting and there is a little number icon in the top-right of the processor which shows that 1 thread is still running trying to execute Which of these choices describes the result?
... View more
07-02-2018
02:24 PM
1 Kudo
The remote input host is the hostname that current node will advertise when another NiFi instance asks for information about the cluster, so it can only be one value. As an example NiFi cluster #1 sending data to NiFi cluster #2... the remote input host is only relevant on cluster #2 here. There will be an Remote Process Group (RPG) on cluster #1 with a URL (or comma separated list of URLS) for cluster #2, it will then ask cluster #2 for cluster information, and cluster #2 will respond with hostnames of the nodes in cluster #2, which will be based off the value of remote input host on each node in cluster #2.
... View more