Member since
07-19-2018
613
Posts
99
Kudos Received
117
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2041 | 01-11-2021 05:54 AM | |
1436 | 01-11-2021 05:52 AM | |
3240 | 01-08-2021 05:23 AM | |
3255 | 01-04-2021 04:08 AM | |
14751 | 12-18-2020 05:42 AM |
07-16-2021
04:59 AM
Understood, had to make sure! Next for good measure, make sure flow works for just name. This will show you if the issue is the entire setup or with just "location.city". Next look at the configurations for the Reader/Writer and share those for us to see incase they are not default configs,etc. I believe the particular error: SQLException: Error while preparing statement occurs with a schema conflict or issue with flowfile being different than expected schema.
... View more
07-16-2021
04:48 AM
1 Kudo
I understand. First a NiFi dev recommendation I always suggest: do not route success and failure in the same route. Make them separate. You need to know if the flowfile goes to failure. Also, if you are ignoring certain routes (failure,retry,others) make a habit of routing all them to an output port so you can see where flowfile goes. This concept will help you know where a flowfile when after you push play. One of my dev flows looks like: Once I am satisfied the flow works, and my Success flowfile is on the bottom, i can auto terminate those failures. However, based on your flow, you may for example want to do something different with a failure, like log it, or send an email. Next, I think if you do the above and run your flow, you might see flowfile NOT go to PutCassandraRecord . If it does make it, update the post with the content of the flowfile, and any errors from the PutCassandraRecord. We need to see those errors and what content you are delivering to the processor.
... View more
07-15-2021
07:57 AM
Your sample query has no "quotes" but the one configured does. Just wanted to make sure you tried without those quotes ?
... View more
07-15-2021
07:53 AM
1 Kudo
I am working a lot with NIFI and Cassandra. Please update your post with incoming flowfile format, csv reader configuration, and any errors when you run your flow. These will help myself or others provide more concise reply and hopefully a solution.
... View more
01-19-2021
04:30 AM
@singyik Yes. I believe that is the last free public repo. Who knows how long it will remain. If you are using it i would recommend to fully copy and use the copy.
... View more
01-13-2021
04:42 AM
@dzbeda In a previous lifetime I accomplished getting windows log data and windows metrics using Elastic Beats. There is one winlogbeat which is great. Even using regular file beats you can make custom listener. This leverages the ELK stack, (elasticsearch, logstash, kibana, beats), but is an interesting look, and connecting in NiFi through the elk indexes on that log data. The other method i have used is Minifi, as suggested to @ashinde, but this is a technical challenge with some difficult hurdles to get a data flow working in windows and wired up to Nifi. If you take this route I would challenge you to create an article here in the community to share your solution. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
01-11-2021
05:54 AM
You must have the reader incorrectly configured for your CSV schema.
... View more
01-11-2021
05:52 AM
2 Kudos
@Lallagreta You should be able to define the filename, or change the filename to what you want. That said the filename doesnt dictate the type, so you can have parquet saved as .txt. One recommendation I have is to use parquet command line tools during the testing of your use case. This is the best way to validate that files are looking right, have the right schema, and right results. https://pypi.org/project/parquet-tools/ I apologize i do not have any exact samples, but from my recall of a year ago, you should be able to get simple commands to check schema of a file, and another command to show the data results. You may have to copy your hdfs file to local file system to inspect them from command line. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
01-08-2021
09:54 AM
1 Kudo
@Lallagreta The solution you are looking for is to leverage NiFi Parquet Processors w/ Parquet Record Reader/Writer Some fun links: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-parquet-nar/1.11.4/org.apache.nifi.parquet.ParquetRecordSetWriter/index.html https://community.cloudera.com/t5/Community-Articles/Apache-NiFi-1-10-Support-for-Parquet-RecordReader/ta-p/282390 The Parquet procs are part of Nifi1.10 and up, but you can also install the nars into any older nifi versions: https://community.cloudera.com/t5/Support-Questions/Can-I-put-the-NiFi-1-10-Parquet-Record-Reader-in-NiFi-1-9/m-p/286465 If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
01-08-2021
05:23 AM
2 Kudos
@murali2425 The solution you are looking for is QueryRecord configured with a CSV Record Reader and Record Writer. You also have UpdateRecord and ConvertRecord which can use the Readers/Writers. This method is preferred over splitting the file and adds some nice functionality. This method allows you to provide a schema for both the inbound csv (reader) and the downstream csv (writer). Using QueryRecord you should be able to split the file, and set attribute of filename set to column1. At the end of the flow you should be able to leverage that filename attribute to resave the new file. You can find some specific examples and configuration screen shots here: https://community.cloudera.com/t5/Community-Articles/Running-SQL-on-FlowFiles-using-QueryRecord-Processor-Apache/ta-p/246671 If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
01-04-2021
05:07 AM
@schnell Glad you were able to find the remnant that blocked re-install. Here is my SO reply, which gives some details about how to completely remove HDP and components from node filesystem... With ambari, any service that is deleted with the UI, will still exist on the original node(s) the service was installed on. You would need to manually remove them from the node(s). This process is hard to find documentation on, but basically goes as follows: Delete the application from file system locations such as /etc/ /var/ /opt/ etc Remove user accounts/groups You can find some more details in this blog post here which goes into some of the detail for completely removing HDP. Just follow steps for single service. https://henning.kropponline.de/2016/04/24/completely-uninstall-remove-hdp-nodes/ https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.3.2/bk_installing_manually_book/content/ch_uninstalling_hdp_chapter.html https://gist.github.com/hourback/085500397bb2588964c5 If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
01-04-2021
04:08 AM
@pacman You would do better to create your own new Question with the same errors and any additional information you can provide. Its unlikely the original poster will respond. Just incase he does, and just incase this helps: I think the solution you are looking for is to add a DistributedMapCacheServer. The screenshot above is just the client. The client needs a server running/enabled in order to be operational. Another suggestion is to make sure that there is connectivity to the map cache port, between nifi,mysql, and the map cache port. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
12-22-2020
11:42 AM
Amazing work here sir!
... View more
12-21-2020
05:03 AM
@nhakhoaparis. I often seen questions like this and although there are many solutions, the one i prefer is to leave the source table alone, create another table with the modified schema. In one form of this concept, you leave the old table behind, and INSERT INTO new_table SELECT * FROM old_table. This is immutable table is also common concept for parquet. You do not modify columns. You read, change, then re-write. One way to complete the above concept in hive query language: select parquet into a non parquet table, do your work to modify the new table, update the new column, etc, then select back into a new parquet table with the new schema. You can also do some of the above with spark or other programming languages. Many options, but in summary: leave the source table alone and create a new tables. During the course, i like to call these staging tables, and sometimes keep the new names, or drop original table, and rename new table to the old table. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
12-18-2020
05:42 AM
1 Kudo
Excellent news. Can you accept the first 2 responses to close this solution?
... View more
12-18-2020
05:40 AM
1 Kudo
@hakansan The error is stating your hard drive is full: could not write to file "pg_logical/replorigin_checkpoint.tmp": No space left on device "no space left on device" The solution you need is to investigate cleaning out some files to free up space, expanding disk, etc. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
12-17-2020
01:33 PM
Thats weird.. did you scroll the attributes list down (just in case its that simple)? i cant see the full modal so not sure. The config of each attribute looks okay so does the json.
... View more
12-17-2020
05:58 AM
1 Kudo
@justenji Great response! @GMAN I have some templates that may help you get a head start: https://github.com/steven-matison/NiFi-Templates There are 2 InvokeHttp Examples.
... View more
12-15-2020
05:06 AM
@jainN Great looking flow. The modification you need is to simply remove json route which is combined with csv. Connect json route from Notify to FetchFile. You may need to adjust the wait/notify so that csv is released when you want. The wait/notify is often trickey, so i would recommend working with wait/notify until you understand their behavior. Here is a good article: https://community.cloudera.com/t5/Community-Articles/Trigger-based-Serial-Data-processing-in-NiFi-using-Wait-and/ta-p/248308 You may find other articles/posts here if you do some deeper research on Wait/Notify. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
12-14-2020
01:12 PM
@jainN If you are looking to route flowfiles that end in json versus those that are not, check out RouteOnAttribute with something similar to json => ${filename:endsWith('.json')}. You would use this after using your method of choice to list/fetch the files which would then provide a $filename for every flowfile. With this json property added to RouteOnAttribute you can drag the json route to a triggering flow, and send everything else (not json: unmatched) to a holding flow. NiFi Wait/Notify should be able to provide the trigger logic, but there are many other ways to do it with out wait/notify by using another datastore, map cache, etc. For example, your non json flow could simply write to a new location and finish. Then your json flow can process that new location some known amount of time later. The logic there is your use case ofc ourse, the point is to use RouteOnAttribute to split your flow. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
12-14-2020
07:05 AM
@toutou From your hdfs cluster you need hdfs-site.xml and correct configuration for PutHDFS. You may also need to satisfy creating a user with permissions on the hdfs location. Please share PutHDFS processor configuration, and error information to allow community members to respond with specific feedback required to solve your issue. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
12-04-2020
05:16 AM
@SandeepG01 Ahh no fun with bad filenames. Space in filename is highly not recommended in these days and times. That said, a solution you might try is to \ (backslash) the space. Especially in the context of passing the filename in flowfile attributes. If you still need to allow spaces and cannot resolve upstream (do not use spaces), i might suggest submitting your experience over on the NiFI jira as a bug: https://issues.apache.org/jira/projects/NIFI/issues If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
12-01-2020
08:28 AM
1 Kudo
The problem is that you need something to store the dynamic schemas in. That is where the Schema Registry comes in as it provides a UI and api to add/update/delete schemas. These can then be refrenced from NiFi. It looks like AvroSchemaRegistry allows you to do the similar, minus the ui/api. So you would need to create your schema in your flow, as attribute, and send that to AvroRecorderReader configured against AvroSchemaRegistry. You could use some other data store to store these schemas, but you would need to pull them out into an attribute of the same name configured in the Reader and Registry. https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-registry-nar/1.12.1/org.apache.nifi.schemaregistry.services.AvroSchemaRegistry/index.html The latter method does not give you a way to manage all the schemas, which is why I reference the Hortonworks Schema Registry which does include ability to manage, version actual schemas.
... View more
12-01-2020
07:49 AM
You can leverage "Attributes to send" or if you stop processor and click + you can add custom attributes right on bottom of processor config. If you are not getting anything out of response (failure, retry, no-retry, etc) then you definitely have a connectivity issue from Nifi outbound...
... View more
12-01-2020
05:09 AM
@Ksad Excellent work to completely show us what you have. Also, excellent work to test and confirm your request work in postman first. This is always one of the first things I do to make sure i have a valid test connection and all settings to connect to the API before attempting with InvokeHttp. When you take this route, and you cannot get a response, this indicates a networking issue with Nifi to [salesforce domain]. You should test command line from NiFi node to [salesforce domain] using curl, wget, telnet, etc. Next if you can confirm connectivity, attempt to adjust the processor time outs. Some systems need longer than defaults. For example I sometimes set to 50 and 150 by just adding a 0 to the 2 values (connection and read timeout). f it did time out it should throw an error. You can also set the processor log level to debug to expose more verbose output in the NiFi UI. Last but not least, tail the nifi-app.log file while doing all nifi flow debugging. Sometimes more useful information is found there. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
12-01-2020
04:59 AM
@Vamshi245 Yes, HandleHttpRequest and HandleHttpResponse are used in tandem. Behind the processors is a map cache which holds connection session between request/response processors. If your flowfile (json) coming out of custom HandleHttpRequest is delivered to stock HandleHttpResponse, it will send the json to the original connecting client. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
11-30-2020
11:57 AM
As suggested above, update post with your processor, its reader and writier settings. It sounds like you have something misconfigured. If possible show us a screen shot of your flow too.
... View more
11-30-2020
11:51 AM
@Chigoz Your issue with that sandbox cluster is likely too many services trying to run on too small of an instance/node. You will need to strategically turn on the components you need individually starting with HDFS first. If you have issues specific to the sandbox, or certain components starting, you should open a post with those specific errors. To install HUE, check out my management pack: https://github.com/steven-matison/dfhz_hue_mpack Local search for "hue install" topics includes articles reference above hue mpack: https://community.cloudera.com/t5/forums/searchpage/tab/message?advanced=false&allow_punctuation=false&q=install%20hue If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
11-30-2020
05:02 AM
@naga_satish yes, what you are looking for is the Schema Registry: https://docs.cloudera.com/HDPDocuments/HDF3/HDF-3.0.0/bk_schema-registry-user-guide/content/ch_integrating-schema-registry.html The schema registry can be configured in NiFI, then the schema you create there are available in NiFi Record Readers and Writers. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
11-30-2020
04:59 AM
You will need to define your schema in avro format, drop that into the readers/writers. Here is an example: {
"type" : "record",
"name" : "DailyCSV",
"fields" : [
{ "name" : "DepartmentName" , "type" : ["string", "null"] },
{ "name" : "AccountName" , "type" : ["string", "null"] },
{ "name" : "AccountOwnerId", "type" : ["string", "null"] },
{ "name" : "AdditionalInfo", "type" : [ "null", "string" ] }
]
}
... View more