1935
Posts
1198
Kudos Received
119
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1615 | 06-03-2021 07:11 AM | |
917 | 06-01-2021 10:05 AM | |
867 | 05-24-2021 11:58 AM | |
925 | 02-23-2021 07:12 AM | |
1181 | 02-22-2021 06:53 AM |
07-23-2021
05:42 AM
if you read a binary file it should be passed into NiFi with no issue.
... View more
07-22-2021
10:00 AM
You can have a QueryDataTableRecord to watch when changes happen and have that trigger your process. You may want to try Debezium with Cloudera Kafka You may want to try Debezium with Cloudera Flink SQL https://dev.to/tspannhw/simple-change-data-capture-cdc-with-sql-selects-via-apache-nifi-flank-19m4 See: https://github.com/tspannhw/EverythingApacheNiFi https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/replicate-track-change-data-capture-always-on-availability?view=sql-server-ver15 https://debezium.io/documentation/reference/connectors/sqlserver.html https://sandeepkattepogu.medium.com/streaming-data-from-microsoft-sql-server-into-apache-kafka-2fb53282115f https://www.linkedin.com/pulse/achieving-incremental-fetch-change-data-capture-via-apache-rajpal/ https://www.datainmotion.dev/2021/02/using-apache-nifi-in-openshift-and.html
... View more
07-21-2021
01:59 PM
Livy and the sparkinteractive connector aren't stable at this point. it only works with Scala code and a jar. it's hacky. i recommend you call cloudera's CDE envirnment
... View more
06-16-2021
09:50 AM
How are you ingesting the rpm, you need to get it in a flowfile as binary and then send it as body
... View more
06-03-2021
07:11 AM
Regex remove those bad characters https://community.cloudera.com/t5/Support-Questions/nifi-regex-replace-special-characters/td-p/103404 https://community.cloudera.com/t5/Support-Questions/How-do-I-enter-an-unprintable-byte-into-a-nifi-property-that/td-p/203936 https://community.cloudera.com/t5/Support-Questions/Remove-from-a-flow-file-in-Nifi/td-p/109503 https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html http://apache-nifi-users-list.2361937.n4.nabble.com/ReplaceText-and-special-characters-td480.html https://community.cloudera.com/t5/Support-Questions/ReplaceText-quot-processor-does-not-replace-special/td-p/171544 https://community.cloudera.com/t5/Support-Questions/remove-special-characters-from-xml-text-node-using-nifi/td-p/241008 https://community.cloudera.com/t5/Support-Questions/Regex-Special-Character-Escape/m-p/239556#M201365 Could also use UpdateRecord on Json with infer with replace or replaceregex https://nifi.apache.org/docs/nifi-docs/html/record-path-guide.html
... View more
06-01-2021
02:29 PM
1 Kudo
CountText will count lines (\r\n). QueryRecord will count # of records, even if it is two records on a line
... View more
06-01-2021
10:12 AM
1 Kudo
it's only for new saves. you probably need to export those with NiFi Cli and reimport after adding git and restart
... View more
06-01-2021
10:07 AM
1 Kudo
if it's csv, use QueryRecordProcessor with csv reader then just do SELECT COUNT(*) FROM FLOWFILE
... View more
06-01-2021
10:05 AM
This is azure error for storage https://social.msdn.microsoft.com/Forums/en-US/a9225a04-7a7c-46f6-ae7b-45168119c0a4/not-able-to-listen-on-azure-eventhostprocessor?forum=servbus
... View more
05-27-2021
07:55 AM
what version of NiFi are you using? There are some bugs with InvokeHttp in older versions. Can you access that url from that machine via curl? There may be a networking or firewall issue. https://www.datainmotion.dev/2021/01/flank-real-time-transit-information-for.html https://www.datainmotion.dev/2021/03/using-cloudera-flow-management-powered.html https://community.cloudera.com/t5/Community-Articles/Real-Time-Stock-Processing-With-Apache-NiFi-and-Apache-Kafka/ta-p/249221 https://community.cloudera.com/t5/Community-Articles/Smart-Stocks-with-FLaNK-NiFi-Kafka-Flink-SQL/ta-p/308223
... View more
05-24-2021
12:02 PM
wrap your SQL in a view / procedure / function or other Database native grouping of statements is smartest. With running multiple sql statements you may want to use Cloudera CDE, Cloudera Machine Learning jobs, YARN Spark Jobs or Airflow. https://www.datainmotion.dev/2019/10/migrating-apache-flume-flows-to-apache_15.html https://www.datainmotion.dev/2020/12/simple-change-data-capture-cdc-with-sql.html
... View more
05-24-2021
11:58 AM
1 Kudo
Use JDK 8 or JDK 11, JDK 9 is not supported JDK 9 issues https://github.com/graphhopper/graphhopper/issues/1391
... View more
05-24-2021
11:56 AM
Some examples https://community.cloudera.com/t5/Support-Questions/How-Extract-text-from-a-multiline-flow-and-create-only-one/td-p/104706 https://nathanlabadie.com/recombining-multiline-logs-with/ https://github.com/tspannhw/EverythingApacheNiFi/blob/main/README.md
... View more
05-21-2021
07:15 AM
Use UpdateRecord (csv reader, json writer) to change date PutDatabaseRecord to save to oracle https://www.datainmotion.dev/2020/12/simple-change-data-capture-cdc-with-sql.html https://www.datainmotion.dev/2019/10/migrating-apache-flume-flows-to-apache_15.html https://www.datainmotion.dev/2021/01/flank-real-time-transit-information-for.html https://www.datainmotion.dev/2020/12/smart-stocks-with-flank-nifi-kafka.html https://github.com/tspannhw/EverythingApacheNiFi https://www.datainmotion.dev/2021/03/processing-fixed-width-and-complex-files.html https://www.datainmotion.dev/2020/06/no-more-spaghetti-flows.html
... View more
05-11-2021
05:58 AM
VMs are not optimal. Run microservices in Kafka Connect, NiFi Stateless, Flink, Spark or Python in CML, Jupyter Notebooks. SQL Stream builder. all good options
... View more
05-10-2021
09:42 AM
2 Kudos
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-prometheus-nar/1.11.1/org.apache.nifi.reporting.prometheus.PrometheusReportingTask/ You will need to open a JIRA. You should not run 3 nifis on the same machine unless it's in K8. NiFi needs a lot of RAM and cores.
... View more
04-30-2021
12:05 PM
what version of NiFi? you can tweak the log level of each processor in the properties. You can also query the stats with reporting tasks you can also access stats via the rest api https://github.com/tspannhw/EverythingApacheNiFi
... View more
04-19-2021
04:40 AM
QueryRecordProcessor grabs fields from content You can also add attributes there You can also add attributes via updaterecord https://www.datainmotion.dev/2020/12/smart-stocks-with-flank-nifi-kafka.html
... View more
04-15-2021
06:28 AM
check disk space http://apache-nifi-users-list.2361937.n4.nabble.com/Clarifications-on-getting-flowfiles-using-FlowFileFilters-td7333.html
... View more
04-15-2021
05:54 AM
1 Kudo
Unfortunately variables cannot be inside other dynamic parameters. It is already rendered by then. What you can do is store this part select * from tbl_sales where load_date= then updateattribute query ${result:append( "'" +${load_date} +"'")}
... View more
04-12-2021
04:39 PM
Add more RAM and CPU to your NiFi server Add more NiFi servers to your cluster. How much is HUGE DATA? You will need some amount of RAM to process it. Make sure minimum JVM RAM on each node is a good size for big workloads. 32G at least.
... View more
04-12-2021
04:37 PM
Put both tables in Kafka topics and have SQL Stream Builder joing them with a simple SQL Join. or https://community.cloudera.com/t5/Support-Questions/Nifi-how-to-sql-join-two-flowfiles/td-p/298227 http://apache-nifi-users-list.2361937.n4.nabble.com/Joining-two-or-more-flow-files-and-merging-the-content-td10543.html https://medium.com/@surajnagendra/merge-csv-files-apache-nifi-21ba44e1b719
... View more
04-12-2021
04:34 PM
use Stateless NiFi https://medium.com/@tspann_38871/exploring-apache-nifi-1-10-parameters-and-stateless-engine-b0815e924938
... View more
04-01-2021
03:51 PM
Move all events to Kafka
... View more
03-18-2021
02:06 PM
NiFi versions are tied to Hive versions so you need to compatible one. Check with your Cloudera team to ge tthe correct version. Using PutHive3Streaming will be faster. So is just PutOrc, PutParquet or PutHDFS.
... View more
03-18-2021
12:23 PM
Ok that's pretty old. PutHiveQL is not the best option, but for now. see: https://issues.apache.org/jira/browse/NIFI-4684 PutHiveStreaming or PutHDFS or PutORC is better and create an external table or PutDatabaseRecord for JDBC I would highly recommend updating to HDF 3.5.2 and HDP 3.1 or CDP as these versions are going to be out of support soon.
... View more
03-18-2021
08:18 AM
https://docs.cloudera.com/cdf-datahub/7.2.7/nifi-hive-ingest/topics/cdf-datahub-nifi-hive-ingest.html PutHive3Streaming is faster and better. https://docs.cloudera.com/cdf-datahub/7.2.7/nifi-hive-ingest/topics/cdf-datahub-hive-ingest-data-target.html What version of Hive? Is this CDH? HDP? You can also do PutORC or convert to ORC and push to HDFS Or push to HDFS as Parquet https://www.datainmotion.dev/2019/10/migrating-apache-flume-flows-to-apache.html Use Record processors, they are easier and MUCH faster. You won't need a split then. https://www.datainmotion.dev/2020/12/simple-change-data-capture-cdc-with-sql.html I recommend using CFM NiFi version 1.11.4 or newer.
... View more
03-15-2021
04:48 PM
NiFi for XML / RSS / REST Feed Ingest I want to retrieve the status from various Cloud providers and services, including Cloudera, AWS, Azure, and Google. I have found many of the available status APIs will return XML/RSS. We love that format for Apache NiFi, so let's do that. Note: If you are doing development in a non-production environment, try the new NiFi 1.13.1. If you need to run your flows in production on-premise, private cloud, or in the public cloud, then use Cloudera Flow Management. I have separated the processing module "Status" from the input, so I can pass in the input anyway I want. When I move this to a K8 environment, this will become a parameter that I pass in. Stay tuned to Cloudera releases. The flow is pretty simple to process RSS status data. We call the status URL and in the next step easily convert RSS into JSON for easier processing. I split these records and grab just the fields I like. I can easily add additional fields from my metadata for unique id, timestamp, company name, and service name. PutKudu will store my JSON records as Kudu fields at high speed. If something goes wrong, we will try again. Sometimes, the internet is down! But, without this app, how will we know??? We can run a QueryRecord processor to query live fields from the status messages and I will send Spark-related ones to my Slack channel. I can add as many ANSI SQL92 Calcite queries as I wish. It's easy. We were easily able to insert all the status messages to our 'cloudstatus' table. Now, we can query it and use it in reports, dashboards, and visual applications. I don't want to have to go to external sites to get the status alerts, so I will post key ones to a Slack channel. I want to store my status reads in a table for fast analytics and permanent storage. So, I will store it in a Kudu table with Impala on top for fast queries. CREATE TABLE cloudstatus ( `uuid` STRING, `ts` TIMESTAMP, `companyname` STRING, `servicename` STRING, `title` STRING, `description` STRING, `pubdate` STRING, `link` STRING, `guid` STRING, PRIMARY KEY (`uuid`,`ts` ) ) PARTITION BY HASH PARTITIONS 4 STORED AS KUDU TBLPROPERTIES ('kudu.num_tablet_replicas' = '1'); My source code is available here. In the next step, I can write some real-time dashboard with Cloudera Visual Apps, add fast queries on Kafka with Flink SQL, or write some machine learning in Cloudera Machine Learning to finish the application. Join my next live video broadcast to suggest what we do with this data next. Thanks for reading!
... View more
Labels: