Member since
07-19-2018
613
Posts
99
Kudos Received
116
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1566 | 01-11-2021 05:54 AM | |
1004 | 01-11-2021 05:52 AM | |
1956 | 01-08-2021 05:23 AM | |
2257 | 01-04-2021 04:08 AM | |
9657 | 12-18-2020 05:42 AM |
09-28-2020
03:41 PM
@surajnag This is a great question. Here are some of my ideas which I have used in development and production nifi flows. During development route all flow exceptions to an output port. I call these EOL, or End of Line and increment them with #s, like EOL1, EOL2, EOL3. I use these to hold failure, retry, original, no-retry, etc type outbound connections during testing. As I move to production, some of these may be auto terminated, and some may remain. In production, I route ALL major points of failure which are not auto terminated to an output port called ERROR. In my flows I sometimes have multiple ERROR output ports for difference purposes. Depending on the use case, the error is sent to some type of event notification. For Example: send an email, or post to a slack channel. In other cases these are routed to a process group that is in a stopped/disabled state. Based on the error, I may make some change in the flow, and then enable/start the flow to create a method to REPLAY that flowfile. In reference to successful end of flow, I may just auto terminate, as I want to assume the flow is always on, and everything has succeeded if it is not routed as above in #1 or #2. In other cases the bottom or end of the flow is also routed to some type of Event Notification. However, be careful here, as you could create tons of notifications depending on your flow. I hope some of these ideas are helpful. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
09-25-2020
05:27 AM
@Tokolosk JAVA_HOME is not required for nifi to operate. [root@c7401 nifi-1.12.0]# ./bin/nifi.sh start
nifi.sh: JAVA_HOME not set; results may vary
Java home:
NiFi home: /root/nifi-1.12.0
Bootstrap Config File: /root/nifi-1.12.0/conf/bootstrap.conf The only commands in my node are (not ubuntu but should be similar): yum install java java-devel wget http://apache.mirrors.hoobly.com/nifi/1.12.0/nifi-1.12.0-bin.zip unzip nifi-1.12.0-bin.zip cd nifi-1.12.0 ./bin/nifi.sh start tail -f logs/nifi-app.log .... 2020-09-25 12:15:14,828 INFO [main] org.apache.nifi.web.server.JettyServer NiFi has started. The UI is available at the following URLs:
2020-09-25 12:15:14,828 INFO [main] org.apache.nifi.web.server.JettyServer http://192.168.74.101:8080/nifi
2020-09-25 12:15:14,828 INFO [main] org.apache.nifi.web.server.JettyServer http://10.0.2.15:8080/nifi
2020-09-25 12:15:14,828 INFO [main] org.apache.nifi.web.server.JettyServer http://127.0.0.1:8080/nifi If you are having problems, be sure to tail logs/nifi-app.log and look for errors. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
09-24-2020
06:34 AM
1 Kudo
@MKS_AWS There are a few ways to break up JSON within a flowfile (splitJson,QueryRecord). However if its just one giant blob of JSON you may not find that very useful. Perhaps you can share some sample json to that effect. Check out this way to do SNS up to 2gb per payload: https://github.com/awslabs/amazon-sns-java-extended-client-lib If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
09-24-2020
06:16 AM
Check that the user executing sqoop is able to write to the target directory. You may need to create a service user and directory with proper permissions since the hdfs user cannot write to the intended directory. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
09-24-2020
06:01 AM
@wenfeng Check that the user hue is able to write to the target directory. You may need to create a hue user directory with proper permissions if the hue user cannot write to the hive user directory. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
09-24-2020
05:56 AM
@ravi_sh_DS This gets a bit high level, so forgive me, as I am not sure how you know which ID to change and what to change it too. That said, your approach could be to use QueryRecord and find the match you want, then update that match with UpdateRecord. You can also split the json image array with SplitJson, then use UpdateRecord as suggested above. In either method depending on your Use Case when you split the records and process the splits separately you may need to rejoin them downstream. Some older methods useful here are SplitJson, EvaluateJson, UpdateAttribute, AttributeToJson, but the Query Update Records are now preferred as it is possible to do things more dynamically.
... View more
09-23-2020
04:50 PM
@ravi_sh_DS The solution you are looking for is UpdateRecord. Here is a great article with full info: https://community.cloudera.com/t5/Community-Articles/Update-the-Contents-of-FlowFile-by-using-UpdateRecord/ta-p/248267 If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
09-23-2020
04:46 PM
@alex15 I suspect the issue you have is that nifi use is not able to execute the script. Make sure the user ownership of the file is correct, also confirm read/write permissions. In unix/linux these are chown and chmod. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
09-19-2020
05:10 AM
1 Kudo
@Kilynn The solution you are looking for here is QueryRecord using JsonRecordReader. With this QueryRecord processor and record reader/writers configured, you can click + in QueryRecord to create a new key => value property. In the value you can write a SQL like query against the results in the flowfile. For example: DATA_MESSAGE => SELECT logEvents FROM FLOWFILE This will create all of your message objects as individual flowfles directed away from QueryRecord for the relationship "DATA_MESSAGE". From here you can process each message according to your use case. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
09-16-2020
04:58 AM
1. Check on the user permission of the jar file you added to your class path. Make sure jar file is in right place. 2. When you add driver manually also add this to sqoop command --driver = org . postgresql . Driver
... View more
09-15-2020
05:53 AM
@lukolas The example you provide sees to be REGEX not expression language. You would need to test some kind of expression language in that Topic Name(s) property. Another suggestion would be to have a file, or for example generateflowfile, which contains a list of topics. Then split/extract that list into attributes, then send that attribute to the topic name. Having a ton of topics going to single processors can become a bottleneck which creates tons of downstream flowfiles from each topic. So be careful with tuning and concurrency in reference to number and topics and messages per topic. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
09-15-2020
05:22 AM
@CSRabbit Installing the PostgreSQL JDBC Driver Download the PostgreSQL JDBC driver from http://jdbc.postgresql.org/download.html and copy it to the/var/lib/sqoop/ directory. For example: $ curl -L 'http://jdbc.postgresql.org/download/postgresql-9.2-1002.jdbc4.jar' -o postgresql-9.2-1002.jdbc4.jar
$ sudo cp postgresql-9.2-1002.jdbc4.jar /var/lib/sqoop/ Snippet from (reference) for cloudera 5.9x but should be similar for other versions, ambari/hdp, or native sqoop. Just make sure you have the right download url and path for sqoop. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
09-14-2020
01:18 PM
I suspect you have not completed a step, or missing something. The cacerts works for me in all cases if the cert is publicly trusted (standard public cert from public CA) which it should be. You should share info on the configurations you tried and what if any errors you got from that. The bare minimum settings you need for that are keystore (file location), password, key type (jks), and TLS version. Assuming you copied your java cacert file to all nodes as /nifi/ssl/cacerts the controller service properties should look like: If cacerts doesnt work, then you must create keystores and/or trust stores with the public cert. Use the openssl command to get the cert. That command looks like: openssl s_client -connect https://secure.domain.com You can also get it from the browser when you visit the elk interface; for example cluster health, or indexes. Double click cert lock icon in the browser then use the browser's interface to see/view/download public certificate. You need the .cer or .crt file. Then you use the cert to create the keystore with keytool commands. An example is: keytool -import -trustcacerts -alias ambari -file cert.cer -keystore keystore.jks Once you have created a keystore/truststore file you need to copy it to all nifi nodes, ensure the correct ownership, and make sure all the details are correct in the SSL Context Service. Lastly you may need to modify the TLS type until testing works. Here is working example of getting the cert and using it with keytool from a recent use case: echo -n|openssl s_client -connect https://secure.domain.com | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > publiccert.crt keytool -import -file publiccert.crt -alias astra -keystore keyStore.jks -storepass password -noprompt keytool -import -file publiccert.crt -alias astra -keystore trustStore.jks -storepass password -noprompt mkdir -p /etc/nifi/ssl/ cp *.jks /etc/nifi/ssl chown -R nifi:nifi /etc/nifi/ssl/
... View more
09-11-2020
12:24 PM
@Gubbi use this: ListFile -> FetchFile -> ConvertRecord
... View more
09-11-2020
09:25 AM
@HarshR You need to configure your SSLContextService with a keystore/truststore built with the cert you get from the Elasticsearch Cluster. You can also try cacerts that is included with your Java. This is usually easier to do. More details here for cacerts: https://community.cloudera.com/t5/Support-Questions/Connecting-to-DataSift-HTTPS-API-using-NiFi-GetHTTP/td-p/102276 If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
09-11-2020
04:45 AM
@Francesco_Fa If you do not want to use embedded sqlite, the options available for this are: Mysql Postgres Oracle https://docs.gethue.com/administrator/administration/database/ If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
09-11-2020
04:42 AM
@Gubbi The solution you are looking for is ConvertRecord + ParquetRecordReader + CSVRecordWriter. The ParquetRecordReader is included in NiFi 1.10 and up. If you have an older nifi, here is a post where I talk about adding the required jar files to nifi 1.9 (older version doesn't have parquet): https://community.cloudera.com/t5/Support-Questions/Can-I-put-the-NiFi-1-10-Parquet-Record-Reader-in-NiFi-1-9/td-p/286465 Another suggestion: If you are working with nifi and hadoop/hdfs/hive, you could store the raw parquet, create external hive table on parquet, then select results and insert them into similar table of csv format. Then you select the csv table results and create csv file. Also in order to validate/inspect your parquet, or to read the schema (if you need it for controller services) you use parquet tools: https://community.cloudera.com/t5/Community-Articles/Build-and-use-Parquet-tools-to-read-parquet-files/ta-p/248629 If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
09-10-2020
06:39 AM
I think the most straight forward would be to drop the infer schema into your version of NiFi. The procedure is not that hard, you just have to be surgically careful. The process is explained a bit here in reference to adding parquet jars from new version, into older version. Be sure to read all the comments: https://community.cloudera.com/t5/Support-Questions/Can-I-put-the-NiFi-1-10-Parquet-Record-Reader-in-NiFi-1-9/td-p/286465
... View more
09-10-2020
05:55 AM
@SashankRamaraju In most recent version of NiFi some of the older methods (infer schema) have been left behind. You can certainly add them back in manually (PM if you want specifics). However, the current tools to manage record conversion are definitely preferred and bundle into NiFi out of the box on purpose. To solve your constantly changing csv, I would push back as to why the csv contents are changing. If there was nothing I could do about it upstream, I would create a flow that split the different csvs up based on known schemas. I would process the ones I have schema for and create a holder process group for those that fail. I would monitor failures, create flow branches for new schemas (teaching my flow to be smarter over time). After this kind of evaluation, I would have a clear idea of how much the csv is actually changing. I could now do some upstream actions on each csv to converge them into a single schema before I start processing them in Nifi. For example, if some fields are missing, I could do the work to add them (as empty values) before reading them with a single schema reader. This gets a bit kludgy but I wanted to explain the thought process and evaluation of how to converge the schemas into a single reader. I would likely not do the later, and just split the flow for each csv difference. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
09-10-2020
05:43 AM
1 Kudo
@mike_bronson7 oddly enough, the top post recommendations is this same question you asked 2 years ago: https://community.cloudera.com/t5/Support-Questions/how-to-change-hostnames-on-all-machines-in-ambari-cluster/m-p/217984#M179885 To confirm, the hostname and the domain name are one in the same. The "domain.com" is just part of the hostname. The first part is the subdomain. So the procedure to change "domain.com" is the same as subdomain. HOWEVER, I would highly recommend testing this in a dev cluster before production, especially if you have ssl/kerberos. This is a major change that affects ambari and all components. If the ambari command agains the new json mapping fails, or there is any incomplete action within any agent/component, you are going to bomb the whole cluster. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
09-10-2020
05:37 AM
@Francesco_Fa You will need to configure hue with credentials and information for hbase: [hbase]
# Comma-separated list of HBase Thrift servers for clusters in the format of '(name|host:port)'.
# Use full hostname. If hbase.thrift.ssl.enabled in hbase-site is set to true, https will be used otherwise it will use http
# If using Kerberos we assume GSSAPI SASL, not PLAIN.
## hbase_clusters=(Cluster|localhost:9090)
# HBase configuration directory, where hbase-site.xml is located.
## hbase_conf_dir=/etc/hbase/conf
# Hard limit of rows or columns per row fetched before truncating.
## truncate_limit = 500
# Should come from hbase-site.xml, do not set. 'framed' is used to chunk up responses, used with the nonblocking server in Thrift but is not supported in Hue.
# 'buffered' used to be the default of the HBase Thrift Server. Default is buffered when not set in hbase-site.xml.
## thrift_transport=buffered
# Choose whether Hue should validate certificates received from the server.
## ssl_cert_ca_verify=true Once you have the configuration working, you should be able to view hbase within the UI. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
09-04-2020
06:25 AM
@P_Rat98 You need parquet tools to read parquet files from command line. There is no method to view parquet in nifi. https://pypi.org/project/parquet-tools/
... View more
09-04-2020
06:20 AM
@DanMcCray1 Once you have the content from Kafka as a flowfile, your options are not just limited to ExecuteScript. Depending on the type of content you can use the following ideas: EvaluateJsonPath - if the content is a single json, and you need one or more values inside the object then this is an easy way to get those values to attributes. ExtractText - if the content is text or some raw format, extractText allows you to regex match against the content to get values to attributes. QueryRecord w/ Record Readers & Record Writer - this is the most recommended method. Assuming your data has structure (text,csv,json,etc) and/or multiple rows/objects you can define a reader, with schema, output format (record writer), and query the results very effectively. If you indeed want to work with Execute Script you should start here: https://community.cloudera.com/t5/Community-Articles/ExecuteScript-Cookbook-part-1/ta-p/248922 https://community.cloudera.com/t5/Community-Articles/ExecuteScript-Cookbook-part-2/ta-p/249018 https://community.cloudera.com/t5/Community-Articles/ExecuteScript-Cookbook-part-3/ta-p/249148 If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
09-02-2020
01:37 PM
1 Kudo
@Tokolosk the solution you need is: ${date_posted:format('MM-dd-yyyy hh:mm:ss') } ${date_posted:multiply(1000):format('MM-dd-yyyy hh:mm:ss') } Of course you can experiment with different format... I created a test in a template I use (Working With Timestamps) where I set date_posted to your string value, then 2 conversion tests: If you are getting empty values, then I suspect you have an issue with ${date_posted} not the expression language. Maybe also take a look at using a different attribute name for formatted timestamp. Hope this helps... If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-28-2020
08:52 AM
@P_Rat98 The error above is saying there is an issue with the Schema Name in your record reader or writer. When inside the properties for Convert Record, click the --> arrow through to the reader/writer and make sure they are configured correctly. You will need to provide the correct schema name (if it is already an existing attribute) or provide the schema text. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-28-2020
08:49 AM
@ujay The solution you are looking for is the DetectDuplicate processor: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.5.0/org.apache.nifi.processors.standard.DetectDuplicate/index.html This processor is used with Distributed Map Cache Client and Server (Controller Services) to deduplicate a flow based on your criteria. I have a template demo here: https://github.com/steven-matison/NiFi-Templates/blob/master/DetectDuplicate_DistributedMapCache_Demo.xml If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-28-2020
08:47 AM
@P_Rat98 You need to set the filename (Object Key) of each parquet file uniquely to save different S3 files. If that processor is configure to just ${filename} then it will over write additional executions. For the second option, if you have split in your data flow, the split parts should have key/value pair for the split and total splits. Inspect your queue and list attributes on split flowfiles for these attributes. You use these attributes with MergeContent to remerge everything back together into a single flowfile. You need to do this before converting to parquet, not after. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-27-2020
07:14 AM
Yes. Auto-termination is how you drop the whole flowfile and all attributes. If for example, you have a giant attribute of SCHEMA or JSON, using updateAttribute could be used to empty the value. However if you do not even need the flowfile anymore, auto-terminate, versus updateAttribute and retain flowfile.
... View more
08-27-2020
06:56 AM
@Jarinek Yes, this is totally possible in NiFi. Traditionally one method to do this could be something like EvaluateJsonPath to get some payload json values to attributes and then use AttributesToJson to reform a new json object of all the attributes you have created. In newer versions of Nifi you have some other options with QueryRecord, UpdateRecord and the Json Record Readers, Record Writers. I think another option is also custom ExecuteScript and JoltTransform on JSON. My suggestion would be to research NiFi + JSON and begin with some simple examples. Once you have some experience with basic examples, begin to transform them to suit your use case. I also suggest that you try it in different ways before deciding on one. For example, you may build a flow and get it working (evaluateJsonPath), but improve it over time based on new nifi versions and its capabilities (Use JSON Record Reader/Writers). If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-27-2020
06:45 AM
@Mike in Austin Thanks for sharing that link. I had lost the one I book marked last year. Since you guys are both Cloudera Employees, can you comment on enterprise customers with HDP 2.6.5 which cannot or will not be able to upgrade or replace HDP for many years? That December 2020 date is right around the corner. Is Cloudera going to truly stop providing support or are these customers just going to get support elsewhere? I know some customers on 2.6.5 that could still be on that platform as legacy 5+ years from now due to just how long it takes in the government sector to replace/upgrade technology.
... View more