About stevenmatison

stevenmatison · ‎09-15-2020

@lukolas The example you provide sees to be REGEX not expression language. You would need to test some kind of expression language in that Topic Name(s) property. Another suggestion would be to have a file, or for example generateflowfile, which contains a list of topics. Then split/extract that list into attributes, then send that attribute to the topic name. Having a ton of topics going to single processors can become a bottleneck which creates tons of downstream flowfiles from each topic. So be careful with tuning and concurrency in reference to number and topics and messages per topic. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎09-15-2020

@CSRabbit Installing the PostgreSQL JDBC Driver Download the PostgreSQL JDBC driver from http://jdbc.postgresql.org/download.html and copy it to the/var/lib/sqoop/ directory. For example: $ curl -L 'http://jdbc.postgresql.org/download/postgresql-9.2-1002.jdbc4.jar' -o postgresql-9.2-1002.jdbc4.jar $ sudo cp postgresql-9.2-1002.jdbc4.jar /var/lib/sqoop/ Snippet from (reference) for cloudera 5.9x but should be similar for other versions, ambari/hdp, or native sqoop. Just make sure you have the right download url and path for sqoop. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎09-14-2020

I suspect you have not completed a step, or missing something. The cacerts works for me in all cases if the cert is publicly trusted (standard public cert from public CA) which it should be. You should share info on the configurations you tried and what if any errors you got from that. The bare minimum settings you need for that are keystore (file location), password, key type (jks), and TLS version. Assuming you copied your java cacert file to all nodes as /nifi/ssl/cacerts the controller service properties should look like: If cacerts doesnt work, then you must create keystores and/or trust stores with the public cert. Use the openssl command to get the cert. That command looks like: openssl s_client -connect https://secure.domain.com You can also get it from the browser when you visit the elk interface; for example cluster health, or indexes. Double click cert lock icon in the browser then use the browser's interface to see/view/download public certificate. You need the .cer or .crt file. Then you use the cert to create the keystore with keytool commands. An example is: keytool -import -trustcacerts -alias ambari -file cert.cer -keystore keystore.jks Once you have created a keystore/truststore file you need to copy it to all nifi nodes, ensure the correct ownership, and make sure all the details are correct in the SSL Context Service. Lastly you may need to modify the TLS type until testing works. Here is working example of getting the cert and using it with keytool from a recent use case: echo -n|openssl s_client -connect https://secure.domain.com | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > publiccert.crt keytool -import -file publiccert.crt -alias astra -keystore keyStore.jks -storepass password -noprompt keytool -import -file publiccert.crt -alias astra -keystore trustStore.jks -storepass password -noprompt mkdir -p /etc/nifi/ssl/ cp *.jks /etc/nifi/ssl chown -R nifi:nifi /etc/nifi/ssl/

stevenmatison · ‎09-11-2020

@Gubbi use this: ListFile -> FetchFile -> ConvertRecord

stevenmatison · ‎09-11-2020

@HarshR You need to configure your SSLContextService with a keystore/truststore built with the cert you get from the Elasticsearch Cluster. You can also try cacerts that is included with your Java. This is usually easier to do. More details here for cacerts: https://community.cloudera.com/t5/Support-Questions/Connecting-to-DataSift-HTTPS-API-using-NiFi-GetHTTP/td-p/102276 If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎09-11-2020

@Francesco_Fa If you do not want to use embedded sqlite, the options available for this are: Mysql Postgres Oracle https://docs.gethue.com/administrator/administration/database/ If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎09-11-2020

@Gubbi The solution you are looking for is ConvertRecord + ParquetRecordReader + CSVRecordWriter. The ParquetRecordReader is included in NiFi 1.10 and up. If you have an older nifi, here is a post where I talk about adding the required jar files to nifi 1.9 (older version doesn't have parquet): https://community.cloudera.com/t5/Support-Questions/Can-I-put-the-NiFi-1-10-Parquet-Record-Reader-in-NiFi-1-9/td-p/286465 Another suggestion: If you are working with nifi and hadoop/hdfs/hive, you could store the raw parquet, create external hive table on parquet, then select results and insert them into similar table of csv format. Then you select the csv table results and create csv file. Also in order to validate/inspect your parquet, or to read the schema (if you need it for controller services) you use parquet tools: https://community.cloudera.com/t5/Community-Articles/Build-and-use-Parquet-tools-to-read-parquet-files/ta-p/248629 If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎09-10-2020

I think the most straight forward would be to drop the infer schema into your version of NiFi. The procedure is not that hard, you just have to be surgically careful. The process is explained a bit here in reference to adding parquet jars from new version, into older version. Be sure to read all the comments: https://community.cloudera.com/t5/Support-Questions/Can-I-put-the-NiFi-1-10-Parquet-Record-Reader-in-NiFi-1-9/td-p/286465

stevenmatison · ‎09-10-2020

@SashankRamaraju In most recent version of NiFi some of the older methods (infer schema) have been left behind. You can certainly add them back in manually (PM if you want specifics). However, the current tools to manage record conversion are definitely preferred and bundle into NiFi out of the box on purpose. To solve your constantly changing csv, I would push back as to why the csv contents are changing. If there was nothing I could do about it upstream, I would create a flow that split the different csvs up based on known schemas. I would process the ones I have schema for and create a holder process group for those that fail. I would monitor failures, create flow branches for new schemas (teaching my flow to be smarter over time). After this kind of evaluation, I would have a clear idea of how much the csv is actually changing. I could now do some upstream actions on each csv to converge them into a single schema before I start processing them in Nifi. For example, if some fields are missing, I could do the work to add them (as empty values) before reading them with a single schema reader. This gets a bit kludgy but I wanted to explain the thought process and evaluation of how to converge the schemas into a single reader. I would likely not do the later, and just split the flow for each csv difference. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎09-10-2020

@mike_bronson7 oddly enough, the top post recommendations is this same question you asked 2 years ago: https://community.cloudera.com/t5/Support-Questions/how-to-change-hostnames-on-all-machines-in-ambari-cluster/m-p/217984#M179885 To confirm, the hostname and the domain name are one in the same. The "domain.com" is just part of the hostname. The first part is the subdomain. So the procedure to change "domain.com" is the same as subdomain. HOWEVER, I would highly recommend testing this in a dev cluster before production, especially if you have ssl/kerberos. This is a major change that affects ambari and all components. If the ambari command agains the new json mapping fails, or there is any incomplete action within any agent/component, you are going to bomb the whole cluster. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

Online	Offline
Last Visited	‎06-01-2022 03:47 PM

Name	Steven Matison
Location	Florida
Member Since	‎07-19-2018 04:45 PM
Last Visited	‎06-01-2022 03:47 PM
Posts	613
Kudos received	101

Cloudera Community

Re: Apache nifi - how to convert a file .txt into ...

Re: Apache Nifi - Using PutParquet, the HDFS file ...

Re: How to extract csv column record and used it f...

Re: Could not connect to Distributed Map Cache ser...

Re: NiFi InvokeHTTP POST JSON

Re: Data processing with multiple Kafka topics -> ...

Re: problem about java.sql.SQLException: No suitab...

Re: Configure StandardSSLContextService for Elast...

Re: Convert Parquet file to CSV using NiFi

Re: Configure StandardSSLContextService for Elast...

Re: Create Hue dashboard from HBase Table

Re: Convert Parquet file to CSV using NiFi

Re: How to generate Avro Schema dynamically (durin...

Re: How to generate Avro Schema dynamically (durin...

Re: hadoop cluster + what are the configuration st...