Member since
07-19-2018
613
Posts
101
Kudos Received
117
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 5687 | 01-11-2021 05:54 AM | |
| 3812 | 01-11-2021 05:52 AM | |
| 9487 | 01-08-2021 05:23 AM | |
| 9288 | 01-04-2021 04:08 AM | |
| 38601 | 12-18-2020 05:42 AM |
09-10-2020
06:39 AM
I think the most straight forward would be to drop the infer schema into your version of NiFi. The procedure is not that hard, you just have to be surgically careful. The process is explained a bit here in reference to adding parquet jars from new version, into older version. Be sure to read all the comments: https://community.cloudera.com/t5/Support-Questions/Can-I-put-the-NiFi-1-10-Parquet-Record-Reader-in-NiFi-1-9/td-p/286465
... View more
09-10-2020
05:55 AM
@SashankRamaraju In most recent version of NiFi some of the older methods (infer schema) have been left behind. You can certainly add them back in manually (PM if you want specifics). However, the current tools to manage record conversion are definitely preferred and bundle into NiFi out of the box on purpose. To solve your constantly changing csv, I would push back as to why the csv contents are changing. If there was nothing I could do about it upstream, I would create a flow that split the different csvs up based on known schemas. I would process the ones I have schema for and create a holder process group for those that fail. I would monitor failures, create flow branches for new schemas (teaching my flow to be smarter over time). After this kind of evaluation, I would have a clear idea of how much the csv is actually changing. I could now do some upstream actions on each csv to converge them into a single schema before I start processing them in Nifi. For example, if some fields are missing, I could do the work to add them (as empty values) before reading them with a single schema reader. This gets a bit kludgy but I wanted to explain the thought process and evaluation of how to converge the schemas into a single reader. I would likely not do the later, and just split the flow for each csv difference. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
09-10-2020
05:43 AM
1 Kudo
@mike_bronson7 oddly enough, the top post recommendations is this same question you asked 2 years ago: https://community.cloudera.com/t5/Support-Questions/how-to-change-hostnames-on-all-machines-in-ambari-cluster/m-p/217984#M179885 To confirm, the hostname and the domain name are one in the same. The "domain.com" is just part of the hostname. The first part is the subdomain. So the procedure to change "domain.com" is the same as subdomain. HOWEVER, I would highly recommend testing this in a dev cluster before production, especially if you have ssl/kerberos. This is a major change that affects ambari and all components. If the ambari command agains the new json mapping fails, or there is any incomplete action within any agent/component, you are going to bomb the whole cluster. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
09-04-2020
06:25 AM
@P_Rat98 You need parquet tools to read parquet files from command line. There is no method to view parquet in nifi. https://pypi.org/project/parquet-tools/
... View more
09-02-2020
01:37 PM
1 Kudo
@Tokolosk the solution you need is: ${date_posted:format('MM-dd-yyyy hh:mm:ss') } ${date_posted:multiply(1000):format('MM-dd-yyyy hh:mm:ss') } Of course you can experiment with different format... I created a test in a template I use (Working With Timestamps) where I set date_posted to your string value, then 2 conversion tests: If you are getting empty values, then I suspect you have an issue with ${date_posted} not the expression language. Maybe also take a look at using a different attribute name for formatted timestamp. Hope this helps... If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-27-2020
07:14 AM
Yes. Auto-termination is how you drop the whole flowfile and all attributes. If for example, you have a giant attribute of SCHEMA or JSON, using updateAttribute could be used to empty the value. However if you do not even need the flowfile anymore, auto-terminate, versus updateAttribute and retain flowfile.
... View more
08-27-2020
06:56 AM
@Jarinek Yes, this is totally possible in NiFi. Traditionally one method to do this could be something like EvaluateJsonPath to get some payload json values to attributes and then use AttributesToJson to reform a new json object of all the attributes you have created. In newer versions of Nifi you have some other options with QueryRecord, UpdateRecord and the Json Record Readers, Record Writers. I think another option is also custom ExecuteScript and JoltTransform on JSON. My suggestion would be to research NiFi + JSON and begin with some simple examples. Once you have some experience with basic examples, begin to transform them to suit your use case. I also suggest that you try it in different ways before deciding on one. For example, you may build a flow and get it working (evaluateJsonPath), but improve it over time based on new nifi versions and its capabilities (Use JSON Record Reader/Writers). If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-27-2020
06:35 AM
@Jayavardhini Yes, It is a best practice to auto-terminate successfully completed data flow flowfiles at the bottom of your flow branches. You do not want them to remain as they will continue to hold resources. A good NiFi developer will build a flow during development where all bottom branches are visible, including routing all processor relationships, even ones that will eventually be auto terminated. This gives visibility during testing and flow creation if someone unexpected happens. I use stopped output ports for this purpose. In some of my production flows I create capture points for exceptions, these are bottom branch Process Groups or Queues where the flowfiles will remain until someone inspects them, makes a change, inspects the provenance, and maybe even reroutes it back into the flow again. This is the only case where I keep flowfiles in my flow. In all other cases I auto terminate and the flowfiles are gone from my flows. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-27-2020
06:18 AM
@Muffex My recommendation for ingestion process is to always use staging/temporary tables which are managed separately from the master table the data needs to arrive in. This allows you operate on the staging tables before or after those results are added to the master table w/o effecting the master table. In your use case, your ingestion process would sqoop to temp, insert from temp to master table, then drop temp location. In some of my past implementations of this manner, the temp tables were organized hourly, and they stay active for at least 7 days before a decoupled cleanup job removes anything 7 days old. This idea was done for auditing purposes, but normally I would create and destroy the data during the ingestion procedure. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-27-2020
06:08 AM
1 Kudo
@P_Rat98 Creating an API with NiFi using HandleHttpRequest and HandleHttpResponse is something I have done quite a few times for Hortonworks and NiFI Customers. This is a great use case for NiFi and sending receiving JSON to NiFi, processing JSON, and completing actions downstream is super easy. I have created a basic template for you which includes the HandleHttpRequest (inbound port 80 call) a process group for doing something with the JSON, and HandleHttpResponse (provides 200 response code) to respond to inbound call. This is an API in the simplest form with NiFi. Depending on your use case you can build out Process Api Request Process Group to suit your needs. Out of the box you should be able to send to import template, add/start the StandHttpContextMap Controller Service, Start the flow, send a call to http://yourhost:80 and have JSON sitting in the bottom of the flow Success Queue. You can find the template here: https://github.com/steven-matison/NiFi-Templates/blob/master/NiFi_API_with_HandleHttpRequest_Demo.xml Some API suggestions: Be sure to take a look at both HandleHttp Processors for the properties you can configure. Ports, hostname, acceptable methods, ssl, authentication, and more. If your API call does not care if the Process API Request finishes, you can put HandleHttpResponse right after HandleHttpRequest, and let all the downstream work happen after the request/response is completed. This is common when I expect my API to be only giving inbound data, and doesn't care what the response is (other than just 200 to know it was received). In this case I accept the payload, return 200, and rest of the flow is decoupled from the connection. If my processing time is lengthy I usually do this so the system initiating the api call is not left waiting. Once you have the basic framework built, consider handling errors, and or returning different status codes as a variable (created before the response) in the Status Code for Handle Http Response. Sometimes I even have different HandleHttpResponse at end of different flow branches. For example: if someone sends invalid JSON, I return maybe 302 or 404 with the invalid error as the content body. Have fun with it. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more