About stevenmatison

stevenmatison · ‎12-31-2019

@mervezeybel this error is caused by sending the mpack command to ambari but without a valid mpack file. Not sure what full command you gave but I have seen this happen before when the url I use is wrong. Couple of things you can do: Try to wget the url to a local file, then adjust command to use the local file. If the mpack url you are using is still not working, try to get it again from my GitHub directly: https://github.com/steven-dfheinz/dfhz_elk_mpack There is a newer version there as well. Make sure you get the correct /raw/ link if using the github links (see sample below). If you just take the /blob/ link straight from the page, it will result in same error you have (not a gzip file). ambari-server install-mpack --mpack=https://github.com/steven-dfheinz/dfhz_elk_mpack/raw/master/elasticsearch_mpack-3.4.0.0-0.tar.gz --verbose

stevenmatison · ‎12-30-2019

Regex always gives me a go. Knowing if you have a typo, or are completely missing what works and what doesn't is always a big pain. Try working with something like this Regex Tester to get your regex string setup: https://regex101.com

stevenmatison · ‎12-30-2019

To get it into a table, you need something like this: <table><tr><td> ${'$1':replace(',','</td><td>'):replace('\n','</td></tr>\n<tr><td>')} </td></tr></table> and it should output an HTML Table like this: <table><tr><td> Release Cause</td><td>Previous Count</td><td>Current Count</td><td>Change_Ratio</td><td>SEVERITY</td> </tr><tr><td>SIP: [487] Request Terminated</td><td>173</td><td>393</td><td>1.271676300578034682</td><td>Critical </td></tr></table> You may have to play with the last </td></tr> depending on if you have an empty line on bottom or not. You may also have to adjust \n to \r\n again depending on the source file actual return chars (not seen they are hidden).

stevenmatison · ‎12-30-2019

I understand you are not able to figure out the exact ReplaceText syntax that you need. I have quickly created a flow and validated the following works as demonstrated. I had to use \t to match your string: Output: The regex you need is: ${'$1':replace(',','\t')} Table formatting is something entirely different and is not part of "csv" parsing....

stevenmatison · ‎12-30-2019

In this article, I am going to explain how you can work with the Schema Registry directly in your NiFi Data Flow. In my previous article Using the Schema Registry API I talk about the work required to expose the API methods needed to Create a Schema Registry Entity and update that Entity with an Avro Schema. In this article, I am going to take it one step further and complete both operations directly in my NiFi Data Flow. I will also include the Use Case I am working on. This flow accepts a CSV Parameter List which contains the columns and data types found within the fields object of the Avro Schema, processes the contents of the file, builds the required Avro Schema and saves it to the Schema Registry Entity I created. At the time of writing this article I have not handled data types, so in these examples, the schema data types are all strings. In future updates to my flow, I will be mapping different types to appropriate data types for hive queries. For now, this article and the NiFi template will include two main parts: Create a Schema Registry Entity Via NiFi using InvokeHttp Processor Update the Created Schema Registry with an Avro Schema created from a Parameter List The NiFi Template https://github.com/steven-dfheinz/NiFi-Templates/raw/master/Schema_Registry_Demo.xml Create a Schema Registry Entity Via NiFi using InvokeHttp Processor GenerateFlowFile→UpdateAttribute→AttributesToJSON→InvokeHttp **Note: notice my use of an output port to route failures during testing. While working on my flows I route all relationships like this until a time I decide it is appropriate to auto-terminate or route to error handling process group. GenerateFlowFile - Very simple here. This proc just starts the flow. I have the Run Schedule at 1 minute so I can start & stop to create only a single flow file through the flow for testing. UpdateAttribute - Manually create attributes required for Schema Creation: name type schemaGroup description evolve compatibility AttributesToJSON - Writes Attributes above to flow-file contents. InvokeHTTP - Executes API Post to Schema Registry URL **Note: to see specific configuration, please download the template, add to your workspace, and inspect procs. Update the Created Schema Registry with an Avro Schema created from a Parameter List HandleHttpRequest→HandleHttpResponse→UpdateAttribute→ConvertCSVToAvro→ConvertAvroToJSON→ExtractText→AttributesToJSON→ReplaceText→UpdateAttribute→AttributesToJSON→InvokeHTTP **Note: notice my use of an output port to route failures during testing. While working on my flows I route all relationships like this until a time I decide it is appropriate to auto-terminate or route to error handling process group. HandleHttpRequest - API Endpoint in NiFi Flow necessary to accept POST of Parameter List. HandleHttpResponse - Sends API Response 200 and closes API Connection. UpdateAttribute - Manually set some Attributes Required for Avro Schema type name ConvertCSVToAvro - Converts the CSV contents to Avro Schema ConvertAvroToJSON - Converts Avro to JSON ExtractText - Grabs content of flow-file into fields attribute AttributesToJSON - Writes Attributes above to flow-file contents. ReplaceText - Handles some JSON object formatting requirements ${'$1':unescapeJson():replace('"[','['):replace(']"',']')} UpdateAttribute - Attributes required for Schema Update: schemaText description AttributesToJSON - Writes Attributes above to flow-file contents. InvokeHTTP - Executes API Post to Schema Registry URL **Note: to see specific configuration, please download the template, add to your workspace, and inspect procs. As always, if you have any questions, or comments feel free to add them below or send me a message. If you need more help getting this template to work, I would be more than happy to help out.

stevenmatison · ‎12-30-2019

In this article I am going to explain how to use the Schema Registry API to Create and Delete a Schema Entity. I am working on a Use Case for creating schemas with over 500 columns on the fly within a NiFi Data Flow. In order to complete this task I needed to figure out how to create, and update the schemas from outside of the Schema Registry UI. While testing I also needed to delete the test schemas since there is no way to delete them from within the UI. First, to get the artifacts I need (API urls, and post formats), I use the Schema Registry UI in Chrome with my Developer Tools open to capture what happens when the UI buttons are clicked. When I add a test schema, I notice 2 POST calls. Creates the Schema Entity (without an actual Schema) Posts the Schema to the newly created Entity Using the information in Developer Tools, I then create 2 Postman api calls to duplicate, debug, and validate they will work externally. ** Note: I am working in a single node sandbox (hdf.cloudera.com) hdf cluster without any authorization. If your cluster is secured you will need to satisfy access, authorization, and ssl requirements. Create Schema Entity POST: http://hdf.cloudera.com:7788/api/v1/schemaregistry/schemas BODY: {"name":"test","type":"avro","schemaGroup":"test","description":"Testing creating a schema from API","evolve":true,"compatibility":"BACKWARD"} Add Schema To Entity POST: http://hdf.cloudera.com:7788/api/v1/schemaregistry/schemas/test/versions?branch=MASTER BODY: {"schemaText":"{ \"name\" : \"test\", \"type\" : \"record\", \"fields\" : [ { \"name\" : \"column0\", \"type\" : \"string\" }, { \"name\" : \"column1\", \"type\" : \"string\" }, { \"name\" : \"column2\", \"type\" : \"string\" } ]}","description":"Testing creating a schema from API"} ** Note: the string escaped format above is required. If your schema has return lines those should also be replaced with \n. Delete Schema DELETE: http://hdf.cloudera.com:7788/api/v1/schemaregistry/schemas/test/ ** thanks to @mvalleavila on [ This Thead ] for exposing the simple Delete. The information above should be enough to get you started using the Schema Registry API via command line, curl, or any other method you prefer. In my next article, I will go into deeper detail in how I fit these concepts into my NiFi Flow.

stevenmatison · ‎12-30-2019

If the output is still comma separated, then replace text did not work. The solution above is how you do it. You may need to experiment with other replacements. For example maybe \t instead of [tab]. Again, the method above is what you need. Here is a similar post that includes some of the other forms of "tab"... https://community.cloudera.com/t5/Support-Questions/How-best-to-replace-all-TABs-t-by-COMMAs-in-the-content-of-a/m-p/225115

stevenmatison · ‎12-30-2019

The alert is just indicating you need to enter a value where one is missing. Scroll down inside of the page (not the whole page), and you should be able to find the missing value. Another fast way to get where you want is to search/filter for nifi.toolkit.tls.token which will take you directly to it. Once the alert/error is satisfied the NEXT button on the bottom of the page will work again. If this reply answers your questions please mark this reply as Solution.

stevenmatison · ‎12-30-2019

A replaceText processor can change from Comma Separated to Tab Separated. This would be the easiest option. Configured with [tab] in Search Value, and [,] in Replacement Value:

stevenmatison · ‎12-30-2019

Not working in your Use Case, I tried to show as much of mine as I could, I think you understand the concept. So the next trick is just getting the REGEX matched to your string. Try a tool here: https://regex101.com In this tool I quickly matched REGEX to 4th column: .*?,.*?,.*?,(.*?),.* to: test,test,test,25.0,test,test,test For your example, the above should work. It is not necessary to parse past the column you need, so the .* on end should pickup entire rest of the line.

Online	Offline
Last Visited	‎06-01-2022 03:47 PM

Name	Steven Matison
Location	Florida
Member Since	‎07-19-2018 04:45 PM
Last Visited	‎06-01-2022 03:47 PM
Posts	613
Kudos received	101

Cloudera Community

Re: Apache nifi - how to convert a file .txt into ...

Re: Apache Nifi - Using PutParquet, the HDFS file ...

Re: How to extract csv column record and used it f...

Re: Could not connect to Distributed Map Cache ser...

Re: NiFi InvokeHTTP POST JSON

Re: How To Install ELK Stack (6.3.2) in Ambari

Re: Regexp search match multiple values

Re: Nifi || Mail || Display csv files content in t...

Re: Nifi || Mail || Display csv files content in t...

How to work with Schema Registry in your NiFi Flow...

Using the Schema Registry API

Re: Nifi || Mail || Display csv files content in t...

Re: Error shown before deploying hdf cluster with ...

Re: Nifi || Mail || Display csv files content in t...

Re: Use column values of a csv file to route flow ...