Member since
07-19-2018
613
Posts
101
Kudos Received
117
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 5092 | 01-11-2021 05:54 AM | |
| 3421 | 01-11-2021 05:52 AM | |
| 8788 | 01-08-2021 05:23 AM | |
| 8383 | 01-04-2021 04:08 AM | |
| 36679 | 12-18-2020 05:42 AM |
12-31-2019
04:25 AM
@mervezeybel this error is caused by sending the mpack command to ambari but without a valid mpack file. Not sure what full command you gave but I have seen this happen before when the url I use is wrong. Couple of things you can do: Try to wget the url to a local file, then adjust command to use the local file. If the mpack url you are using is still not working, try to get it again from my GitHub directly: https://github.com/steven-dfheinz/dfhz_elk_mpack There is a newer version there as well. Make sure you get the correct /raw/ link if using the github links (see sample below). If you just take the /blob/ link straight from the page, it will result in same error you have (not a gzip file). ambari-server install-mpack --mpack=https://github.com/steven-dfheinz/dfhz_elk_mpack/raw/master/elasticsearch_mpack-3.4.0.0-0.tar.gz --verbose
... View more
12-30-2019
01:15 PM
Regex always gives me a go. Knowing if you have a typo, or are completely missing what works and what doesn't is always a big pain. Try working with something like this Regex Tester to get your regex string setup: https://regex101.com
... View more
12-30-2019
09:50 AM
1 Kudo
To get it into a table, you need something like this: <table><tr><td> ${'$1':replace(',','</td><td>'):replace('\n','</td></tr>\n<tr><td>')} </td></tr></table> and it should output an HTML Table like this: <table><tr><td> Release Cause</td><td>Previous Count</td><td>Current Count</td><td>Change_Ratio</td><td>SEVERITY</td> </tr><tr><td>SIP: [487] Request Terminated</td><td>173</td><td>393</td><td>1.271676300578034682</td><td>Critical </td></tr></table> You may have to play with the last </td></tr> depending on if you have an empty line on bottom or not. You may also have to adjust \n to \r\n again depending on the source file actual return chars (not seen they are hidden).
... View more
12-30-2019
09:29 AM
I understand you are not able to figure out the exact ReplaceText syntax that you need. I have quickly created a flow and validated the following works as demonstrated. I had to use \t to match your string: Output: The regex you need is: ${'$1':replace(',','\t')} Table formatting is something entirely different and is not part of "csv" parsing....
... View more
12-30-2019
08:44 AM
1 Kudo
In this article, I am going to explain how you can work with the Schema Registry directly in your NiFi Data Flow. In my previous article Using the Schema Registry API I talk about the work required to expose the API methods needed to Create a Schema Registry Entity and update that Entity with an Avro Schema. In this article, I am going to take it one step further and complete both operations directly in my NiFi Data Flow. I will also include the Use Case I am working on. This flow accepts a CSV Parameter List which contains the columns and data types found within the fields object of the Avro Schema, processes the contents of the file, builds the required Avro Schema and saves it to the Schema Registry Entity I created.
At the time of writing this article I have not handled data types, so in these examples, the schema data types are all strings. In future updates to my flow, I will be mapping different types to appropriate data types for hive queries. For now, this article and the NiFi template will include two main parts:
Create a Schema Registry Entity Via NiFi using InvokeHttp Processor
Update the Created Schema Registry with an Avro Schema created from a Parameter List
The NiFi Template
https://github.com/steven-dfheinz/NiFi-Templates/raw/master/Schema_Registry_Demo.xml
Create a Schema Registry Entity Via NiFi using InvokeHttp Processor
GenerateFlowFile→UpdateAttribute→AttributesToJSON→InvokeHttp
**Note: notice my use of an output port to route failures during testing. While working on my flows I route all relationships like this until a time I decide it is appropriate to auto-terminate or route to error handling process group.
GenerateFlowFile - Very simple here. This proc just starts the flow. I have the Run Schedule at 1 minute so I can start & stop to create only a single flow file through the flow for testing.
UpdateAttribute - Manually create attributes required for Schema Creation:
name
type
schemaGroup
description
evolve
compatibility
AttributesToJSON - Writes Attributes above to flow-file contents.
InvokeHTTP - Executes API Post to Schema Registry URL
**Note: to see specific configuration, please download the template, add to your workspace, and inspect procs.
Update the Created Schema Registry with an Avro Schema created from a Parameter List
HandleHttpRequest→HandleHttpResponse→UpdateAttribute→ConvertCSVToAvro→ConvertAvroToJSON→ExtractText→AttributesToJSON→ReplaceText→UpdateAttribute→AttributesToJSON→InvokeHTTP
**Note: notice my use of an output port to route failures during testing. While working on my flows I route all relationships like this until a time I decide it is appropriate to auto-terminate or route to error handling process group.
HandleHttpRequest - API Endpoint in NiFi Flow necessary to accept POST of Parameter List.
HandleHttpResponse - Sends API Response 200 and closes API Connection.
UpdateAttribute - Manually set some Attributes Required for Avro Schema
type
name
ConvertCSVToAvro - Converts the CSV contents to Avro Schema
ConvertAvroToJSON - Converts Avro to JSON
ExtractText - Grabs content of flow-file into fields attribute
AttributesToJSON - Writes Attributes above to flow-file contents.
ReplaceText - Handles some JSON object formatting requirements
${'$1':unescapeJson():replace('"[','['):replace(']"',']')}
UpdateAttribute - Attributes required for Schema Update:
schemaText
description
AttributesToJSON - Writes Attributes above to flow-file contents.
InvokeHTTP - Executes API Post to Schema Registry URL
**Note: to see specific configuration, please download the template, add to your workspace, and inspect procs.
As always, if you have any questions, or comments feel free to add them below or send me a message. If you need more help getting this template to work, I would be more than happy to help out.
... View more
Labels:
12-30-2019
08:36 AM
In this article I am going to explain how to use the Schema Registry API to Create and Delete a Schema Entity.
I am working on a Use Case for creating schemas with over 500 columns on the fly within a NiFi Data Flow. In order to complete this task I needed to figure out how to create, and update the schemas from outside of the Schema Registry UI. While testing I also needed to delete the test schemas since there is no way to delete them from within the UI.
First, to get the artifacts I need (API urls, and post formats), I use the Schema Registry UI in Chrome with my Developer Tools open to capture what happens when the UI buttons are clicked. When I add a test schema, I notice 2 POST calls.
Creates the Schema Entity (without an actual Schema)
Posts the Schema to the newly created Entity
Using the information in Developer Tools, I then create 2 Postman api calls to duplicate, debug, and validate they will work externally.
** Note: I am working in a single node sandbox (hdf.cloudera.com) hdf cluster without any authorization. If your cluster is secured you will need to satisfy access, authorization, and ssl requirements.
Create Schema Entity
POST: http://hdf.cloudera.com:7788/api/v1/schemaregistry/schemas
BODY:
{"name":"test","type":"avro","schemaGroup":"test","description":"Testing creating a schema from API","evolve":true,"compatibility":"BACKWARD"}
Add Schema To Entity
POST: http://hdf.cloudera.com:7788/api/v1/schemaregistry/schemas/test/versions?branch=MASTER
BODY:
{"schemaText":"{ \"name\" : \"test\", \"type\" : \"record\", \"fields\" : [ { \"name\" : \"column0\", \"type\" : \"string\" }, { \"name\" : \"column1\", \"type\" : \"string\" }, { \"name\" : \"column2\", \"type\" : \"string\" } ]}","description":"Testing creating a schema from API"}
** Note: the string escaped format above is required. If your schema has return lines those should also be replaced with \n.
Delete Schema
DELETE: http://hdf.cloudera.com:7788/api/v1/schemaregistry/schemas/test/
** thanks to @mvalleavila on [ This Thead ] for exposing the simple Delete.
The information above should be enough to get you started using the Schema Registry API via command line, curl, or any other method you prefer. In my next article, I will go into deeper detail in how I fit these concepts into my NiFi Flow.
... View more
Labels:
12-30-2019
08:21 AM
If the output is still comma separated, then replace text did not work. The solution above is how you do it. You may need to experiment with other replacements. For example maybe \t instead of [tab]. Again, the method above is what you need. Here is a similar post that includes some of the other forms of "tab"... https://community.cloudera.com/t5/Support-Questions/How-best-to-replace-all-TABs-t-by-COMMAs-in-the-content-of-a/m-p/225115
... View more
12-30-2019
06:43 AM
2 Kudos
The alert is just indicating you need to enter a value where one is missing. Scroll down inside of the page (not the whole page), and you should be able to find the missing value. Another fast way to get where you want is to search/filter for nifi.toolkit.tls.token which will take you directly to it. Once the alert/error is satisfied the NEXT button on the bottom of the page will work again. If this reply answers your questions please mark this reply as Solution.
... View more
12-30-2019
06:30 AM
A replaceText processor can change from Comma Separated to Tab Separated. This would be the easiest option. Configured with [tab] in Search Value, and [,] in Replacement Value:
... View more
12-30-2019
06:23 AM
1 Kudo
Not working in your Use Case, I tried to show as much of mine as I could, I think you understand the concept. So the next trick is just getting the REGEX matched to your string. Try a tool here: https://regex101.com In this tool I quickly matched REGEX to 4th column: .*?,.*?,.*?,(.*?),.* to: test,test,test,25.0,test,test,test For your example, the above should work. It is not necessary to parse past the column you need, so the .* on end should pickup entire rest of the line.
... View more