1973
Posts
1225
Kudos Received
124
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1772 | 04-03-2024 06:39 AM | |
2757 | 01-12-2024 08:19 AM | |
1530 | 12-07-2023 01:49 PM | |
2290 | 08-02-2023 07:30 AM | |
3124 | 03-29-2023 01:22 PM |
10-14-2016
02:39 PM
1 Kudo
All the slides are here: http://hadoopsummit.org/melbourne/agenda/ https://www.youtube.com/channel/UCAPa-K_rhylDZAUHVxqqsRA
... View more
10-20-2016
07:53 PM
it worked when i changed my ReplaceText to the format below. i think this has to do with the mime-type urlencoded. remember we cannot send special chars like @ i had to send it as %40 This is how the ReplaceText processor looks.. grant_type=password&client_id=6e880286&client_secret=d12f0f6d41cfe81fcfc122e3fc17a833&username=Saikrishna.Tarapareddy%40purina.nestle.com&password=7heStuhuwa also i had mime.type = application/x-www-form-urlencoded in my updateattribute processor. Thanks you all.
... View more
04-10-2017
12:07 PM
@glupu : I tried all the network settings, I changed to NAT, I changed to Host Only, I changed to Bridged. None works 😞
... View more
08-12-2019
03:23 AM
Nifi_AutoDeploymentScript/ is really helpful in workflow deployment. However looking for more details on 1. controller services 2. reading variables of source process group and deploy only those variables per environment 3. reading json attributes
... View more
10-12-2016
03:33 PM
3 Kudos
Often lines of business, individual users or shared teams will use online Google Sheets to share spreadsheet and tabular data amongst teams or without outside vendors. It's quick and easy to add sheets and store your data in Google Drive as spreadsheets. Often you will want to consolidate, federate, analyze, enrich and use this data for reporting and dashboards throughout your organization. An easy way to do that is to read in the data using Google's Sheet API. This is a standard SSL HTTP REST API that returns clean JSON data. I created a simple Google Sheet to test ingesting a Google Sheet with HDF. You will need to enable Google Sheets API in the Google APIs Console. You must be logged into Google and have a Google Account (use the one where you created your Google Spreadsheets). Google Documentation Google provides a few Quick starts that you can use to ingest this data: https://developers.google.com/sheets/quickstart/js or https://developers.google.com/sheets/quickstart/python. I chose to ingest this data the easiest way with a simple REST call from NIFI. Testing Your Queries in Google's API Explorer To test your queries and get your exact URL, go to Google's API Explorer: https://developers.google.com/apis-explorer/#p/sheets/v4/ GET https://sheets.googleapis.com/v4/spreadsheets/1sbMyDocID?includeGridData=true&key=MYKEYISFROMGOOGLE Where 1sb… is the document id that comes from the name you see in your google sheet page like so: https://docs.google.com/spreadsheets/d/1UMyDocumentId/edit#g. Calling the API From HDF 2.0 The one thing you will need is to setup a StandardSSLContextService to read in HTTPS data. You will need to grab the truststore file cacerts for the JRE that NiFi is using to run. By default the Truststore Password is changeit. You really should change it. Once you have an SSL configuration setup, then you can do a GetHTTP. You add in the Sheets GoogleAPI URL that includes the Sheet ID. I also set the User Agent, Accept Content-type and Follow Redirects = True. Now that we have SSL enabled, we can make our call to Google. The flow below is pretty simple. Now that I have ingested the Google Sheet, I can store it as JSON in my data lake. You could process this in HDF many ways including taking out fields, enriching with other data sources, converting to AVRO or ORC, storing in a HIVE table, Phoenix or HBase. You have now ingested Google Sheet data. Determining what you want to do to it and parsing out the JSON is a fun exercise. You can use an EvaluateJsonPath processor in Apache NiFi to pull out fields you want. Inside that processor you add a field and then a value like so $.entities.media[0].media_url that runs JsonPath HDF 2.0 Diagram Overview Reference: https://community.hortonworks.com/articles/59349/hdf-20-flow-for-ingesting-real-time-tweets-from-st.html http://jsonpath.com/ https://blogs.apache.org/nifi/entry/indexing_tweets_with_nifi_and https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.EvaluateJsonPath/ https://community.hortonworks.com/questions/21011/how-i-extract-attribute-from-json-file-using-nifi.html https://jsonpath.curiousconcept.com/ https://developers.google.com/sheets/guides/authorizing https://codelabs.developers.google.com/codelabs/sheets-api/#0 https://developers.google.com/sheets/samples/
... View more
Labels:
04-06-2017
12:07 PM
1 Kudo
This problem has been happening on our side since many months as well. Both with Spark1 and Spark2. Both while running jobs in the shell as well as in Python notebooks. And it is very easy to reproduce. Just open a notebook and let it run for a couple of hours. Or just do some simple dataframe operations in an infinite loop. There seems to be something fundamentally wrong with the timeout configurations in the core of Spark. We will open a case for that as no matter what kind of configurations we have tried, the problem insists.
... View more
10-07-2016
03:39 PM
3 Kudos
With NiFi 1.00 I ingested a lot of image data from drones, mostly to get metadata like geo. I also ingested a resized version of the image in case I wanted to use it. I found a use for it. I am pulling this out very simply with Spring for a simple HTML page. So I wrote a quick Java program to pull out fields I stored in Phoenix (from the metadata) and I wanted to display the image. I could have streamed it out of HDFS using HDFS libraries to read the file and then stream it. sql = "select datekey, fileName, gPSAltitude, gPSLatitude, gPSLongitude, orientation,geolat,geolong,inception from dronedata1 order by datekey asc";
out.append(STATIC_HEADER);
PreparedStatement ps = connection.prepareStatement(sql);
ResultSet res = ps.executeQuery();
while (res.next()) {
try {
out.append("<br><br>\n<table width=100%><tr><td valign=top><img src=\"");
out.append("http://tspanndev10.field.hortonworks.com:50070/webhdfs/v1/drone/").
append(res.getString("fileName")).append("?op=OPEN\"></td>");
out.append("<td valign=top>Date: ").append(res.getString("datekey"));
out.append("\n<br>Lat: ").append(res.getString("geolat"));
out.append("\n<br>Long: ").append(res.getString("geolong"));
out.append("\n<br><br>\n</td></tr></table>\n");
} catch (Exception e) {
e.printStackTrace();
}
}
It was a lot easier to use the built-in WebHDFS to display an image. Wrapping the Web API call to the image file in an HTML IMG SRC tag loads our image. http://node1:50070/webhdfs/v1/drone/Bebop2_20160920083655-0400.jpg?op=OPEN It's pretty simple and you can use this with a MEAN application, Python Flask or your non-JVM front-end of choice. And now you have a solid distributed host for your images. I recommend this only for internal sites and public images. Having this data publicly available on the cloud is dangerous!
... View more
Labels:
12-07-2016
10:09 PM
1 Kudo
https://github.com/apsaltis/nifi-soap has been updated 7 hours ago with this comment: "Updating to use NiFi 1.1.0" Now it runs without "nifi-app.log:java.lang.NoSuchMethodError: org.apache.nifi.processors.soap.GetSOAP.getLogger()Lorg/apache/nifi/logging/ProcessorLog;" error
... View more
10-05-2016
01:39 PM
5 Kudos
1. Acquire an EDI File (GetFile, GetFTP, GetHTTP, GetSFTP, Fetch...)
2. Install open source nifi-edireader on NIFI 1.0.0
Download https://github.com/BerryWorksSoftware/edireader
Maven Install Berry Works EDIReader
Download https://github.com/mrcsparker/nifi-edireader-bundle
Maven packge nifi-edireader (must be Maven 3.3 or newer - may have to download and install separately from standard linux package)
cp nifi-edireader-nar/target/nifi-edireader-nar-0.0.1.nar to your NIFI/lib
Restart NiFi Service
3. Add EdiXML Processor and connect from EDI File input
4. Add extra processing, conversion or routing (TransformXML with XSLT or EvaluateXPATH) to convert to JSON
5. Land to HDFS (PutHDFS)
6. Used the Web Form linked below to generate a test EDI file.
ISA*00* *00* *ZZ*SENDER ID *ZZ*RECEIVER ID *010101*0101*U*00401*000000001*0*T*!
GS*IN*SENDER ID*APP RECEIVER*01010101*01010101*1*X*004010
ST*810*0001
BIG*20021208*00001**A999
N1*ST*Timothy Spann*9*122334455
N3*115 xxx ave
N4*xxxtown*nj*08520
N1*BT*Hortonworks*9*122334455
N3*5470 GREAT AMERICA PARKWAY
N4*santa clara*CA*95054
ITD*01*3*2**30**30*****60
FOB*PP
IT1**1*EA*200**UA*EAN
PID*F****Lamp
IT1**4*EA*50**UA*EAN
PID*F****Chair
TDS*2000
CAD*****Routing
ISS*30*CA
CTT*50
SE*19*0001
GE*1*1
IEA*1*000000001
7. Converted to XML
<?xml version="1.0" encoding="UTF-8"?>
<ediroot>
<interchange Standard="ANSI X.12"
AuthorizationQual="00"
Authorization=" "
SecurityQual="00"
Security=" "
Date="010101"
Time="0101"
StandardsId="U"
Version="00401"
Control="000000001"
AckRequest="0"
TestIndicator="T">
<sender>
<address Id="SENDER ID " Qual="ZZ"/>
</sender>
<receiver>
<address Id="RECEIVER ID " Qual="ZZ"/>
</receiver>
<group GroupType="IN"
ApplSender="SENDER ID"
ApplReceiver="APP RECEIVER"
Date="01010101"
Time="01010101"
Control="1"
StandardCode="X"
StandardVersion="004010">
<transaction DocType="810" Name="Invoice" Control="0001">
<segment Id="BIG">
<element Id="BIG01">20021208</element>
<element Id="BIG02">00001</element>
<element Id="BIG04">A999</element>
</segment>
<loop Id="N1">
<segment Id="N1">
<element Id="N101">ST</element>
<element Id="N102">Timothy Spann</element>
<element Id="N103">9</element>
<element Id="N104">122334455</element>
</segment>
<segment Id="N3">
<element Id="N301">115 xxx ave</element>
</segment>
<segment Id="N4">
<element Id="N401">xxxstown</element>
<element Id="N402">nj</element>
<element Id="N403">08520</element>
</segment>
</loop>
<loop Id="N1">
<segment Id="N1">
<element Id="N101">BT</element>
<element Id="N102">Hortonworks</element>
<element Id="N103">9</element>
<element Id="N104">122334455</element>
</segment>
<segment Id="N3">
<element Id="N301">5470 GREAT AMERICA PARKWAY</element>
</segment>
<segment Id="N4">
<element Id="N401">santa clara</element>
<element Id="N402">CA</element>
<element Id="N403">95054</element>
</segment>
</loop>
<segment Id="ITD">
<element Id="ITD01">01</element>
<element Id="ITD02">3</element>
<element Id="ITD03">2</element>
<element Id="ITD05">30</element>
<element Id="ITD07">30</element>
<element Id="ITD12">60</element>
</segment>
<segment Id="FOB">
<element Id="FOB01">PP</element>
</segment>
<loop Id="IT1">
<segment Id="IT1">
<element Id="IT102">1</element>
<element Id="IT103">EA</element>
<element Id="IT104">200</element>
<element Id="IT106">UA</element>
<element Id="IT107">EAN</element>
</segment>
<loop Id="PID">
<segment Id="PID">
<element Id="PID01">F</element>
<element Id="PID05">Lamp</element>
</segment>
</loop>
</loop>
<loop Id="IT1">
<segment Id="IT1">
<element Id="IT102">4</element>
<element Id="IT103">EA</element>
<element Id="IT104">50</element>
<element Id="IT106">UA</element>
<element Id="IT107">EAN</element>
</segment>
<loop Id="PID">
<segment Id="PID">
<element Id="PID01">F</element>
<element Id="PID05">Chair</element>
</segment>
</loop>
</loop>
<segment Id="TDS">
<element Id="TDS01">2000</element>
</segment>
<segment Id="CAD">
<element Id="CAD05">Routing</element>
</segment>
<loop Id="ISS">
<segment Id="ISS">
<element Id="ISS01">30</element>
<element Id="ISS02">CA</element>
</segment>
</loop>
<segment Id="CTT">
<element Id="CTT01">50</element>
</segment>
</transaction>
</group>
</interchange>
</ediroot>
Resources
https://github.com/mrcsparker/nifi-edireader-bundle
https://github.com/BerryWorksSoftware/edireader
https://en.wikipedia.org/wiki/Electronic_data_interchange
https://en.wikipedia.org/wiki/EDIFACT
https://en.wikipedia.org/wiki/FORTRAS
http://databene.org/edifatto.html
https://sourceforge.net/projects/edifatto/
https://secure.edidev.net/edidev-ca/samples/vbNetGen/WebFrmNetGen.aspx (Generate example EDI)
... View more
Labels: