1973
Posts
1225
Kudos Received
124
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1999 | 04-03-2024 06:39 AM | |
| 3168 | 01-12-2024 08:19 AM | |
| 1725 | 12-07-2023 01:49 PM | |
| 2505 | 08-02-2023 07:30 AM | |
| 3517 | 03-29-2023 01:22 PM |
10-14-2016
02:39 PM
1 Kudo
All the slides are here: http://hadoopsummit.org/melbourne/agenda/ https://www.youtube.com/channel/UCAPa-K_rhylDZAUHVxqqsRA
... View more
10-14-2016
01:24 PM
2 Kudos
NiFI Deploy https://github.com/aperepel/nifi-api-deploy you can have a git script check-in every XML you see in the conf/archive directory or manually save templates when you have work done at the end of a day (and check-in those XML)
... View more
10-14-2016
01:21 PM
we recommend more than 8 GIG, when did you download the sandbox. Getting the newest one may help.
... View more
10-14-2016
01:20 PM
Does NiFi support listening to remote syslog's and not just the one running on it's local machine? I know i could install NiFi or MiniFi on all the machines I want to listen to or write something to forward the syslog via REST or TCP; but does it support rsyslog over a network?
... View more
Labels:
- Labels:
-
Apache MiNiFi
-
Apache NiFi
10-13-2016
11:48 PM
You need to create an SSL context see how to set up SSL here: https://community.hortonworks.com/articles/61180/streaming-ingest-of-google-sheets-into-a-connected.html
... View more
10-13-2016
12:34 AM
check your YARN resources, I think it does not have enough resources to run. Spark may have to wait for Zeppelin and other YARN apps to finish. Note: Zeppelin YARN app keeps running. How did you do spark submit? --master yarn is needed. Is this a Scala job? Check out: https://community.hortonworks.com/content/idea/29810/spark-configuration-best-practices.html Check the Spark logs, YARN UI and Spark History UI
... View more
10-13-2016
12:34 AM
hit refresh look at data provenance you can see numbers in queues if things are still processing
... View more
10-12-2016
08:09 PM
3 Kudos
You could use the DistributedMapCache. https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.PutDistributedMapCache/index.html https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.distributed.cache.server.map.DistributedMapCacheServer/index.html https://community.hortonworks.com/questions/35223/distributedmapcacheclientservice-nifi-wecrawlerxml.html https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.distributed.cache.client.DistributedMapCacheClientService/index.html https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.FetchDistributedMapCache/index.html That's pretty easy since you are just using a date. http://funnifi.blogspot.com/2016/04/inspecting-your-nifi.html I also like storing that in HBase or an RDBMS or a small in-memory database like Redis, Ignite, Geode, but that's more work and another step.
... View more
10-12-2016
03:33 PM
3 Kudos
Often lines of business, individual users or shared teams will use online Google Sheets to share spreadsheet and tabular data amongst teams or without outside vendors. It's quick and easy to add sheets and store your data in Google Drive as spreadsheets. Often you will want to consolidate, federate, analyze, enrich and use this data for reporting and dashboards throughout your organization. An easy way to do that is to read in the data using Google's Sheet API. This is a standard SSL HTTP REST API that returns clean JSON data. I created a simple Google Sheet to test ingesting a Google Sheet with HDF. You will need to enable Google Sheets API in the Google APIs Console. You must be logged into Google and have a Google Account (use the one where you created your Google Spreadsheets). Google Documentation Google provides a few Quick starts that you can use to ingest this data: https://developers.google.com/sheets/quickstart/js or https://developers.google.com/sheets/quickstart/python. I chose to ingest this data the easiest way with a simple REST call from NIFI. Testing Your Queries in Google's API Explorer To test your queries and get your exact URL, go to Google's API Explorer: https://developers.google.com/apis-explorer/#p/sheets/v4/ GET https://sheets.googleapis.com/v4/spreadsheets/1sbMyDocID?includeGridData=true&key=MYKEYISFROMGOOGLE Where 1sb… is the document id that comes from the name you see in your google sheet page like so: https://docs.google.com/spreadsheets/d/1UMyDocumentId/edit#g. Calling the API From HDF 2.0 The one thing you will need is to setup a StandardSSLContextService to read in HTTPS data. You will need to grab the truststore file cacerts for the JRE that NiFi is using to run. By default the Truststore Password is changeit. You really should change it. Once you have an SSL configuration setup, then you can do a GetHTTP. You add in the Sheets GoogleAPI URL that includes the Sheet ID. I also set the User Agent, Accept Content-type and Follow Redirects = True. Now that we have SSL enabled, we can make our call to Google. The flow below is pretty simple. Now that I have ingested the Google Sheet, I can store it as JSON in my data lake. You could process this in HDF many ways including taking out fields, enriching with other data sources, converting to AVRO or ORC, storing in a HIVE table, Phoenix or HBase. You have now ingested Google Sheet data. Determining what you want to do to it and parsing out the JSON is a fun exercise. You can use an EvaluateJsonPath processor in Apache NiFi to pull out fields you want. Inside that processor you add a field and then a value like so $.entities.media[0].media_url that runs JsonPath HDF 2.0 Diagram Overview Reference: https://community.hortonworks.com/articles/59349/hdf-20-flow-for-ingesting-real-time-tweets-from-st.html http://jsonpath.com/ https://blogs.apache.org/nifi/entry/indexing_tweets_with_nifi_and https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.EvaluateJsonPath/ https://community.hortonworks.com/questions/21011/how-i-extract-attribute-from-json-file-using-nifi.html https://jsonpath.curiousconcept.com/ https://developers.google.com/sheets/guides/authorizing https://codelabs.developers.google.com/codelabs/sheets-api/#0 https://developers.google.com/sheets/samples/
... View more
Labels:
10-11-2016
09:55 PM
default JKS/TLS password is changeit
... View more