1973
Posts
1225
Kudos Received
124
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1913 | 04-03-2024 06:39 AM | |
| 3010 | 01-12-2024 08:19 AM | |
| 1642 | 12-07-2023 01:49 PM | |
| 2419 | 08-02-2023 07:30 AM | |
| 3358 | 03-29-2023 01:22 PM |
03-11-2018
01:43 AM
3 Kudos
Big Data DevOps: Part 2: Schemas, Schemas, Schemas. Know Your Records, Know Your DataTypes, Know Your Fields, Know Your Data. Since we can do process records in Apache NiFi, Streaming Analytics Manager, Apache Kafka and any tool that can work with a schema, we have a real need to use a Schema Registry. I have mentioned them before. One thing that is important is to be able to automate the management of schemas. Today we will be listing and exporting them for backup and migration purposes. We will also cover how to upload new schemas and version of schemas. The steps to backup schemas with Apache NiFi 1.5+ is easy. Backup All Schemas
GetHTTP: Get the List of Schemas for SR via GET SplitJson to turn list into individual records EvaluateJsonPath: get the schema name. InvokeHTTP: get the schema body EvaluateJsonPath: turn the schema text into a separate flow file Rename and save both the full JSON record from the registry and the schema only. NiFi Flow Initial Call to List All Schemas Get The Schema Name Example Schema with Text An Example of JSON Schema Text Build a New Flow File from The Schema Text JSON Get the Latest Version of the Schema Text For this Schema By Name The List Returned Swagger Documentation for SR Example Flow backup-schema.xml Schema List JSON Formatting "entities" : [ {
"schemaMetadata" : {
"type" : "avro",
"schemaGroup" : "Kafka",
"name" : "adsb",
"description" : "adsb",
"compatibility" : "BACKWARD",
"validationLevel" : "ALL",
"evolve" : true
},
"id" : 3,
"timestamp" : 1520460239420 Get Schema List REST URL (GET) http://server:7788/api/v1/schemaregistry/schemas Get Schema Body REST URL (GET) http://server:7788/api/v1/schemaregistry/schemas/${schema}/versions/latest?branch=MASTER See: https://community.hortonworks.com/articles/177301/big-data-devops-apache-nifi-flow-versioning-and-au.html If you wish you can use the Confluent style API against SR and against Confluent Schema Registry. it is slighty different, but easy to change our REST calls to process this. Swagger Docs http://YourHWXRegistry:7788/swagger#!/4._Confluent_Schema_Registry_compatible_API/getSubjects Hortonworks Schema Registry from HDF 3.1 https://community.hortonworks.com/articles/171893/hdf-31-executing-apache-spark-via-executesparkinte-1.html
... View more
Labels:
03-09-2018
09:20 PM
I will be adding an article on using the Schema Registry's REST API. If enough people are interested are interested in this feature: https://github.com/Chaffelson/nipyapi/issues/54 It's possible Dan will add it. Also if anyone in the community is interested, this is a good one to start with.
... View more
03-09-2018
03:17 PM
12 Kudos
Another of people have asked at the Future of Data Meetup (https://www.meetup.com/futureofdata-princeton/) how they can automate the deployment, versioning and other operations around Apache NiFi. Fortunately, there is a project that is current, very easy to use and now available in version 0.8. That tool that you need is here: https://github.com/Chaffelson/nipyapi The nipyapi does it all. As you can see in the screenshot above, you can even run it in Apache Zeppelin if it's installed on your HDP cluster. Installation and Setup This is very simple. I have installed in on HDP and HDF nodes in Centos 7 and on my Mac in both version 2.7 and 3.x. You will need Python and PiP to get started. For most people you can quickly install via: pip install nipyapi If not, you can clone the repo and then: pip install -r requirements.txt && pip install -r requirements_dev.txt To quote, @Dan Chaffelson NiPyApi v0.8.0 released!The#Python Client for@apachenifi now supports secured environments, versioning import/export, better templates, better documentation, more demos, and NiFi version backtesting. Release Notes: http://nipyapi.readthedocs.io/en/latest/history.html#id1 In my opinion, the best thing to come out of the UK since Dr. Who. If you have been using an older version, there are some big changes now. So buckle up, update and try out the new demos. Set up some configuration:
nifi_config.host = 'http://localhost:8080/nifi-api'
registry_config.host = 'http://localhost:18080/nifi-registry-api'
from nipyapi.demo.console import *
print (nipyapi.system.get_system_diagnostics())
You can now automate all the things! Script it all. You will be tempted to call these scripts from Apache NiFi as I am, but perhaps we have gone to far then. This may cause some mental recursion that I am not prepared for. I installed and updated this in Python 2 and Python 3. Let's update on Centos 7: pip3 install nipyapi -U
Collecting nipyapi
Downloading nipyapi-0.8.0-py2.py3-none-any.whl (733kB)
100% |████████████████████████████████| 737kB 1.1MB/s
Collecting docker (from nipyapi)
Downloading docker-3.1.1-py2.py3-none-any.whl (121kB)
100% |████████████████████████████████| 122kB 5.9MB/s
Collecting lxml (from nipyapi)
Downloading lxml-4.1.1-cp36-cp36m-manylinux1_x86_64.whl (5.6MB)
100% |████████████████████████████████| 5.6MB 178kB/s
Collecting deepdiff (from nipyapi)
Downloading deepdiff-3.3.0-py3-none-any.whl
Collecting ruamel.yaml==0.14.12 (from nipyapi)
Downloading ruamel.yaml-0.14.12-cp36-cp36m-manylinux1_x86_64.whl (542kB)
100% |████████████████████████████████| 552kB 1.5MB/s
Requirement already up-to-date: requests[security] in /usr/lib/python3.6/site-packages (from nipyapi)
Requirement already up-to-date: urllib3 in /usr/lib/python3.6/site-packages (from nipyapi)
Collecting six (from nipyapi)
Using cached six-1.11.0-py2.py3-none-any.whl
Collecting docker-pycreds>=0.2.2 (from docker->nipyapi)
Downloading docker_pycreds-0.2.2-py2.py3-none-any.whl
Collecting websocket-client>=0.32.0 (from docker->nipyapi)
Downloading websocket_client-0.47.0-py2.py3-none-any.whl (200kB)
100% |████████████████████████████████| 204kB 3.0MB/s
Collecting jsonpickle (from deepdiff->nipyapi)
Downloading jsonpickle-0.9.6.tar.gz (67kB)
100% |████████████████████████████████| 71kB 8.3MB/s
Requirement already up-to-date: chardet<3.1.0,>=3.0.2 in /usr/lib/python3.6/site-packages (from requests[security]->nipyapi)
Requirement already up-to-date: certifi>=2017.4.17 in /usr/lib/python3.6/site-packages (from requests[security]->nipyapi)
Requirement already up-to-date: idna<2.7,>=2.5 in /usr/lib/python3.6/site-packages (from requests[security]->nipyapi)
Collecting pyOpenSSL>=0.14; extra == "security" (from requests[security]->nipyapi)
Downloading pyOpenSSL-17.5.0-py2.py3-none-any.whl (53kB)
100% |████████████████████████████████| 61kB 5.2MB/s
Collecting cryptography>=1.3.4; extra == "security" (from requests[security]->nipyapi)
Downloading cryptography-2.1.4-cp36-cp36m-manylinux1_x86_64.whl (2.2MB)
100% |████████████████████████████████| 2.2MB 409kB/s
Collecting cffi>=1.7; platform_python_implementation != "PyPy" (from cryptography>=1.3.4; extra == "security"->requests[security]->nipyapi)
Downloading cffi-1.11.5-cp36-cp36m-manylinux1_x86_64.whl (421kB)
100% |████████████████████████████████| 430kB 2.5MB/s
Collecting asn1crypto>=0.21.0 (from cryptography>=1.3.4; extra == "security"->requests[security]->nipyapi)
Downloading asn1crypto-0.24.0-py2.py3-none-any.whl (101kB)
100% |████████████████████████████████| 102kB 7.8MB/s
Collecting pycparser (from cffi>=1.7; platform_python_implementation != "PyPy"->cryptography>=1.3.4; extra == "security"->requests[security]->nipyapi)
Downloading pycparser-2.18.tar.gz (245kB)
100% |████████████████████████████████| 256kB 2.5MB/s
Building wheels for collected packages: jsonpickle, pycparser
Running setup.py bdist_wheel for jsonpickle ... done
Stored in directory: /root/.cache/pip/wheels/c8/c5/88/0975a9ef0ae87799d3a4ae54244ca8f76eaaf03395f48a5129
Running setup.py bdist_wheel for pycparser ... done
Stored in directory: /root/.cache/pip/wheels/95/14/9a/5e7b9024459d2a6600aaa64e0ba485325aff7a9ac7489db1b6
Successfully built jsonpickle pycparser
Installing collected packages: six, docker-pycreds, websocket-client, docker, lxml, jsonpickle, deepdiff, ruamel.yaml, nipyapi, pycparser, cffi, asn1crypto, cryptography, pyOpenSSL
Successfully installed asn1crypto-0.24.0 cffi-1.11.5 cryptography-2.1.4 deepdiff-3.3.0 docker-3.1.1 docker-pycreds-0.2.2 jsonpickle-0.9.6 lxml-4.1.1 nipyapi-0.8.0 pyOpenSSL-17.5.0 pycparser-2.18 ruamel.yaml-0.14.12 six-1.11.0 websocket-client-0.47.0
pip2 install nipyapi -U
Collecting nipyapi
Using cached nipyapi-0.8.0-py2.py3-none-any.whl
Collecting lxml (from nipyapi)
Downloading lxml-4.1.1-cp27-cp27mu-manylinux1_x86_64.whl (5.6MB)
100% |████████████████████████████████| 5.6MB 155kB/s
Requirement already up-to-date: requests[security] in /usr/lib/python2.7/site-packages (from nipyapi)
Requirement already up-to-date: urllib3 in /usr/lib/python2.7/site-packages (from nipyapi)
Requirement already up-to-date: deepdiff in /usr/lib/python2.7/site-packages/deepdiff-3.3.0-py2.7.egg (from nipyapi)
Collecting ruamel.yaml==0.14.12 (from nipyapi)
Downloading ruamel.yaml-0.14.12-cp27-cp27mu-manylinux1_x86_64.whl (519kB)
100% |████████████████████████████████| 522kB 1.3MB/s
Requirement already up-to-date: six in /usr/lib/python2.7/site-packages (from nipyapi)
Requirement already up-to-date: docker in /usr/lib/python2.7/site-packages/docker-3.1.1-py2.7.egg (from nipyapi)
Requirement already up-to-date: certifi>=2017.4.17 in /usr/lib/python2.7/site-packages (from requests[security]->nipyapi)
Requirement already up-to-date: chardet<3.1.0,>=3.0.2 in /usr/lib/python2.7/site-packages (from requests[security]->nipyapi)
Requirement already up-to-date: idna<2.7,>=2.5 in /usr/lib/python2.7/site-packages (from requests[security]->nipyapi)
Collecting cryptography>=1.3.4; extra == "security" (from requests[security]->nipyapi)
Downloading cryptography-2.1.4-cp27-cp27mu-manylinux1_x86_64.whl (2.2MB)
100% |████████████████████████████████| 2.2MB 406kB/s
Collecting pyOpenSSL>=0.14; extra == "security" (from requests[security]->nipyapi)
Using cached pyOpenSSL-17.5.0-py2.py3-none-any.whl
Requirement already up-to-date: jsonpickle in /usr/lib/python2.7/site-packages/jsonpickle-0.9.6-py2.7.egg (from deepdiff->nipyapi)
Requirement already up-to-date: ruamel.ordereddict in /usr/lib/python2.7/site-packages/ruamel.ordereddict-0.4.13-py2.7-linux-x86_64.egg (from ruamel.yaml==0.14.12->nipyapi)
Collecting backports.ssl-match-hostname>=3.5 (from docker->nipyapi)
Downloading backports.ssl_match_hostname-3.5.0.1.tar.gz
Requirement already up-to-date: docker-pycreds>=0.2.2 in /usr/lib/python2.7/site-packages/docker_pycreds-0.2.2-py2.7.egg (from docker->nipyapi)
Requirement already up-to-date: ipaddress>=1.0.16 in /usr/lib/python2.7/site-packages/ipaddress-1.0.19-py2.7.egg (from docker->nipyapi)
Requirement already up-to-date: websocket-client>=0.32.0 in /usr/lib/python2.7/site-packages/websocket_client-0.47.0-py2.7.egg (from docker->nipyapi)
Collecting cffi>=1.7; platform_python_implementation != "PyPy" (from cryptography>=1.3.4; extra == "security"->requests[security]->nipyapi)
Downloading cffi-1.11.5-cp27-cp27mu-manylinux1_x86_64.whl (407kB)
100% |████████████████████████████████| 409kB 1.7MB/s
Collecting enum34; python_version < "3" (from cryptography>=1.3.4; extra == "security"->requests[security]->nipyapi)
Downloading enum34-1.1.6-py2-none-any.whl
Collecting asn1crypto>=0.21.0 (from cryptography>=1.3.4; extra == "security"->requests[security]->nipyapi)
Using cached asn1crypto-0.24.0-py2.py3-none-any.whl
Collecting pycparser (from cffi>=1.7; platform_python_implementation != "PyPy"->cryptography>=1.3.4; extra == "security"->requests[security]->nipyapi)
Building wheels for collected packages: backports.ssl-match-hostname
Running setup.py bdist_wheel for backports.ssl-match-hostname ... done
Stored in directory: /root/.cache/pip/wheels/5d/72/36/b2a31507b613967b728edc33378a5ff2ada0f62855b93c5ae1
Successfully built backports.ssl-match-hostname
Installing collected packages: lxml, ruamel.yaml, nipyapi, pycparser, cffi, enum34, asn1crypto, cryptography, pyOpenSSL, backports.ssl-match-hostname
Found existing installation: lxml 3.2.1
Uninstalling lxml-3.2.1:
Successfully uninstalled lxml-3.2.1
Found existing installation: ruamel.yaml 0.15.35
Uninstalling ruamel.yaml-0.15.35:
Successfully uninstalled ruamel.yaml-0.15.35
Found existing installation: nipyapi 0.7.0
Uninstalling nipyapi-0.7.0:
Successfully uninstalled nipyapi-0.7.0
Found existing installation: backports.ssl-match-hostname 3.4.0.2
Uninstalling backports.ssl-match-hostname-3.4.0.2:
Successfully uninstalled backports.ssl-match-hostname-3.4.0.2
Successfully installed asn1crypto-0.24.0 backports.ssl-match-hostname-3.5.0.1 cffi-1.11.5 cryptography-2.1.4 enum34-1.1.6 lxml-4.1.1 nipyapi-0.8.0 pyOpenSSL-17.5.0 pycparser-2.18 ruamel.yaml-0.14.12
I ran this on a HDF server on Centos 7 and got a pretty nice example demo run. python3.6 conso.py
0.8.0
{'system_diagnostics': {'aggregate_snapshot': {'available_processors': 32,
'content_repository_storage_usage': [{'free_space': '226.77 '
'GB',
'free_space_bytes': 243491807232,
'identifier': 'default',
'total_space': '249.99 '
'GB',
'total_space_bytes': 268420476928,
'used_space': '23.22 '
'GB',
'used_space_bytes': 24928669696,
'utilization': '9.0%'}],
'daemon_threads': 141,
'flow_file_repository_storage_usage': {'free_space': '226.77 '
'GB',
'free_space_bytes': 243491807232,
'identifier': None,
'total_space': '249.99 '
'GB',
'total_space_bytes': 268420476928,
'used_space': '23.22 '
'GB',
'used_space_bytes': 24928669696,
'utilization': '9.0%'},
'free_heap': '218.43 MB',
'free_heap_bytes': 229037208,
'free_non_heap': '28.15 MB',
'free_non_heap_bytes': 29513496,
'garbage_collection': [{'collection_count': 0,
'collection_millis': 0,
'collection_time': '00:00:00.000',
'name': 'G1 '
'Old '
'Generation'},
{'collection_count': 5295,
'collection_millis': 362298,
'collection_time': '00:06:02.298',
'name': 'G1 '
'Young '
'Generation'}],
'heap_utilization': '57.0%',
'max_heap': '512 MB',
'max_heap_bytes': 536870912,
'max_non_heap': '-1 bytes',
'max_non_heap_bytes': -1,
'non_heap_utilization': None,
'processor_load_average': 0.54,
'provenance_repository_storage_usage': [{'free_space': '226.77 '
'GB',
'free_space_bytes': 243491807232,
'identifier': 'default',
'total_space': '249.99 '
'GB',
'total_space_bytes': 268420476928,
'used_space': '23.22 '
'GB',
'used_space_bytes': 24928669696,
'utilization': '9.0%'}],
'stats_last_refreshed': '16:08:54 '
'UTC',
'total_heap': '512 MB',
'total_heap_bytes': 536870912,
'total_non_heap': '494.65 MB',
'total_non_heap_bytes': 518676480,
'total_threads': 201,
'uptime': '87:41:10.346',
'used_heap': '293.57 MB',
'used_heap_bytes': 307833704,
'used_non_heap': '466.5 MB',
'used_non_heap_bytes': 489162984,
'version_info': {'build_branch': 'UNKNOWN',
'build_revision': 'd5f3eef',
'build_tag': 'nifi-1.5.0-RC1',
'build_timestamp': '01/30/2018 '
'23:17:15 '
'UTC',
'java_vendor': 'Oracle '
'Corporation',
'java_version': '1.8.0_112',
'ni_fi_version': '1.5.0.3.1.0.0-564',
'os_architecture': 'amd64',
'os_name': 'Linux',
'os_version': '3.10.0-693.17.1.el7.x86_64'}},
'node_snapshots': None}}
INFO:nipyapi.demo.console:Setting up NiPyApi Demo Console
INFO:nipyapi.demo.console:Cleaning up old NiPyApi Console Process Groups
INFO:nipyapi.demo.console:Creating process_group_0 as an empty process group name nipyapi_console_process_group_0
INFO:nipyapi.demo.console:Cleaning up old NiPyApi Console Processors
INFO:nipyapi.demo.console:Creating processor_0 as a new GenerateFlowFile named nipyapi_console_processor_0 in the previous ProcessGroup
INFO:nipyapi.demo.console:Creating reg_client_0 as NiFi Registry Client
INFO:nipyapi.demo.console:Cleaning up old NiPyApi Console Registry Buckets
INFO:nipyapi.demo.console:Creating bucket_0 as new a Registry Bucket named nipyapi_console_bucket_0
INFO:nipyapi.demo.console:Creating bucket_1 as new a Registry Bucket named nipyapi_console_bucket_1
INFO:nipyapi.demo.console:Saving nipyapi_console_process_group_0 as a new Versioned Flow named nipyapi_console_ver_flow_0 in Bucket nipyapi_console_bucket_0, and saving as variable ver_flow_info_0
INFO:nipyapi.demo.console:Creating ver_flow_0 as a copy of the new Versioned Flow object
INFO:nipyapi.demo.console:Creating ver_flow_snapshot_0 as a copy of the new Versioned FlowSnapshot
INFO:nipyapi.demo.console:Creating ver_flow_1 as an empty Versioned Flow stub named nipyapi_console_ver_flow_1 in nipyapi_console_bucket_1
INFO:nipyapi.demo.console:Creating ver_flow_snapshot_1 by cloning the first snapshot nipyapi_console_ver_flow_0 into the new Versioned Flow Stub nipyapi_console_ver_flow_1
INFO:nipyapi.demo.console:Creating ver_flow_raw_0 as a raw Json export of nipyapi_console_ver_flow_0
INFO:nipyapi.demo.console:Creating ver_flow_json_0 as a sorted pretty Json export of nipyapi_console_ver_flow_0
INFO:nipyapi.demo.console:Creating ver_flow_yaml_0 as a sorted pretty Yaml export of nipyapi_console_ver_flow_0
INFO:nipyapi.demo.console:Creating flow_template_0 as a new Template from Process Group nipyapi_console_process_group_0
Demo Console deployed!
Let's do a few activities that might be handy in a typical enterprise Apache NiFi 1.5 (HDF 3.1) environment. DevOps Activities (See: http://nipyapi.readthedocs.io/en/latest/nipyapi-docs/nipyapi.html#module-nipyapi.templates) List All Templates import nipyapi
nipyapi.config.nifi_config.host = 'http://server:9090/nifi-api'
nipyapi.config.registry_config.host = 'http://server:61080/nifi-registry-api'
print(nipyapi.templates.list_all_templates()) Example Output {'bulletins': None,
'id': '21e95277-98ae-38ce-9235-bbf90772cef3',
'permissions': {'can_read': True, 'can_write': True},
'position': None,
'revision': None,
'template': {'description': '',
'encoding_version': None,
'group_id': '7c84501d-d10c-407c-b9f3-1d80e38fe36a',
'id': '21e95277-98ae-38ce-9235-bbf90772cef3',
'name': 'ApacheMxNet Local Processing',
'snippet': None,
'timestamp': '03/07/2018 17:48:06 UTC',
'uri': 'http://princeton1.field.hortonworks.com:9090/nifi-api/templates/21e95277-98ae-38ce-9235-bbf90772cef3'},
'uri': None}]} Upload a Template nipyapi.templates. upload_template (pg_id, template_file) Create a Template From A Process Group nipyapi.templates. create_template (pg_id, name, desc='') Delete A Process Group nipyapi.canvas. delete_process_group (process_group, force=False, refresh=True) List All Process Groups nipyapi.canvas. list_all_process_groups () Create A Processor in A Process Group nipyapi.canvas. create_processor (parent_pg, processor, location, name=None, config=None) Delete a Processor From a Process Group nipyapi.canvas. delete_processor (processor, refresh=True, force=False) Start or Stop a Processor (If possible) nipyapi.canvas. schedule_processor (processor, scheduled, refresh=True) Update a Key-Value Pair in Any Variable Registry nipyapi.canvas. update_variable_registry (process_group, update) You Can Create a Registry Client on the NiFi Server nipyapi.versioning. create_registry_client (name, uri, description) Check Existing Registry Clients (More common use case, admin already connected one or more) print(nipyapi.versioning.list_registry_clients()) Example Output {'registries': [{'bulletins': None,
'component': {'description': 'NiPyApi Demo Console',
'id': 'adc0435b-3eb5-1e1a-ffff-fffff521cba3',
'name': 'nipyapi_console_reg_client_0',
'uri': 'http://princeton1.field.hortonworks.com:61080/'},
'id': 'adc0435b-3eb5-1e1a-ffff-fffff521cba3',
'permissions': {'can_read': True, 'can_write': True},
'position': None,
'revision': {'client_id': 'adc0439f-3eb5-1e1a-6ca0-baadd98fcdf6',
'last_modifier': None,
'version': 2},
'uri': 'http://princeton1.field.hortonworks.com:9090/nifi-api/controller/registry-clients/adc0435b-3eb5-1e1a-ffff-fffff521cba3'}]} Get Version Registry Buckets print(nipyapi.versioning.list_registry_buckets()) Create a New Version Registry Bucket ipyapi.versioning. create_registry_bucket (name) Save a Flow Version to Registry*** rgcid = 'nipyapi_console_reg_client_0'
registry_client = nipyapi.versioning.get_registry_client(rgcid, identifier_type='name')
identifier = 'nipyapi_console_process_group_0'
process_group = nipyapi.canvas.get_process_group(identifier, identifier_type='name')
bucketid = 'meetup'
bucket = nipyapi.versioning.get_registry_bucket(bucketid, identifier_type='name')
print(nipyapi.versioning.save_flow_ver(process_group, registry_client, bucket, flow_name='built by python', flow_id=None, comment='automated', desc='automated', refresh=True)) You grab a process group, pick up a bucket and save your verison there. Example Output {'process_group_revision': {'client_id': 'adc045ac-3eb5-1e1a-9861-68c932e967e9',
'last_modifier': None,
'version': 3},
'version_control_information': {'bucket_id': 'd80946af-64be-471a-baf8-d85fa98d0a19',
'bucket_name': 'meetup',
'flow_description': 'automated',
'flow_id': '64f37030-6c0f-4272-b16c-cb8a0119e776',
'flow_name': 'built by python',
'group_id': 'e2794288-daa1-173a-ffff-ffffdadbf3f8',
'registry_id': 'adc0435b-3eb5-1e1a-ffff-fffff521cba3',
'registry_name': 'nipyapi_console_reg_client_0',
'state': 'UP_TO_DATE',
'state_explanation': 'Flow version is current',
'version': 1}} Export a Registry Flow Version nipyapi.versioning. export_flow_version (bucket_id, flow_id, version=None, file_path=None, mode='json') Get The Variable Registry From a Process Group import nipyapi
nipyapi.config.nifi_config.host = 'http://server:9090/nifi-api'
nipyapi.config.registry_config.host = 'http://server:61080/nifi-registry-api'
identifier = 'nipyapi_console_process_group_0'
process_group = nipyapi.canvas.get_process_group(identifier, identifier_type='name')
print(nipyapi.canvas.get_variable_registry(process_group, ancestors=True)) Example Output {'process_group_revision': {'client_id': 'adc045ac-3eb5-1e1a-9861-68c932e967e9',
'last_modifier': None,
'version': 2},
'variable_registry': {'process_group_id': 'e2794288-daa1-173a-ffff-ffffdadbf3f8',
'variables': [{'can_write': True,
'variable': {'affected_components': [],
'name': 'testsqlconnection',
'process_group_id': 'e2794288-daa1-173a-ffff-ffffdadbf3f8',
'value': 'jdbc=fun:8080'}}]}} Export a Template As a String t_id='21e95277-98ae-38ce-9235-bbf90772cef3'
print(nipyapi.templates.export_template(t_id, output='string')) You can also export as a file, as you can imagine if you combined this with some Python manipulation magic and some deep learning, you can take a template, back it up, change it, add to it, upload it, deploy it and version it. As you can imagine, you can do things like upload templates, move them to/from the Apache NiFi Registry and other useful ops. Get Apache NiFi Version Info print(nipyapi.system.get_nifi_version_info()) Example Output {'build_branch': 'UNKNOWN',
'build_revision': 'd5f3eef',
'build_tag': 'nifi-1.5.0-RC1',
'build_timestamp': u'01/30/2018 23:17:15 UTC',
'java_vendor': 'Oracle Corporation',
'java_version': '1.8.0_112',
'ni_fi_version': '1.5.0.3.1.0.0-564',
'os_architecture': 'amd64',
'os_name': 'Linux',
'os_version': '3.10.0-693.17.1.el7.x86_64'} Get Cluster Info (Experimental) print(nipyapi.system.get_cluster()) Example Output {'cluster': {'generated': '17:57:41 UTC',
'nodes': [{'active_thread_count': 5,
'address': 'princeton1.field.hortonworks.com',
'api_port': 9090,
'connection_requested': None,
'events': [{'category': 'INFO',
'message': 'Received first heartbeat from connecting node. Node connected.',
'timestamp': '03/06/2018 00:33:58 UTC'},
{'category': 'INFO',
'message': 'Received heartbeat but ignoring because it was reported before the node was last asked to reconnect.',
'timestamp': '03/06/2018 00:33:53 UTC'},
{'category': 'INFO',
'message': 'Connection requested from existing node. Setting status to connecting.',
'timestamp': '03/06/2018 00:33:48 UTC'},
{'category': 'INFO',
'message': 'Requesting that node connect to cluster',
'timestamp': '03/06/2018 00:33:48 UTC'},
{'category': 'INFO',
'message': 'Received heartbeat from node previously disconnected due to Has Not Yet Connected to Cluster. Issuing reconnection request.',
'timestamp': '03/06/2018 00:33:48 UTC'}],
'heartbeat': '03/09/2018 17:57:38 UTC',
'node_id': '1b6eb3aa-add8-476d-a91a-5b500f6515f1',
'node_start_time': '03/06/2018 00:28:21 UTC',
'queued': '3,865 / 12.9 MB',
'roles': ['Primary Node', 'Cluster Coordinator'],
'status': 'CONNECTED'}]}} For the Really Brave! Purge it! nipyapi.canvas. purge_connection (con_id) nipyapi.canvas. purge_process_group (process_group, stop=False) nipyapi.canvas. delete_process_group (process_group, force=True, refresh=True) There is a large security section you should check out for your secure environment: http://nipyapi.readthedocs.io/en/latest/nipyapi-docs/nipyapi.html#module-nipyapi.security One of the cool things that almost goes without saying is that all data is returned as JSON which is of course easy to work with in Python as well as in Apache NiFi and most other tools. Resources http://nipyapi.readthedocs.io/en/latest/readme.html https://community.hortonworks.com/articles/167364/nifi-sdlc-automation-in-python-with-nipyapi-part-1.html http://nipyapi.readthedocs.io/en/latest/nipyapi-docs/nipyapi.html#module-nipyapi.templates Other Apache NiFi DevOps Options https://github.com/pvillard31/nifi-api-client-python https://github.com/jdye64/nifi-shell This one is new and I will be investigating this soon: https://github.com/apache/nifi/tree/master/nifi-toolkit/nifi-toolkit-cli
... View more
Labels:
03-09-2018
02:55 PM
We could enhance this API to allow for inserts and updates. We could also use https://projects.spring.io/spring-data-rest/
... View more
03-09-2018
12:37 PM
There you go, MiniFi is not able to connect to NiFi . ConnectException: Connection refuse is this the correct server 192.168.2.3 is that the correct IP and port? do you have NiFi setup to receive remote? what is the url you access NiFi with? https://community.hortonworks.com/articles/88473/site-to-site-communication-between-secured-https-a.html With NiFi 1.5 you can only have one domain or IP to point to. Make sure you follow the setup directions here: https://nifi.apache.org/minifi/getting-started.html Configuring NiFi to Receive Data First, check that $NIFI_HOME/conf/nifi.properties has a Site to Site input socket port specified: # Site to Site properties
nifi.remote.input.host=
nifi.remote.input.secure=false
nifi.remote.input.socket.port=1026
nifi.remote.input.http.enabled=true
nifi.remote.input.http.transaction.ttl=30 sec Restart NiFi so changes take effect. https://community.hortonworks.com/articles/56341/getting-started-with-minifi.html
... View more
03-09-2018
02:32 AM
2 Kudos
When I first joined Hortonworks I wrote an article on integrating Apache Hive and Spring Boot since I came from Pivotal and was a huge fan of Spring Boot. It's been almost 1.5 years and Spring Boot is now in 2.0 and Apache Hive is now in LLAP with ACID transactional tables. So it's time for a remix! It is crazy easy to start your project, just head to https://start.spring.io/ and build a bomb POM and let's being. You will need to modify the built POM.XML (Maven) to include some Hadoop and Hive references. No big deal. Clone it then own it: git clone https://github.com/tspannhw/hivereader.git You can build the code via build.sh. mvn install Then run it via run.sh (change HDP version to match yours). java -Xms512m -Xmx2048m -Dhdp.version=2.6.4 -Djava.net.preferIPv4Stack=true -jar target/hivereader-0.0.1-SNAPSHOT.jar You must have the Java JDK, Git and Maven installed on your machine. If you don't why are you running Java programs. Obviously this can be inside a docker container or virtual environment. Maven Build Script <?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.dataflowdeveloper</groupId>
<artifactId>hivereader</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>hivereader</name>
<description>Spring Boot 2.0 Apache Hive ACID Reader</description>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.0.0.RELEASE</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
<java.version>1.8</java.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-rest</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-jdbc</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.kafka</groupId>
<artifactId>spring-kafka</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.springframework.restdocs</groupId>
<artifactId>spring-restdocs-mockmvc</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>1.2.1</version>
<exclusions>
<exclusion>
<groupId>org.eclipse.jetty.aggregate</groupId>
<artifactId>*</artifactId>
</exclusion>
<exclusion>
<artifactId>slf4j-log4j12</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
<exclusion>
<artifactId>log4j</artifactId>
<groupId>log4j</groupId>
</exclusion>
<exclusion>
<artifactId>servlet-api</artifactId>
<groupId>javax.servlet</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-jdbc</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.3</version>
<type>jar</type>
<exclusions>
<exclusion>
<artifactId>slf4j-log4j12</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
<exclusion>
<artifactId>log4j</artifactId>
<groupId>log4j</groupId>
</exclusion>
<exclusion>
<artifactId>servlet-api</artifactId>
<groupId>javax.servlet</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.3</version>
<exclusions>
<exclusion>
<artifactId>servlet-api</artifactId>
<groupId>javax.servlet</groupId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.2.1</version>
<type>jar</type>
<exclusions>
<exclusion>
<artifactId>servlet-api</artifactId>
<groupId>javax.servlet</groupId>
</exclusion>
<exclusion>
<artifactId>slf4j-log4j12</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
<exclusion>
<artifactId>log4j</artifactId>
<groupId>log4j</groupId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project> You can update java.version to 1.9 or 1.10 if you are running those. I have it set to hive-jdbc 1.2.1, Spring Boot 2.0.0.RELEASE and hadoop-client 2.7.3. When you have newer versions, update these and rebuild. Running the Spring Boot Microservices JAR Example JSON Returned by REST Call Let's Call the Spring Boot 2.0 REST API Let's build a schema for this data Let's examine the data in Apache Zeppelin Now that we have the data, let's query it with Apache Calcite and do some more processing... Add your own next step The Main Apache NiFi Flow to Ingest the Spring Boot REST API Let's use the QueryRecord Processor to Query our Records and only look at ones with a good Top 1 Percent. Let's look at some Data Provenance Let's Create a Table on ACID!!!! application.properties server.port = 9999 querylimit=25 hiveuri=jdbc:hive2://server:10000/default hiveusername=root hivepassword= If you have kerberos, there's some funky things to do. I leave this to the experts. Change the Hive URI to match your server, port and database. The server.port is the port you want the Spring Boot microservice to run on. Obviously you can run this through Apache YARN, Dockerized Containers or Kubernetes. In the next sequel, I will show this running dockerized on Apache Hadoop 3.0 on YARN. Inception Bean package com.dataflowdeveloper.hivereader;
/** inception **/public class Inception { private String top1pct; private String top1; private String top2pct; private String top2; private String top3pct; private String top3; private String top4pct; private String top4; private String top5pct; private String top5; private String imagefilename;
public String getTop1pct(){ return top1pct; }
public void setTop1pct(String top1pct){ this.top1pct=top1pct; }
public String getTop1(){ return top1; }
public void setTop1(String top1){ this.top1=top1; }
public String getTop2pct(){ return top2pct; }
public void setTop2pct(String top2pct){ this.top2pct=top2pct; }
public String getTop2(){ return top2; }
public void setTop2(String top2){ this.top2=top2; }
public String getTop3pct(){ return top3pct; }
public void setTop3pct(String top3pct){ this.top3pct=top3pct; }
public String getTop3(){ return top3; }
public void setTop3(String top3){ this.top3=top3; }
public String getTop4pct(){ return top4pct; }
public void setTop4pct(String top4pct){ this.top4pct=top4pct; }
public String getTop4(){ return top4; }
public void setTop4(String top4){ this.top4=top4; }
public String getTop5pct(){ return top5pct; }
public void setTop5pct(String top5pct){ this.top5pct=top5pct; }
public String getTop5(){ return top5; }
public void setTop5(String top5){ this.top5=top5; }
public String getImagefilename(){ return imagefilename; }
public void setImagefilename(String imagefilename){ this.imagefilename=imagefilename; }}
DataController package com.dataflowdeveloper.hivereader;
import java.util.List;
import javax.servlet.http.HttpServletRequest;
import org.slf4j.Logger;import org.slf4j.LoggerFactory;import org.springframework.beans.factory.annotation.Autowired;import org.springframework.util.Assert;import org.springframework.web.bind.annotation.PathVariable;import org.springframework.web.bind.annotation.RequestMapping;import org.springframework.web.bind.annotation.RestController;import org.springframework.web.context.request.RequestAttributes;import org.springframework.web.context.request.RequestContextHolder;import org.springframework.web.context.request.ServletRequestAttributes;
/** * * @author tspann * */@RestControllerpublic class DataController {
public static HttpServletRequest getCurrentRequest() { RequestAttributes requestAttributes = RequestContextHolder.getRequestAttributes(); Assert.state(requestAttributes != null, "Could not find current request via RequestContextHolder"); Assert.isInstanceOf(ServletRequestAttributes.class, requestAttributes); HttpServletRequest servletRequest = ((ServletRequestAttributes) requestAttributes).getRequest(); Assert.state(servletRequest != null, "Could not find current HttpServletRequest"); return servletRequest; } Logger logger = LoggerFactory.getLogger(DataController.class);
@Autowired private DataSourceService dataSourceService; @RequestMapping("/query/{query}") public List<Inception> query( @PathVariable(value="query") String query) { List<Inception> value = dataSourceService.search(query); final String userIpAddress = getCurrentRequest().getRemoteAddr(); final String userAgent = getCurrentRequest().getHeader("user-agent"); final String userDisplay = String.format("Query:%s,IP:%s Browser:%s", query, userIpAddress, userAgent); logger.error(userDisplay); return value; }}
HivereaderApplication We need a class to setup the Hive datasource and initial the Spring Application. This is the only boiler plate and it's pretty minimal. Great job Spring people! DataSourceService Get the datasource, the limit to the query and then our one search method. It takes a query parameter received from the REST API and sets that in a few places to query all of our text results and adds the limit. We add that to our handy object and put it in a list. There are tons of helpers and short cuts to make this easier, but this works for me. Close everything up and return this list. Spring Boot returns this as a JSON array of clean data. This is very easy to work with in Apache NiFi. This makes for a nice combination. Running the Raw REST API http://localhost:9999/time/now
{"time":"1520548296121"} http://localhost:9999/query/ice [{"top1pct":"16.5000006557","top1":"n04270147 spatula","top2pct":"12.3999997973","top2":"n03494278 harmonica, mouth organ, harp, mouth harp","top3pct":"3.79999987781","top3":"n07615774 ice lolly, lolly, lollipop, popsicle","top4pct":"2.89999991655","top4":"n04370456 sweatshirt","top5pct":"2.60000005364","top5":"n03891332 parking meter","imagefilename":"images/tx1_image_img_20180308170314.jpg"},{"top1pct":"8.50000008941","top1":"n04270147 spatula","top2pct":"6.80000036955","top2":"n02823428 beer bottle","top3pct":"4.50000017881","top3":"n03902125 pay-phone, pay-station","top4pct":"3.90000008047","top4":"n07614500 ice cream, icecream","top5pct":"3.59999984503","top5":"n03657121 lens cap, lens cover","imagefilename":"images/tx1_image_img_20180308170832.jpg"},{"top1pct":"11.2000003457","top1":"n04270147 spatula","top2pct":"5.20000010729","top2":"n03814639 neck brace","top3pct":"4.69999983907","top3":"n07615774 ice lolly, lolly, lollipop, popsicle","top4pct":"4.10000011325","top4":"n03494278 harmonica, mouth organ, harp, mouth harp","top5pct":"3.79999987781","top5":"n02823428 beer bottle","imagefilename":"images/tx1_image_img_20180308165958.jpg"},{"top1pct":"8.90000015497","top1":"n02823428 beer bottle","top2pct":"5.70000000298","top2":"n03902125 pay-phone, pay-station","top3pct":"5.20000010729","top3":"n04525305 vending machine","top4pct":"4.30000014603","top4":"n04200800 shoe shop, shoe-shop, shoe store","top5pct":"4.30000014603","top5":"n07614500 ice cream, icecream","imagefilename":"images/tx1_image_img_20180308171043.jpg"},{"top1pct":"14.8000001907","top1":"n04270147 spatula","top2pct":"7.50000029802","top2":"n04579432 whistle","top3pct":"4.80000004172","top3":"n07614500 ice cream, icecream","top4pct":"4.60000000894","top4":"n02883205 bow tie, bow-tie, bowtie","top5pct":"4.50000017881","top5":"n03494278 harmonica, mouth organ, harp, mouth harp","imagefilename":"images/tx1_image_img_20180308173619.jpg"},{"top1pct":"16.5999993682","top1":"n04111531 rotisserie","top2pct":"8.69999974966","top2":"n04270147 spatula","top3pct":"3.59999984503","top3":"n07614500 ice cream, icecream","top4pct":"3.5000000149","top4":"n03666591 lighter, light, igniter, ignitor","top5pct":"2.70000007004","top5":"n03902125 pay-phone, pay-station","imagefilename":"images/tx1_image_img_20180308152145.jpg"},{"top1pct":"24.0999996662","top1":"n02667093 abaya","top2pct":"4.50000017881","top2":"n07614500 ice cream, icecream","top3pct":"3.29999998212","top3":"n02823750 beer glass","top4pct":"2.89999991655","top4":"n02883205 bow tie, bow-tie, bowtie","top5pct":"2.40000002086","top5":"n04584207 wig","imagefilename":"images/tx1_image_img_20180307214015.jpg"},{"top1pct":"6.40000030398","top1":"n04370456 sweatshirt","top2pct":"5.4999999702","top2":"n07614500 ice cream, icecream","top3pct":"5.40000014007","top3":"n04229816 ski mask","top4pct":"4.69999983907","top4":"n03724870 mask","top5pct":"4.39999997616","top5":"n04270147 spatula","imagefilename":"images/tx1_image_img_20180308151828.jpg"},{"top1pct":"8.60000029206","top1":"n04579432 whistle","top2pct":"6.19999989867","top2":"n04270147 spatula","top3pct":"5.99999986589","top3":"n07614500 ice cream, icecream","top4pct":"5.99999986589","top4":"n03494278 harmonica, mouth organ, harp, mouth harp","top5pct":"5.90000003576","top5":"n07880968 burrito","imagefilename":"images/tx1_image_img_20180308151620.jpg"},{"top1pct":"22.6999998093","top1":"n02667093 abaya","top2pct":"5.29999993742","top2":"n03347037 fire screen, fireguard","top3pct":"3.20000015199","top3":"n04070727 refrigerator, icebox","top4pct":"3.20000015199","top4":"n04179913 sewing machine","top5pct":"3.09999994934","top5":"n04532106 vestment","imagefilename":"images/tx1_image_img_20180307214952.jpg"},{"top1pct":"7.99999982119","top1":"n02667093 abaya","top2pct":"5.60000017285","top2":"n02823428 beer bottle","top3pct":"3.59999984503","top3":"n07614500 ice cream, icecream","top4pct":"3.09999994934","top4":"n02883205 bow tie, bow-tie, bowtie","top5pct":"3.09999994934","top5":"n04584207 wig","imagefilename":"images/tx1_image_img_20180307213659.jpg"},{"top1pct":"83.0999970436","top1":"n02667093 abaya","top2pct":"2.80000008643","top2":"n03045698 cloak","top3pct":"1.30000002682","top3":"n02977058 cash machine, cash dispenser, automated teller machine, automatic teller machine, automated teller, automatic teller, ATM","top4pct":"1.20000001043","top4":"n04243546 slot, one-armed bandit","top5pct":"0.600000005215","top5":"n04070727 refrigerator, icebox","imagefilename":"images/tx1_image_img_20180308141123.jpg"},{"top1pct":"9.00000035763","top1":"n07614500 ice cream, icecream","top2pct":"7.69999995828","top2":"n04376876 syringe","top3pct":"5.79999983311","top3":"n04270147 spatula","top4pct":"3.20000015199","top4":"n07880968 burrito","top5pct":"2.89999991655","top5":"n07695742 pretzel","imagefilename":"images/tx1_image_img_20180307213805.jpg"},{"top1pct":"25.9000003338","top1":"n02667093 abaya","top2pct":"4.80000004172","top2":"n02883205 bow tie, bow-tie, bowtie","top3pct":"3.90000008047","top3":"n02786058 Band Aid","top4pct":"3.20000015199","top4":"n07614500 ice cream, icecream","top5pct":"2.60000005364","top5":"n04229816 ski mask","imagefilename":"images/tx1_image_img_20180307213911.jpg"},{"top1pct":"9.20000001788","top1":"n02786058 Band Aid","top2pct":"7.59999975562","top2":"n04229816 ski mask","top3pct":"5.79999983311","top3":"n02883205 bow tie, bow-tie, bowtie","top4pct":"5.70000000298","top4":"n04370456 sweatshirt","top5pct":"4.69999983907","top5":"n07614500 ice cream, icecream","imagefilename":"images/tx1_image_img_20180308163321.jpg"},{"top1pct":"9.7000002861","top1":"n04270147 spatula","top2pct":"5.40000014007","top2":"n02786058 Band Aid","top3pct":"4.80000004172","top3":"n04356056 sunglasses, dark glasses, shades","top4pct":"4.39999997616","top4":"n03250847 drumstick","top5pct":"3.79999987781","top5":"n07615774 ice lolly, lolly, lollipop, popsicle","imagefilename":"images/tx1_image_img_20180308161932.jpg"},{"top1pct":"5.60000017285","top1":"n07614500 ice cream, icecream","top2pct":"5.00000007451","top2":"n02786058 Band Aid","top3pct":"3.5000000149","top3":"n04270147 spatula","top4pct":"3.40000018477","top4":"n01984695 spiny lobster, langouste, rock lobster, crawfish, crayfish, sea crawfish","top5pct":"3.20000015199","top5":"n02992529 cellular telephone, cellular phone, cellphone, cell, mobile phone","imagefilename":"images/tx1_image_img_20180308142106.jpg"},{"top1pct":"6.10000006855","top1":"n04270147 spatula","top2pct":"5.90000003576","top2":"n03250847 drumstick","top3pct":"3.40000018477","top3":"n02883205 bow tie, bow-tie, bowtie","top4pct":"3.29999998212","top4":"n03249569 drum, membranophone, tympan","top5pct":"3.09999994934","top5":"n07614500 ice cream, icecream","imagefilename":"images/tx1_image_img_20180308162554.jpg"},{"top1pct":"13.0999997258","top1":"n02787622 banjo","top2pct":"8.60000029206","top2":"n03447721 gong, tam-tam","top3pct":"5.79999983311","top3":"n03250847 drumstick","top4pct":"3.70000004768","top4":"n07614500 ice cream, icecream","top5pct":"3.59999984503","top5":"n07695742 pretzel","imagefilename":"images/tx1_image_img_20180308141540.jpg"},{"top1pct":"24.699999392","top1":"n04270147 spatula","top2pct":"9.39999967813","top2":"n07615774 ice lolly, lolly, lollipop, popsicle","top3pct":"5.60000017285","top3":"n03724870 mask","top4pct":"2.70000007004","top4":"n03494278 harmonica, mouth organ, harp, mouth harp","top5pct":"2.40000002086","top5":"n04229816 ski mask","imagefilename":"images/tx1_image_img_20180308145223.jpg"},{"top1pct":"15.5000001192","top1":"n02667093 abaya","top2pct":"5.40000014007","top2":"n01985128 crayfish, crawfish, crawdad, crawdaddy","top3pct":"4.60000000894","top3":"n07614500 ice cream, icecream","top4pct":"4.50000017881","top4":"n03980874 poncho","top5pct":"3.79999987781","top5":"n03720891 maraca","imagefilename":"images/tx1_image_img_20180307213449.jpg"},{"top1pct":"31.7999988794","top1":"n02667093 abaya","top2pct":"3.29999998212","top2":"n03045698 cloak","top3pct":"2.89999991655","top3":"n04070727 refrigerator, icebox","top4pct":"2.80000008643","top4":"n02977058 cash machine, cash dispenser, automated teller machine, automatic teller machine, automated teller, automatic teller, ATM","top5pct":"2.80000008643","top5":"n04591713 wine bottle","imagefilename":"images/tx1_image_img_20180307214328.jpg"},{"top1pct":"24.1999998689","top1":"n02667093 abaya","top2pct":"3.40000018477","top2":"n03045698 cloak","top3pct":"3.20000015199","top3":"n02977058 cash machine, cash dispenser, automated teller machine, automatic teller machine, automated teller, automatic teller, ATM","top4pct":"2.89999991655","top4":"n04070727 refrigerator, icebox","top5pct":"2.50000003725","top5":"n04179913 sewing machine","imagefilename":"images/tx1_image_img_20180307214848.jpg"},{"top1pct":"23.1000006199","top1":"n03721384 marimba, xylophone","top2pct":"5.40000014007","top2":"n04370456 sweatshirt","top3pct":"4.89999987185","top3":"n01704323 triceratops","top4pct":"3.29999998212","top4":"n03980874 poncho","top5pct":"2.80000008643","top5":"n03944341 pinwheel","imagefilename":"images/tx1_image_img_20180308152459.jpg"}] Pro Tip: When doing PutHiveStreaming
Caused by: org.apache.hadoop.ipc.RemoteException: Permission denied: user=nifi, access=WRITE, inode="/apps/hive/warehouse/inception":admin:hdfs:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:399)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:255)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:193) hdfs dfs -chmod -R 777 /apps/hive/warehouse/inception Make sure nifi user has write permissions to the file. Github Source: https://github.com/tspannhw/hivereader Original Github Source: https://github.com/tspannhw/hive Release https://github.com/tspannhw/hivereader/releases/tag/1.0 If you don't want to install Maven and the JDK, you can download the run the JAR file with just the Java JVM runtime (and a copy of the application.properties pointing to your stuff). Original Article: https://community.hortonworks.com/articles/53629/writing-a-spring-boot-microservices-to-access-hive.html Resources: https://projects.spring.io/spring-boot/
... View more
Labels:
03-09-2018
02:13 AM
check further back in the logs for a sql error
... View more
03-08-2018
09:27 PM
should be Avro with a schema. Get a document or example code from an engineer
... View more
03-08-2018
09:27 PM
What is the Hive table DDL? It needs to be bucketed and an ORC file. See my example: https://community.hortonworks.com/articles/174538/apache-deep-learning-101-using-apache-mxnet-with-h.html
... View more
03-08-2018
08:18 PM
2 Kudos
This is for people preparing to attend my talk on Deep Learning at DataWorks Summit Berling 2018 (https://dataworkssummit.com/berlin-2018/#agenda) on Thursday April 19, 2018 at 11:50AM Berlin time. In this example we required Apache NiFi 1.5 or newer. This is part 2 of https://community.hortonworks.com/articles/155435/using-the-new-mxnet-model-server.html Our flow that receives the JSON files from the server does some minimal processing. We add some meta data fields, infer an AVRO schema from the JSON file (we only need to do this once in development and then you can delete that box from your flow). As you can see I can easily push that data to HDFS as a parquet file. This is if you wish to not install Apache MXNet on your HDF, HDP or related nodes. You can now install Apache MXNet plus MMS on a cloud or edge server and call it via HTTP from Apache NiFi for processing. Local Apache NiFi Flow To Call Our SSD Predict and Squeeze Net Predict REST Services Cluster Receiving The Two Remote Ports Server Apache NiFi Flow Example Squeeze Net JSON Data Processed by Apache NiFi Set the Schema and Mime Type Storage Settings For Apache Parquet Files on HDFS SSD MMS Logs Squeeze Net MMS Logs Schemas Used An Example Prediction returned, as you can see you get the coordinates for drawing a box. To Store Apache Parquet Files: hdfs dfs -mkdir /ssdpredict hdfs dfs -chmod 755 /ssdpredict Inside one of the files stored by Apache NiFi in HDFS, as your can see there is an embedded Apache Avro schema in JSON format built by Avro Parquet MR tool version 1.8.2. parquet.avro.schema�{"type":"record","name":"ssdpredict","fields":[{"name":"prediction","type":{"type":"array","items":{"type":"array","items":["string","int"]}},"doc":"Type inferred from '[[\"person\",385,329,466,498],[\"bicycle\",96,386,274,498]]'"}]}writer.model.nameavroIparquet-mr version 1.8.2 (build c6522788629e590a53eb79874b95f6c3ff11f16c)sPAR1 Example File -rw-r--r-- 3 nifi hdfs 688 2018-03-08 18:32 /ssdpredict/201801081202602.jpg.parquet.avro Apache NiFi Flow File: apache-mxnet-cluster-processing.xml Reference: http://parquet.apache.org/documentation/latest/
... View more
Labels: