1971
Posts
1224
Kudos Received
124
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
79 | 04-03-2024 06:39 AM | |
450 | 01-12-2024 08:19 AM | |
285 | 12-07-2023 01:49 PM | |
634 | 08-02-2023 07:30 AM | |
1079 | 03-29-2023 01:22 PM |
04-12-2017
03:27 PM
2017-04-12 15:25:12,604 INFO [NiFi logging handler] org.apache.nifi.StdOut Error occurred during initialization of VM
2017-04-12 15:25:12,604 INFO [NiFi logging handler] org.apache.nifi.StdOut Initial heap size set to a larger value than the maximum heap size
... View more
Labels:
- Labels:
-
Apache NiFi
-
Cloudera DataFlow (CDF)
04-11-2017
06:26 PM
You should stick with the JDK 8 version. New features require JDK 8. JDK 8 is very old at this point.
... View more
04-10-2017
09:06 PM
Metastore on princeton10.field.hortonworks.com failed (Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_metastore.py", line 200, in execute
timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run
tries=self.resource.tries, try_sleep=self.resource.try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call
tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)
File "/usr/lib/p ython2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call
raise ExecutionFailed(err_msg, code, out, err)
ExecutionFailed: Execution of 'export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf' ; hive --hiveconf hive.metastore.uris=thrift://princeton10.field.hortonworks.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e 'show databases;'' returned 12. log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.
Logging initialized using configuration in file:/etc/hive/2.6.0.3-8/0/hive-log4j.properties
hive.exec.post.hooks Class not found:org.apache.atlas.hive.hook.HiveHook
FAILED: Hive Internal Error: java.lang.ClassNotFoundException(org.apache.atlas.hive.hook.HiveHook)
java.lang.ClassNotFoundException: org.apache.atlas.hive.hook.HiveHook
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.hive.ql.hooks.HookUtils.getHooks(HookUtils.java:60)
at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1386)
at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1370)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1598)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1291)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1158)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1148)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:217)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:169)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:380)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:315)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:712)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:685)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
)
This host-level alert is triggered if the Hive Metastore process cannot be determined to be up and listening on the network.
... View more
Labels:
- Labels:
-
Apache Hive
04-07-2017
06:53 PM
Apache has good documentation for this, you need to make sure you have correct kerberos versions.
... View more
04-04-2017
08:33 PM
please post full exception logs, SQL string, flow file, attributes. is your db connection working? for further debugging: https://dzone.com/articles/finding-nifi-errors
... View more
04-02-2017
09:16 PM
2 Kudos
The QueryDatabaseTable processor can easily ingest data from a table based on a incrementing key. A sequence id or primary key that is autogeneratored like Postgresql and MariaDB do is ideal. You can also do an incrementing data or Oracle Sequence ID. As long as it increments when you get a new one you can set. If your tables don't this, you could write a trigger or procedure in your database that sends it to a transaction table with such an autogenerated id and NiFi will grab that. Clearly real CDC involves reading Write Ahead Logs or Transaction logs at a deep level and grabbing all changes. That is coming and can now be done by tools like Atunity + NiFi. For use cases that I have, I just need to grab new rows when they are added to a table and I control the ID. I convert from AVRO to JSON so I can extract attributes since I want to do some routing based on column values. Based on one field in the table, I want to determine where I land the data. It can be sent to HBase (and Phoenix), HDFS or Hive. I split my records for easy processing. One thing you I highly recommend you do for SQL safety and to prevent errors. Example SQL for CDC: upsert into trials (trialid, trialdescription, fileName) values (1,'FENTANYL','5ab2d068-dd53-4674-bcf8-17f7d80d0553')
CREATE EXTERNAL TABLE IF NOT EXISTS trials2 (trialid INT, trialdescription STRING, trialtype STRING) STORED AS ORC
location '/hiveorc'
CREATE TABLE trials (trialid integer not null primary key, trialdescription varchar, filename varchar);
Set your SQL Attributes for SQL Safety. The types are the numeric values for JDBC Types. 12 is String. -5 is BIG INT. Then your SQL is standard JDBC syntax with ?'s for place markers. Here is some cool data: I used Google Location API called via NiFi REST CALL to enhance some data and get lat and long from a vague location. This kind of thing happens in Twitter all the time.
Reference: https://www.mockaroo.com/ https://community.hortonworks.com/articles/51902/incremental-fetch-in-nifi-with-querydatabasetable.html
... View more
Labels:
04-02-2017
08:58 PM
6 Kudos
Monitoring Apache NiFi It's really important to pick some Reporting Tasks to let you know what's happening in Apache NiFi servers. Ambari will send it to your HDF Ambari which will show the results in nice Grafana graphs, charts and tables. You can also monitor disk usage, memory and also send tasks to DataDog, Ganglia and Other Servers. It's also easy to write your own Reporting Task if you need a different one. One of the ways to monitor your Apache NiFi Data Flows is to use the MonitorActivity processor which will create messages that can be sent to your Operations Dashboard, Console or elsewhere. For people doing ChatOps, you can easily push these messages to Slack (there's a processor for that) PutSlack. You could also send a REST call to HipChat or other chat tools. Pretty easy to wrap that up in a custom processor as well. Other Things to Monitor REST END Points server:port/nifi-api/system-diagnostics See: https://nifi.apache.org/docs/nifi-docs/rest-api/ Logs ...nifi/logs/nifi-app.log and ..nifi/logs/nifi-user.log These can be ingested with Apache NiFi for detailed log processing. You can filter and send some messages to SumoLogic or elsewhere via Apache NiFi. See: https://community.hortonworks.com/content/kbentry/67309/routing-logs-through-apache-nifi-to-phoenix-hdfs-a.html
... View more
Labels:
03-31-2017
09:39 PM
6 Kudos
FlowFile Continuation
Sometimes you need to backup your current running flow, let that flow run at a later date, or make a backup of what is in process now. You want this in a permanent storage and want to reconstitute it later like Orange Juice. And add it back into the flow or restart it. This could be do to failures, for integration testing, for testing new versions of components, as a checkpoint or for many other purposes. You don't always want to reprocess the original source or files (they may be gone). Option 1: You can save that raw data that came in originally in local files or HDFS. Then read it out of there later. Option 2: Preferred: MergeContent to FlowFileV3 then Reload with Get* to IdentifyMimeType to UnpackContent Using MergeContent with FlowFileV3 option. After that step you can PutFile, PutS3Object, PutHDFS or other file saving options. Perhaps send it to an FTP or sFTP server for storage elsewhere. Now you have a pkg file. cat /opt/demo/flow/904381478117605.pkg
NiFiFF3+tempf73.02sql.args.2.value29.7sql.args.11.type3roll353.9306742667328
mqtt.brokertcp://m13.cloudmqtt.com:14162sql.args.4.type3uuid$9f2f8b6f-2870-40a3-a460-49427cddf9a8
mqtt.topicsensorsql.args.7.type3sql.args.7.value353.9306742667328path./sql.args.4.value33.9sql.args.9.value-0.0sql.args.1.type1humidity29.7pitch14.015266431562901
nf.file.path.mqtt.qos0sql.args.8.type3temp33.9sql.args.1.value34sql.args.2.type3sql.args.10.type3sql.args.8.value128.4983979122009sql.args.5.type3sql.args.6.value14.015266431562901sql.args.3.value1011.1sql.args.10.value-0.0mqtt.isDuplicatefalspressure1011.1mqtt.isRetainedfalseyaw128.4983979122009cputemp3filename904381478117605sql.args.11.value1.0sql.args.9.type3x-0.0y-0.0z1.0sql.args.6.type3
nf.file.name904381478117605sql.args.5.value73.02sql.args.3.type3�[{"tempf": 73.02, "pressure": 1011.1, "pitch": 14.015266431562901, "temp": 33.9, "yaw": 128.4983979122009, "humidity": 29.7, "cputemp": "34", "y": -0.0, "x": -0.0, "z": 1.0, "roll": 353.9306742667328}]%
You can now reload that FlowFileV3 at any time, send it to IdentifyMimeType (so it knows it's a FlowFileV3) and then use UnpackContent to reconstitute into the original flow file. Now you can use it like it never stopped and was sent to disk. Now you have an unlimited queue to store pre or partially processed files. Saving time! You could run really expensive processes once and save the preprocessed items, files or models and reuse everywhere! Choose: FlowFile Stream, v3 Thanks to Joe Witt for explanation of the process. Reference: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.UnpackContent/ https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.MergeContent/
... View more
Labels:
03-27-2017
03:02 PM
You can assemble in NiFi and then store to ORC. I recommend breaking your JSON down into simpler structures since you will have to query it and use it with other data. Can you make it a wide table? Duplicate data is not a big deal for Hadoop.
... View more
03-24-2017
05:57 PM
have you connected to data from SQL?
... View more