About TimothySpann

TimothySpann · ‎01-23-2017

https://community.hortonworks.com/questions/63249/how-to-debug-each-nifi-processor.html https://community.hortonworks.com/questions/71001/debugging-nifi-flow.html A useful article: https://community.hortonworks.com/articles/71839/decyphering-error-messages-in-apache-nifi.html

TimothySpann · ‎01-20-2017

You can run the attack library for OSX or Linux from an edge node or from outside the cluster. I ran against mine from my OSX laptop against my cluster that I had network access to. You should try to scan from inside your network, from an edge node and from a remote site on the Internet. You will need Python 2.7 or Python 3.x installed first. git clone git@github.com:CERT-W/hadoop-attack-library.git pip install requests lxml You may need root or sudo access to install on your machine. One of the scanners hits the WebHDFS link that you may have seen a warning about. python hdfsbrowser.py timscluster Beginning to test services accessibility using default ports ... Testing service WebHDFS [+] Service WebHDFS is available Testing service HttpFS [-] Exception during requesting the service [+] Sucessfully retrieved 1 services drwxrwxrwx hdfs:hdfs 2017-01-15T05:50:27+0000 / drwxrwxrwx yarn:hadoop 2017-01-11T19:25:26+0000 app-logs /app-logs drwxrwxrwx hdfs:hdfs 2016-12-21T23:12:40+0000 apps /apps drwxrwxrwx yarn:hadoop 2016-09-15T21:02:30+0000 ats /ats drwxrwxrwx root:hdfs 2016-12-21T23:08:34+0000 avroresults /avroresults drwxrwxrwx hdfs:hdfs 2016-12-13T03:42:55+0000 banking /banking To see how available your Hadoop configurations are available, you can use Hadoop Snooper. This is under: Tools\ Techniques\ and\ Procedures \ Getting\ the\ target\ environment\ configuration python hadoopsnooper.py timscluster -o test Specified destination path does not exist, do you want to create it ? [y/N]y [+] Creating configuration directory [+] core-site.xml successfully created [+] mapred-site.xml successfully created [+] yarn-site.xml successfully created This downloads all those configuration files to a directory named test. These were not the full configuration files, but they pointed to correct internal servers and give an attacker more information. Another scan worth running is sqlmap. This tool will let you check various SQL tools in the system. SQLMap requires Python 2.6 or 2.7. ➜ projects git clone https://github.com/sqlmapproject/sqlmap.git sqlmap-dev Cloning into 'sqlmap-dev'... remote: Counting objects: 55560, done. remote: Compressing objects: 100% (41/41), done. remote: Total 55560 (delta 22), reused 0 (delta 0), pack-reused 55519 Receiving objects: 100% (55560/55560), 47.25 MiB | 2.28 MiB/s, done. Resolving deltas: 100% (42960/42960), done. Checking connectivity... done. ➜ projects python sqlmap.py --update ➜ projects cd sqlmap-dev ➜ sqlmap-dev git:(master) python sqlmap.py --update ___ __H__ ___ ___[.]_____ ___ ___ {1.1.1.14#dev} |_ -| . [)] | .'| . | |___|_ [']_|_|_|__,| _| |_|V |_| http://sqlmap.org [!] legal disclaimer: Usage of sqlmap for attacking targets without prior mutual consent is illegal. It is the end user's responsibility to obey all applicable local, state and federal laws. Developers assume no liability and are not responsible for any misuse or damage caused by this program [*] starting at 16:49:13 [16:49:13] [INFO] updating sqlmap to the latest development version from the GitHub repository [16:49:13] [INFO] update in progress . [16:49:14] [INFO] already at the latest revision 'f542e82' [*] shutting down at 16:49:14 References: http://sqlmap.org/ http://www.slideshare.net/bunkertor/hadoop-security-54483815 http://tools.kali.org/ https://github.com/savio-code/hexorbase https://community.hortonworks.com/articles/73035/running-dns-and-domain-scanning-tools-from-apache.html

TimothySpann · ‎01-20-2017

import org.apache.spark.SparkContext import org.apache.spark.sql.SQLContext val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) val df = sqlContext.table("tablename") df.select("location").show(5) java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:165) at org.apache.spark.sql.execution.datasources.LogicalRelation$anonfun$1.apply(LogicalRelation.scala:39) at org.apache.spark.sql.execution.datasources.LogicalRelation$anonfun$1.apply(LogicalRelation.scala:38) at scala.Option.map(Option.scala:145) at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:38) at org.apache.spark.sql.execution.datasources.LogicalRelation.copy(LogicalRelation.scala:31) at org.apache.spark.sql.hive.HiveMetastoreCatalog.org$apache$spark$sql$hive$HiveMetastoreCatalog$convertToOrcRelation(HiveMetastoreCatalog.scala:588) at org.apache.spark.sql.hive.HiveMetastoreCatalog$OrcConversions$anonfun$apply$2.applyOrElse(HiveMetastoreCatalog.scala:647) at org.apache.spark.sql.hive.HiveMetastoreCatalog$OrcConversions$anonfun$apply$2.applyOrElse(HiveMetastoreCatalog.scala:643) at org.apache.spark.sql.catalyst.trees.TreeNode$anonfun$transformUp$1.apply(TreeNode.scala:335) at org.apache.spark.sql.catalyst.trees.TreeNode$anonfun$transformUp$1.apply(TreeNode.scala:335) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:334) at org.apache.spark.sql.hive.HiveMetastoreCatalog$OrcConversions$.apply(HiveMetastoreCatalog.scala:643) at org.apache.spark.sql.hive.HiveMetastoreCatalog$OrcConversions$.apply(HiveMetastoreCatalog.scala:637) at org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$execute$1$anonfun$apply$1.apply(RuleExecutor.scala:83) at org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$execute$1$anonfun$apply$1.apply(RuleExecutor.scala:80) at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111) at scala.collection.immutable.List.foldLeft(List.scala:84) at org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$execute$1.apply(RuleExecutor.scala:80) at org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$execute$1.apply(RuleExecutor.scala:72) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:72) at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:36) at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:36) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:133) at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52) at org.apache.spark.sql.SQLContext.table(SQLContext.scala:831) at org.apache.spark.sql.SQLContext.table(SQLContext.scala:827)

TimothySpann · ‎01-19-2017

after restart it was ok

TimothySpann · ‎01-18-2017

getting errors on the validation query

TimothySpann · ‎01-18-2017

are you using out of the box zeppelin installed through ambari, version 0.60? how much RAM do you have? what version of HDP? ambari? jdk? does your cluster have spark? any logs?

TimothySpann · ‎01-17-2017

I will add a validation query

TimothySpann · ‎01-17-2017

This is the latests HDF 2.x. Sometimes it happens in ExecuteStreamCommand. If I have a flow running for a few days continuously there will be a few a day. ion: Broken pipe: java.io.IOException: Broken pipe 2017-01-17 18:42:46,932 ERROR [Thread-617894] o.a.n.p.standard.ExecuteStreamCommand java.io.IOException: Broken pipe at java.io.FileOutputStream.writeBytes(Native Method) ~[na:1.8.0_77] at java.io.FileOutputStream.write(FileOutputStream.java:326) ~[na:1.8.0_77] at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) ~[na:1.8.0_77] at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) ~[na:1.8.0_77] at org.apache.nifi.stream.io.StreamUtils.copy(StreamUtils.java:36) ~[nifi-utils-1.1.0.2.1.1.0-2.jar:1.1.0.2.1.1.0-2] at org.apache.nifi.processors.standard.ExecuteStreamCommand$2.run(ExecuteStreamCommand.java:489) ~[nifi-standard-processors-1.1.0.2.1.1.0-2.jar:1.1.0.2.1.1.0-2] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77] PutHiveQL will happen usually any time it first starts up 2017-01-17 18:42:37,563 ERROR [Timer-Driven Process Thread-10] o.apache.nifi.processors.hive.PutHiveQL PutHiveQL[id=71b732e9-f140-109c-a315-47f0af695760] Failed to update Hive for StandardFlowFileRecord[uuid=d95fadbd-d349-4efe-88da-cb76c3f2aca8,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1484678529444-3613, container=default, section=541], offset=596859, length=425],offset=0,name=1653249247047835.json.orc,size=425] due to java.sql.SQLException: org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe; it is possible that retrying the operation will succeed, so routing to retry: java.sql.SQLException: org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe 2017-01-17 18:42:37,568 ERROR [Timer-Driven Process Thread-10] o.apache.nifi.processors.hive.PutHiveQL java.sql.SQLException: org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe at org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:305) ~[hive-jdbc-1.2.1000.2.5.0.0-1245.jar:1.2.1000.2.5.0.0-1245] at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:238) ~[hive-jdbc-1.2.1000.2.5.0.0-1245.jar:1.2.1000.2.5.0.0-1245] at org.apache.hive.jdbc.HivePreparedStatement.execute(HivePreparedStatement.java:98) ~[hive-jdbc-1.2.1000.2.5.0.0-1245.jar:1.2.1000.2.5.0.0-1245] at org.apache.commons.dbcp.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:172) ~[commons-dbcp-1.4.jar:1.4] at org.apache.commons.dbcp.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:172) ~[commons-dbcp-1.4.jar:1.4] at org.apache.nifi.processors.hive.PutHiveQL.onTrigger(PutHiveQL.java:161) ~[nifi-hive-processors-1.1.0.2.1.1.0-2.jar:1.1.0.2.1.1.0-2] at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) [nifi-api-1.1.0.2.1.1.0-2.jar:1.1.0.2.1.1.0-2] at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1099) [nifi-framework-core-1.1.0.2.1.1.0-2.jar:1.1.0.2.1.1.0-2] at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136) [nifi-framework-core-1.1.0.2.1.1.0-2.jar:1.1.0.2.1.1.0-2] at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47) [nifi-framework-core-1.1.0.2.1.1.0-2.jar:1.1.0.2.1.1.0-2] at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132) [nifi-framework-core-1.1.0.2.1.1.0-2.jar:1.1.0.2.1.1.0-2] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_77] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_77] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_77] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_77] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_77] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_77] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77] Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe at org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:147) ~[hive-exec-1.2.1000.2.5.0.0-1245.jar:1.2.1000.2.5.0.0-1245] at org.apache.thrift.transport.TTransport.write(TTransport.java:107) ~[hive-exec-1.2.1000.2.5.0.0-1245.jar:1.2.1000.2.5.0.0-1245] at org.apache.thrift.transport.TSaslTransport.writeLength(TSaslTransport.java:391) ~[hive-exec-1.2.1000.2.5.0.0-1245.jar:1.2.1000.2.5.0.0-1245] at org.apache.thrift.transport.TSaslTransport.flush(TSaslTransport.java:499) ~[hive-exec-1.2.1000.2.5.0.0-1245.jar:1.2.1000.2.5.0.0-1245] at org.apache.thrift.transport.TSaslClientTransport.flush(TSaslClientTransport.java:37) ~[hive-exec-1.2.1000.2.5.0.0-1245.jar:1.2.1000.2.5.0.0-1245] at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:73) ~[hive-exec-1.2.1000.2.5.0.0-1245.jar:1.2.1000.2.5.0.0-1245] at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:62) ~[hive-exec-1.2.1000.2.5.0.0-1245.jar:1.2.1000.2.5.0.0-1245] at org.apache.hive.service.cli.thrift.TCLIService$Client.send_ExecuteStatement(TCLIService.java:223) ~[hive-service-1.2.1000.2.5.0.0-1245.jar:1.2.1000.2.5.0.0-1245] at org.apache.hive.service.cli.thrift.TCLIService$Client.ExecuteStatement(TCLIService.java:215) ~[hive-service-1.2.1000.2.5.0.0-1245.jar:1.2.1000.2.5.0.0-1245] at sun.reflect.GeneratedMethodAccessor1087.invoke(Unknown Source) ~[na:na] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_77] at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_77] at org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1363) ~[hive-jdbc-1.2.1000.2.5.0.0-1245.jar:1.2.1000.2.5.0.0-1245] at com.sun.proxy.$Proxy211.ExecuteStatement(Unknown Source) ~[na:na] at org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:296) ~[hive-jdbc-1.2.1000.2.5.0.0-1245.jar:1.2.1000.2.5.0.0-1245] ... 17 common frames omitted Caused by: java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) ~[na:1.8.0_77] at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) ~[na:1.8.0_77] at java.net.SocketOutputStream.write(SocketOutputStream.java:153) ~[na:1.8.0_77] at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) ~[na:1.8.0_77] at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126) ~[na:1.8.0_77] at org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:145) ~[hive-exec-1.2.1000.2.5.0.0-1245.jar:1.2.1000.2.5.0.0-1245] ... 31 common frames omitted

TimothySpann · ‎01-16-2017

You install both, not even HDP Client

TimothySpann · ‎01-15-2017

Raspberry PIs and other small devices often have cameras or can have camera's attached. Raspberry Pi's have cheap camera add-ons that can ingest still images and videos (https://www.raspberrypi.org/products/camera-module/). Using a simple Python script we can ingest images and then ingest them into our central Hadoop Data Lake. This is a nice simple use case for Connected Data Platforms with both Data in Motion and Data at Rest. This data can be processed in-line with Deep Learning Libraries like TensorFlow for image recognition and assessment. Using OpenCV and other tools we can process in-motion and look for issues like security breaches, leaks and other events. The most difficult part is the Python code which reads from camera, adds a watermark, converts to bytes, sends to MQTT and then ftps to an FTP server. I do both since networking is always tricky. You could also add if it fails to connect to either, store to a directory on a mapped USB drive. Once network returns send it out, it would be easy to do that with MiniFi which could read that directory. Once the file lands into the MQTT broker or FTP server, NIFI pulls it and bring it into the flow. I first store to HDFS for our Data @ Rest permanent storage for future deep learning processing. I also run three processors to extra image metadata and then call jp2a to convert the image into an ASCII picture. ExecuteStreamCommand for Running jp2a The Output Ascii HDFS Directory of Uploaded Files Metadata extracted from the image An Example Imported Image Other Meta Data Meta Data Extracted A Converted JPG to ASCII Running JP2A on Images Stored in HDFS via WebHDFS REST API /opt/demo/jp2a-master/src/jp2a "http://hdfsnode:50070/webhdfs/v1/images/$@?op=OPEN" Python on RPI #!/usr/bin/python import os import datetime import ftplib import traceback import math import random, string import base64 import json import paho.mqtt.client as mqtt import picamera from time import sleep from time import gmtime, strftime packet_size=3000 def randomword(length): return ''.join(random.choice(string.lowercase) for i in range(length)) # Create unique image name img_name = 'pi_image_{0}_{1}.jpg'.format(randomword(3),strftime("%Y%m%d%H%M%S",gmtime())) # Capture Image from Pi Camera try: camera = picamera.PiCamera() camera.annotate_text = " Stored with Apache NiFi " camera.capture(img_name, resize=(500,281)) pass finally: camera.close() # MQTT client = mqtt.Client() client.username_pw_set("CloudMqttUserName","!MakeSureYouHaveAV@5&L0N6Pa55W0$4!") client.connect("cloudmqttiothoster", 14162, 60) f=open(img_name) fileContent = f.read() byteArr = bytearray(fileContent) f.close() message = '"image": {"bytearray":"' + byteArr + '"} } ' print client.publish("image",payload=message,qos=1,retain=False) client.disconnect() # FTP ftp = ftplib.FTP() ftp.connect("ftpserver", "21") try: ftp.login("reallyLongUserName", "FTP PASSWORDS SHOULD BE HARD") ftp.storbinary('STOR ' + img_name, open(img_name, 'rb')) finally: ftp.quit() # clean up sent file os.remove(img_name) References: https://community.hortonworks.com/repos/77987/rpi-picamera-mqtt-nifi.html?shortDescriptionMaxLength=140 https://github.com/bikash/RTNiFiStreamProcessors http://stackoverflow.com/questions/37499739/how-can-i-send-a-image-by-using-mosquitto https://www.raspberrypi.org/learning/getting-started-with-picamera/worksheet/ https://www.cloudmqtt.com/ https://developer.ibm.com/recipes/tutorials/sending-and-receiving-pictures-from-a-raspberry-pi-via-mqtt/ https://developer.ibm.com/recipes/tutorials/displaying-image-from-raspberry-pi-in-nodered-ui-hosted-on-bluemix/ https://www.raspberrypi.org/learning/getting-started-with-picamera/worksheet/ https://github.com/jpmens/twitter2mqtt http://www.ev3dev.org/docs/tutorials/sending-and-receiving-messages-with-mqtt/ https://github.com/njh/mqtt-http-bridge https://www.raspberrypi.org/learning/parent-detector/worksheet/ http://picamera.readthedocs.io/en/release-1.10/recipes1.html http://picamera.readthedocs.io/en/release-1.10/faq.html http://www.eclipse.org/paho/ http://picamera.readthedocs.io/en/release-1.10/recipes1.html#capturing-to-an-opencv-object https://github.com/cslarsen/jp2a https://www.raspberrypi.org/learning/getting-started-with-picamera/ https://www.raspberrypi.org/learning/tweeting-babbage/worksheet/ https://csl.name/jp2a/

Online	Offline
Last Visited	‎05-20-2024 05:42 PM

Member Since	‎01-07-2019 11:58 AM
Last Visited	‎05-20-2024 05:42 PM
Posts	1,973
Kudos received	1122

Cloudera Community

Re: Has anyone tried NiFi consuming (JMSConsume) f...

Re: NiFi Crash after runing chain of lookups

Re: Recommend approach for listening to RSS Feed i...

Re: NiFi ListenFTP Processor Default Data Port

Re: Nifi: Kafka Producer with Avro format in both ...

Re: Debugging custom NiFi unresponsive flows

Using the Hadoop Attack Library to Check Your Hado...

Spark Weird Error

Re: Broken Pipes in NiFi

Re: Broken Pipes in NiFi

Re: Configuring Zeppelin Spark Interpreters

Re: Broken Pipes in NiFi

Broken Pipes in NiFi

Re: Trying to install NiFi 1.1 from HDF Repo

Ingest Remote Camera Images from Raspberry Pi via ...