About TimothySpann

bbihari · ‎07-29-2016

Hi, duplicate of this, because HDP-AWS is using Cloudbreak

TimothySpann · ‎07-28-2016

There are a lot of excellent talks from the summit. Deep Learning Apache Spark with Machine Learning like TensorFlow Distributed Deep Learning on Hadoop Clusters (Yahoo) Apache Spark Big Data Heterogeneous Mixture Learning on Spark Integrating Apache Spark and NiFi for Data Lakes (ThinkBig) Operations Zero Downtime App Deployment Using Hadoop (Hortonworks) Debugging YARN Cluster in Production (Hortonworks) Yahoo's Experience Running Pig on Tez at Scale The DAP Where Yarn HBase Kafka and Spark go to Production (Cask) Extend Governance in Hadoop with Atlas Ecosystem (Hortonworks) Cost and Resource Tracking for Hadoop (Yahoo) Managing Hadoop HBase and Storm Clusters at Yahoo Scale Operating and Supporting Apache HBase Best Practices and Improvements (Hortonworks) Scheduling Policies in Yarn (Slides) Future of Data Arun Murthy, Hortonworks - Hadoop Summit 2016 San Jose - #HS16SJ - #theCUBE The Future of Hadoop : An Enterprise View (Slides) Streaming The Future of Storm (Hortonworks) Streaming ETL for All : Embeddable Data Transformation for Real Time Streams Real-time, Streaming Advanced Analytics, Approximations, and Recommendations using Apache Spark ML/GraphX, Kafka Stanford CoreNLP, and Twitter Algebird (Chris Fregly, IBM) Fighting Fraud in Real Time by Processing 1Mplus TPS Using Storm on Slider YARN (Rocketfuel) Lambda less Stream Processing at Scale in LinkedIn Make Streaming Analytics Work For You The Devil is in the Details (Hortonworks) Lego Like Building Blocks of Storm and Spark Streaming Pipelines for Rapid IOT and Streaming (StreamAnalytix) Performance Comparison of Streaming Big Data Platforms Machine Learning Prescient Keeps Travelers Safe with Natural Language Processing and Geospatial Analytics IoAT Internet Of Things What about Data Storage (Hortonworks) YAF (Yet Another Framework) Apache Beam A Unified Model for Batch and Streaming Data Processing (Google) Turning the Stream Processor into a Database Building Online Applications on Streams (Flink / DataArtisans) The Next Generation of Data Processing OSS (Google) Next Gen Big Data Analytics with Apache Apex SQL and Friends How We Re Engineered Phoenix with a Cost Based Optimizer Based on Calcite (Intel and Hortonworks) Hive Hbase Metastore Improving Hive with a Big Data Metadata Storage (Hortonworks) Phoenix plus HBase An Enterprise Grade Data Warehouse Appliance for Interactive Analytics (Hortonworks) Presto Whats New in SQL on Hadoop and Beyond (Facebook, Teradata) DataFlow Scalable Optical Character Recognition with Apache NiFi and Tesseract (Hortonworks) Building a Smarter Home with Nifi and Spark General Its Time Launching Your Advanced Analytics Program for Success in a Mature Industry Like Oil and Gas (Conoco Phillips) Instilling Confidence and Trust Big Data Security Governance (Mastercard) Hadoop in the Cloud The What Why and How from the Experts (Microsoft) War on Stealth Cyberattacks that Target Unknown Vulnerabilities Hadoop and Cloud Storage Object Store Integration in Production (Hortonworks) There is a New Ranger in Town End to End Security and Auditing in a Big Data as a Service Deployment Building A Scalable Data Science Platform with R (Microsoft) A Data Lake and a Data Lab to Optimize Operations and Safety Within a Nuclear Fleet Reliable and Scalable Data Ingestion at Airbnb

henimaher · ‎05-22-2018

I'm collecting facebook data using NIFI. Using which processors ( and configurations) and how to modify the query to get more next feeds from the response( Graph API) . I'm getting the first 100 posts and after that a link to the next 100 posts how to manage a dynamic process to get contunually dataflow from facebook.

ahadjidj · ‎08-26-2016

Hi @Andread B, Why do you want to run NiFi on the NameNode ? If you are ingesting lot of data I would recommend running NiFi on a dedicated host or at least on edge node. Also, if you will ingest lot of data for a single NiFi instance, you can use GenerateTableFetch (coming in NiFi 1.0) to divide your import into several chunks, and distribute them on several NiFi nodes. This processor will generate several FlowFiles based on the Partition Size property where each FlowFile is a query to get a part of the data. You can try this by downloading NiFi 1.0 Beta : https://nifi.apache.org/download.html

ashok_padmanabh · ‎05-22-2017

@Timothy Spann Were you able to configure TOAD with a kerberized cluster?

alexmc · ‎07-10-2017

Still worth answering if anyone knows the answer...

mak88 · ‎09-07-2017

I am running spark version 1.6.3 under HDP 2.5.6. What version of magellan should I use to run with this version?

TimothySpann · ‎07-22-2016

Thanks for the analysis. Does anyone have similiar sizings for Google and Azure:

TimothySpann · ‎07-21-2016

Using the GetHTTP Processor we grab random images from the DigitalOcean's Unsplash.it free image site. I give it a random file name so we can save it uniquely in HDFS. The Entire Data Flow from GetHTTP to Final HDFS storage of image and it's metadata as JSON. ExtractMediaMetaData Processor The final results: hdfs dfs -cat /mediametadata/random1469112881039.json {"Number of Components":"3","Resolution Units":"none","Image Height":"200 pixels","File Name":"apache-tika-3181704319795384377.tmp", "Data Precision":"8 bits", "File Modified Date":"Thu Jul 21 14:54:43 UTC 2016","tiff:BitsPerSample":"8", "Compression Type":"Progressive,Huffman","X-Parsed-By":"org.apache.tika.parser.DefaultParser, org.apache.tika.parser.jpeg.JpegParser", "Component 1":"Y component: Quantization table 0, Sampling factors 2 horiz/2vert", "Component 2":"Cb component: Quantization table 1,Sampling factors 1 horiz/1 vert", "tiff:ImageLength":"200","mime.type":"image/jpeg","gethttp.remote.source":"unsplash.it", "Component3":"Cr component: Quantization table 1, Sampling factors 1 horiz/1vert", "X Resolution":"1 dot", "FileSize":"4701 bytes","tiff:ImageWidth":"200","path":"./", "filename":"random1469112881039.jpg","ImageWidth":"200 pixels", "uuid":"8b7c4f9f-9436-4ccb-b06e-9a720c91f6e0", "Content-Type":"image/jpeg", "YResolution":"1 dot"} We have as many images as we want. Using the Unsplash.it parameters I picked an image width of always 200. You can customize that. Below is the image downloaded with the above metadata.

dfossouo · ‎11-27-2017

Hello, Great post need to correct this part : sudo wget http://download.opensuse.org/repositories/home:/oojah:/mqtt/CentOS_CentOS-6/home:oojah:mqtt.repo sudo cp *.repo /etc/yum.repos.d/ sudo yum -y update sudo yum -y install mosquitto step 1 and 2 are fused. Regards

Online	Offline
Last Visited	‎05-20-2024 05:42 PM

Member Since	‎01-07-2019 11:58 AM
Last Visited	‎05-20-2024 05:42 PM
Posts	1,973
Kudos received	1122

Cloudera Community

Re: Has anyone tried NiFi consuming (JMSConsume) f...

Re: NiFi Crash after runing chain of lookups

Re: Recommend approach for listening to RSS Feed i...

Re: NiFi ListenFTP Processor Default Data Port

Re: Nifi: Kafka Producer with Avro format in both ...

Re: HDP-AWS: Adding an HDF Node to the Cluster

Talks from Hadoop Summit San Jose 2016

Re: Accessing Facebook Page Data from Apache NiFi ...

Re: Using NiFi to quey RDBMS

Re: Using Toad for Hadoop with HDP 2.4

Re: Installing HDF on AWS

Re: Magellan Failure in Zeppelin

Re: Cloud Sizing for AWS, Azure and Google - What ...

Using ExtractMediaMetaData for Image Analysis

Re: IoT Example in Apache NiFi: Consuming and Pr...