1973
Posts
1225
Kudos Received
124
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
840 | 04-03-2024 06:39 AM | |
1618 | 01-12-2024 08:19 AM | |
800 | 12-07-2023 01:49 PM | |
1387 | 08-02-2023 07:30 AM | |
2003 | 03-29-2023 01:22 PM |
07-28-2016
08:34 PM
5 Kudos
There are a lot of excellent talks from the summit. Deep Learning Apache Spark with Machine Learning like TensorFlow Distributed Deep Learning on Hadoop Clusters (Yahoo) Apache Spark Big Data Heterogeneous Mixture Learning on Spark
Integrating Apache Spark and NiFi for Data Lakes (ThinkBig) Operations Zero Downtime App Deployment Using Hadoop (Hortonworks) Debugging YARN Cluster in Production (Hortonworks) Yahoo's Experience Running Pig on Tez at Scale The DAP Where Yarn HBase Kafka and Spark go to Production (Cask) Extend Governance in Hadoop with Atlas Ecosystem (Hortonworks) Cost and Resource Tracking for Hadoop (Yahoo) Managing Hadoop HBase and Storm Clusters at Yahoo Scale Operating and Supporting Apache HBase Best Practices and Improvements (Hortonworks) Scheduling Policies in Yarn (Slides) Future of Data Arun Murthy, Hortonworks - Hadoop Summit 2016 San Jose - #HS16SJ - #theCUBE The Future of Hadoop : An Enterprise View (Slides) Streaming The Future of Storm (Hortonworks) Streaming ETL for All : Embeddable Data Transformation for Real Time Streams Real-time, Streaming Advanced Analytics, Approximations, and Recommendations using Apache Spark ML/GraphX, Kafka Stanford CoreNLP, and Twitter Algebird (Chris Fregly, IBM) Fighting Fraud in Real Time by Processing 1Mplus TPS Using Storm on Slider YARN (Rocketfuel) Lambda less Stream Processing at Scale in LinkedIn Make Streaming Analytics Work For You The Devil is in the Details (Hortonworks) Lego Like Building Blocks of Storm and Spark Streaming Pipelines for Rapid IOT and Streaming (StreamAnalytix) Performance Comparison of Streaming Big Data Platforms Machine Learning
Prescient Keeps Travelers Safe with Natural Language Processing and Geospatial Analytics
IoAT Internet Of Things What about Data Storage (Hortonworks) YAF (Yet Another Framework) Apache Beam A Unified Model for Batch and Streaming Data Processing (Google) Turning the Stream Processor into a Database Building Online Applications on Streams (Flink / DataArtisans) The Next Generation of Data Processing OSS (Google) Next Gen Big Data Analytics with Apache Apex SQL and Friends How We Re Engineered Phoenix with a Cost Based Optimizer Based on Calcite (Intel and Hortonworks) Hive Hbase Metastore Improving Hive with a Big Data Metadata Storage (Hortonworks) Phoenix plus HBase An Enterprise Grade Data Warehouse Appliance for Interactive Analytics (Hortonworks) Presto Whats New in SQL on Hadoop and Beyond (Facebook, Teradata) DataFlow Scalable Optical Character Recognition with Apache NiFi and Tesseract (Hortonworks) Building a Smarter Home with Nifi and Spark General
Its Time Launching Your Advanced Analytics Program for Success in a Mature Industry Like Oil and Gas (Conoco Phillips) Instilling Confidence and Trust Big Data Security Governance (Mastercard) Hadoop in the Cloud The What Why and How from the Experts (Microsoft) War on Stealth Cyberattacks that Target Unknown Vulnerabilities Hadoop and Cloud Storage Object Store Integration in Production (Hortonworks) There is a New Ranger in Town End to End Security and Auditing in a Big Data as a Service Deployment Building A Scalable Data Science Platform with R (Microsoft) A Data Lake and a Data Lab to Optimize Operations and Safety Within a Nuclear Fleet Reliable and Scalable Data Ingestion at Airbnb
... View more
Labels:
05-22-2018
12:55 PM
I'm collecting facebook data using NIFI. Using which processors ( and configurations) and how to modify the query to get more next feeds from the response( Graph API) . I'm getting the first 100 posts and after that a link to the next 100 posts how to manage a dynamic process to get contunually dataflow from facebook.
... View more
08-26-2016
09:53 AM
Hi @Andread B, Why do you want to run NiFi on the NameNode ? If you are ingesting lot of data I would recommend running NiFi on a dedicated host or at least on edge node. Also, if you will ingest lot of data for a single NiFi instance, you can use GenerateTableFetch (coming in NiFi 1.0) to divide your import into several chunks, and distribute them on several NiFi nodes. This processor will generate several FlowFiles based on the Partition Size property where each FlowFile is a query to get a part of the data. You can try this by downloading NiFi 1.0 Beta : https://nifi.apache.org/download.html
... View more
05-22-2017
01:26 PM
@Timothy Spann Were you able to configure TOAD with a kerberized cluster?
... View more
09-07-2017
08:05 PM
I am running spark version 1.6.3 under HDP 2.5.6. What version of magellan should I use to run with this version?
... View more
07-22-2016
09:17 PM
Thanks for the analysis. Does anyone have similiar sizings for Google and Azure:
... View more
07-21-2016
10:13 PM
2 Kudos
Using the GetHTTP Processor we grab random images from the DigitalOcean's Unsplash.it free image site. I give it a random file name so we can save it uniquely in HDFS. The Entire Data Flow from GetHTTP to Final HDFS storage of image and it's metadata as JSON. ExtractMediaMetaData Processor The final results: hdfs dfs -cat /mediametadata/random1469112881039.json
{"Number of Components":"3","Resolution Units":"none","Image Height":"200
pixels","File Name":"apache-tika-3181704319795384377.tmp",
"Data Precision":"8 bits",
"File Modified Date":"Thu Jul 21 14:54:43 UTC 2016","tiff:BitsPerSample":"8",
"Compression Type":"Progressive,Huffman","X-Parsed-By":"org.apache.tika.parser.DefaultParser,
org.apache.tika.parser.jpeg.JpegParser",
"Component 1":"Y component: Quantization table 0, Sampling factors 2 horiz/2vert",
"Component 2":"Cb component: Quantization table 1,Sampling factors 1 horiz/1 vert",
"tiff:ImageLength":"200","mime.type":"image/jpeg","gethttp.remote.source":"unsplash.it",
"Component3":"Cr component: Quantization table 1, Sampling factors 1 horiz/1vert",
"X Resolution":"1 dot",
"FileSize":"4701 bytes","tiff:ImageWidth":"200","path":"./",
"filename":"random1469112881039.jpg","ImageWidth":"200 pixels",
"uuid":"8b7c4f9f-9436-4ccb-b06e-9a720c91f6e0",
"Content-Type":"image/jpeg",
"YResolution":"1 dot"}
We have as many images as we want. Using the Unsplash.it parameters I picked an image width of always 200. You can customize that. Below is the image downloaded with the above metadata.
... View more
Labels:
11-27-2017
01:43 PM
1 Kudo
Hello, Great post need to correct this part :
sudo wget http://download.opensuse.org/repositories/home:/oojah:/mqtt/CentOS_CentOS-6/home:oojah:mqtt.repo sudo cp *.repo /etc/yum.repos.d/ sudo yum -y update sudo yum -y install mosquitto step 1 and 2 are fused. Regards
... View more