1973
Posts
1225
Kudos Received
124
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1841 | 04-03-2024 06:39 AM | |
| 2859 | 01-12-2024 08:19 AM | |
| 1581 | 12-07-2023 01:49 PM | |
| 2344 | 08-02-2023 07:30 AM | |
| 3231 | 03-29-2023 01:22 PM |
07-22-2016
09:17 PM
Thanks for the analysis. Does anyone have similiar sizings for Google and Azure:
... View more
07-22-2016
02:32 PM
Are you on a sandbox? do you have access to / from the box? Did you get an error? It should show up in the UI. Is it always timeout. Can NiFi access anything?
... View more
07-22-2016
01:54 PM
1 Kudo
What are the best instance types for HDP nodes (master, data, edge)? I found a number of instance types that may work. Looking at TCO https://awstcocalculator.com/ Rough Pricing https://calculator.s3.amazonaws.com/index.html Amazon Types http://www.ec2instances.info/ Amazon EC2 Types https://aws.amazon.com/ec2/instance-types/
... View more
Labels:
- Labels:
-
Hortonworks Data Platform (HDP)
07-21-2016
10:13 PM
2 Kudos
Using the GetHTTP Processor we grab random images from the DigitalOcean's Unsplash.it free image site. I give it a random file name so we can save it uniquely in HDFS. The Entire Data Flow from GetHTTP to Final HDFS storage of image and it's metadata as JSON. ExtractMediaMetaData Processor The final results: hdfs dfs -cat /mediametadata/random1469112881039.json
{"Number of Components":"3","Resolution Units":"none","Image Height":"200
pixels","File Name":"apache-tika-3181704319795384377.tmp",
"Data Precision":"8 bits",
"File Modified Date":"Thu Jul 21 14:54:43 UTC 2016","tiff:BitsPerSample":"8",
"Compression Type":"Progressive,Huffman","X-Parsed-By":"org.apache.tika.parser.DefaultParser,
org.apache.tika.parser.jpeg.JpegParser",
"Component 1":"Y component: Quantization table 0, Sampling factors 2 horiz/2vert",
"Component 2":"Cb component: Quantization table 1,Sampling factors 1 horiz/1 vert",
"tiff:ImageLength":"200","mime.type":"image/jpeg","gethttp.remote.source":"unsplash.it",
"Component3":"Cr component: Quantization table 1, Sampling factors 1 horiz/1vert",
"X Resolution":"1 dot",
"FileSize":"4701 bytes","tiff:ImageWidth":"200","path":"./",
"filename":"random1469112881039.jpg","ImageWidth":"200 pixels",
"uuid":"8b7c4f9f-9436-4ccb-b06e-9a720c91f6e0",
"Content-Type":"image/jpeg",
"YResolution":"1 dot"}
We have as many images as we want. Using the Unsplash.it parameters I picked an image width of always 200. You can customize that. Below is the image downloaded with the above metadata.
... View more
Labels:
07-21-2016
02:18 AM
8 Kudos
In Apache NiFi 1.2, there are processors to Get and Put data to an MQTT broker, which is popular in IoT because of it's small footprint and speed. MQTT is supported by Eclipse and IBM. I created an example on the HDP 2.6. I downloaded and installed the latest Apache NiFi 1.2 there as well as an example MQTT Broker (Mosquitto) http://mosquitto.org/. To Install Mosquitto on HDP 2.6 (Centos 7.x) sudo wget http://download.opensuse.org/repositories/home:/oojah:/mqtt/CentOS_CentOS-6/home:oojah:mqtt.reposudo cp *.repo /etc/yum.repos.d/
sudo yum -y update
sudo yum -y install mosquitto To Verify the Settings and Prepare Logs [root@sandbox opt]# cat /etc/mosquitto/mosquitto.conf
# Place your local configuration in /etc/mosquitto/conf.d/
pid_file /var/run/mosquitto.pid
persistence true
persistence_location /var/lib/mosquitto/
#log_dest file /var/log/mosquitto/mosquitto.log
include_dir /etc/mosquitto/conf.d
[root@sandbox opt]# vi /etc/mosquitto/mosquitto.conf
[root@sandbox opt]# mkdir -p /var/log/mosquitto
[root@sandbox opt]# chmod 777 /var/log/mosquitto/
[root@sandbox opt]# touch /var/log/mosquitto/mosquitto.log
[root@sandbox opt]# chmod 777 /var/log/mosquitto/
Run MQTT Broker Server mosquitto -d The default port for MQTT and Mosquitto is 1883. Make sure that port is not blocked by Firewalls, Virus software and if one the sandbox it is exposed. Running Mosquitto on Sandbox NiFi PublishMQTT NiFi ConsumeMQTT
After Running [root@sandbox demo]# hdfs dfs -ls /mqtt
root hdfs 2783 2016-07-20 14:56 /mqtt/37115929161818
root hdfs 2805 2016-07-20 14:56 /mqtt/37115930927495
ConsumeMQTT Publish MQTT Resources:
http://mosquitto.org/man/mosquitto-8.html http://ceit.uq.edu.au/content/mqtt-and-growl http://growl.info/ http://www.eclipse.org/paho/
... View more
Labels:
07-20-2016
06:13 PM
2 Kudos
I am looking for the best option for in-memory computing, fast data. The most recent data we have (current, 5 minutes, 1 hours, < 1 day) we need to have access to as fast as possible. It's probably 500G or less. Something like Pivotal's Butterfly Architecture. What will work best for keeping some of this fast data? I have been looking at Apache Geode, Apache Ignite, Alluxio, SnappyData, Redis, HDFS Ram Data Nodes, HBase In-Memory Column Families, Kafka, Spark Streaming. Any baked solutions out there that work with HDP?
... View more
Labels:
- Labels:
-
Apache Hadoop
07-20-2016
03:44 PM
I had catalog and schema name and then left them off. I tried a few options. twitter is a table in default hive database SelectHiveQL is working fine
... View more
07-20-2016
03:29 PM
i set unmatched columns to ignore i tried true and false on field names
... View more
07-20-2016
03:11 PM
1 Kudo
Is there anything special to get this to work? Hive Table create table
twitter(
id int,
handle string,
hashtags string,
msg string,
time string,
user_name string,
tweet_id string,
unixtime string,
uuid string
) stored as orc
tblproperties ("orc.compress"="ZLIB");
Data is paired down tweet: {
"user_name" : "Tweet Person",
"time" : "Wed Jul 20 15:09:42 +0000 2016",
"unixtime" : "1469027382664",
"handle" : "SomeTweeter",
"tweet_id" : "755781737674932224",
"hashtags" : "",
"msg" : "RT some stuff"
}
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache NiFi
07-20-2016
03:45 AM
true is spelled wrong
... View more