1973
Posts
1225
Kudos Received
124
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1770 | 04-03-2024 06:39 AM | |
2756 | 01-12-2024 08:19 AM | |
1529 | 12-07-2023 01:49 PM | |
2283 | 08-02-2023 07:30 AM | |
3121 | 03-29-2023 01:22 PM |
10-01-2016
11:13 PM
2 Kudos
I ran the same flow myself and examined the AVRO file in HDFS using AVRO Cli. Even though I didn't specify SNAPPY compression, it was there in the file. java -jar avro-tools-1.8.0.jar getmeta 23568764174290.avro
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
avro.codec snappyavro.schema {"type":"record","name":"people","doc":"Schema generated by Kite","fields":[{"name":"id","type":"long","doc":"Type inferred from '2'"},{"name":"first_name","type":"string","doc":"Type inferred from 'Gregory'"},{"name":"last_name","type":"string","doc":"Type inferred from 'Vasquez'"},{"name":"email","type":"string","doc":"Type inferred from 'gvasquez1@pcworld.com'"},{"name":"gender","type":"string","doc":"Type inferred from 'Male'"},{"name":"ip_address","type":"string","doc":"Type inferred from '32.8.254.252'"},{"name":"company_name","type":"string","doc":"Type inferred from 'Janyx'"},{"name":"domain_name","type":"string","doc":"Type inferred from 'free.fr'"},{"name":"file_name","type":"string","doc":"Type inferred from 'NonMauris.xls'"},{"name":"mac_address","type":"string","doc":"Type inferred from '03-FB-66-0F-20-A3'"},{"name":"user_agent","type":"string","doc":"Type inferred from '\"Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_7;'"},{"name":"lat","type":"string","doc":"Type inferred from ' like Gecko) Version/5.0.4 Safari/533.20.27\"'"},{"name":"long","type":"double","doc":"Type inferred from '26.98829'"}]} It's hard coded in NIFI. https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-kite-bundle/nifi-kite-processors/src/main/java/org/apache/nifi/processors/kite/ConvertCSVToAvro.java It always adds SnappyCompression to every AVRO file. No options. writer.setCodec(CodecFactory.snappyCodec()); Make sure you have a schema set: Record schema Record Schema: ${inferred.avro.schema} If you can make everything Strings and convert to other types later, you will be happier. References: https://www.linkedin.com/pulse/converting-csv-avro-apache-nifi-jeremy-dyer https://community.hortonworks.com/questions/44063/nifi-avro-to-csv-or-json-to-csvnifi-convert-avro-t.html https://community.hortonworks.com/articles/28341/converting-csv-to-avro-with-apache-nifi.html
... View more
Labels:
09-30-2016
10:40 PM
Here's the simple zeppelin file. twitter-from-strata-hadoop-processing.txt
Rename that as .JSON. For security, don't upload/download are working with .JS or .JSON fies.
... View more
03-02-2017
03:38 PM
Hi i have 5 separate queues for 5 different processors, everytime i'm going to each processor and clearing the each queue its taking me lot of time, is there any way to clear all the queue's at same time ? please help me with this thanks Ravi
... View more
09-28-2016
02:20 AM
@Timothy Spann yes that code was written for storm 0.10. Now I am trying to test that for 1.0.1. Updated the Pom with necessary storm and Kafka versions and added guava dependency as suggested in the above link. Still I am getting build errors
... View more
08-02-2019
02:25 PM
@Riccardo Iacomini Thank you for the great post! This is very helpful. Here I am wondering how you batch things together like having many csv rows instead of one csv row. Because if we want to batch csv row into multiple rows, we use MergeContent processor, but you also mention that MergeContent is costly. So how batch processing will work on Nifi??
... View more
09-26-2016
06:05 PM
6 Kudos
@Arkaprova Saha It depends on you feel about yourself and your future. If you consider yourself a software engineer that has solid Java background and wants to deliver highly optimized and scalable software products based on Spark then you may want to focus more on Scala. If you are more focused on data wrangling, discovery and analysis, short-term use focused studies, or to resolve business problems as quick as possible then Python is awesome. Python has such a large community and code snippets, applications etc. Don't get me wrong, but Python could also be used to deliver enterprise-level applications, but it is more often to use Java and Scala for highly optimized. Python has some culprits, which we will not debate here. Anyhow, I would say that Python is kind of a MUST HAVE and Scala is NICE TO HAVE. Obviously, this is my 2c and I would be amazed that any of these responses in this thread is the ANSWER.
... View more
09-26-2016
06:24 PM
Thanks Andy. I clearly understand the concern around security confidence levels, and don't put it out as a solution. Rather a workaround to let the devs move forward. This isn't an official solution by any means, and everyone should understand that in a thread.
... View more
10-11-2016
08:41 PM
TensorFlow 0.11 is out export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.11.0rc0-cp27-none-linux_x86_64.whl
... View more
09-22-2016
09:45 AM
Hi @Timothy Spann and @Jasper , I found the cause of issue now. The issue was I was not putting colon (: ) between port(2181) and hbase tablespace(hbase-unsecure) in spark-shell properly while loading the table. - Earlier I was loading the table in spark-shell as below, which was giving me no Table found error. val jdbcDF = sqlContext.read.format("jdbc").options( Map( "driver" -> "org.apache.phoenix.jdbc.PhoenixDriver", "url" -> "jdbc:phoenix:<host>:2181/hbase-unsecure", "dbtable" -> "TEST_TABLE2") ).load() - But now after putting colon ( : ) between port(2181) number andhbase tablespace (hbase-unsecure). I am able to load table. val jdbcDF = sqlContext.read.format("jdbc").options( Map( "driver" -> "org.apache.phoenix.jdbc.PhoenixDriver", "url" -> "jdbc:phoenix:<host>:2181:/hbase-unsecure", "dbtable" -> "TEST_TABLE2") ).load()
... View more
02-04-2017
08:06 PM
This solved my issue. In my case, the ambari database is a postgresql Database.
... View more