Member since
07-08-2013
26
Posts
8
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
14792 | 01-09-2015 07:09 AM | |
10063 | 12-29-2014 09:57 AM | |
3617 | 12-23-2014 05:55 PM |
01-15-2015
01:24 PM
I'm not aware of an option to get it added to the classpath first. In the past when I've had to deploy a patched jar to a core component, I replace the jar file in the lib directory.
... View more
01-09-2015
07:28 AM
Keep in mind that with the MemoryChannel you lose any records in the channel if Flume crashes or the system reboots.
... View more
01-09-2015
07:09 AM
You probably need to adjust the maxFileSize and minimumSpaceRequired settings on the file channel[1]. FWIW, transfering large files with Flume is an anti-pattern. Flume is designed for event/log transport not large file transport. You might want to check out a new Apache project called Apache NiFi[2] that is better suited to large file transfer. There's a quick how-to blog post available here to get you started: http://ingest.tips/2014/12/22/getting-started-with-apache-nifi/ -Joey [1] http://flume.apache.org/FlumeUserGuide.html#file-channel [2] http://nifi.incubator.apache.org
... View more
12-29-2014
09:57 AM
1 Kudo
If you want each file to end up remaining whole, you can use the BlobDeserialzier[1] for the deserializer parameter of the SpoolingDirectorySource[2].: a1.channels = c1
a1.sources = src-1
a1.sources.src-1.type = spooldir
a1.sources.src-1.channels = ch-1
a1.sources.src-1.spoolDir = /var/log/apache/flumeSpool
a1.sources.src-1.fileHeader = true a1.sources.src-1.deserializer = org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder If you need to, set deserialzier.maxBlobLength to the maximum file size you'll be picking up. The default is 100 million bytes. This won't work for very large files as the entire file contents will get buffered into RAM. The File channel is the best option for reliable data flow. If you want the output file to have the same name is the input file, you can set the basenameHeader parameter to true. This will set a header in the flume event called basename. You can customize the name of the header by setting basenameHeaderKey. Then in your sink configuration, you can refer to the header value in the filePrefix with something like this: a1.channels = c1
a1.sinks = k1
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.path = /flume/events/
a1.sinks.k1.hdfs.filePrefix = %{basename}- a1.sinks.k1.hdfs.fileType = DataStream HTH, -Joey [1] http://flume.apache.org/FlumeUserGuide.html#blobdeserializer [2] http://flume.apache.org/FlumeUserGuide.html#spooling-directory-source
... View more
12-23-2014
05:55 PM
1 Kudo
Kite uses the Hive API to drop the table when you tell it to delete the dataset and Hive should take care of dropping the table. Can you check the log of your HiveMetastoreServer to see if there was an error on that side? To get past the error, you can remove the directory by hand.
... View more