Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How can you use Sequence Files that Flume Wrote to HDFS?

How can you use Sequence Files that Flume Wrote to HDFS?

I have recently become familiar with Sequence Files and written code to create some.  Typically, sequence files imply that there is a java class, implementing the Writable interface, that is used to parse/serialize each record.  So I noticed that Flume's HDFS Sink writes records to HDFS in Sequence File format by default. 

 

My question is this: how can we use those sequence files?  We don't have any java class to unmarshall those records in the sequence file.  What java class does Flume use to marshall its records into a sequence file?  Is it simply the Text class?  If so, then what is the key and value?

7 REPLIES 7
Highlighted

Re: How can you use Sequence Files that Flume Wrote to HDFS?

Master Guru
Flume will use LongWritable and BytesWritable classes for serialising
the SequenceFile data if the HDFS Sink format type is SequenceFile
format. This can be seen at
https://github.com/cloudera/flume-ng/blob/cdh4.5.0-release/flume-ng-sinks/flume-hdfs-sink/src/main/j...

Highlighted

Re: How can you use Sequence Files that Flume Wrote to HDFS?

Explorer

Hi Harsh,

 

How can I read this using hive? I get error java.io.BytesWritable cannot be cast to java.io.Text.

 

Hive thinks it is text instead of BytesWritable. Do you have any serde for the same.


@Harsh J wrote:
Flume will use LongWritable and BytesWritable classes for serialising
the SequenceFile data if the HDFS Sink format type is SequenceFile
format. This can be seen at
https://github.com/cloudera/flume-ng/blob/cdh4.5.0-release/flume-ng-sinks/flume-hdfs-sink/src/main/j...


 

Highlighted

Re: How can you use Sequence Files that Flume Wrote to HDFS?

Explorer

HI Michael,

 

How can I read this using hive? I get error java.io.BytesWritable cannot be cast to java.io.Text.

 

Hive thinks it is text instead of BytesWritable. Do you have any serde for the same.

Highlighted

Re: How can you use Sequence Files that Flume Wrote to HDFS?

Super Collaborator
Can you please provide the flume configuration you are using? That may shed some light on if there is an incorrect configuration causing the issue.

-pd
Highlighted

Re: How can you use Sequence Files that Flume Wrote to HDFS?

Explorer

# Define an SQS source called sqs-source-mobile-createcache on mccagent.
# Connect it to channel ch-mobile-createcache.
mccagent.sources.sqs-source-mobile-createcache.channels = ch-mobile-createcache
mccagent.sources.sqs-source-mobile-createcache.type = com.plumbee.flume.source.sqs.SQSSource
mccagent.sources.sqs-source-mobile-createcache.url = http://10.5.194.79:9876/SQSActiveMQ/queue/mobile-createcache-hdfs
mccagent.sources.sqs-source-mobile-createcache.recvBatchSize = 1
mccagent.sources.sqs-source-mobile-createcache.deleteBatchSize = 1
mccagent.sources.sqs-source-mobile-createcache.nbThreads = 2
mccagent.sources.sqs-source-mobile-createcache.awsAccessKeyId = x
mccagent.sources.sqs-source-mobile-createcache.awsSecretKey = x
mccagent.sources.sqs-source-mobile-createcache.batchSize = 1
mccagent.sources.sqs-source-mobile-createcache.flushInterval = 100
mccagent.sources.sqs-source-mobile-createcache.maxBackOffSleep = 99

mccagent.sources.sqs-source-mobile-createcache.interceptors = addHost addTimestamp
mccagent.sources.sqs-source-mobile-createcache.interceptors.addHost.type = org.apache.flume.interceptor.HostInterceptor$Builder
mccagent.sources.sqs-source-mobile-createcache.interceptors.addHost.preserveExisting = false
mccagent.sources.sqs-source-mobile-createcache.interceptors.addHost.useIP = false
mccagent.sources.sqs-source-mobile-createcache.interceptors.addHost.hostHeader = host
mccagent.sources.sqs-source-mobile-createcache.interceptors.addTimestamp.type = org.apache.flume.interceptor.TimestampInterceptor$Builder


# Define a memory channel called ch-mobile-createcache on agent

mccagent.channels.ch-mobile-createcache.type = file
mccagent.channels.ch-mobile-createcache.dataDirs = /home/leaprun/flume_channel/mobile-createcache/data
mccagent.channels.ch-mobile-createcache.checkpointDir = /home/leaprun/flume_channel/mobile-createcache/checkpoint


# Define a logger sink that simply logs all events it receives
# and connect it to the other end of the same channel.

mccagent.sinks.sink-mobile-createcache1.channel = ch-mobile-createcache
mccagent.sinks.sink-mobile-createcache1.type = hdfs
mccagent.sinks.sink-mobile-createcache1.hdfs.path = hdfs://namenode-ha/mobility/network/leap/edwdev_base/mobile-createcache/gen_date=%Y-%m-%d/
mccagent.sinks.sink-mobile-createcache1.hdfs.filePrefix = dart-records-mobile-createcache
# Number of seconds to wait before rolling current file (0 = never roll based on time interval)
mccagent.sinks.sink-mobile-createcache1.hdfs.rollInterval = 0
# File size to trigger roll, in bytes (0: never roll based on file size)
mccagent.sinks.sink-mobile-createcache1.hdfs.rollSize = 134217728
mccagent.sinks.sink-mobile-createcache1.hdfs.batchSize = 1
#Number of events written to file before it rolled (0 = never roll based on number of events)
mccagent.sinks.sink-mobile-createcache1.hdfs.rollCount = 0
# Timeout after which inactive files get closed (0 = disable automatic closing of idle files) * in seconds!
mccagent.sinks.sink-mobile-createcache1.hdfs.idleTimeout = 120
mccagent.sinks.sink-mobile-createcache1.hdfs.fileType = SequenceFile
#mccagent.sinks.sink-mobile-createcache1.hdfs.writeFormat=Text
mccagent.sinks.sink-mobile-createcache1.hdfs.codeC = bzip2
#mccagent.sinks.sink-mobile-createcache1.hdfs.callTimeout = 1200000

# Finally, now that we've defined all of our components, tell
# agent which ones we want to activate.

mccagent.sources = sqs-source-mobile-createcache
mccagent.sinks=sink-mobile-createcache1
mccagent.channels = ch-mobile-createcache

 

Please note I have commented out writeFormat as "Text" and I wanted it that way only. You can drop me mail at my personal id m.aikansh@gmail.com where I can share with you the output file as well.

Highlighted

Re: How can you use Sequence Files that Flume Wrote to HDFS?

Super Collaborator

Can you please provide your hive table definition?  Are you using the 'stored as sequencefile' in the creation?

 

-pd

Highlighted

Re: How can you use Sequence Files that Flume Wrote to HDFS?

Explorer

 

I created my own serde to read the same.

Don't have an account?
Coming from Hortonworks? Activate your account here