Reply
Explorer
Posts: 16
Registered: ‎09-25-2014

Flume HDFS sink error: "unknown protocol: hdfs"

Hi, I am using flume with an HDFS sink but am getting an error: "unknown protocol: hdfs"

4:24:09.423 PM    INFO    org.apache.flume.sink.hdfs.BucketWriter    
Creating /user/888/datasets/actions_timeseries/year=2014/month=05/day=05/hour=15/minute=00/FlumeData.1411662249337.tmp
4:24:10.282 PM    WARN    org.apache.hadoop.security.UserGroupInformation    
PriviledgedActionException as:hdfs (auth:PROXY) via flume (auth:SIMPLE) cause:java.net.MalformedURLException: unknown protocol: hdfs
4:24:10.282 PM    WARN    org.apache.flume.sink.hdfs.BucketWriter    
Caught IOException writing to HDFSWriter (unknown protocol: hdfs). Closing file (/user/888/datasets/actions_timeseries/year=2014/month=05/day=05/hour=15/minute=00/FlumeData.1411662249337.tmp) and rethrowing exception.

It does create a file in HDFS but no data is written to it.

I am using CDH-5.0.2-1.cdh5.0.2.p0.13 which has flume 1.4.0-cdh5.0.2.

This used to work on my CDH4 cluster.

 

Any help would be great,

 

Thanks,

James

My full agent config is:

streamingingest.channels = mem-channel
streamingingest.sources = listener
streamingingest.sinks = user-dataset

streamingingest.channels.mem-channel.type = memory
streamingingest.channels.mem-channel.capacity = 10000000
streamingingest.channels.mem-channel.transactionCapacity = 1000

streamingingest.sources.listener.type = avro
streamingingest.sources.listener.channels = mem-channel
streamingingest.sources.listener.bind = 0.0.0.0
streamingingest.sources.listener.port = 41415

# attach the schema to the record, convert it to avro
streamingingest.sources.listener.interceptors = attach-schema morphline

# add the schema for our record sink
streamingingest.sources.listener.interceptors.attach-schema.type = static
streamingingest.sources.listener.interceptors.attach-schema.key = flume.avro.schema.url
streamingingest.sources.listener.interceptors.attach-schema.value = hdfs:/user/888/datasets/actions_timeseries/.metadata/schema.avsc

# morphline interceptor config
streamingingest.sources.listener.interceptors.morphline.type = org.apache.flume.sink.solr.morphline.MorphlineInterceptor$Builder
streamingingest.sources.listener.interceptors.morphline.morphlineFile = /etc/flume-ng/conf/user/888/morphline.conf
streamingingest.sources.listener.interceptors.morphline.morphlineId = SomeCommand

# store the something in the Dataset
streamingingest.sinks.user-dataset.type = hdfs
streamingingest.sinks.user-dataset.channel = mem-channel
# the partitioned directories must match the dataset's partition strategy
streamingingest.sinks.user-dataset.hdfs.path = /user/888/datasets/actions_timeseries/year=%{kite.partition.year}/month=%{kite.partition.month}/day=%{kite.partition.day}/hour=%{kite.partition.hour}/minute=%{kite.partition.minute}

streamingingest.sinks.user-dataset.hdfs.batchSize = 500
streamingingest.sinks.user-dataset.hdfs.fileType = DataStream
streamingingest.sinks.user-dataset.hdfs.proxyUser = hdfs
streamingingest.sinks.user-dataset.serializer = org.apache.flume.sink.hdfs.AvroEventSerializer$Builder
streamingingest.sinks.user-dataset.serializer.compressionCodec = snappy

Explorer
Posts: 16
Registered: ‎09-25-2014

Re: Flume HDFS sink error: "unknown protocol: hdfs"

It was this bit:

streamingingest.sources.listener.interceptors.attach-schema.value = hdfs:/user/888/datasets/actions_timeseries/.metadata/schema.avsc

 

Changed it to file:/etc/flume-ng/conf/user/888/streaming/schema.avs cand it works again. That must have been changed since it was working on CDH4.

 

Does anyone know if you can load these files from HDFS instead of having to get them onto each agents local HDD?

 

Cheers

James

Announcements
New solutions