Reply
Explorer
Posts: 78
Registered: ‎11-12-2015

Flume Hive Sink

Hello, I'm trying to use the Hive Sink, but it throws me a parse error:

2016-02-08 11:52:25,165 WARN org.apache.hive.hcatalog.data.JsonSerDe: Error [java.io.IOException: Field name expected] parsing json text [{"client":"cyt","product":"sepsyslog","type":"syslog","host":"datos01","path":"/datos/logs/clientes/cyt/antivirus/syslog/sep_cyt_syslog.log","logsource":"SymantecServer","sepm_name":"192.168.200.130","syslog_severity_code":5,"syslog_facility_code":1,"syslog_severity":"notice","facility_label":"user-level","tags":["scm_system"],"site_name":"sepmanager","sepm_server":"192.168.200.130","domain_sepm":"Default","event_description":"The client has downloaded GUP list","computer":"mjeria","user_name":"mjeria","domain_name":"CYT.CONCHAYTORO.CL","date":"2016-01-11T12:01:28.000Z","version":"1"}].
2016-02-08 11:52:25,165 INFO org.apache.flume.sink.hive.HiveWriter: Parse failed : Unable to convert byte[] record into Object  : {"client":"cyt","product":"sepsyslog","type":"syslog","host":"datos01","path":"/datos/logs/clientes/cyt/antivirus/syslog/sep_cyt_syslog.log","logsource":"SymantecServer","sepm_name":"192.168.200.130","syslog_severity_code":5,"syslog_facility_code":1,"syslog_severity":"notice","facility_label":"user-level","tags":["scm_system"],"site_name":"sepmanager","sepm_server":"192.168.200.130","domain_sepm":"Default","event_description":"The client has downloaded GUP list","computer":"mjeria","user_name":"mjeria","domain_name":"CYT.CONCHAYTORO.CL","date":"2016-01-11T12:01:28.000Z","version":"1"}

I'm using JSON serializer, this is my Sink conf:

flume1.sinks.hdfs-sink-1.type = hive
flume1.sinks.hdfs-sink-1.hive.metastore = thrift://master2:9083
flume1.sinks.hdfs-sink-1.hive.database = sepsyslog
flume1.sinks.hdfs-sink-1.hive.partition = date
flume1.sinks.hdfs-sink-1.hive.table = sepsyslog_cyt_2016_01
flume1.sinks.hdfs-sink-1.useLocalTimeStamp = false
flume1.sinks.hdfs-sink-1.serializer = JSON

The hive table always have more fields than the JSON message, because the messages fields are changing, so the table has all the posible fields.

 

Regards, 

Posts: 1,508
Kudos: 260
Solutions: 230
Registered: ‎07-31-2013

Re: Flume Hive Sink

What is your defined column datatype for tags column, in the Hive table? Could you share the full DESCRIBE output for sepsyslog_cyt_2016_01?
Backline Customer Operations Engineer
Explorer
Posts: 78
Registered: ‎11-12-2015

Re: Flume Hive Sink

All the columns are set all to Strings. I solve the problem structuring the JSON logs, by adding empty strings to those fields that are not  in the log but are in the table. 

New Contributor
Posts: 1
Registered: ‎05-17-2016

Re: Flume Hive Sink

Hi everyone,

I have a problem with Flume Hive Sink in my CDH 5.7.1.

 

If I use this sink config in flume:

# Use a Hive Sink
a1.sinks.k1.type = hive
a1.sinks.k1.hive.metastore = thrift://localhost:7432
a1.sinks.k1.hive.database = hive_flume
a1.sinks.k1.hive.table = test
a1.sinks.k1.serializer = DELIMITED
a1.sinks.k1.serializer.delimiter = "\t"

Flume doesn't start and give me this error:

2016-09-01 12:14:53,254 INFO org.apache.flume.sink.DefaultSinkFactory: Creating instance of sink: k1, type: hive
2016-09-01 12:14:53,270 ERROR org.apache.flume.node.PollingPropertiesFileConfigurationProvider: Failed to start agent because dependencies were not found in classpath. Error follows.
java.lang.NoClassDefFoundError: org/apache/hive/hcatalog/streaming/RecordWriter
       	at org.apache.flume.sink.hive.HiveSink.createSerializer(HiveSink.java:220)
       	at org.apache.flume.sink.hive.HiveSink.configure(HiveSink.java:203)
       	at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
       	at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:413)
       	at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:98)
       	at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
       	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
       	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
       	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
       	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
       	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
       	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.hive.hcatalog.streaming.RecordWriter
       	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
       	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
       	at java.security.AccessController.doPrivileged(Native Method)
       	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
       	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
       	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
       	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
       	... 13 more

It seems to be a classpath problem, so i added this config in the Cloudera Manager:

-Djava.library.path=/opt/cloudera/parcels/CDH/lib/hive-hcatalog/share/hcatalog:/opt/cloudera/parcels/CDH/lib/hive/lib

 

but I did not have good results.

Do you have any ideas for my problem?

 

Thanks!

New Contributor
Posts: 1
Registered: ‎09-09-2016

Re: Flume Hive Sink

Hi,

 

I had the same issue and maybe this could be considered as a bit of a hack but I added into the agent configuration in CM at "Agent Environment Advanced Configuration Snippet (Safety Valve)" the following lines

 

HCAT_HOME=/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hive-hcatalog
HIVE_HOME=/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hive

 

for upgrade purposes it might be better to add it like this

 

HCAT_HOME=/opt/cloudera/parcels/CDH/lib/hive-hcatalog
HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive

 

Hope this helps.

New Contributor
Posts: 2
Registered: ‎07-20-2017

Re: Flume Hive Sink

Hi,I also want to use Flume hive sink ,but I have encounter errors.Can you give your configuration about hive table and flume configuration? Thanks a lot!

Announcements