Support Questions
Find answers, ask questions, and share your expertise

A weird problem about HDFS+flume+HIVE

Rising Star

We installed HDP2.5 and using flume to extract data from ORACLE to HDFS, and then using HIVE EXTERNAL table to query. It seems run no problem. But today we encounter a weird problem:

When we query in HIVE table, it only returned 11 rows while there are 11157 rows in the file.

    > select * from humep.BAR_RLTD_INF;
OK
2102350FHT10GA000140,02350FHT,21021310068NG6000452,,02131006,,,2016-10-11 00:00:00.0    NULL    NULL    NULL    NULL    NULL    NULLNULL
2102350FHT10GA000140,02350FHT,21021310068NG6000802,,02131006,,,2016-10-11 00:00:00.0    NULL    NULL    NULL    NULL    NULL    NULLNULL
2102350FHT10GA000140,02350FHT,090405G02N1Y16715A10,,,,,2016-10-11 00:00:00.0    NULL    NULL    NULL    NULL    NULL    NULL    NULL
2102350MXF10GA000008,02350MXF,031UWV10GA000317,,,,,2016-10-11 00:00:00.0        NULL    NULL    NULL    NULL    NULL    NULL    NULL
2102350MXF10GA000010,02350MXF,031UWV10GA000292,,,,,2016-10-11 00:00:00.0        NULL    NULL    NULL    NULL    NULL    NULL    NULL
210305587010GA000007,03055870,NW900CG,,,,,2016-10-10 00:00:00.0 NULL    NULL    NULL    NULL    NULL    NULL    NULL
210305679210GA000012,03056792,031SKB10G9000172,,,,,2016-10-10 00:00:00.0        NULL    NULL    NULL    NULL    NULL    NULL    NULL
210305679210GA000012,03056792,09340607791H16808011,,,,,2016-10-10 00:00:00.0    NULL    NULL    NULL    NULL    NULL    NULL    NULL
2102350FHT10GA000106,02350FHT,203DB23EB5B9,,,,,2016-10-09 00:00:00.0    NULL    NULL    NULL    NULL    NULL    NULL    NULL
2102350FHT10GA000142,02350FHT,2102350HMT10GA000142,,02350HMT,,,2016-10-11 00:00:00.0    NULL    NULL    NULL    NULL    NULL    NULLNULL
2102350FHT10GA000142,02350FHT,031URL10G9000698,,,,,2016-10-11 00:00:00.0        NULL    NULL    NULL    NULL    NULL    NULL    NULL
Time taken: 0.312 seconds, Fetched: 11 row(s)

And I also found the size of the file only 761 bytes, but in local it is 899533 bytes, and real have 11157 lines

[hadoop@insightcluster137 ~]$ hdfs dfs -ls /user/hadoop/BAR_RLTD_INF     
Found 1 items
-rw-r--r--   3 hadoop hdfs        761 2016-10-13 14:05 /user/hadoop/BAR_RLTD_INF/bar_rltd_inf.1476338750863.tmp
[hadoop@insightcluster137 ~]$ hdfs dfs -du  /user/hadoop/BAR_RLTD_INF
761  /user/hadoop/BAR_RLTD_INF/bar_rltd_inf.1476338750863.tmp
[hadoop@insightcluster137 ~]$ hdfs dfs -get /user/hadoop/BAR_RLTD_INF/bar_rltd_inf.1476338750863.tmp bar_rltd_inf.1476338750863.tmp
[hadoop@insightcluster137 ~]$ hdfs dfs -get /user/hadoop/BAR_RLTD_INF/bar_rltd_inf.1476338750863.tmp bar_rltd_inf.1476338750863.tmp^C
[hadoop@insightcluster137 ~]$ ls -l bar_rltd_inf.1476338750863.tmp
-rw-r--r-- 1 hadoop hadoop 899533 Oct 13 14:26 bar_rltd_inf.1476338750863.tmp
[hadoop@insightcluster137 ~]$ wc -l bar_rltd_inf.1476338750863.tmp   
11157 bar_rltd_inf.1476338750863.tmp

Some flume conf are below:

agent.sources.sqlSource.hibernate.connection.provider_class = org.hibernate.connection.C3P0ConnectionProvider
agent.sources.sqlSource.hibernate.c3p0.min_size=1
agent.sources.sqlSource.hibernate.c3p0.max_size=12
agent.sinks.hdfssink.type = hdfs
agent.sinks.hdfssink.channel = ch8
agent.sinks.hdfssink.hdfs.path = hdfs://insightcluster132.huawei.com:8020/user/hadoop/BAR_RLTD_INF
agent.sinks.hdfssink.hdfs.fileType = DataStream
agent.sinks.hdfssink.hdfs.filePrefix = bar_rltd_inf
agent.sinks.hdfssink.hdfs.rollInterval = 0
agent.sinks.hdfssink.hdfs.rollSize = 0
agent.sinks.hdfssink.hdfs.rollCount = 0
agent.sinks.hdfssink.hdfs.threadsPoolSize = 18
agent.sinks.hdfssink.hdfs.batchSize = 10


I think the problem might in HDFS or FLUME, anyone can help?

1 ACCEPTED SOLUTION

Accepted Solutions

Rising Star

We solved it by adding following in flume conf

agent.sinks.hdfssink.hdfs.idleTimeout = 300

View solution in original post

1 REPLY 1

Rising Star

We solved it by adding following in flume conf

agent.sinks.hdfssink.hdfs.idleTimeout = 300

View solution in original post