Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

A weird problem about HDFS+flume+HIVE

avatar
Expert Contributor

We installed HDP2.5 and using flume to extract data from ORACLE to HDFS, and then using HIVE EXTERNAL table to query. It seems run no problem. But today we encounter a weird problem:

When we query in HIVE table, it only returned 11 rows while there are 11157 rows in the file.

    > select * from humep.BAR_RLTD_INF;
OK
2102350FHT10GA000140,02350FHT,21021310068NG6000452,,02131006,,,2016-10-11 00:00:00.0    NULL    NULL    NULL    NULL    NULL    NULLNULL
2102350FHT10GA000140,02350FHT,21021310068NG6000802,,02131006,,,2016-10-11 00:00:00.0    NULL    NULL    NULL    NULL    NULL    NULLNULL
2102350FHT10GA000140,02350FHT,090405G02N1Y16715A10,,,,,2016-10-11 00:00:00.0    NULL    NULL    NULL    NULL    NULL    NULL    NULL
2102350MXF10GA000008,02350MXF,031UWV10GA000317,,,,,2016-10-11 00:00:00.0        NULL    NULL    NULL    NULL    NULL    NULL    NULL
2102350MXF10GA000010,02350MXF,031UWV10GA000292,,,,,2016-10-11 00:00:00.0        NULL    NULL    NULL    NULL    NULL    NULL    NULL
210305587010GA000007,03055870,NW900CG,,,,,2016-10-10 00:00:00.0 NULL    NULL    NULL    NULL    NULL    NULL    NULL
210305679210GA000012,03056792,031SKB10G9000172,,,,,2016-10-10 00:00:00.0        NULL    NULL    NULL    NULL    NULL    NULL    NULL
210305679210GA000012,03056792,09340607791H16808011,,,,,2016-10-10 00:00:00.0    NULL    NULL    NULL    NULL    NULL    NULL    NULL
2102350FHT10GA000106,02350FHT,203DB23EB5B9,,,,,2016-10-09 00:00:00.0    NULL    NULL    NULL    NULL    NULL    NULL    NULL
2102350FHT10GA000142,02350FHT,2102350HMT10GA000142,,02350HMT,,,2016-10-11 00:00:00.0    NULL    NULL    NULL    NULL    NULL    NULLNULL
2102350FHT10GA000142,02350FHT,031URL10G9000698,,,,,2016-10-11 00:00:00.0        NULL    NULL    NULL    NULL    NULL    NULL    NULL
Time taken: 0.312 seconds, Fetched: 11 row(s)

And I also found the size of the file only 761 bytes, but in local it is 899533 bytes, and real have 11157 lines

[hadoop@insightcluster137 ~]$ hdfs dfs -ls /user/hadoop/BAR_RLTD_INF     
Found 1 items
-rw-r--r--   3 hadoop hdfs        761 2016-10-13 14:05 /user/hadoop/BAR_RLTD_INF/bar_rltd_inf.1476338750863.tmp
[hadoop@insightcluster137 ~]$ hdfs dfs -du  /user/hadoop/BAR_RLTD_INF
761  /user/hadoop/BAR_RLTD_INF/bar_rltd_inf.1476338750863.tmp
[hadoop@insightcluster137 ~]$ hdfs dfs -get /user/hadoop/BAR_RLTD_INF/bar_rltd_inf.1476338750863.tmp bar_rltd_inf.1476338750863.tmp
[hadoop@insightcluster137 ~]$ hdfs dfs -get /user/hadoop/BAR_RLTD_INF/bar_rltd_inf.1476338750863.tmp bar_rltd_inf.1476338750863.tmp^C
[hadoop@insightcluster137 ~]$ ls -l bar_rltd_inf.1476338750863.tmp
-rw-r--r-- 1 hadoop hadoop 899533 Oct 13 14:26 bar_rltd_inf.1476338750863.tmp
[hadoop@insightcluster137 ~]$ wc -l bar_rltd_inf.1476338750863.tmp   
11157 bar_rltd_inf.1476338750863.tmp

Some flume conf are below:

agent.sources.sqlSource.hibernate.connection.provider_class = org.hibernate.connection.C3P0ConnectionProvider
agent.sources.sqlSource.hibernate.c3p0.min_size=1
agent.sources.sqlSource.hibernate.c3p0.max_size=12
agent.sinks.hdfssink.type = hdfs
agent.sinks.hdfssink.channel = ch8
agent.sinks.hdfssink.hdfs.path = hdfs://insightcluster132.huawei.com:8020/user/hadoop/BAR_RLTD_INF
agent.sinks.hdfssink.hdfs.fileType = DataStream
agent.sinks.hdfssink.hdfs.filePrefix = bar_rltd_inf
agent.sinks.hdfssink.hdfs.rollInterval = 0
agent.sinks.hdfssink.hdfs.rollSize = 0
agent.sinks.hdfssink.hdfs.rollCount = 0
agent.sinks.hdfssink.hdfs.threadsPoolSize = 18
agent.sinks.hdfssink.hdfs.batchSize = 10


I think the problem might in HDFS or FLUME, anyone can help?

1 ACCEPTED SOLUTION

avatar
Expert Contributor

We solved it by adding following in flume conf

agent.sinks.hdfssink.hdfs.idleTimeout = 300

View solution in original post

1 REPLY 1

avatar
Expert Contributor

We solved it by adding following in flume conf

agent.sinks.hdfssink.hdfs.idleTimeout = 300