Created 10-13-2016 07:25 AM
We installed HDP2.5 and using flume to extract data from ORACLE to HDFS, and then using HIVE EXTERNAL table to query. It seems run no problem. But today we encounter a weird problem:
When we query in HIVE table, it only returned 11 rows while there are 11157 rows in the file.
> select * from humep.BAR_RLTD_INF; OK 2102350FHT10GA000140,02350FHT,21021310068NG6000452,,02131006,,,2016-10-11 00:00:00.0 NULL NULL NULL NULL NULL NULLNULL 2102350FHT10GA000140,02350FHT,21021310068NG6000802,,02131006,,,2016-10-11 00:00:00.0 NULL NULL NULL NULL NULL NULLNULL 2102350FHT10GA000140,02350FHT,090405G02N1Y16715A10,,,,,2016-10-11 00:00:00.0 NULL NULL NULL NULL NULL NULL NULL 2102350MXF10GA000008,02350MXF,031UWV10GA000317,,,,,2016-10-11 00:00:00.0 NULL NULL NULL NULL NULL NULL NULL 2102350MXF10GA000010,02350MXF,031UWV10GA000292,,,,,2016-10-11 00:00:00.0 NULL NULL NULL NULL NULL NULL NULL 210305587010GA000007,03055870,NW900CG,,,,,2016-10-10 00:00:00.0 NULL NULL NULL NULL NULL NULL NULL 210305679210GA000012,03056792,031SKB10G9000172,,,,,2016-10-10 00:00:00.0 NULL NULL NULL NULL NULL NULL NULL 210305679210GA000012,03056792,09340607791H16808011,,,,,2016-10-10 00:00:00.0 NULL NULL NULL NULL NULL NULL NULL 2102350FHT10GA000106,02350FHT,203DB23EB5B9,,,,,2016-10-09 00:00:00.0 NULL NULL NULL NULL NULL NULL NULL 2102350FHT10GA000142,02350FHT,2102350HMT10GA000142,,02350HMT,,,2016-10-11 00:00:00.0 NULL NULL NULL NULL NULL NULLNULL 2102350FHT10GA000142,02350FHT,031URL10G9000698,,,,,2016-10-11 00:00:00.0 NULL NULL NULL NULL NULL NULL NULL Time taken: 0.312 seconds, Fetched: 11 row(s)
And I also found the size of the file only 761 bytes, but in local it is 899533 bytes, and real have 11157 lines
[hadoop@insightcluster137 ~]$ hdfs dfs -ls /user/hadoop/BAR_RLTD_INF Found 1 items -rw-r--r-- 3 hadoop hdfs 761 2016-10-13 14:05 /user/hadoop/BAR_RLTD_INF/bar_rltd_inf.1476338750863.tmp [hadoop@insightcluster137 ~]$ hdfs dfs -du /user/hadoop/BAR_RLTD_INF 761 /user/hadoop/BAR_RLTD_INF/bar_rltd_inf.1476338750863.tmp [hadoop@insightcluster137 ~]$ hdfs dfs -get /user/hadoop/BAR_RLTD_INF/bar_rltd_inf.1476338750863.tmp bar_rltd_inf.1476338750863.tmp [hadoop@insightcluster137 ~]$ hdfs dfs -get /user/hadoop/BAR_RLTD_INF/bar_rltd_inf.1476338750863.tmp bar_rltd_inf.1476338750863.tmp^C [hadoop@insightcluster137 ~]$ ls -l bar_rltd_inf.1476338750863.tmp -rw-r--r-- 1 hadoop hadoop 899533 Oct 13 14:26 bar_rltd_inf.1476338750863.tmp [hadoop@insightcluster137 ~]$ wc -l bar_rltd_inf.1476338750863.tmp 11157 bar_rltd_inf.1476338750863.tmp
Some flume conf are below:
agent.sources.sqlSource.hibernate.connection.provider_class = org.hibernate.connection.C3P0ConnectionProvider agent.sources.sqlSource.hibernate.c3p0.min_size=1 agent.sources.sqlSource.hibernate.c3p0.max_size=12 agent.sinks.hdfssink.type = hdfs agent.sinks.hdfssink.channel = ch8 agent.sinks.hdfssink.hdfs.path = hdfs://insightcluster132.huawei.com:8020/user/hadoop/BAR_RLTD_INF agent.sinks.hdfssink.hdfs.fileType = DataStream agent.sinks.hdfssink.hdfs.filePrefix = bar_rltd_inf agent.sinks.hdfssink.hdfs.rollInterval = 0 agent.sinks.hdfssink.hdfs.rollSize = 0 agent.sinks.hdfssink.hdfs.rollCount = 0 agent.sinks.hdfssink.hdfs.threadsPoolSize = 18 agent.sinks.hdfssink.hdfs.batchSize = 10
I think the problem might in HDFS or FLUME, anyone can help?
Created 10-13-2016 01:55 PM
We solved it by adding following in flume conf
agent.sinks.hdfssink.hdfs.idleTimeout = 300
Created 10-13-2016 01:55 PM
We solved it by adding following in flume conf
agent.sinks.hdfssink.hdfs.idleTimeout = 300