<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question A weird problem about HDFS+flume+HIVE in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/A-weird-problem-about-HDFS-flume-HIVE/m-p/128316#M43400</link>
    <description>&lt;P&gt;We installed HDP2.5 and using flume to extract data from ORACLE to HDFS, and then using HIVE EXTERNAL table to query. It seems run no problem. But today we encounter a weird problem:&lt;/P&gt;&lt;P&gt;When we query in HIVE table, it only returned 11 rows while there are 11157 rows in the file. &lt;/P&gt;&lt;PRE&gt;    &amp;gt; select * from humep.BAR_RLTD_INF;
OK
2102350FHT10GA000140,02350FHT,21021310068NG6000452,,02131006,,,2016-10-11 00:00:00.0    NULL    NULL    NULL    NULL    NULL    NULLNULL
2102350FHT10GA000140,02350FHT,21021310068NG6000802,,02131006,,,2016-10-11 00:00:00.0    NULL    NULL    NULL    NULL    NULL    NULLNULL
2102350FHT10GA000140,02350FHT,090405G02N1Y16715A10,,,,,2016-10-11 00:00:00.0    NULL    NULL    NULL    NULL    NULL    NULL    NULL
2102350MXF10GA000008,02350MXF,031UWV10GA000317,,,,,2016-10-11 00:00:00.0        NULL    NULL    NULL    NULL    NULL    NULL    NULL
2102350MXF10GA000010,02350MXF,031UWV10GA000292,,,,,2016-10-11 00:00:00.0        NULL    NULL    NULL    NULL    NULL    NULL    NULL
210305587010GA000007,03055870,NW900CG,,,,,2016-10-10 00:00:00.0 NULL    NULL    NULL    NULL    NULL    NULL    NULL
210305679210GA000012,03056792,031SKB10G9000172,,,,,2016-10-10 00:00:00.0        NULL    NULL    NULL    NULL    NULL    NULL    NULL
210305679210GA000012,03056792,09340607791H16808011,,,,,2016-10-10 00:00:00.0    NULL    NULL    NULL    NULL    NULL    NULL    NULL
2102350FHT10GA000106,02350FHT,203DB23EB5B9,,,,,2016-10-09 00:00:00.0    NULL    NULL    NULL    NULL    NULL    NULL    NULL
2102350FHT10GA000142,02350FHT,2102350HMT10GA000142,,02350HMT,,,2016-10-11 00:00:00.0    NULL    NULL    NULL    NULL    NULL    NULLNULL
2102350FHT10GA000142,02350FHT,031URL10G9000698,,,,,2016-10-11 00:00:00.0        NULL    NULL    NULL    NULL    NULL    NULL    NULL
Time taken: 0.312 seconds, Fetched: 11 row(s)&lt;/PRE&gt;&lt;P&gt;And I also found the size of the file only 761 bytes, but in local  it is 899533 bytes, and real have 11157 lines&lt;/P&gt;&lt;PRE&gt;[hadoop@insightcluster137 ~]$ hdfs dfs -ls /user/hadoop/BAR_RLTD_INF     
Found 1 items
-rw-r--r--   3 hadoop hdfs        761 2016-10-13 14:05 /user/hadoop/BAR_RLTD_INF/bar_rltd_inf.1476338750863.tmp
[hadoop@insightcluster137 ~]$ hdfs dfs -du  /user/hadoop/BAR_RLTD_INF
761  /user/hadoop/BAR_RLTD_INF/bar_rltd_inf.1476338750863.tmp
[hadoop@insightcluster137 ~]$ hdfs dfs -get /user/hadoop/BAR_RLTD_INF/bar_rltd_inf.1476338750863.tmp bar_rltd_inf.1476338750863.tmp
[hadoop@insightcluster137 ~]$ hdfs dfs -get /user/hadoop/BAR_RLTD_INF/bar_rltd_inf.1476338750863.tmp bar_rltd_inf.1476338750863.tmp^C
[hadoop@insightcluster137 ~]$ ls -l bar_rltd_inf.1476338750863.tmp
-rw-r--r-- 1 hadoop hadoop 899533 Oct 13 14:26 bar_rltd_inf.1476338750863.tmp
[hadoop@insightcluster137 ~]$ wc -l bar_rltd_inf.1476338750863.tmp   
11157 bar_rltd_inf.1476338750863.tmp&lt;/PRE&gt;&lt;P&gt;Some flume conf are below:&lt;/P&gt;&lt;PRE&gt;agent.sources.sqlSource.hibernate.connection.provider_class = org.hibernate.connection.C3P0ConnectionProvider
agent.sources.sqlSource.hibernate.c3p0.min_size=1
agent.sources.sqlSource.hibernate.c3p0.max_size=12
agent.sinks.hdfssink.type = hdfs
agent.sinks.hdfssink.channel = ch8
agent.sinks.hdfssink.hdfs.path = hdfs://insightcluster132.huawei.com:8020/user/hadoop/BAR_RLTD_INF
agent.sinks.hdfssink.hdfs.fileType = DataStream
agent.sinks.hdfssink.hdfs.filePrefix = bar_rltd_inf
agent.sinks.hdfssink.hdfs.rollInterval = 0
agent.sinks.hdfssink.hdfs.rollSize = 0
agent.sinks.hdfssink.hdfs.rollCount = 0
agent.sinks.hdfssink.hdfs.threadsPoolSize = 18
agent.sinks.hdfssink.hdfs.batchSize = 10


&lt;/PRE&gt;&lt;P&gt;I think the problem might in HDFS or FLUME, anyone can help?&lt;/P&gt;</description>
    <pubDate>Thu, 13 Oct 2016 14:25:53 GMT</pubDate>
    <dc:creator>diablo2</dc:creator>
    <dc:date>2016-10-13T14:25:53Z</dc:date>
    <item>
      <title>A weird problem about HDFS+flume+HIVE</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/A-weird-problem-about-HDFS-flume-HIVE/m-p/128316#M43400</link>
      <description>&lt;P&gt;We installed HDP2.5 and using flume to extract data from ORACLE to HDFS, and then using HIVE EXTERNAL table to query. It seems run no problem. But today we encounter a weird problem:&lt;/P&gt;&lt;P&gt;When we query in HIVE table, it only returned 11 rows while there are 11157 rows in the file. &lt;/P&gt;&lt;PRE&gt;    &amp;gt; select * from humep.BAR_RLTD_INF;
OK
2102350FHT10GA000140,02350FHT,21021310068NG6000452,,02131006,,,2016-10-11 00:00:00.0    NULL    NULL    NULL    NULL    NULL    NULLNULL
2102350FHT10GA000140,02350FHT,21021310068NG6000802,,02131006,,,2016-10-11 00:00:00.0    NULL    NULL    NULL    NULL    NULL    NULLNULL
2102350FHT10GA000140,02350FHT,090405G02N1Y16715A10,,,,,2016-10-11 00:00:00.0    NULL    NULL    NULL    NULL    NULL    NULL    NULL
2102350MXF10GA000008,02350MXF,031UWV10GA000317,,,,,2016-10-11 00:00:00.0        NULL    NULL    NULL    NULL    NULL    NULL    NULL
2102350MXF10GA000010,02350MXF,031UWV10GA000292,,,,,2016-10-11 00:00:00.0        NULL    NULL    NULL    NULL    NULL    NULL    NULL
210305587010GA000007,03055870,NW900CG,,,,,2016-10-10 00:00:00.0 NULL    NULL    NULL    NULL    NULL    NULL    NULL
210305679210GA000012,03056792,031SKB10G9000172,,,,,2016-10-10 00:00:00.0        NULL    NULL    NULL    NULL    NULL    NULL    NULL
210305679210GA000012,03056792,09340607791H16808011,,,,,2016-10-10 00:00:00.0    NULL    NULL    NULL    NULL    NULL    NULL    NULL
2102350FHT10GA000106,02350FHT,203DB23EB5B9,,,,,2016-10-09 00:00:00.0    NULL    NULL    NULL    NULL    NULL    NULL    NULL
2102350FHT10GA000142,02350FHT,2102350HMT10GA000142,,02350HMT,,,2016-10-11 00:00:00.0    NULL    NULL    NULL    NULL    NULL    NULLNULL
2102350FHT10GA000142,02350FHT,031URL10G9000698,,,,,2016-10-11 00:00:00.0        NULL    NULL    NULL    NULL    NULL    NULL    NULL
Time taken: 0.312 seconds, Fetched: 11 row(s)&lt;/PRE&gt;&lt;P&gt;And I also found the size of the file only 761 bytes, but in local  it is 899533 bytes, and real have 11157 lines&lt;/P&gt;&lt;PRE&gt;[hadoop@insightcluster137 ~]$ hdfs dfs -ls /user/hadoop/BAR_RLTD_INF     
Found 1 items
-rw-r--r--   3 hadoop hdfs        761 2016-10-13 14:05 /user/hadoop/BAR_RLTD_INF/bar_rltd_inf.1476338750863.tmp
[hadoop@insightcluster137 ~]$ hdfs dfs -du  /user/hadoop/BAR_RLTD_INF
761  /user/hadoop/BAR_RLTD_INF/bar_rltd_inf.1476338750863.tmp
[hadoop@insightcluster137 ~]$ hdfs dfs -get /user/hadoop/BAR_RLTD_INF/bar_rltd_inf.1476338750863.tmp bar_rltd_inf.1476338750863.tmp
[hadoop@insightcluster137 ~]$ hdfs dfs -get /user/hadoop/BAR_RLTD_INF/bar_rltd_inf.1476338750863.tmp bar_rltd_inf.1476338750863.tmp^C
[hadoop@insightcluster137 ~]$ ls -l bar_rltd_inf.1476338750863.tmp
-rw-r--r-- 1 hadoop hadoop 899533 Oct 13 14:26 bar_rltd_inf.1476338750863.tmp
[hadoop@insightcluster137 ~]$ wc -l bar_rltd_inf.1476338750863.tmp   
11157 bar_rltd_inf.1476338750863.tmp&lt;/PRE&gt;&lt;P&gt;Some flume conf are below:&lt;/P&gt;&lt;PRE&gt;agent.sources.sqlSource.hibernate.connection.provider_class = org.hibernate.connection.C3P0ConnectionProvider
agent.sources.sqlSource.hibernate.c3p0.min_size=1
agent.sources.sqlSource.hibernate.c3p0.max_size=12
agent.sinks.hdfssink.type = hdfs
agent.sinks.hdfssink.channel = ch8
agent.sinks.hdfssink.hdfs.path = hdfs://insightcluster132.huawei.com:8020/user/hadoop/BAR_RLTD_INF
agent.sinks.hdfssink.hdfs.fileType = DataStream
agent.sinks.hdfssink.hdfs.filePrefix = bar_rltd_inf
agent.sinks.hdfssink.hdfs.rollInterval = 0
agent.sinks.hdfssink.hdfs.rollSize = 0
agent.sinks.hdfssink.hdfs.rollCount = 0
agent.sinks.hdfssink.hdfs.threadsPoolSize = 18
agent.sinks.hdfssink.hdfs.batchSize = 10


&lt;/PRE&gt;&lt;P&gt;I think the problem might in HDFS or FLUME, anyone can help?&lt;/P&gt;</description>
      <pubDate>Thu, 13 Oct 2016 14:25:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/A-weird-problem-about-HDFS-flume-HIVE/m-p/128316#M43400</guid>
      <dc:creator>diablo2</dc:creator>
      <dc:date>2016-10-13T14:25:53Z</dc:date>
    </item>
    <item>
      <title>Re: A weird problem about HDFS+flume+HIVE</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/A-weird-problem-about-HDFS-flume-HIVE/m-p/128317#M43401</link>
      <description>&lt;P&gt;We solved it by adding following in flume conf&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;agent.sinks.hdfssink.hdfs.idleTimeout = 300&lt;/PRE&gt;</description>
      <pubDate>Thu, 13 Oct 2016 20:55:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/A-weird-problem-about-HDFS-flume-HIVE/m-p/128317#M43401</guid>
      <dc:creator>diablo2</dc:creator>
      <dc:date>2016-10-13T20:55:40Z</dc:date>
    </item>
  </channel>
</rss>

