Member since
06-01-2017
87
Posts
11
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
996 | 11-24-2016 12:23 PM | |
872 | 10-13-2016 01:55 PM |
10-31-2016
12:50 PM
1 Kudo
I found sometime, in our cluster, only one query from Tez can used up all resources. Can I limit the containers for Tez query so that there are free resource kept for other sessions?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Tez
10-27-2016
07:34 AM
Hi Artem, Thanks for your quick response. Currently, we consider to add some disks to Type 2 hosts(but it still less that the Type 1 hosts) and then add datanode on Type 2 hosts . Can this make the compute close to data? And the HDFS on both 2 types hosts with different size local file systems, Can this acceptable?
... View more
10-25-2016
12:07 PM
We have two types hosts. Type 1: 64 G memory , 11*900G disks Type 2: 1T memory, 2*500G disks The interconnect network is 10GE. So is following deploy architecture reasonable? Is there any potential risk? Any recommendations? Thanks DATANODE only on Type 1 hosts RegionServer also only on Type 1 hosts NODEMANGER only on Type 2 hosts. others masters are also on Type 2 hosts.
... View more
Labels:
- Labels:
-
Apache Hadoop
10-13-2016
01:55 PM
1 Kudo
We solved it by adding following in flume conf agent.sinks.hdfssink.hdfs.idleTimeout = 300
... View more
10-13-2016
07:25 AM
1 Kudo
We installed HDP2.5 and using flume to extract data from ORACLE to HDFS, and then using HIVE EXTERNAL table to query. It seems run no problem. But today we encounter a weird problem: When we query in HIVE table, it only returned 11 rows while there are 11157 rows in the file. > select * from humep.BAR_RLTD_INF;
OK
2102350FHT10GA000140,02350FHT,21021310068NG6000452,,02131006,,,2016-10-11 00:00:00.0 NULL NULL NULL NULL NULL NULLNULL
2102350FHT10GA000140,02350FHT,21021310068NG6000802,,02131006,,,2016-10-11 00:00:00.0 NULL NULL NULL NULL NULL NULLNULL
2102350FHT10GA000140,02350FHT,090405G02N1Y16715A10,,,,,2016-10-11 00:00:00.0 NULL NULL NULL NULL NULL NULL NULL
2102350MXF10GA000008,02350MXF,031UWV10GA000317,,,,,2016-10-11 00:00:00.0 NULL NULL NULL NULL NULL NULL NULL
2102350MXF10GA000010,02350MXF,031UWV10GA000292,,,,,2016-10-11 00:00:00.0 NULL NULL NULL NULL NULL NULL NULL
210305587010GA000007,03055870,NW900CG,,,,,2016-10-10 00:00:00.0 NULL NULL NULL NULL NULL NULL NULL
210305679210GA000012,03056792,031SKB10G9000172,,,,,2016-10-10 00:00:00.0 NULL NULL NULL NULL NULL NULL NULL
210305679210GA000012,03056792,09340607791H16808011,,,,,2016-10-10 00:00:00.0 NULL NULL NULL NULL NULL NULL NULL
2102350FHT10GA000106,02350FHT,203DB23EB5B9,,,,,2016-10-09 00:00:00.0 NULL NULL NULL NULL NULL NULL NULL
2102350FHT10GA000142,02350FHT,2102350HMT10GA000142,,02350HMT,,,2016-10-11 00:00:00.0 NULL NULL NULL NULL NULL NULLNULL
2102350FHT10GA000142,02350FHT,031URL10G9000698,,,,,2016-10-11 00:00:00.0 NULL NULL NULL NULL NULL NULL NULL
Time taken: 0.312 seconds, Fetched: 11 row(s) And I also found the size of the file only 761 bytes, but in local it is 899533 bytes, and real have 11157 lines [hadoop@insightcluster137 ~]$ hdfs dfs -ls /user/hadoop/BAR_RLTD_INF
Found 1 items
-rw-r--r-- 3 hadoop hdfs 761 2016-10-13 14:05 /user/hadoop/BAR_RLTD_INF/bar_rltd_inf.1476338750863.tmp
[hadoop@insightcluster137 ~]$ hdfs dfs -du /user/hadoop/BAR_RLTD_INF
761 /user/hadoop/BAR_RLTD_INF/bar_rltd_inf.1476338750863.tmp
[hadoop@insightcluster137 ~]$ hdfs dfs -get /user/hadoop/BAR_RLTD_INF/bar_rltd_inf.1476338750863.tmp bar_rltd_inf.1476338750863.tmp
[hadoop@insightcluster137 ~]$ hdfs dfs -get /user/hadoop/BAR_RLTD_INF/bar_rltd_inf.1476338750863.tmp bar_rltd_inf.1476338750863.tmp^C
[hadoop@insightcluster137 ~]$ ls -l bar_rltd_inf.1476338750863.tmp
-rw-r--r-- 1 hadoop hadoop 899533 Oct 13 14:26 bar_rltd_inf.1476338750863.tmp
[hadoop@insightcluster137 ~]$ wc -l bar_rltd_inf.1476338750863.tmp
11157 bar_rltd_inf.1476338750863.tmp Some flume conf are below: agent.sources.sqlSource.hibernate.connection.provider_class = org.hibernate.connection.C3P0ConnectionProvider
agent.sources.sqlSource.hibernate.c3p0.min_size=1
agent.sources.sqlSource.hibernate.c3p0.max_size=12
agent.sinks.hdfssink.type = hdfs
agent.sinks.hdfssink.channel = ch8
agent.sinks.hdfssink.hdfs.path = hdfs://insightcluster132.huawei.com:8020/user/hadoop/BAR_RLTD_INF
agent.sinks.hdfssink.hdfs.fileType = DataStream
agent.sinks.hdfssink.hdfs.filePrefix = bar_rltd_inf
agent.sinks.hdfssink.hdfs.rollInterval = 0
agent.sinks.hdfssink.hdfs.rollSize = 0
agent.sinks.hdfssink.hdfs.rollCount = 0
agent.sinks.hdfssink.hdfs.threadsPoolSize = 18
agent.sinks.hdfssink.hdfs.batchSize = 10
I think the problem might in HDFS or FLUME, anyone can help?
... View more
Labels:
- Labels:
-
Apache Flume
-
Apache Hadoop
-
Apache Hive
10-12-2016
11:34 AM
Can we just copy the /etc/passwd and /etc/group of original hosts to the hosts those to be adding?
... View more
10-09-2016
07:56 AM
First, I setup cluster with 7 hosts. Then I added another 2 hosts which are new installed OS. All done by Ambari UI. Currently, the 9 hosts runs no problem. But I noticed the UID and GID on later 2 hosts are some different with the original 7 hosts. such as: The original 7 hosts: zookeeper:x:1000:1000::/home/zookeeper:/bin/bash
ams:x:1001:1000::/home/ams:/bin/bash
ambari-qa:x:1002:1000::/home/ambari-qa:/bin/bash
hdfs:x:1003:1000::/home/hdfs:/bin/bash
yarn:x:1004:1000::/home/yarn:/bin/bash
mapred:x:1005:1000::/home/mapred:/bin/bash
hbase:x:1006:1000::/home/hbase:/bin/bash
slider:x:1007:990:SLIDER:/var/lib/slider:/bin/bash
hive:x:1008:1000::/home/hive:/bin/bash
oozie:x:1009:1000::/home/oozie:/bin/bash
tez:x:1010:1000::/home/tez:/bin/bash
flume:x:1011:1000::/home/flume:/bin/bash
kafka:x:1012:1000::/home/kafka:/bin/bash
sqoop:x:1013:1000::/home/sqoop:/bin/bash
hcat:x:1014:1000::/home/hcat:/bin/bash
falcon:x:996:986:Falcon:/var/lib/falcon:/bin/bash
zeppelin:x:1015:1000::/home/zeppelin:/bin/bash
livy:x:1016:1000::/home/livy:/bin/bash
spark:x:1017:1000::/home/spark:/bin/bash hadoop:x:1000:zookeeper,ams,hdfs,yarn,mapred,hbase,hive,flume,kafka,sqoop,hcat,zeppelin,livy,spark
hdfs:x:1001:hdfs
zookeeper:x:994:
yarn:x:993:
mapred:x:992:
hbase:x:991:
slider:x:990:
flume:x:989:
hive:x:988:
oozie:x:987:
falcon:x:986:falcon
livy:x:1002:
spark:x:1003:
zeppelin:x:1004: But the later 2 hosts: hive:x:1000:1003::/home/hive:/bin/bash
zookeeper:x:1001:1003::/home/zookeeper:/bin/bash
oozie:x:1002:1003::/home/oozie:/bin/bash
ams:x:1003:1003::/home/ams:/bin/bash
tez:x:1004:1003::/home/tez:/bin/bash
zeppelin:x:1005:1003::/home/zeppelin:/bin/bash
livy:x:1006:1003::/home/livy:/bin/bash
spark:x:1007:1003::/home/spark:/bin/bash
ambari-qa:x:1008:1003::/home/ambari-qa:/bin/bash
flume:x:1009:1003::/home/flume:/bin/bash
kafka:x:1010:1003::/home/kafka:/bin/bash
hdfs:x:1011:1003::/home/hdfs:/bin/bash
sqoop:x:1012:1003::/home/sqoop:/bin/bash
yarn:x:1013:1003::/home/yarn:/bin/bash
mapred:x:1014:1003::/home/mapred:/bin/bash
hbase:x:1015:1003::/home/hbase:/bin/bash
hcat:x:1016:1003::/home/hcat:/bin/bash
falcon:x:996:987:Falcon:/var/lib/falcon:/bin/bash
slider:x:1017:986:SLIDER:/var/lib/slider:/bin/bash livy:x:1000:
spark:x:1001:
zeppelin:x:1002:
hadoop:x:1003:hive,zookeeper,ams,zeppelin,livy,spark,flume,kafka,hdfs,sqoop,yarn,mapred,hbase,hcat
hdfs:x:1004:hdfs
zookeeper:x:994:
yarn:x:993:
mapred:x:992:
flume:x:991:
hbase:x:990:
hive:x:989:
oozie:x:988:
falcon:x:987:falcon
slider:x:986:
sqoop:x:985:
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hadoop
10-09-2016
07:45 AM
Spark 1.6.x.2.5
... View more
09-27-2016
07:03 AM
[root@insightcluster135 /]# sqoop import --connect jdbc:oracle:thin:@10.107.217.161:1521/odw --username ****** --password ******** --query "select * from hw_cpb_relation where LAST_UPDATED_DATE > TO_DATE('2016-09-21 00:00:00', 'YYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN') AND \$CONDITIONS" --target-dir /user/root/mytest --hive-import --hive-table hw_cpb_relation -m 1
Warning: /usr/hdp/2.5.0.0-1245/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/09/26 16:05:22 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.0.0-1245
16/09/26 16:05:22 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/09/26 16:05:22 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
16/09/26 16:05:22 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
16/09/26 16:05:22 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
16/09/26 16:05:22 INFO manager.SqlManager: Using default fetchSize of 1000
16/09/26 16:05:22 INFO tool.CodeGenTool: Beginning code generation
16/09/26 16:05:23 INFO manager.OracleManager: Time zone has been set to GMT
16/09/26 16:05:23 INFO manager.SqlManager: Executing SQL statement: select * from hw_cpb_relation where LAST_UPDATED_DATE > TO_DATE('2016-09-21 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN') AND (1 = 0)
16/09/26 16:05:23 INFO manager.SqlManager: Executing SQL statement: select * from hw_cpb_relation where LAST_UPDATED_DATE > TO_DATE('2016-09-21 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN') AND (1 = 0)
16/09/26 16:05:23 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.5.0.0-1245/hadoop-mapreduce
Note: /tmp/sqoop-root/compile/e319b0ed1331b8c84f27e37105ddd274/QueryResult.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/09/26 16:05:25 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/e319b0ed1331b8c84f27e37105ddd274/QueryResult.jar
16/09/26 16:05:25 INFO mapreduce.ImportJobBase: Beginning query import.
16/09/26 16:05:26 INFO impl.TimelineClientImpl: Timeline service address: http://insightcluster132.aaabbb.com:8188/ws/v1/timeline/
16/09/26 16:05:26 INFO client.RMProxy: Connecting to ResourceManager at insightcluster133.aaabbb.com/202.1.2.133:8050
16/09/26 16:05:26 INFO client.AHSProxy: Connecting to Application History server at insightcluster132.aaabbb.com/202.1.2.132:10200
16/09/26 16:05:28 INFO db.DBInputFormat: Using read commited transaction isolation
16/09/26 16:05:28 INFO mapreduce.JobSubmitter: number of splits:1
16/09/26 16:05:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1474533507895_0038
16/09/26 16:05:28 INFO impl.YarnClientImpl: Submitted application application_1474533507895_0038
16/09/26 16:05:28 INFO mapreduce.Job: The url to track the job: http://InsightCluster133.aaabbb.com:8088/proxy/application_1474533507895_0038/
16/09/26 16:05:28 INFO mapreduce.Job: Running job: job_1474533507895_0038
16/09/26 16:05:34 INFO mapreduce.Job: Job job_1474533507895_0038 running in uber mode : false
16/09/26 16:05:34 INFO mapreduce.Job: map 0% reduce 0%
16/09/26 16:08:02 INFO mapreduce.Job: map 100% reduce 0%
16/09/26 16:08:02 INFO mapreduce.Job: Job job_1474533507895_0038 completed successfully
16/09/26 16:08:02 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=158561
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=4085515414
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=145371
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=145371
Total vcore-milliseconds taken by all map tasks=145371
Total megabyte-milliseconds taken by all map tasks=818729472
Map-Reduce Framework
Map input records=18119433
Map output records=18119433
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=859
CPU time spent (ms)=150120
Physical memory (bytes) snapshot=1008168960
Virtual memory (bytes) snapshot=6935724032
Total committed heap usage (bytes)=983040000
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=4085515414
16/09/26 16:08:02 INFO mapreduce.ImportJobBase: Transferred 3.8049 GB in 156.1788 seconds (24.9474 MB/sec)
16/09/26 16:08:02 INFO mapreduce.ImportJobBase: Retrieved 18119433 records.
16/09/26 16:08:02 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners
16/09/26 16:08:02 INFO manager.OracleManager: Time zone has been set to GMT
16/09/26 16:08:02 INFO manager.SqlManager: Executing SQL statement: select * from hw_cpb_relation where LAST_UPDATED_DATE > TO_DATE('2016-09-21 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN') AND (1 = 0)
16/09/26 16:08:02 INFO manager.SqlManager: Executing SQL statement: select * from hw_cpb_relation where LAST_UPDATED_DATE > TO_DATE('2016-09-21 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN') AND (1 = 0)
16/09/26 16:08:02 WARN hive.TableDefWriter: Column ORGANIZATION_ID had to be cast to a less precise type in Hive
16/09/26 16:08:02 WARN hive.TableDefWriter: Column WIP_ENTITY_ID had to be cast to a less precise type in Hive
16/09/26 16:08:02 WARN hive.TableDefWriter: Column CPB_ITEM_ID had to be cast to a less precise type in Hive
16/09/26 16:08:02 WARN hive.TableDefWriter: Column ZCB_ITEM_ID had to be cast to a less precise type in Hive
16/09/26 16:08:02 WARN hive.TableDefWriter: Column CREATED_BY had to be cast to a less precise type in Hive
16/09/26 16:08:02 WARN hive.TableDefWriter: Column CREATED_DATE had to be cast to a less precise type in Hive
16/09/26 16:08:02 WARN hive.TableDefWriter: Column LAST_UPDATED_BY had to be cast to a less precise type in Hive
16/09/26 16:08:02 WARN hive.TableDefWriter: Column LAST_UPDATED_DATE had to be cast to a less precise type in Hive
16/09/26 16:08:02 WARN hive.TableDefWriter: Column LOAD_BY had to be cast to a less precise type in Hive
16/09/26 16:08:02 WARN hive.TableDefWriter: Column LOAD_DATE had to be cast to a less precise type in Hive
16/09/26 16:08:02 WARN hive.TableDefWriter: Column COLLECT_USER had to be cast to a less precise type in Hive
16/09/26 16:08:02 WARN hive.TableDefWriter: Column CHK_FLAG had to be cast to a less precise type in Hive
16/09/26 16:08:02 INFO hive.HiveImport: Loading uploaded data into Hive Logging initialized using configuration in jar:file:/usr/hdp/2.5.0.0-1245/hive/lib/hive-common-1.2.1000.2.5.0.0-1245.jar!/hive-log4j.properties
OK
Time taken: 1.748 seconds
Loading data to table default.hw_cpb_relation
Failed with exception org.apache.hadoop.security.AccessControlException: User does not belong to hdfs
at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setOwner(FSDirAttrOp.java:88)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setOwner(FSNamesystem.java:1708)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setOwner(NameNodeRpcServer.java:821)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setOwner(ClientNamenodeProtocolServerSideTranslatorPB.java:472)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask ========================================== Is this hitting HIVE bug? https://issues.apache.org/jira/browse/HIVE-13810 If hit, how to fix it in HDP 2.5?
Thanks!
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Sqoop
- « Previous
- Next »