Member since
09-29-2015
25
Posts
7
Kudos Received
6
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1469 | 03-23-2018 04:18 PM | |
961 | 12-11-2017 06:37 PM | |
2476 | 07-17-2017 10:10 PM | |
1139 | 05-11-2017 06:22 PM | |
5584 | 05-03-2017 06:08 PM |
04-28-2017
12:07 AM
Affected versions: 1.2.0, 1.2.1 Symptoms: Bundle upload fails with following
error in hst-server.log: 10 Aug 2016 16:56:17,131 INFO [pool-1-thread-1] GatewayHttpsClient:219
- Executing Request:POST https://gateway0host.gateway.com:9451/api/v1/upload/bundle
HTTP/1.110 Aug 2016 16:56:17,142 ERROR
[pool-1-thread-1] GatewayHttpsClient:241 - SSL error occurred. Cleaning up
certificates/keys at client side is required.10 Aug 2016 16:56:17,142 WARN [pool-1-thread-1] UploadRunnable:106 -
SSL error occurred while connecting to gateway. Reseting local keys and
certificates. Reason: Support for using md5
algorithm to setup 2-way SSL was stopped in JDK 8. We had md5 as our default
algorithm in SmartSense 1.2.1 and lower. This causes conflict in SSL setup
hst-server jdk version is lower than JDK 8. Solution: This happens when HST is java
version 1.6 and Gateway is on version 1.8. This is fixed in 1.2.2. Upgrade HST
to version higher than 1.2.1. Upgrading Gateway mostly will not cause any impact because it provides backward compatibility with respoect to HST Server. If, SmartSense upgrade is not possible then try
to sync up the JDK versions between Gateway host and HST Server host. Similar
issue can occur in 1.2.1 if the hst-agents and hst-server have similar mismatch
in installed JDK versions. More details and steps to fix this manually are
provided here : https://community.hortonworks.com/articles/24928/smartsense-agent-dying.html
... View more
Labels:
04-28-2017
12:02 AM
Affected versions: 1.2.x, 1.3.0 Symptoms: Standalone Gateway is up and
running. But still there is a failure
pop up when trying to upload a bundle. Error details are provided in logs below: ##hst-server.log Caused by:
com.hortonworks.smartsense.gateway.client.GatewayClientException:
java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1
column 7
at com.hortonworks.smartsense.gateway.client.GatewayHttpsClient.uploadBundle(GatewayHttpsClient.java:249)
at com.hortonworks.support.tools.gateway.UploadRunnable.uploadBundle(UploadRunnable.java:100)... 12 more
Caused by: com.google.gson.JsonSyntaxException:
java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1
column 7
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:176)
at com.google.gson.Gson.fromJson(Gson.java:795)
at com.google.gson.Gson.fromJson(Gson.java:761)
at com.google.gson.Gson.fromJson(Gson.java:710)
at com.google.gson.Gson.fromJson(Gson.java:682)
at com.hortonworks.smartsense.gateway.client.security.ClientCertificateManager.reqSignCert(ClientCertificateManager.java:159)
at com.hortonworks.smartsense.gateway.client.security.ClientCertificateManager.checkSecuritySetup(ClientCertificateManager.java:190)
at com.hortonworks.smartsense.gateway.client.https.HttpsClientFactory.verifyTwoWaySSLSetup(HttpsClientFactory.java:80)
at com.hortonworks.smartsense.gateway.client.https.HttpsClientFactory.getSSLHttpClient(HttpsClientFactory.java:114)
at com.hortonworks.smartsense.gateway.client.https.HttpsClientFactory.getHttpsClient(HttpsClientFactory.java:69)
at com.hortonworks.smartsense.gateway.client.GatewayHttpsClient.uploadBundle(GatewayHttpsClient.java:190)... 13 more
Caused by:
java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1
column 7
at com.google.gson.stream.JsonReader.expect(JsonReader.java:339) ##hst-gateway.log 17 Oct 2016 14:13:39,820 INFO
qtp1898155970-67 GatewaySecurityFilter:71 - Filtering
https://770745-hadoopmaster03.abcxyz.com:9450/certs/770745-hadoopmaster03.abcxyz.com
for security purposes17 Oct 2016 14:13:39,821 WARN
qtp1898155970-67 GatewaySecurityFilter:125 - Request
https://770745-hadoopmaster03.abcxyz.com:9450/certs/770745-hadoopmaster03.abcxyz.com
doesn't match any pattern.17 Oct 2016 14:13:39,821 WARN
qtp1898155970-67 GatewaySecurityFilter:77 - This request is not allowed on this
port:
https://770745-hadoopmaster03.abcxyz.com:9450/certs/770745-hadoopmaster03.abcxyz.com ##Popup while clicking on
"upload". Unable to connect to the
on-premise SmartSense Gateway (host: 770745-hadoopmaster03.abcxyz.com, port:
9451). Please verify the gateway.host and gateway.port properties in the HST
Server configurations are set correctly and the on-premise SmartSense Gateway
is up and running. Reason: Gateway had a filter which would
reject any requests if the host url started with a number. Solution: The pattern matcher for host in
the url is fixed in 1.3.1. Upgrade to 1.3.1 or rename the gateway host to remove the preceding
numbers and restart the gateway. SmartSense 1.3.1 upgrade documents : http://docs.hortonworks.com/HDPDocuments/SS1/SmartSense-1.3.1/bk_installation/content/upgrade_scenarios.html
... View more
Labels:
04-27-2017
11:55 PM
1 Kudo
Affected versions: SmartSense 1.3.x Symptoms: Gateway start fails with following
exception in hst-gateway.log : INFO 2016-10-21 12:32:42,881
hst-gateway.py:520 - SmartSense Gateway is not running... INFO 2016-10-21 12:32:42,882
hst-gateway.py:539 - Running server: ['/bin/sh', '-c', 'ulimit -n 10000;
/usr/java/default/bin/java -server -XX:NewRatio=3 -XX:+UseConcMarkSweepGC
-XX:-UseGCOverheadLimit -XX:CMSInitiatingOccupancyFraction=60 -Dlog.file.name=hst-gateway.log
-Xms512m -Xmx1024m -cp /etc/hst/conf:/usr/hdp/share/hst/hst-common/lib/*
com.hortonworks.smartsense.gateway.server.GatewayServer
>/var/log/hst/hst-gateway.out 2>&1 &']
INFO 2016-10-21 12:32:42,884 hst-gateway.py:542 - Executing Gateway Server start command
21 Oct 2016 12:32:43,505 DEBUG main GatewayConfigValidator:40 - Found supported transfer type:HTTPS
21 Oct 2016 12:32:43,507 INFO main GatewayConfigValidator:67 - Found configurations to be used for connectivity.
21 Oct 2016 12:32:43,509 INFO main GatewayValidator:149 - Trying to reach: smartsense-dev.hortonworks.com:443
21 Oct 2016 12:32:43,667 ERROR main GatewayValidator:93 - Failed to reach SmartSense landing zone host:smartsense-dev.hortonworks.com, port:443
21 Oct 2016 12:32:43,667 INFO main GatewayValidator:149 - Trying to reach: hortonworks.com:80
21 Oct 2016 12:32:43,678 ERROR main GatewayValidator:100 - Failed to reach hortonworks.com:80.
21 Oct 2016 12:32:43,679 ERROR main GatewayServer:78 - Error occured during Gateway Server Setup. Unable to start Gateway Server. com.hortonworks.smartsense.gateway.GatewayException:
Unable to reach hortonworks.com:80. Please verify outbound connectivity on Gateway.
at com.hortonworks.smartsense.gateway.server.GatewayValidator.validateReachable(GatewayValidator.java:102)
at
com.hortonworks.smartsense.gateway.server.GatewayValidator.validateSetup(GatewayValidator.java:65)at com.hortonworks.smartsense.gateway.server.GatewayServer.run(GatewayServer.java:99)
at com.hortonworks.smartsense.gateway.server.GatewayServer.main(GatewayServer.java:76)
Caused by:
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at com.hortonworks.smartsense.gateway.server.GatewayValidator.isReachable(GatewayValidator.java:150)
at com.hortonworks.smartsense.gateway.server.GatewayValidator.validateReachable(GatewayValidator.java:90)... 3 more Reason: There is a gateway validation
process that checks for outbound connectivity during start up. This validation
test was trying an outbound socket connectivity check without passing through
the proxy and failing. Solution: Fixed in SmartSense 1.4.0. Please upgrade to
this version and restart Gateway to automatically fix the issue. Gateway supports backward compatibility where HST 1.2.x or HST 1.3.x can use the Gateway 1.4.0 to upload bundles. Otherwise, for
1.3.x set the property in /etc/hst/conf/hst-gateway.ini in gateway section and
restart Gateway: start.validation.enabled=false
... View more
Labels:
04-27-2017
11:50 PM
Affected versions:
SmartSense 1.2.x, 1.3.x with Ambari lower than 2.4.1
Symptoms:
Users / Groups are pre-created before setting up the cluster.
As such, ignore_groupsusers_create needs to be set to true so groups are not modified when services are added.
This works fine for all components except smartsense which fails with error performing usermod :
resource_management.core.exceptions.Fail:
Execution of 'usermod -G -adp-hdfs -g adp-hadoop adp-xxxx' returned 6. usermod: group 'adp-hdfs' does not exist
This group does NOT exist.
Note, within the stdout you do actually see the following message:
Skipping creation of User and
Group as host is sys prepped or ignore_groupusers_create flag is on
Reason:
The “ignore_groupsusers_create” configurations was not considered while adding / installing SmartSense service.
Solution:
New fix is available in Ambari 2.4.2.6-5 or higher. Upgrade is the cleanest solution in this case.
... View more
Labels:
04-27-2017
11:42 PM
Affected versions: All SmartSense versions Symptoms: YARN and MR dashboards work fine
but there is no data for HDFS dashboard. Activity Analyzer logs are available
at /var/log/smartsense-activity/activity-analyzer.log. These logs will shows an
error stack trace similar to something as provided below:
ERROR [pool-14-thread-1] ActivityManager:89 - Failed to process
activity id /hadoop/hdfs/namenode/current/fsimage_0009281 of type HDFScom.hortonworks.smartsense.activity.ActivityException: Failed to process activity. Activity Type: HDFS; ActivityId:
/hadoop/hdfs/namenode/current/fsimage_0009281; ActivityData: null
at com.hortonworks.smartsense.activity.hdfs.HDFSImageProcessor.processActivity(HDFSImageProcessor.java:104)
at com.hortonworks.smartsense.activity.ActivityManager$1.call(ActivityManager.java:80)
at com.hortonworks.smartsense.activity.ActivityManager$1.call(ActivityManager.java:73)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)Caused by: com.hortonworks.smartsense.activity.ActivityException: Error while getting fsImage iterator from path /hadoop/hdfs/namenode/current/fsimage_0009281
at org.apache.hadoop.hdfs.tools.offlineImageViewer.PBImageParser.getIterator(PBImageParser.java:118)
at com.hortonworks.smartsense.activity.hdfs.HDFSImageProcessor.processActivity(HDFSImageProcessor.java:88) ... 6 more
Caused by: java.io.IOException: Unsupported layout version -63 at org.apache.hadoop.hdfs.server.namenode.FSImageUtil.loadSummary(FSImageUtil.java:75) Reason: Activity Analyzer maintains all
the dashboard in a new schema created in the same database as setup by Ambari
Metrics Collector. It also utilizes the clients for HDFS available in the AMS
libraries. This issue can happen if Activity Analyzer is used with older
Hadoop/MapReduce/YARN libraries from AMS. Solution: Verify all the jars available
under /usr/lib/ambari-metrics-collector and remove the older versions. Restart
the Activity Analyzer after removing the older jars. Examples of older version jars: /usr/lib/ambari-metrics-collector/hadoop-annotations-2.6.0.2.2.0.0-2041.jar
/usr/lib/ambari-metrics-collector/hadoop-auth-2.6.0.2.2.0.0-2041.jar
/usr/lib/ambari-metrics-collector/hadoop-client-2.6.0.2.2.0.0-2041.jar
/usr/lib/ambari-metrics-collector/hadoop-common-2.6.0.2.2.0.0-2041.jar
/usr/lib/ambari-metrics-collector/hadoop-hdfs-2.6.0.2.2.0.0-2041.jar
/usr/lib/ambari-metrics-collector/hadoop-mapreduce-client-app-2.6.0.2.2.0.0-2041.jar
/usr/lib/ambari-metrics-collector/hadoop-mapreduce-client-common-2.6.0.2.2.0.0-2041.jar
/usr/lib/ambari-metrics-collector/hadoop-mapreduce-client-core-2.6.0.2.2.1.0-2340.jar
/usr/lib/ambari-metrics-collector/hadoop-mapreduce-client-jobclient-2.6.0.2.2.0.0-2041.jar
/usr/lib/ambari-metrics-collector/hadoop-mapreduce-client-shuffle-2.6.0.2.2.0.0-2041.jar
/usr/lib/ambari-metrics-collector/hadoop-yarn-api-2.6.0.2.2.0.0-2041.jar
/usr/lib/ambari-metrics-collector/hadoop-yarn-client-2.6.0.2.2.0.0-2041.jar
/usr/lib/ambari-metrics-collector/hadoop-yarn-common-2.6.0.2.2.0.0-2041.jar
/usr/lib/ambari-metrics-collector/hadoop-yarn-registry-2.6.0.2.2.0.0-2041.jar
... View more
Labels:
03-22-2017
12:35 AM
@Bruce Perez, I thought you wanted to use ip-10-69-161-179.dir.svc.accenture.comds.dev.accenture.com as your hostname. But based on hosts file and network config , it looks like sandbox.hortonworks.com is the intended hostname. So , revert the change I had earlier recommended in /etc/hosts. Main problem is multiple hostname configured for the same hosts. So , the incorrect host should probably be deleted. What do you see in when you click on Hosts on the top center of ambari view ? Do you see sandbox.hortonworks.com as your hostname ? If yes then, go ahead and use the delete command provided below. replace username, password and clustername. Here is the command to delete using api : curl -u [USERNAME]:[PASSWORD] -H "X-Requested-By: ambari" -X DELETE http://10.69.161.179:8080/api/v1/clusters/[CLUSTERNAME]/hosts/ip-10-69-161-179.dir.svc.accenture.comds.dev.accenture.com Then stop and restart HST Agent as provided in screenshot earlier.
... View more
03-21-2017
11:38 PM
The correct content to put in /etc/hosts is : 127.0.0.1 ip-10-69-161-179.dir.svc.accenture.comds.dev.accenture.com After that verify with command hostname -f. It should show the rectified hostname. Then stop and restart HST Agent as follows: agent-operation.png
... View more
03-21-2017
04:52 PM
@Bruce Perez, 1. Can you please provide Ambari and SmartSense version ? Do you have any custom script to identify hostnames for Ambari ? 2. Pls execute this command and share the output : hst list-agents we need "hostname -f" to be registered smartsense agent and it should also match with ambari-agent.
... View more
07-01-2016
10:20 AM
Decimal declaration format is decimal(precision,scale). Precision is the total number of digits including the digits after the decimal point. So , if you want a number for example 1001.23 the the decimal declaration should be decimal(6,2). Or decimal(4,2) for your given example of 15.56.
... View more
06-29-2016
07:35 PM
1 Kudo
Hive Insert Query Optimization Some business users deeply analyze their data profile,
especially skewness across partitions.There are many other tuning parameters to optimize inserts
like tez parallelism, manually changing reduce tasks (not recommended), setting
reduce tasks etc.This article focuses on insert query tuning to give more
control over handling partitions with no need to tweak any of these properties. Consider a scenario where we are inserting 3 million records
across 200 partitions. The file format is ORC and it is further bucketed and sorted for quicker
retrieval during select. Apart from writing, bucketing/sorting is going to add additional work for the
reducer. Target Table create table if not exists final.final_data_2 (
creation_timestamp bigint,
creator string,
deletion_timestamp bigint,
deletor string,
subject string,
predicate string,
object string,
language_code string )
partitioned by(range_partition bigint)
CLUSTERED BY(creation_timestamp) SORTED BY(creation_timestamp) INTO 8 BUCKETS
stored as ORC;
Simple Insert Query hive> set hive.exec.dynamic.partition=true;
hive> set hive.exec.dynamic.partition.mode=nonstrict;
hive> insert overwrite table final_data_1 partition (range_partition) select creation_timestamp, creator, deletion_timestamp, deletor, subject, predicate, object, language_code, floor(creation_timestamp/1000000000) as range_partition from staging.staging_data;
Query ID = hdfs_20160629110841_fd2ee9ed-b36f-417e-ad6d-d76b45cda15d
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 15 15 0 0 0 0
Reducer 2 ...... SUCCEEDED 8 8 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 138.84 s
--------------------------------------------------------------------------------
Loading data to table final.final_data_1 partition (range_partition=null)
Time taken for load dynamic partitions : 100940
Loading partition {range_partition=1299}
Loading partition {range_partition=1247}
Loading partition {range_partition=1304}
Loading partition {range_partition=1343}
Loading partition {range_partition=1235}
Loading partition {range_partition=1170}
.
.
.
.
.
Partition final.final_data_1{range_partition=1360} stats: [numFiles=8, numRows=95800, totalSize=2168211, rawDataSize=60300725]
Partition final.final_data_1{range_partition=1361} stats: [numFiles=8, numRows=5916, totalSize=173888, rawDataSize=3611249]
Partition final.final_data_1{range_partition=1362} stats: [numFiles=8, numRows=20602, totalSize=819304, rawDataSize=13403465]
Partition final.final_data_1{range_partition=1363} stats: [numFiles=8, numRows=25376, totalSize=767015, rawDataSize=16242356]
Partition final.final_data_1{range_partition=1364} stats: [numFiles=8, numRows=33810, totalSize=901617, rawDataSize=21328693]
OK
Time taken: 298.047 seconds
The above query required 15 mappers based on the splits
identified while selecting from the source. 1 reducers per bucket is assigned and each reducer works on
corresponding bucket file across each of the 200 partitions. Reducer is clearly a bottleneck in such cases. The case
selected in this particular example is relatively small containing 3 millions records
and around 300 MB data on a 3 node cluster. This problem is more pronounced at
higher number of rows and byte size. Optimization Please refer the modified query below which leverages
distributing the work to more number of reducers.
The partitions for each insert-query should be selected so
that inserts are equally distributed. for ex: If partitioned by date and amounts of records are increasing
every year :
query 1 : complete year 2000 to 2005 query 2 : complete year 2005 to 2007 query 3 : year 2008 to 2009 query 4 : year 2010 query 5 : 6 months and so on. The bucket numbers can be identified by calculating each
bucket file size marginally lower than 1 split size. Optimized Insert Query hive> set hive.exec.dynamic.partition=true;
hive> set hive.exec.dynamic.partition.mode=nonstrict;
hive> from staging.staging_view stg
> insert into final_data_2 partition(range_partition) select stg.creation_timestamp, stg.creator,
> stg.deletion_timestamp, stg.deletor, stg.subject,
> stg.predicate, stg.object, stg.language_code, stg.range_partition
> where range_partition between 1161 and 1180
> insert into final_data_2 partition(range_partition) select stg.creation_timestamp, stg.creator,
> stg.deletion_timestamp, stg.deletor, stg.subject,
> stg.predicate, stg.object, stg.language_code, stg.range_partition
> where range_partition between 1181 and 1200
> insert into final_data_2 partition(range_partition) select stg.creation_timestamp, stg.creator,
> stg.deletion_timestamp, stg.deletor, stg.subject,
> stg.predicate, stg.object, stg.language_code, stg.range_partition
> where range_partition between 1201 and 1220
> insert into final_data_2 partition(range_partition) select stg.creation_timestamp, stg.creator,
> stg.deletion_timestamp, stg.deletor, stg.subject,
> stg.predicate, stg.object, stg.language_code, stg.range_partition
> where range_partition between 1221 and 1240
> insert into final_data_2 partition(range_partition) select stg.creation_timestamp, stg.creator,
> stg.deletion_timestamp, stg.deletor, stg.subject,
> stg.predicate, stg.object, stg.language_code, stg.range_partition
> where range_partition between 1241 and 1260
> insert into final_data_2 partition(range_partition) select stg.creation_timestamp, stg.creator,
> stg.deletion_timestamp, stg.deletor, stg.subject,
> stg.predicate, stg.object, stg.language_code, stg.range_partition
> where range_partition between 1261 and 1280
> insert into final_data_2 partition(range_partition) select stg.creation_timestamp, stg.creator,
> stg.deletion_timestamp, stg.deletor, stg.subject,
> stg.predicate, stg.object, stg.language_code, stg.range_partition
> where range_partition between 1281 and 1300
> insert into final_data_2 partition(range_partition) select stg.creation_timestamp, stg.creator,
> stg.deletion_timestamp, stg.deletor, stg.subject,
> stg.predicate, stg.object, stg.language_code, stg.range_partition
> where range_partition between 1301 and 1320
> insert into final_data_2 partition(range_partition) select stg.creation_timestamp, stg.creator,
> stg.deletion_timestamp, stg.deletor, stg.subject,
> stg.predicate, stg.object, stg.language_code, stg.range_partition
> where range_partition between 1321 and 1340
> insert into final_data_2 partition(range_partition) select stg.creation_timestamp, stg.creator,
> stg.deletion_timestamp, stg.deletor, stg.subject,
> stg.predicate, stg.object, stg.language_code, stg.range_partition
> where range_partition between 1341 and 1360
> insert into final_data_2 partition(range_partition) select stg.creation_timestamp, stg.creator,
> stg.deletion_timestamp, stg.deletor, stg.subject,
> stg.predicate, stg.object, stg.language_code, stg.range_partition
> where range_partition between 1361 and 1380;
Query ID = hdfs_20160629130923_ae533e87-3621-4773-8ed9-9d53a1cc857a
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1466522743023_0015)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 15 15 0 0 0 0
Reducer 10 . RUNNING 8 2 0 6 0 0
Reducer 11 ... RUNNING 8 5 0 3 0 0
Reducer 12 . RUNNING 8 2 0 6 0 0
Reducer 2 . RUNNING 8 2 0 6 0 0
Reducer 3 ... RUNNING 8 5 0 3 0 0
Reducer 4 ... RUNNING 8 5 0 3 0 0
Reducer 5 ... RUNNING 8 5 0 3 0 0
Reducer 6 .. RUNNING 8 3 2 3 0 0
Reducer 7 ... RUNNING 8 5 0 3 0 0
Reducer 8 . RUNNING 8 2 3 3 0 0
Reducer 9 . RUNNING 8 2 3 3 0 0
--------------------------------------------------------------------------------
VERTICES: 01/12 [=============>>-------------] 51% ELAPSED TIME: 94.33 s
--------------------------------------------------------------------------------
You can see here that 8 reducers are assigned per query. So, each reducer is working on sorting
and writing only 20 files.
Status: Running (Executing on YARN cluster with App id application_1466522743023_0015)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 15 15 0 0 0 0
Reducer 10 ..... SUCCEEDED 8 8 0 0 0 0
Reducer 11 ..... SUCCEEDED 8 8 0 0 0 0
Reducer 12 ..... SUCCEEDED 8 8 0 0 0 0
Reducer 2 ...... SUCCEEDED 8 8 0 0 0 0
Reducer 3 ...... SUCCEEDED 8 8 0 0 0 0
Reducer 4 ...... SUCCEEDED 8 8 0 0 0 0
Reducer 5 ...... SUCCEEDED 8 8 0 0 0 0
Reducer 6 ...... SUCCEEDED 8 8 0 0 0 0
Reducer 7 ...... SUCCEEDED 8 8 0 0 0 0
Reducer 8 ...... SUCCEEDED 8 8 0 0 0 0
Reducer 9 ...... SUCCEEDED 8 8 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 12/12 [==========================>>] 100% ELAPSED TIME: 129.25 s
--------------------------------------------------------------------------------
Loading data to table final.final_data_2 partition (range_partition=null)
Time taken for load dynamic partitions : 9291
Loading partition {range_partition=1178}
Loading partition {range_partition=1174}
Loading partition {range_partition=1163}
Loading partition {range_partition=1165}
Loading partition {range_partition=1172}
Loading partition {range_partition=1176}
Loading partition {range_partition=1179}
Loading partition {range_partition=1166}
Loading partition {range_partition=1175}
Loading partition {range_partition=1177}
Loading partition {range_partition=1167}
Loading partition {range_partition=1180}
Time taken for adding to write entity : 8
Loading data to table final.final_data_2 partition (range_partition=null)
Time taken for load dynamic partitions : 9718
.
.
.
.
.
.
.
.
.
.
.
.
OK
Time taken: 269.01 seconds
hive>
Here we see some improvements from 298 seconds to 269
seconds but the user will have to try it on his particular case to identify the exact impact. This process of parallelizing inserts is not new and usually
designed to insert into multiple table. But the same
modification can help us gain more control over reducers without explicitly
setting the reducer task
number property. The number of reducer tasks increases but each task runs for
much lesser time than just 8
reducers. Note : Increasing
reducers may not always increase performance. It will only if the reducers are
bottleneck. Task slot utilization
analysis will point out whether there are sufficient task slots available to leverage further breaking
it into more reducers.
... View more
Labels:
- « Previous
-
- 1
- 2
- Next »