About apathak

apathak · ‎04-28-2017

Affected versions: 1.2.0, 1.2.1 Symptoms: Bundle upload fails with following error in hst-server.log: 10 Aug 2016 16:56:17,131 INFO [pool-1-thread-1] GatewayHttpsClient:219 - Executing Request:POST https://gateway0host.gateway.com:9451/api/v1/upload/bundle HTTP/1.110 Aug 2016 16:56:17,142 ERROR [pool-1-thread-1] GatewayHttpsClient:241 - SSL error occurred. Cleaning up certificates/keys at client side is required.10 Aug 2016 16:56:17,142 WARN [pool-1-thread-1] UploadRunnable:106 - SSL error occurred while connecting to gateway. Reseting local keys and certificates. Reason: Support for using md5 algorithm to setup 2-way SSL was stopped in JDK 8. We had md5 as our default algorithm in SmartSense 1.2.1 and lower. This causes conflict in SSL setup hst-server jdk version is lower than JDK 8. Solution: This happens when HST is java version 1.6 and Gateway is on version 1.8. This is fixed in 1.2.2. Upgrade HST to version higher than 1.2.1. Upgrading Gateway mostly will not cause any impact because it provides backward compatibility with respoect to HST Server. If, SmartSense upgrade is not possible then try to sync up the JDK versions between Gateway host and HST Server host. Similar issue can occur in 1.2.1 if the hst-agents and hst-server have similar mismatch in installed JDK versions. More details and steps to fix this manually are provided here : https://community.hortonworks.com/articles/24928/smartsense-agent-dying.html

apathak · ‎04-28-2017

Affected versions: 1.2.x, 1.3.0 Symptoms: Standalone Gateway is up and running. But still there is a failure pop up when trying to upload a bundle. Error details are provided in logs below: ##hst-server.log Caused by: com.hortonworks.smartsense.gateway.client.GatewayClientException: java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 7 at com.hortonworks.smartsense.gateway.client.GatewayHttpsClient.uploadBundle(GatewayHttpsClient.java:249) at com.hortonworks.support.tools.gateway.UploadRunnable.uploadBundle(UploadRunnable.java:100)... 12 more Caused by: com.google.gson.JsonSyntaxException: java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 7 at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:176) at com.google.gson.Gson.fromJson(Gson.java:795) at com.google.gson.Gson.fromJson(Gson.java:761) at com.google.gson.Gson.fromJson(Gson.java:710) at com.google.gson.Gson.fromJson(Gson.java:682) at com.hortonworks.smartsense.gateway.client.security.ClientCertificateManager.reqSignCert(ClientCertificateManager.java:159) at com.hortonworks.smartsense.gateway.client.security.ClientCertificateManager.checkSecuritySetup(ClientCertificateManager.java:190) at com.hortonworks.smartsense.gateway.client.https.HttpsClientFactory.verifyTwoWaySSLSetup(HttpsClientFactory.java:80) at com.hortonworks.smartsense.gateway.client.https.HttpsClientFactory.getSSLHttpClient(HttpsClientFactory.java:114) at com.hortonworks.smartsense.gateway.client.https.HttpsClientFactory.getHttpsClient(HttpsClientFactory.java:69) at com.hortonworks.smartsense.gateway.client.GatewayHttpsClient.uploadBundle(GatewayHttpsClient.java:190)... 13 more Caused by: java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 7 at com.google.gson.stream.JsonReader.expect(JsonReader.java:339) ##hst-gateway.log 17 Oct 2016 14:13:39,820 INFO qtp1898155970-67 GatewaySecurityFilter:71 - Filtering https://770745-hadoopmaster03.abcxyz.com:9450/certs/770745-hadoopmaster03.abcxyz.com for security purposes17 Oct 2016 14:13:39,821 WARN qtp1898155970-67 GatewaySecurityFilter:125 - Request https://770745-hadoopmaster03.abcxyz.com:9450/certs/770745-hadoopmaster03.abcxyz.com doesn't match any pattern.17 Oct 2016 14:13:39,821 WARN qtp1898155970-67 GatewaySecurityFilter:77 - This request is not allowed on this port: https://770745-hadoopmaster03.abcxyz.com:9450/certs/770745-hadoopmaster03.abcxyz.com ##Popup while clicking on "upload". Unable to connect to the on-premise SmartSense Gateway (host: 770745-hadoopmaster03.abcxyz.com, port: 9451). Please verify the gateway.host and gateway.port properties in the HST Server configurations are set correctly and the on-premise SmartSense Gateway is up and running. Reason: Gateway had a filter which would reject any requests if the host url started with a number. Solution: The pattern matcher for host in the url is fixed in 1.3.1. Upgrade to 1.3.1 or rename the gateway host to remove the preceding numbers and restart the gateway. SmartSense 1.3.1 upgrade documents : http://docs.hortonworks.com/HDPDocuments/SS1/SmartSense-1.3.1/bk_installation/content/upgrade_scenarios.html

apathak · ‎04-27-2017

Affected versions: SmartSense 1.3.x Symptoms: Gateway start fails with following exception in hst-gateway.log : INFO 2016-10-21 12:32:42,881 hst-gateway.py:520 - SmartSense Gateway is not running... INFO 2016-10-21 12:32:42,882 hst-gateway.py:539 - Running server: ['/bin/sh', '-c', 'ulimit -n 10000; /usr/java/default/bin/java -server -XX:NewRatio=3 -XX:+UseConcMarkSweepGC -XX:-UseGCOverheadLimit -XX:CMSInitiatingOccupancyFraction=60 -Dlog.file.name=hst-gateway.log -Xms512m -Xmx1024m -cp /etc/hst/conf:/usr/hdp/share/hst/hst-common/lib/* com.hortonworks.smartsense.gateway.server.GatewayServer >/var/log/hst/hst-gateway.out 2>&1 &'] INFO 2016-10-21 12:32:42,884 hst-gateway.py:542 - Executing Gateway Server start command 21 Oct 2016 12:32:43,505 DEBUG main GatewayConfigValidator:40 - Found supported transfer type:HTTPS 21 Oct 2016 12:32:43,507 INFO main GatewayConfigValidator:67 - Found configurations to be used for connectivity. 21 Oct 2016 12:32:43,509 INFO main GatewayValidator:149 - Trying to reach: smartsense-dev.hortonworks.com:443 21 Oct 2016 12:32:43,667 ERROR main GatewayValidator:93 - Failed to reach SmartSense landing zone host:smartsense-dev.hortonworks.com, port:443 21 Oct 2016 12:32:43,667 INFO main GatewayValidator:149 - Trying to reach: hortonworks.com:80 21 Oct 2016 12:32:43,678 ERROR main GatewayValidator:100 - Failed to reach hortonworks.com:80. 21 Oct 2016 12:32:43,679 ERROR main GatewayServer:78 - Error occured during Gateway Server Setup. Unable to start Gateway Server. com.hortonworks.smartsense.gateway.GatewayException: Unable to reach hortonworks.com:80. Please verify outbound connectivity on Gateway. at com.hortonworks.smartsense.gateway.server.GatewayValidator.validateReachable(GatewayValidator.java:102) at com.hortonworks.smartsense.gateway.server.GatewayValidator.validateSetup(GatewayValidator.java:65)at com.hortonworks.smartsense.gateway.server.GatewayServer.run(GatewayServer.java:99) at com.hortonworks.smartsense.gateway.server.GatewayServer.main(GatewayServer.java:76) Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at com.hortonworks.smartsense.gateway.server.GatewayValidator.isReachable(GatewayValidator.java:150) at com.hortonworks.smartsense.gateway.server.GatewayValidator.validateReachable(GatewayValidator.java:90)... 3 more Reason: There is a gateway validation process that checks for outbound connectivity during start up. This validation test was trying an outbound socket connectivity check without passing through the proxy and failing. Solution: Fixed in SmartSense 1.4.0. Please upgrade to this version and restart Gateway to automatically fix the issue. Gateway supports backward compatibility where HST 1.2.x or HST 1.3.x can use the Gateway 1.4.0 to upload bundles. Otherwise, for 1.3.x set the property in /etc/hst/conf/hst-gateway.ini in gateway section and restart Gateway: start.validation.enabled=false

apathak · ‎04-27-2017

Affected versions: SmartSense 1.2.x, 1.3.x with Ambari lower than 2.4.1 Symptoms: Users / Groups are pre-created before setting up the cluster. As such, ignore_groupsusers_create needs to be set to true so groups are not modified when services are added. This works fine for all components except smartsense which fails with error performing usermod : resource_management.core.exceptions.Fail: Execution of 'usermod -G -adp-hdfs -g adp-hadoop adp-xxxx' returned 6. usermod: group 'adp-hdfs' does not exist This group does NOT exist. Note, within the stdout you do actually see the following message: Skipping creation of User and Group as host is sys prepped or ignore_groupusers_create flag is on Reason: The “ignore_groupsusers_create” configurations was not considered while adding / installing SmartSense service. Solution: New fix is available in Ambari 2.4.2.6-5 or higher. Upgrade is the cleanest solution in this case.

apathak · ‎04-27-2017

Affected versions: All SmartSense versions Symptoms: YARN and MR dashboards work fine but there is no data for HDFS dashboard. Activity Analyzer logs are available at /var/log/smartsense-activity/activity-analyzer.log. These logs will shows an error stack trace similar to something as provided below: ERROR [pool-14-thread-1] ActivityManager:89 - Failed to process activity id /hadoop/hdfs/namenode/current/fsimage_0009281 of type HDFScom.hortonworks.smartsense.activity.ActivityException: Failed to process activity. Activity Type: HDFS; ActivityId: /hadoop/hdfs/namenode/current/fsimage_0009281; ActivityData: null at com.hortonworks.smartsense.activity.hdfs.HDFSImageProcessor.processActivity(HDFSImageProcessor.java:104) at com.hortonworks.smartsense.activity.ActivityManager$1.call(ActivityManager.java:80) at com.hortonworks.smartsense.activity.ActivityManager$1.call(ActivityManager.java:73) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)Caused by: com.hortonworks.smartsense.activity.ActivityException: Error while getting fsImage iterator from path /hadoop/hdfs/namenode/current/fsimage_0009281 at org.apache.hadoop.hdfs.tools.offlineImageViewer.PBImageParser.getIterator(PBImageParser.java:118) at com.hortonworks.smartsense.activity.hdfs.HDFSImageProcessor.processActivity(HDFSImageProcessor.java:88) ... 6 more Caused by: java.io.IOException: Unsupported layout version -63 at org.apache.hadoop.hdfs.server.namenode.FSImageUtil.loadSummary(FSImageUtil.java:75) Reason: Activity Analyzer maintains all the dashboard in a new schema created in the same database as setup by Ambari Metrics Collector. It also utilizes the clients for HDFS available in the AMS libraries. This issue can happen if Activity Analyzer is used with older Hadoop/MapReduce/YARN libraries from AMS. Solution: Verify all the jars available under /usr/lib/ambari-metrics-collector and remove the older versions. Restart the Activity Analyzer after removing the older jars. Examples of older version jars: /usr/lib/ambari-metrics-collector/hadoop-annotations-2.6.0.2.2.0.0-2041.jar /usr/lib/ambari-metrics-collector/hadoop-auth-2.6.0.2.2.0.0-2041.jar /usr/lib/ambari-metrics-collector/hadoop-client-2.6.0.2.2.0.0-2041.jar /usr/lib/ambari-metrics-collector/hadoop-common-2.6.0.2.2.0.0-2041.jar /usr/lib/ambari-metrics-collector/hadoop-hdfs-2.6.0.2.2.0.0-2041.jar /usr/lib/ambari-metrics-collector/hadoop-mapreduce-client-app-2.6.0.2.2.0.0-2041.jar /usr/lib/ambari-metrics-collector/hadoop-mapreduce-client-common-2.6.0.2.2.0.0-2041.jar /usr/lib/ambari-metrics-collector/hadoop-mapreduce-client-core-2.6.0.2.2.1.0-2340.jar /usr/lib/ambari-metrics-collector/hadoop-mapreduce-client-jobclient-2.6.0.2.2.0.0-2041.jar /usr/lib/ambari-metrics-collector/hadoop-mapreduce-client-shuffle-2.6.0.2.2.0.0-2041.jar /usr/lib/ambari-metrics-collector/hadoop-yarn-api-2.6.0.2.2.0.0-2041.jar /usr/lib/ambari-metrics-collector/hadoop-yarn-client-2.6.0.2.2.0.0-2041.jar /usr/lib/ambari-metrics-collector/hadoop-yarn-common-2.6.0.2.2.0.0-2041.jar /usr/lib/ambari-metrics-collector/hadoop-yarn-registry-2.6.0.2.2.0.0-2041.jar

apathak · ‎03-22-2017

@Bruce Perez, I thought you wanted to use ip-10-69-161-179.dir.svc.accenture.comds.dev.accenture.com as your hostname. But based on hosts file and network config , it looks like sandbox.hortonworks.com is the intended hostname. So , revert the change I had earlier recommended in /etc/hosts. Main problem is multiple hostname configured for the same hosts. So , the incorrect host should probably be deleted. What do you see in when you click on Hosts on the top center of ambari view ? Do you see sandbox.hortonworks.com as your hostname ? If yes then, go ahead and use the delete command provided below. replace username, password and clustername. Here is the command to delete using api : curl -u [USERNAME]:[PASSWORD] -H "X-Requested-By: ambari" -X DELETE http://10.69.161.179:8080/api/v1/clusters/[CLUSTERNAME]/hosts/ip-10-69-161-179.dir.svc.accenture.comds.dev.accenture.com Then stop and restart HST Agent as provided in screenshot earlier.

apathak · ‎03-21-2017

The correct content to put in /etc/hosts is : 127.0.0.1 ip-10-69-161-179.dir.svc.accenture.comds.dev.accenture.com After that verify with command hostname -f. It should show the rectified hostname. Then stop and restart HST Agent as follows: agent-operation.png

apathak · ‎03-21-2017

@Bruce Perez, 1. Can you please provide Ambari and SmartSense version ? Do you have any custom script to identify hostnames for Ambari ? 2. Pls execute this command and share the output : hst list-agents we need "hostname -f" to be registered smartsense agent and it should also match with ambari-agent.

apathak · ‎07-01-2016

Decimal declaration format is decimal(precision,scale). Precision is the total number of digits including the digits after the decimal point. So , if you want a number for example 1001.23 the the decimal declaration should be decimal(6,2). Or decimal(4,2) for your given example of 15.56.

apathak · ‎06-29-2016

Hive Insert Query Optimization Some business users deeply analyze their data profile, especially skewness across partitions.There are many other tuning parameters to optimize inserts like tez parallelism, manually changing reduce tasks (not recommended), setting reduce tasks etc.This article focuses on insert query tuning to give more control over handling partitions with no need to tweak any of these properties. Consider a scenario where we are inserting 3 million records across 200 partitions. The file format is ORC and it is further bucketed and sorted for quicker retrieval during select. Apart from writing, bucketing/sorting is going to add additional work for the reducer. Target Table create table if not exists final.final_data_2 ( creation_timestamp bigint, creator string, deletion_timestamp bigint, deletor string, subject string, predicate string, object string, language_code string ) partitioned by(range_partition bigint) CLUSTERED BY(creation_timestamp) SORTED BY(creation_timestamp) INTO 8 BUCKETS stored as ORC; Simple Insert Query hive> set hive.exec.dynamic.partition=true; hive> set hive.exec.dynamic.partition.mode=nonstrict; hive> insert overwrite table final_data_1 partition (range_partition) select creation_timestamp, creator, deletion_timestamp, deletor, subject, predicate, object, language_code, floor(creation_timestamp/1000000000) as range_partition from staging.staging_data; Query ID = hdfs_20160629110841_fd2ee9ed-b36f-417e-ad6d-d76b45cda15d Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id -------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- Map 1 .......... SUCCEEDED 15 15 0 0 0 0 Reducer 2 ...... SUCCEEDED 8 8 0 0 0 0 -------------------------------------------------------------------------------- VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 138.84 s -------------------------------------------------------------------------------- Loading data to table final.final_data_1 partition (range_partition=null) Time taken for load dynamic partitions : 100940 Loading partition {range_partition=1299} Loading partition {range_partition=1247} Loading partition {range_partition=1304} Loading partition {range_partition=1343} Loading partition {range_partition=1235} Loading partition {range_partition=1170} . . . . . Partition final.final_data_1{range_partition=1360} stats: [numFiles=8, numRows=95800, totalSize=2168211, rawDataSize=60300725] Partition final.final_data_1{range_partition=1361} stats: [numFiles=8, numRows=5916, totalSize=173888, rawDataSize=3611249] Partition final.final_data_1{range_partition=1362} stats: [numFiles=8, numRows=20602, totalSize=819304, rawDataSize=13403465] Partition final.final_data_1{range_partition=1363} stats: [numFiles=8, numRows=25376, totalSize=767015, rawDataSize=16242356] Partition final.final_data_1{range_partition=1364} stats: [numFiles=8, numRows=33810, totalSize=901617, rawDataSize=21328693] OK Time taken: 298.047 seconds The above query required 15 mappers based on the splits identified while selecting from the source. 1 reducers per bucket is assigned and each reducer works on corresponding bucket file across each of the 200 partitions. Reducer is clearly a bottleneck in such cases. The case selected in this particular example is relatively small containing 3 millions records and around 300 MB data on a 3 node cluster. This problem is more pronounced at higher number of rows and byte size. Optimization Please refer the modified query below which leverages distributing the work to more number of reducers. The partitions for each insert-query should be selected so that inserts are equally distributed. for ex: If partitioned by date and amounts of records are increasing every year : query 1 : complete year 2000 to 2005 query 2 : complete year 2005 to 2007 query 3 : year 2008 to 2009 query 4 : year 2010 query 5 : 6 months and so on. The bucket numbers can be identified by calculating each bucket file size marginally lower than 1 split size. Optimized Insert Query hive> set hive.exec.dynamic.partition=true; hive> set hive.exec.dynamic.partition.mode=nonstrict; hive> from staging.staging_view stg > insert into final_data_2 partition(range_partition) select stg.creation_timestamp, stg.creator, > stg.deletion_timestamp, stg.deletor, stg.subject, > stg.predicate, stg.object, stg.language_code, stg.range_partition > where range_partition between 1161 and 1180 > insert into final_data_2 partition(range_partition) select stg.creation_timestamp, stg.creator, > stg.deletion_timestamp, stg.deletor, stg.subject, > stg.predicate, stg.object, stg.language_code, stg.range_partition > where range_partition between 1181 and 1200 > insert into final_data_2 partition(range_partition) select stg.creation_timestamp, stg.creator, > stg.deletion_timestamp, stg.deletor, stg.subject, > stg.predicate, stg.object, stg.language_code, stg.range_partition > where range_partition between 1201 and 1220 > insert into final_data_2 partition(range_partition) select stg.creation_timestamp, stg.creator, > stg.deletion_timestamp, stg.deletor, stg.subject, > stg.predicate, stg.object, stg.language_code, stg.range_partition > where range_partition between 1221 and 1240 > insert into final_data_2 partition(range_partition) select stg.creation_timestamp, stg.creator, > stg.deletion_timestamp, stg.deletor, stg.subject, > stg.predicate, stg.object, stg.language_code, stg.range_partition > where range_partition between 1241 and 1260 > insert into final_data_2 partition(range_partition) select stg.creation_timestamp, stg.creator, > stg.deletion_timestamp, stg.deletor, stg.subject, > stg.predicate, stg.object, stg.language_code, stg.range_partition > where range_partition between 1261 and 1280 > insert into final_data_2 partition(range_partition) select stg.creation_timestamp, stg.creator, > stg.deletion_timestamp, stg.deletor, stg.subject, > stg.predicate, stg.object, stg.language_code, stg.range_partition > where range_partition between 1281 and 1300 > insert into final_data_2 partition(range_partition) select stg.creation_timestamp, stg.creator, > stg.deletion_timestamp, stg.deletor, stg.subject, > stg.predicate, stg.object, stg.language_code, stg.range_partition > where range_partition between 1301 and 1320 > insert into final_data_2 partition(range_partition) select stg.creation_timestamp, stg.creator, > stg.deletion_timestamp, stg.deletor, stg.subject, > stg.predicate, stg.object, stg.language_code, stg.range_partition > where range_partition between 1321 and 1340 > insert into final_data_2 partition(range_partition) select stg.creation_timestamp, stg.creator, > stg.deletion_timestamp, stg.deletor, stg.subject, > stg.predicate, stg.object, stg.language_code, stg.range_partition > where range_partition between 1341 and 1360 > insert into final_data_2 partition(range_partition) select stg.creation_timestamp, stg.creator, > stg.deletion_timestamp, stg.deletor, stg.subject, > stg.predicate, stg.object, stg.language_code, stg.range_partition > where range_partition between 1361 and 1380; Query ID = hdfs_20160629130923_ae533e87-3621-4773-8ed9-9d53a1cc857a Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1466522743023_0015) -------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- Map 1 .......... SUCCEEDED 15 15 0 0 0 0 Reducer 10 . RUNNING 8 2 0 6 0 0 Reducer 11 ... RUNNING 8 5 0 3 0 0 Reducer 12 . RUNNING 8 2 0 6 0 0 Reducer 2 . RUNNING 8 2 0 6 0 0 Reducer 3 ... RUNNING 8 5 0 3 0 0 Reducer 4 ... RUNNING 8 5 0 3 0 0 Reducer 5 ... RUNNING 8 5 0 3 0 0 Reducer 6 .. RUNNING 8 3 2 3 0 0 Reducer 7 ... RUNNING 8 5 0 3 0 0 Reducer 8 . RUNNING 8 2 3 3 0 0 Reducer 9 . RUNNING 8 2 3 3 0 0 -------------------------------------------------------------------------------- VERTICES: 01/12 [=============>>-------------] 51% ELAPSED TIME: 94.33 s -------------------------------------------------------------------------------- You can see here that 8 reducers are assigned per query. So, each reducer is working on sorting and writing only 20 files. Status: Running (Executing on YARN cluster with App id application_1466522743023_0015) -------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- Map 1 .......... SUCCEEDED 15 15 0 0 0 0 Reducer 10 ..... SUCCEEDED 8 8 0 0 0 0 Reducer 11 ..... SUCCEEDED 8 8 0 0 0 0 Reducer 12 ..... SUCCEEDED 8 8 0 0 0 0 Reducer 2 ...... SUCCEEDED 8 8 0 0 0 0 Reducer 3 ...... SUCCEEDED 8 8 0 0 0 0 Reducer 4 ...... SUCCEEDED 8 8 0 0 0 0 Reducer 5 ...... SUCCEEDED 8 8 0 0 0 0 Reducer 6 ...... SUCCEEDED 8 8 0 0 0 0 Reducer 7 ...... SUCCEEDED 8 8 0 0 0 0 Reducer 8 ...... SUCCEEDED 8 8 0 0 0 0 Reducer 9 ...... SUCCEEDED 8 8 0 0 0 0 -------------------------------------------------------------------------------- VERTICES: 12/12 [==========================>>] 100% ELAPSED TIME: 129.25 s -------------------------------------------------------------------------------- Loading data to table final.final_data_2 partition (range_partition=null) Time taken for load dynamic partitions : 9291 Loading partition {range_partition=1178} Loading partition {range_partition=1174} Loading partition {range_partition=1163} Loading partition {range_partition=1165} Loading partition {range_partition=1172} Loading partition {range_partition=1176} Loading partition {range_partition=1179} Loading partition {range_partition=1166} Loading partition {range_partition=1175} Loading partition {range_partition=1177} Loading partition {range_partition=1167} Loading partition {range_partition=1180} Time taken for adding to write entity : 8 Loading data to table final.final_data_2 partition (range_partition=null) Time taken for load dynamic partitions : 9718 . . . . . . . . . . . . OK Time taken: 269.01 seconds hive> Here we see some improvements from 298 seconds to 269 seconds but the user will have to try it on his particular case to identify the exact impact. This process of parallelizing inserts is not new and usually designed to insert into multiple table. But the same modification can help us gain more control over reducers without explicitly setting the reducer task number property. The number of reducer tasks increases but each task runs for much lesser time than just 8 reducers. Note : Increasing reducers may not always increase performance. It will only if the reducers are bottleneck. Task slot utilization analysis will point out whether there are sufficient task slots available to leverage further breaking it into more reducers.

Online	Offline
Last Visited	‎08-21-2024 05:24 PM

Member Since	‎09-29-2015 12:35 AM
Last Visited	‎08-21-2024 05:24 PM
Posts	25
Kudos received	7

Cloudera Community

Re: SmartSense Gateway Using a HTTP Proxy with NTL...

Re: "ActivityAnalyzerFacade" consuming high ammoun...

Re: HDP2.5 - SmartSense HST Server not starting up...

Re: How to move Activity Analyzer to a different n...

Re: Zeppelin running but process died

Bundle upload on standalone Gateway 1.2.1 fails wi...

Gateway upload failing with error - "Unable to con...

SmartSense Gateway fails to start when using HTTPS...

Installing SmartSense with “ignore_groupsusers_cre...

Activity Analyzer is not reporting any data for HD...

Re: SmartSense Host Does Not Match

Re: SmartSense Host Does Not Match

Re: SmartSense Host Does Not Match

Re: Hive is rounding the number columns automatica...

Hive insert query optimization