About aaditya_a_deshp

aaditya_a_deshp · ‎03-25-2017

I see the problem. Second cluster hardware (3 nodes) : 64 logical cores 256Gb RAM 2To HDD As per my maths, total cores = 64*3 = 192, total RAM = 256*3 = 768GB. The NodeManager capacities, yarn.nodemanager.resource.memory-mb and yarn.nodemanager.resource.cpu-vcores , should probably be set to 252* 1024 = 158048 (megabytes) and 60 respectively. We avoid allocating 100% of the resources to YARN containers because the node needs some resources to run the OS and Hadoop daemons. In this case leave 4 gigabyte and a 4 cores for these system processes. Check in yarn-site.xml, it could already be set, if not, set yourself. Now, a better option would be to use --num-executors 34 --executor-cores 5 --executor-memory 19G --driver-memory 32G . Why? This config results in three executors on all nodes except for the one with the AM, which will have two executors. --executor-memory was derived as (252/12 executors per node) = 21. 21 * 0.07 = 1.47. 21 – 1.47 ~ 19.

aaditya_a_deshp · ‎03-22-2017

Can you post the error logs which are given in the tracking url - For example, for application id application_1459318710253_0026, we check the logs at http://cluster.manager.ip:8088/cluster/app/application_1459318710253_0026

aaditya_a_deshp · ‎03-22-2017

local[*] new SparkConf() .setMaster("local[2]") This is specific to run the job in local mode This is specifically used to test the code in small amount of data in local environment It Does not provide the advantages of distributed environment * is the number of cpu cores to be allocated to perform the local operation It helps in debugging the code by applying breakpoints while running from Eclipse or IntelliJ yarn-client --master yarn --deploy-mode client Yarn client mode: your driver program is running on the yarn client where you type the command to submit the spark application (may not be a machine in the yarn cluster). In this mode, although the drive program is running on the client machine, the tasks are executed on the executors in the node managers of the YARN cluster yarn-cluster --master yarn --deploy-mode cluster This is the most advisable pattern for executing/submitting your spark jobs in production Yarn cluster mode: Your driver program is running on the cluster master machine where you type the command to submit the spark application

aaditya_a_deshp · ‎03-22-2017

This seems to be moreover a job tunning problem. Kindly share following parameters used while submitting the jobs in both the clusters --num-executors ? --executor-cores ? --executor-memory ? --driver-memory ?

aaditya_a_deshp · ‎03-21-2017

Analysis: As per oracle - Oracle Database 8i and earlier versions did not support TIMESTAMP data, but Oracle DATE data used to have a time component as an extension to the SQL standard. So, Oracle Database 8i and earlier versions of JDBC drivers mapped oracle.sql.DATE to java.sql.Timestamp to preserve the time component. Starting with Oracle Database 9.0.1, TIMESTAMP support was included and 9i JDBC drivers started mapping oracle.sql.DATE to java.sql.Date. This mapping was incorrect as it truncated the time component of Oracle DATE data. To overcome this problem, Oracle Database 11.1 introduced a new flag mapDateToTimestamp. The default value of this flag is true, which means that by default the drivers will correctly map oracle.sql.DATE to java.sql.Timestamp, retaining the time information. If you still want the incorrect but 10g compatible oracle.sql.DATE to java.sql.Date mapping, then you can get it by setting the value of mapDateToTimestamp flag to false. Ref link is here. Solution: So as instructed by oracle provide property jdbc.oracle.mapDateToTimestamp as false - Class.forName("oracle.jdbc.driver.OracleDriver")var info : java.util.Properties=new java.util.Properties() info.put("user", user) info.put("password", password) info.put("oracle.jdbc.mapDateToTimestamp","false")val jdbcDF = spark.read.jdbc(jdbcURL, tableFullName, info) Add Oracle database connector jar which supports "oracle.jdbc.mapDateToTimestamp" flag is ojdbc14.jar Hope it helps!

Online	Offline
Last Visited	‎03-03-2018 07:27 PM

Member Since	‎03-21-2017 05:58 PM
Last Visited	‎03-03-2018 07:27 PM
Posts	6
Kudos received	3

Cloudera Community

Re: Incorrect conversion of Date (data type) to Ti...

Re: Yarn / Spark Strange behaviour between 2 clust...

Re: Minimal executable jar based on Scala code pac...

Re: Difference between local[*] vs yarn cluster vs...

Re: Yarn / Spark Strange behaviour between 2 clust...

Re: Incorrect conversion of Date (data type) to Ti...