Member since
03-21-2017
6
Posts
3
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4353 | 03-21-2017 06:00 PM |
03-25-2017
05:09 PM
I see the problem. Second cluster hardware (3 nodes) : 64 logical cores 256Gb RAM 2To HDD As per my maths, total cores = 64*3 = 192, total RAM = 256*3 = 768GB. The NodeManager capacities, yarn.nodemanager.resource.memory-mb and yarn.nodemanager.resource.cpu-vcores ,
should probably be set to 252* 1024 = 158048 (megabytes) and 60
respectively. We avoid allocating 100% of the resources to YARN
containers because the node needs some resources to run the OS and
Hadoop daemons. In this case leave 4 gigabyte and a 4 cores for these
system processes. Check in yarn-site.xml, it could already be set, if not, set yourself. Now, a better option would be to use --num-executors 34 --executor-cores 5 --executor-memory 19G --driver-memory 32G . Why? This config results in three executors on all nodes except for the one with the AM, which will have two executors. --executor-memory was derived as (252/12 executors per node) = 21. 21 * 0.07 = 1.47. 21 – 1.47 ~ 19.
... View more
03-22-2017
08:28 AM
Can you post the error logs which are given in the tracking url -
For example, for application id application_1459318710253_0026, we check the logs at http://cluster.manager.ip:8088/cluster/app/application_1459318710253_0026
... View more
03-22-2017
06:31 AM
2 Kudos
local[*] new SparkConf() .setMaster("local[2]")
This is specific to run the job in local mode This is specifically used to test the code in small amount of data in local environment It Does not provide the advantages of distributed environment * is the number of cpu cores to be allocated to perform the local operation It helps in debugging the code by applying breakpoints while running from Eclipse or IntelliJ
yarn-client --master yarn --deploy-mode client
Yarn client mode: your driver program is running on the yarn client where you type the command to submit the spark application (may not be a machine in the yarn cluster). In this mode, although the drive program is running on the client machine, the tasks are executed on the executors in the node managers of the YARN cluster yarn-cluster --master yarn --deploy-mode cluster
This is the most advisable pattern for executing/submitting your spark jobs in production Yarn cluster mode: Your driver program is running on the cluster master machine where you type the command to submit the spark application
... View more
03-22-2017
06:00 AM
This seems to be moreover a job tunning problem. Kindly share following parameters used while submitting the jobs in both the clusters --num-executors ? --executor-cores ? --executor-memory ? --driver-memory ?
... View more
03-21-2017
06:00 PM
1 Kudo
Analysis:
As per oracle -
Oracle Database 8i and earlier versions did not support TIMESTAMP
data, but Oracle DATE data used to have a time component as an
extension to the SQL standard. So, Oracle Database 8i and earlier
versions of JDBC drivers mapped oracle.sql.DATE to java.sql.Timestamp
to preserve the time component. Starting with Oracle Database 9.0.1,
TIMESTAMP support was included and 9i JDBC drivers started mapping
oracle.sql.DATE to java.sql.Date. This mapping was incorrect as it
truncated the time component of Oracle DATE data. To overcome this
problem, Oracle Database 11.1 introduced a new flag
mapDateToTimestamp. The default value of this flag is true, which
means that by default the drivers will correctly map oracle.sql.DATE
to java.sql.Timestamp, retaining the time information. If you still
want the incorrect but 10g compatible oracle.sql.DATE to java.sql.Date
mapping, then you can get it by setting the value of
mapDateToTimestamp flag to false. Ref link is here. Solution: So as instructed by oracle provide property jdbc.oracle.mapDateToTimestamp as false - Class.forName("oracle.jdbc.driver.OracleDriver")var info : java.util.Properties=new java.util.Properties()
info.put("user", user)
info.put("password", password)
info.put("oracle.jdbc.mapDateToTimestamp","false")val jdbcDF = spark.read.jdbc(jdbcURL, tableFullName, info)
Add Oracle database connector jar which supports "oracle.jdbc.mapDateToTimestamp" flag is ojdbc14.jar Hope it helps!
... View more