Member since
01-23-2017
114
Posts
19
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1201 | 03-26-2018 04:53 AM | |
22561 | 12-01-2017 07:15 AM | |
430 | 11-28-2016 11:30 AM | |
786 | 10-25-2016 11:26 AM |
05-23-2018
02:29 PM
2 Kudos
This article discuss the process related to Oozie Manual Sharelib update and the prerequisites for Spark Oozie Sharelib Copy the sharelib to a local directory
# mkdir oozie_share_lib
# hadoop fs -copyToLocal <current-share-lib-directory> oozie_share_lib/lib To update oozie sharelib once the existing oozie sharelib copied from HDFS to local as above: /usr/hdp/current/oozie-client/bin/oozie-setup.sh sharelib create -fs /user/oozie/share/lib/ -locallib oozie_share_lib/ This will create a new sharelib including SPARK Oozie sharelib: the destination path for sharelib is: /user/oozie/share/lib/lib_20180502070613
Fixing oozie spark sharelib
Spark is locally installed at /usr/hdp/2.6.3.0-235/oozie/../spark
Renaming spark to spark_orig in /user/oozie/share/lib/lib_20180502070613
Creating new spark directory in /user/oozie/share/lib/lib_20180502070613
Copying Oozie spark sharelib jar to /user/oozie/share/lib/lib_20180502070613/spark
Copying oozie_share_lib/lib/spark/oozie-sharelib-spark-4.2.0.2.6.3.0-235.jar to /user/oozie/share/lib/lib_20180502070613/spark
Copying local spark libraries to /user/oozie/share/lib/lib_20180502070613/spark
Copying local spark python libraries to /user/oozie/share/lib/lib_20180502070613/spark
Copying local spark hive site to /user/oozie/share/lib/lib_20180502070613/spark But from the corresponding HDFS folder we can see that the spark lib's were not added to the SPARK Oozie share lib: $ hadoop fs -ls /user/oozie/share/lib/lib_20180502070613/spark
Found 1 items
-rwxrwxrwx 3 oozie hadoop 191121639 2018-05-02 07:18 /user/oozie/share/lib/lib_20180502070613/spark/spark-assembly-1.6.3.2.6.3.0-235-hadoop2.7.3.2.6.3.0-235.jar It means Oozie Sharelib update is not working as expected for SPARK, even though it shows Spark is locally installed at /usr/hdp/2.6.3.0-235/oozie/../spark But the spark client was not installed on the node from where oozie sharelib update command was run no-spark-client-installed.png And from the node where the SPARK-CLIENT installed OOZIE Sharelib update does properly update the Spark Oozie Share Lib: the destination path for sharelib is: /user/oozie/share/lib/lib_20180502064112
Fixing oozie spark sharelib
Spark is locally installed at /usr/hdp/2.6.3.0-235/oozie/../spark
Renaming spark to spark_orig in /user/oozie/share/lib/lib_20180502064112
Creating new spark directory in /user/oozie/share/lib/lib_20180502064112
Copying Oozie spark sharelib jar to /user/oozie/share/lib/lib_20180502064112/spark
Copying oozie-new-sharelib/lib/spark/oozie-sharelib-spark-4.2.0.2.6.3.0-235.jar to /user/oozie/share/lib/lib_20180502064112/spark
Copying local spark libraries to /user/oozie/share/lib/lib_20180502064112/spark
Ignoring file /usr/hdp/2.6.3.0-235/oozie/../spark/lib/spark-examples-1.6.3.2.6.3.0-235-hadoop2.7.3.2.6.3.0-235.jar
Copying /usr/hdp/2.6.3.0-235/oozie/../spark/lib/datanucleus-core-3.2.10.jar to /user/oozie/share/lib/lib_20180502064112/spark
Copying /usr/hdp/2.6.3.0-235/oozie/../spark/lib/spark-assembly-1.6.3.2.6.3.0-235-hadoop2.7.3.2.6.3.0-235.jar to /user/oozie/share/lib/lib_20180502064112/spark
Ignoring file /usr/hdp/2.6.3.0-235/oozie/../spark/lib/spark-hdp-assembly.jar
Copying /usr/hdp/2.6.3.0-235/oozie/../spark/lib/datanucleus-rdbms-3.2.9.jar to /user/oozie/share/lib/lib_20180502064112/spark
Copying /usr/hdp/2.6.3.0-235/oozie/../spark/lib/datanucleus-api-jdo-3.2.6.jar to /user/oozie/share/lib/lib_20180502064112/spark
Copying local spark python libraries to /user/oozie/share/lib/lib_20180502064112/spark
Copying /usr/hdp/2.6.3.0-235/oozie/../spark/python/lib/pyspark.zip to /user/oozie/share/lib/lib_20180502064112/spark
Copying /usr/hdp/2.6.3.0-235/oozie/../spark/python/lib/py4j-0.9-src.zip to /user/oozie/share/lib/lib_20180502064112/spark
Ignoring file /usr/hdp/2.6.3.0-235/oozie/../spark/python/lib/PY4J_LICENSE.txt
Copying local spark hive site to /user/oozie/share/lib/lib_20180502064112/spark
Copying /etc/spark/conf/hive-site.xml to /user/oozie/share/lib/lib_20180502064112/spark From here we can see that Oozie is able to pick up the files from /usr/hdp/2.6.3.0-235/spark/conf/ to HDFS /user/oozie/share/lib/lib_20180502064112/spark where we have the spark-client installed spark-client-installed.png $ hadoop fs -ls /user/oozie/share/lib/lib_20180502064112/spark
Found 8 items
-rw-r--r-- 3 oozie hdfs 339666 2018-05-02 06:41 /user/oozie/share/lib/lib_20180502064112/spark/datanucleus-api-jdo-3.2.6.jar
-rw-r--r-- 3 oozie hdfs 1890075 2018-05-02 06:41 /user/oozie/share/lib/lib_20180502064112/spark/datanucleus-core-3.2.10.jar
-rw-r--r-- 3 oozie hdfs 1809447 2018-05-02 06:41 /user/oozie/share/lib/lib_20180502064112/spark/datanucleus-rdbms-3.2.9.jar
-rw-r--r-- 3 oozie hdfs 1918 2018-05-02 06:41 /user/oozie/share/lib/lib_20180502064112/spark/hive-site.xml
-rw-r--r-- 3 oozie hdfs 23278 2018-05-02 06:41 /user/oozie/share/lib/lib_20180502064112/spark/oozie-sharelib-spark-4.2.0.2.6.3.0-235.jar
-rw-r--r-- 3 oozie hdfs 44846 2018-05-02 06:41 /user/oozie/share/lib/lib_20180502064112/spark/py4j-0.9-src.zip
-rw-r--r-- 3 oozie hdfs 358253 2018-05-02 06:41 /user/oozie/share/lib/lib_20180502064112/spark/pyspark.zip
-rw-r--r-- 3 oozie hdfs 191121639 2018-05-02 06:41 /user/oozie/share/lib/lib_20180502064112/spark/spark-assembly-1.6.3.2.6.3.0-235-hadoop2.7.3.2.6.3.0-235.jar With this, to have properly updated Spark Oozie share lib we need to have Spark client to be installed from the node/server where we are running the Oozie Share lib update manually.
... View more
- Find more articles tagged with:
- Governance & Lifecycle
- Issue Resolution
- Oozie
- oozie-sharelib
- oozie-spark
Labels:
05-23-2018
05:54 AM
can you please provide job.properties and workflow.xml files and also how the spark submit is called i.e. are we using Oozie spark action or are we calling the spark-submit from a shell action?
... View more
05-23-2018
05:38 AM
@Fasil Ahamed Does all the Oozie client nodes and NodeManagers have spark2-client installed? Thanks Venkat
... View more
05-03-2018
05:14 AM
@Satya
P
The error: StandaloneSchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up. and the --master spark://111.33.22.111:50070 Any specific reason to use NN port 50070 instead of Spark related ports? Thanks Venkat
... View more
04-20-2018
11:28 AM
1 Kudo
This has been identified as a BUG in SPARK 2.2. which is fixed in SPARK 2.3
... View more
04-20-2018
11:13 AM
@Ravikiran Dasari Make sure you have Sqoop clients installed on all the nodes. can you please check the map logs, the given logs doesn't have much information. Thanks Venkat
... View more
04-18-2018
01:20 PM
@heta desai
you can use the parameters based on your environment and here is the details that gives details about LDAP error codes. Thanks Venkat
... View more
04-18-2018
11:39 AM
@heta desai This is what we use for add.ldif: dn: CN=<username>,OU=prod1,OU=Hadoop,OU=Users,OU=UK,DC=global,DC=org
changetype: add
objectClass: top
objectClass: person
objectClass: organizationalPerson
objectClass: user
distinguishedName: CN=<username>,OU=prod1,OU=Hadoop,OU=Users,OU=UK,DC=global,DC=org
cn: <username>
userAccountControl: 514
unicodePwd::IgBTAHQAYQBnAGkAbgBnAEAAMgAwADEANwAiAA==
accountExpires: 0
userPrincipalName: <username>@GLOBAL.ORG This works for us. Please check your DN's, OU's and the corresponding objectClass to be specified, these are entirely environment dependent. Thanks Venkat
... View more
04-18-2018
10:38 AM
@heta desai Can you please check the ldapsearch with the same user you are trying to connect and the same OU is working? Thanks Venkat
... View more
04-17-2018
02:55 PM
@Kiran Nittala --files and --conf spark.yarn.dist.files both works, any specific reason we have to pass these parameters even though the files hive-site.xml and hbase-site.xml from /etc/spark2/conf Thanks Venkat
... View more
04-17-2018
02:39 PM
@Vinod K C I haven't come across any document but from the HDP installation you can find it from: /etc/spark2/conf/spark-env.sh # Options read in YARN client mode
#SPARK_EXECUTOR_INSTANCES="2" #Number of workers to start (Default: 2)
#SPARK_EXECUTOR_CORES="1" #Number of cores for the workers (Default: 1).
#SPARK_EXECUTOR_MEMORY="1G" #Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
#SPARK_DRIVER_MEMORY="512M" #Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
#SPARK_YARN_APP_NAME="spark" #The name of your application (Default: Spark)
#SPARK_YARN_QUEUE="default" #The hadoop queue to use for allocation requests (Default: default)
#SPARK_YARN_DIST_FILES="" #Comma separated list of files to be distributed with the job.
#SPARK_YARN_DIST_ARCHIVES="" #Comma separated list of archives to be distributed with the job. But this says only YARN CLIENT mode. And the job is not picking up the files available in /etc/spark2/conf as well. Thanks Venkat
... View more
04-13-2018
01:59 PM
@heta desai if you access to nodes you can check with ls -l /usr/hdp/current/pig-client/ and also by getting into pig shell if have only 2-3 nodes. Thanks Venkat
... View more
04-13-2018
11:37 AM
@Rohit Khose As i have given --files does work, but when the file is given as part of SPARK_YARN_DIST_FILES and also the files are available in /etc/spark2/conf/hive-site.xml spark should be able to pick it up these any specific reason that this is not getting picked up? Thanks Venkat
... View more
04-13-2018
11:01 AM
We are on HDP 2.6.3 and using SPARK 2.2 and running the job using on YARN CLUSTER mode. using spark-submit and the spark-env.sh contains SPARK_YARN_DIST_FILES="/etc/spark2/conf/hive-site.xml,/etc/spark2/conf/hbase-site.xml" but these values are not honored. spark-submit --class com.virtuslab.sparksql.MainClass --master yarn --deploy-mode cluster /tmp/spark-hive-test/spark_sql_under_the_hood-spark2.2.0.jar This is trying to connect to Hive and fetch the data from a table, but it fails with table on not found in database: diagnostics: User class threw exception: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'xyz' not found in database 'qwerty';
ApplicationMaster host: 121.121.121.121
ApplicationMaster RPC port: 0
queue: default
start time: 1523616607943
final status: FAILED
tracking URL: https://managenode002xxserver:8090/proxy/application_1523374609937_10224/
user: abc123
Exception in thread "main" org.apache.spark.SparkException: Application application_1523374609937_10224 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1187)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1233)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:782)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) The same works when we pass the --files parameter: spark-submit --class com.virtuslab.sparksql.MainClass --master yarn --deploy-mode cluster --files /etc/spark2/conf/hive-site.xml /tmp/spark-hive-test/spark_sql_under_the_hood-spark2.2.0.jar Result attached. Any pointers why it is not using picking up SPARK_YARN_DIST_FILES? Thanks Venkat
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache YARN
04-13-2018
10:45 AM
@heta desai Do you PIG Client installed on all the nodes and is the PIG oozie share lib exists? Thanks Venkat
... View more
04-10-2018
07:09 AM
@ssathish Isn't it only for the currently running jobs? Do we able to see the job completed jobs containers and details. Here is the Running job that shows Total Allocated Containers:running-containers.png Here is the Completed Job that shows Total Allocated Containers: finished-job.png But none of these Total Allocated Containers the get transformed to the REST API of RM. Below given XML's will show only the allocated contain Running Job XML: running.xml Finished Job XML: finished-job.xml And the Node REST API: curl http://<Nodemanager address>:<port>/ws/v1/node/containers/<containerID> gives the containers details about only the running containers not about the completed containers. Is there a way what we see on YARN Application UI https://manag003:8090/cluster/appattempt/appattempt_1522212350151_40488_000001 for the Total Allocated Containers: to be transformed to REST API. Thanks Venkat
... View more
04-09-2018
04:39 PM
I'm following YARN REST API this shows: allocatedMB int The sum of memory in MB allocated to the application’s running containers allocatedVCores int The sum of virtual cores allocated to the application’s running containers But these are the aggregated metrics. I'm looking for the total containers, and for each container how much memory and vcores are allocated. Is there a way this can be achieved? Thanks Venkat
... View more
Labels:
- Labels:
-
Apache YARN
04-04-2018
08:27 AM
@Saumil Mayani Thanks a lot for the details. That makes it more clear.
... View more
04-03-2018
03:53 PM
1 Kudo
All these packages are not required for the DataNodes but these are required for the hosts that are hosting the NodeManagers, as the Containers launched on the NodeManagers does require the client services to be available. Make sure Sqoop client is added to all the NodeManager and Oozie Client hosts which is the basic need for the SQOOP job. Thanks Venkat
... View more
04-03-2018
05:42 AM
We have cluster with below CPU configuration:
# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 56
On-line CPU(s) list: 0-55
Thread(s) per core: 2
Core(s) per socket: 14
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Stepping: 1
CPU MHz: 2400.000
BogoMIPS: 4794.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 35840K
NUMA node0 CPU(s): 0-13,28-41
NUMA node1 CPU(s): 14-27,42-55
We have 2 Physical Cores, 14 CPU's each, with Hyper threading 2 (physical) * 14 (cpu's each) * 2 (hyder threading) = 56 But the YARN Configs from Ambari shows 112 cores for the property yarn.nodemanager.resource.cpu-vcores (56*2) This is being done at the Ambari Stack Advisor Code The question here is are we by default doing the multiplication by 2 by assuming that the Hyper Threading is not enabled or are we considering that the CPU is capable of holding multiple containers and leaving the scope to admins to tune the environment based on CPU or I/O work loads. Thanks Venkat
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache YARN
03-26-2018
04:53 AM
1 Kudo
@toide Ambari 2.6.1.3 is no longer a valid version and the communication sent out by Hortonworks, The issued BUG's were fixed in 2.6.1.5 to avoid any potential issues. Thanks Venkat
... View more
03-25-2018
07:29 AM
@sajid mohammed
this issue is not related to ZEPPELIN-1263 , as this is related to User Impersonation with Zeppelin Spark interpreter.
You can mode more details in relation to this under: https://issues.apache.org/jira/browse/ZEPPELIN-3016 and the corresponding community article Please note zeppelin gives the error: ERROR [2017-10-2012:28:46,619]({pool-2-thread-5}RemoteScheduler.java[getStatus]:256)-Can't get status information org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused) even for the following scenarios: 1) Log Directory not having permissions 2) User doesn't have folder/file level permissions 3) Jar files missing in the path 4) ENV variables missing These are some of the scenarios i have seen this error with zeppelin Spark interpreter. Thanks Venkat
... View more
01-24-2018
05:50 AM
Environment:
We are using EMR, with Spark 2.1 and EMR FS.
Process we are doing:
We are running a PySpark job to join 2 Hive tables and creating a another hive table based on this result using saveAsTable and storing it as a ORC with partitions Issue:
18/01/23 10:21:28 INFO
OutputCommitCoordinator: Task was denied committing, stage: 84, partition: 901,
attempt: 10364
18/01/23 10:21:28 INFO
TaskSetManager: Starting task 901.10365 in stage 84.0 (TID 212686, ip-172-31-46-97.ec2.internal,
executor 10, partition 901, PROCESS_LOCAL, 6235 bytes)
18/01/23 10:21:28 WARN
TaskSetManager: Lost task 884.10406 in stage 84.0 (TID 212677,
ip-172-31-46-97.ec2.internal, executor 85): TaskCommitDenied (Driver denied
task commit) for job: 84, partition: 884, attemptNumber: 10406
This specific log information is recursive from the Spark logs and by the time we killed the job we have seen this for about ~170000 (160595) times as given in spark-task-commit-denied.jpg
From the source code it shows this: /** * :: DeveloperApi :: * Task requested the driver to commit, but was denied. */
@DeveloperApicase class TaskCommitDenied
( jobID: Int,
partitionID: Int,
attemptNumber: Int) extends TaskFailedReason {
override def toErrorString: String = s"TaskCommitDenied (Driver denied task commit)" +
s" for job: $jobID, partition: $partitionID, attemptNumber: $attemptNumber"
/** * If a task failed because its attempt to commit was denied, do not count this failure * towards failing the stage. This is intended to prevent spurious stage failures in cases * where many speculative tasks are launched and denied to commit. */
override def countTowardsTaskFailures: Boolean = false
} Please note we have not enabled spark.speculation i.e. (it is false) and from the spark job Environment we have not seen this property at all. But while the job is running we can see that the corresponding files are created under EMRFS under the table temp directories like: hdfs://ip-172-31-18-155.ec2.internal:8020/hive/location/hive.db/hivetable/_temporary/0/task_1513431588574_1185_3_01_000000/00000_0.orc we can see these kind of folders about 2001 ( as we have given the spark.sql.shuffle.partitions = 2001) Question(s): 1) What could cause the job to get launch ~170000 tasks even though we have not enabled spark.speculation 2) When it has completed writing the data to HDFS (EMRFS) why each executor is trying to launch new tasks 3) is there a way we can avoid this? Thanks a lot for looking into this. any inputs related to this will help us a lot. Venkat
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
12-22-2017
12:45 AM
@Karan Alang Can you please test the same command with one broker at a time? i.e. instead of giving all the brokers to the --broker-list it looks like only host1:9093 is having the issue from this: Error message :[2017-12-2119:48:49,846] WARN Fetching topic metadata with correlation id 11for topics [Set(mmtest4)]from broker [BrokerEndPoint(0,<host1>,9093)] failed (kafka.client.ClientUtils$)
java.io.EOFException
... View more
12-11-2017
05:02 AM
@Manfred PAUL Yes, i was looking at only the current session. Can you please check whether you have all the keytabs generated properly for all the services?
... View more
12-11-2017
04:58 AM
@Abhijit Nayak As given by @Jay Kumar SenSharma the JIRA ( https://issues.apache.org/jira/browse/AMBARI-19666 ) was a bug in Ambari 2.4.0, but your Ambari version is 2.5.0.3 which is fixed in this release as per the JIRA, please check the below as given by @Jay Kumar SenSharma Also it might be a browser setting which might be interrupting the complete file download in between. So please try using a different browser to see if the behaviour is persistent?
... View more
12-07-2017
06:32 AM
1 Kudo
@Joffrey C 17/12/0616:29:20 WARN metastore: set_ugi()not successful,Likely cause:new client talking to old server.Continuing without it. This mostly happens when spark is using the wrong hive-site.xml file, if you notice /etc/spark/conf will have a separate hive-site.xml file which is not same as /etc/hive/conf/hive-site.xml , if you done the upgrade and replaced /etc/spark/conf/hive-site.xml with /etc/hive/conf/hive-site.xml then these kind of issues occur.
... View more
12-07-2017
05:50 AM
@Abhijit Nayak I have seen this issue, if there is any issues in Excel formatting, Excel use to show the different row counts, instead of checking the row counts, can you please validate the data and see you have any data mismatch issues.
... View more
12-07-2017
05:44 AM
@Manfred PAUL In relation to this: When I connect myself as the hive user here is what I get (without a principal / keytab to test): sudo su - hive klist klist:No credentials cache found (ticket cache FILE:/tmp/krb5cc_xxxx) are you missing kinit? Can you please provide how you are generating the ticket for hive user? did you configure it in .bashrc?
... View more
12-07-2017
05:39 AM
@Abhijit Nayak Not sure about that, i didn't get a chance to test it, i will test and let you know if i can see the similar issue or any workaround.
... View more