Member since
06-22-2016
36
Posts
7
Kudos Received
0
Solutions
10-12-2016
06:44 AM
Hi, I am using HDP 2.4.0. I have created hive table called table1 using Spark application, the data is stored in parquet format and the type of data is Complex JSON. I get the incremental data on hourly basis from MongoDB into this table and this table is External table. Now i have created same table2 with same schema as table1 and tried to perform INSERTION into it. But it is throwing some exception. Exception : Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163)
... 8 more
Caused by: java.lang.RuntimeException: Parquet record is malformed: empty fields are illegal, the field should be ommited completely instead
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:64)
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59)
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)
at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121)
at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123)
at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42)
at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:111)
at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:124)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:753)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
... 9 more
Caused by: parquet.io.ParquetEncodingException: empty fields are illegal, the field should be ommited completely instead
at parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.endField(MessageColumnIO.java:244)
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeArray(DataWritableWriter.java:186)
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeValue(DataWritableWriter.java:113)
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeGroupFields(DataWritableWriter.java:89)
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeGroup(DataWritableWriter.java:146)
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeValue(DataWritableWriter.java:119)
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeGroupFields(DataWritableWriter.java:89)
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:60)
... 23 more
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
Note: In JSON, some of the columns may be empty. Please help me how to handle this kind of situation. Thanks in advance
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
09-13-2016
10:47 AM
@Laurence Da Luz Thanks for the response. What if the process fails while fetching (or) storing the data. Is it feasible for production use. How to execute this NIFI on Hourly basis.
... View more
09-13-2016
07:18 AM
@Joe Widen Thanks for the response. I did tried to increase but there is no change in performance. Infact please find the below parameters which i have used. I have two datasets where one dataset size will be ~200MB daily and Master dataset will be ~20GB. Please note The Master dataset size increasing daily. conf.set("spark.shuffle.blockTransferService", "nio");
conf.set("spark.files.overwrite","true");
conf.set("spark.kryoserializer.buffer", "70");
conf.set("spark.driver.extraJavaOptions", "-XX:+UseG1GC");
conf.set("spark.executor.extraJavaOptions", "-XX:+UseG1GC");
conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
conf.set("spark.broadcast.compress", "true");
conf.set("spark.shuffle.compress", "true");
conf.set("spark.shuffle.spill.compress", "true");
conf.set("spark.io.compression.codec","org.apache.spark.io.LZ4CompressionCodec");
conf.set("spark.sql.inMemoryColumnarStorage.compressed", "true");
conf.set("spark.sql.autoBroadcastJoinThreshold","100485760");
Kindly suggest me for the above scenario
... View more
09-13-2016
07:04 AM
@Rajkumar Singh I did tried this example. But my Mongo Json data is Very complex
... View more
09-13-2016
05:55 AM
2 Kudos
Hi All, Can someone please kind to tell me how to import the data from MongoDB to HBase using Spark or Without Using Spark. If not any other Way. Regards, Vijay
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Spark
09-08-2016
02:46 PM
@ymoiseev Thanks for the response. Infact i am using Hive context and joining the two dataframes and doing analytical functions like ranking and row over partition. Can you please help me how to use this broadcast variable for these dataframes using the java code so that it would be helpful.
... View more
09-06-2016
03:57 PM
1 Kudo
Hi All, Can some one please clarify how to create broadcast variable for a Dataframe. Also help me with an example which would be great. Thanks in advance. Regards, Vijay
... View more
Labels:
- Labels:
-
Apache Spark
08-20-2016
04:42 AM
Thanks @Kuldeep Kulkarni It was an permission issue.
... View more
08-19-2016
04:44 PM
Hi all, I have created a shell action in oozie workflow. It was executing perfectly a while ago. Later day it started showing below error. Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.ShellMain], main() threw exception, Cannot run program "checkForFileExistance.sh" (in directory "/data1/hadoop/yarn/local/usercache/hadoop/appcache/application_1471524954637_0316/container_e27_1471524954637_0316_01_000002"): error=2, No such file or directory Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], main() threw exception, Cannot run program "checkForFileExistance.sh" (in directory "/data1/hadoop/yarn/local/usercache/hadoop/appcache/application_1471524954637_0316/container_e27_1471524954637_0316_01_000002"): error=2, No such file or directory
java.io.IOException: Cannot run program "checkForFileExistance.sh" (in directory "/data1/hadoop/yarn/local/usercache/hadoop/appcache/application_1471524954637_0316/container_e27_1471524954637_0316_01_000002"): error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.oozie.action.hadoop.ShellMain.execute(ShellMain.java:95)
at org.apache.oozie.action.hadoop.ShellMain.run(ShellMain.java:57)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
at org.apache.oozie.action.hadoop.ShellMain.main(ShellMain.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:241)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:248)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 17 more Please help me with the error. Please not that the shell script is available in lib folder under workflow directory.
... View more
Labels:
- Labels:
-
Apache Oozie
08-16-2016
03:55 PM
@Timothy Spann I have followed the same. Now i can able to read the parquet file. But how can it be solution.
... View more
08-16-2016
02:15 PM
@Timothy Spann As per your suggestion i have downloaded the parquet tools from github and tried to package , it is throwing an error. Failed to execute goal on project parquet-tools: Could not resolve dependencies for project com.twitter:parquet-tools:jar:1.6.0rc3-SNAPSHOT: Failure to find com.twitter:parquet-hadoop:jar:1.6.0rc3-SNAPSHOT in https://oss.sonatype.org/content/repositories/snapshots was cached in the local repository, resolution will not be reattempted until the update interval of sonatype-nexus-snapshots has elapsed or updates are forced
Please help me !!!
... View more
08-12-2016
08:42 PM
@gopal If i do total number of Id is 4030. Whether i use distinct or not the result will be same, as the Id doesnt have any duplicate records
... View more
08-12-2016
07:25 PM
Hadoop version: 2.7.1.2.4.0.0-169
... View more
08-12-2016
07:23 PM
Spark version : 1.6.0 hive version : 1.2.1
... View more
08-12-2016
07:19 PM
2 Kudos
Hi, I have developed a simple Java Spark application where it fetch the data from MongoDB to HDFS on Hourly basis. The data is stored in Parquet format. Once the data is residing in HDFS, the actual testing began. I am taking a simple row count but it got differed in two scenarios. Will it be possible to have the different count. Code: import org.apache.spark.sql.hive.HiveContext
val hivecontext = new HiveContext(sc)
val parquetFile = hivecontext.parquetFile("/data/daily/2016-08-11_15_31_34.995/*")
parquetFile.count
Result : 4030 Extending the above code and trying to use registerTempTable method the count got differed Code: import org.apache.spark.sql.hive.HiveContext
val hivecontext = new HiveContext(sc)
val parquetFile = hivecontext.parquetFile("/data/daily/2016-08-11_15_31_34.995/*")
parquetFile.registerTempTable("ParquetTable")val ParquetResult = hivecontext.sql("select count(distinct Id) from ParquetTable")ParquetResult.show Result: 4026 This implies the difference between using the direct count & registering temp table count. I am confused why the count is mismatch.Can we know the reason behind the difference in the count. Note : Its a simple java spark application which extracts the data from MongoDB to HDFS.
There is no intermediate transformation added in the code. Regards, Vijay Kumar J
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
-
Apache Spark
07-14-2016
09:09 PM
Hi @Bernhard Walter, Inspite of Creating the Fat jar, the below error also occured Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, org.apache.spark.util.Utils$.DEFAULT_DRIVER_MEM_MB()I
java.lang.NoSuchMethodError: org.apache.spark.util.Utils$.DEFAULT_DRIVER_MEM_MB()I
at org.apache.spark.deploy.yarn.ClientArguments.<init>(ClientArguments.scala:49)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1120)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:104)
at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:95)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:38)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:241)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
... View more
07-14-2016
06:15 PM
Hi @Bernhard Walter, Thanks for the reply!!!. I have followed your idea, but still throwing different error. Please help me. diagnostics: Application application_1468279065782_0300 failed 2 times due to AM Container for appattempt_1468279065782_0300_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://yarnNM:8088/cluster/app/application_1468279065782_0300Then, click on links to logs of each attempt.
Diagnostics: Permission denied: user=hadoop, access=EXECUTE, inode="/user/yarn/.sparkStaging/application_1468279065782_0300/__spark_conf__1316069581048982381.zip":yarn:yarn:drwx------
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1771)
at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:108)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3866)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1076)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:843)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
... View more
07-14-2016
03:47 AM
Hi, I We have installed HDP-2.4.0.0. As per the requirement i need to configure oozie job w.r.t spark action. I have written the code. Workflow.xml: <?xml version="1.0"?>
<workflow-app name="${OOZIE_WF_NAME}" xmlns="uri:oozie:workflow:0.5">
<global>
<configuration>
<property>
<name>oozie.launcher.yarn.app.mapreduce.am.env</name>
<value>SPARK_HOME=/usr/hdp/2.4.0.0-169/spark/</value>
</property>
</configuration>
</global>
<start to="spark-mongo-ETL"/>
<action name="spark-mongo-ETL">
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<master>yarn-cluster</master>
<mode>cluster</mode>
<name>SparkMongoLoading</name>
<class>com.SparkSqlExample</class>
<jar>${nameNode}${WORKFLOW_HOME}/lib/SparkParquetExample-0.0.1-SNAPSHOT.jar</jar>
</spark>
<ok to="End"/>
<error to="killAction"/>
</action>
<kill name="killAction">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="End"/>
</workflow-app>
Job.properties: nameNode=hdfs://nameNode1:8020
jobTracker=yarnNM:8050
queueName=default
user.name=hadoop
oozie.libpath=/user/oozie/share/lib/
oozie.use.system.libpath=true
WORKFLOW_HOME=/user/hadoop/SparkETL
OOZIE_WF_NAME=Spark-Mongo-ETL-wf
SPARK_MONGO_JAR=${nameNode}${WORKFLOW_HOME}/lib/SparkParquetExample-0.0.1-SNAPSHOT.jar
oozie.wf.application.path=${nameNode}/user/hadoop/SparkETL/
Under lib folder 2 jar are placed SparkParquetExample-0.0.1-SNAPSHOT.jar
spark-assembly-1.6.0.2.4.0.0-169-hadoop2.7.1.2.4.0.0-169.jar When I submit the oozie job, the action was killed. Error : Error: java.lang.UnsupportedOperationException: Not implemented by the TFS FileSystem implementation
at org.apache.hadoop.fs.FileSystem.getScheme(FileSystem.java:217)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2624)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2634)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
at org.apache.hadoop.fs.FileSystem.getLocal(FileSystem.java:342)
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.confChanged(LocalDirAllocator.java:270)
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:432)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164)
at org.apache.hadoop.mapred.YarnChild.configureLocalDirs(YarnChild.java:256)
at org.apache.hadoop.mapred.YarnChild.configureTask(YarnChild.java:314)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:146)
Also let me know how to pass the jars and files explicitly in the workflow. Command : spark-submit --class com.SparkSqlExample --master yarn-cluster --num-executors 2 --driver-memory 1g --executor-memory 2g --executor-cores 2 --files /usr/hdp/current/spark-client/conf/hive-site.xml --jars /usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar,/usr/hdp/current/spark-client/lib/jackson-core-2.4.4.jar,/usr/hdp/current/spark-client/lib/mongo-hadoop-spark-1.5.2.jar,/usr/share/java/slf4j-simple-1.7.5.jar,/usr/hdp/current/spark-client/lib/spark-core_2.10-1.6.0.jar,/usr/hdp/current/spark-client/lib/spark-hive_2.10-1.6.0.jar,/usr/hdp/current/spark-client/lib/spark-sql_2.10-1.6.0.jar,/usr/hdp/current/spark-client/lib/mongo-hadoop-core-1.5.2.jar,/usr/hdp/current/spark-client/lib/spark-avro_2.10-2.0.1.jar,/usr/hdp/current/spark-client/lib/spark-csv_2.10-1.4.0.jar,/usr/hdp/current/spark-client/lib/spark-mongodb_2.10-0.11.2.jar,/usr/hdp/current/spark-client/lib/spark-streaming_2.10-1.6.0.jar,/usr/hdp/current/spark-client/lib/commons-csv-1.1.jar,/usr/hdp/current/spark-client/lib/mongodb-driver-3.2.2.jar,/usr/hdp/current/spark-client/lib/mongo-hadoop-master-1.5.2.jar,/usr/hdp/current/spark-client/lib/mongo-java-driver-3.2.2.jar,/usr/hdp/current/spark-client/lib/spark-1.6.0.2.4.0.0-169-yarn-shuffle.jar --conf spark.yarn.jar=hdfs:///user/spark/spark-assembly-1.6.0.2.4.0.0-169-hadoop2.7.1.2.4.0.0-169.jar --conf spark.yarn.executor.memoryOverhead=512 /home/hadoop/SparkParquetExample-0.0.1-SNAPSHOT.jar The above command executes successfully Can anyone suggest me the solution.
... View more
Labels:
- Labels:
-
Apache Spark
07-05-2016
07:04 PM
Hi Trupti, Sorry for delay & thanks for the response. I have tried the above mentioned process, But still facing the same error.
... View more
06-24-2016
03:30 AM
@Timithy I haven't tried Nifi. Please find the below details. yarn.nodemanager.resource.memory-mb=200GB yarn.scheduler.minimum-allocation-mb=2GB yarn.scheduler.maximum-allocation-mb=6GB yarn.scheduler.maximum-allocation-vcores=8 yarn.scheduler.minumum-allocation-vcores=1 yarn.nodemanager.resource.cpu-vcores=16 yarn.nodemanager.resource.percentage-physical-cpu-limit=80%
... View more
06-22-2016
09:48 AM
Please help me.. Thanks in Advance!!!
... View more
06-22-2016
09:48 AM
Also note that there is no Security implemented on the cluster Please help me.. Thanks in Advance!!!
... View more
06-22-2016
09:46 AM
1 Kudo
<a href="/storage/attachments/5152-img-22062016-154852.png">img-22062016-154852.png</a>Hi Team,
We are using HDP 2.4.0.0-169 which is installed on Ubuntu 14.04. There is a spark- mongo application which extracts the data from mongodb into HDFS. This application ins executed by using 3 spark modes which are local[*] , yarn-client and yarn-cluster. All these 3 spark-submit commands works from the command prompt. We have written an oozie workflow to execute the job on hourly basis. While executing the oozie job, job is in RUNNING state but failed due to SLA error. I have followed steps as mentioned below. I have changed the propeties for YARN in ambari yarn.nodemanager.resource.memory-mb = 67436288 and increased upto 200999168
yarn.scheduler.minimum-allocation-mb = 1024 and increased to 2048
yarn.scheduler.maximum-allocation-mb = 8192 and decreased to 6144 2.I have also added below changes in custom spark-defaults : spark.authenticate = false spark.driver.extraLibraryPath = /usr/hdp/2.4.0.0-169/hadoop/lib/native spark.dynamicAllocation.executorIdleTimeout = 60 spark.dynamicAllocation.schedulerBacklogTimeout = 1 spark.executor.extraLibraryPath = /usr/hdp/2.4.0.0-169/hadoop/lib/native spark.serializer = org.apache.spark.serializer.KryoSerializer spark.yarn.am.extraLibraryPath = /usr/hdp/2.4.0.0-169/hadoop/lib/native spark.yarn.config.gatewayPath = /usr/hdp spark.yarn.config.replacementPath = {{HADOOP_COMMON_HOME}}/../../.. spark.yarn.jar = local:/usr/hdp/2.4.0.0-169/spark/lib/spark-assembly-1.6.0.2.4.0.0-169-hadoop2.7.1.2.4.0.0-169.jar 3. Added below property in advanced hadoop-env: export SPARK_HOME=/usr/hdp/2.4.0.0-169/spark 4.I have added spark jars in the oozie lib folder with required permissions. Tried 777 and 755 permissions for all jars. But no luck: spark-assembly-1.6.0.2.4.0.0-169-hadoop2.7.1.2.4.0.0-169.jar
spark-1.6.0.2.4.0.0-169-hadoop2.7.1.2.4.0.0-169-yarn-shuffle.jar 5.Also added the above mentioned jars and below mentioned jars in oozie share folder with required permission but no luck /user/oozie/share/lib/lib_20160420150601/oozie Code is defined as follows: workflow.xml: <?xml version="1.0"?>
<workflow-app name="sparkmongo" xmlns="uri:oozie:workflow:0.5">
<start to="spark-mongo-ETL"/>
<action name="spark-mongo-ETL">
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<master>${master}</master>
<mode>client</mode>
<name>SparkMongoLoading</name>
<class>com.snapfish.spark.etl.MongoETL</class>
<jar>/user/hadoop/Sparkmongo/lib/original-etl-0.0.1-SNAPSHOT.jar</jar>
<spark-opts> spark.driver.extraClassPath=hdfs://namenode1:8020/user/oozie/share/lib/lib_20150711021244/spark/* spark.yarn.historyServer.address=http://yarnNM:19888/ spark.eventLog.dir=hdfs://namenode1:8020/spark-history spark.eventLog.enabled=true </spark-opts>
<arg>orders</arg>
</spark>
<ok to="End"/>
<error to="killAction"/>
</action>
<kill name="killAction">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="End"/>
</workflow-app>
Tried removing below property. but no luck
spark.driver.extraClassPath=hdfs://namenode1:8020/user/oozie/share/lib/lib_20150711021244/spark/* spark.yarn.historyServer.address=http://yarnNM:19888/ spark.eventLog.dir=hdfs://namenode1:8020/spark-history spark.eventLog.enabled=true
Jars in lib folder : mongo-hadoop-spark-1.5.2.jar
mongo-java-driver-3.2.2.jar
mongodb-driver-3.2.2.jar
original-etl-0.0.1-SNAPSHOT.jar
spark-1.6.0.2.4.0.0-169-yarn-shuffle.jar
spark-assembly-1.6.0.2.4.0.0-169-hadoop2.7.1.2.4.0.0-169.jar
spark-core_2.10-1.6.0.jar
spark-mongodb_2.10-0.11.2.jar
Tried removing all jars except below two jars but no luck:
original-etl-0.0.1-SNAPSHOT.jar
spark-assembly-1.6.0.2.4.0.0-169-hadoop2.7.1.2.4.0.0-169.jar
job.properties : nameNode=hdfs://namenode1:8020
jobTracker=yarnNM:8050
master=yarn-client
queueName=default
oozie.use.system.libpath=true
oozie.libpath=/user/oozie/share/lib/
user.name=hadoop
mapreduce.job.username=yarn
oozie.wf.application.path=${nameNode}/user/hadoop/Sparkmongo/
Also changed the below properties but no luck: nameNode=hdfs://namenode1:8020
jobTracker=yarnNM:8050
master=yarn-client
queueName=default
oozie.use.system.libpath=true
user.name=hadoop
oozie.wf.application.path=${nameNode}/user/hadoop/Sparkmongo/ nameNode=hdfs://namenode1:8020
jobTracker=yarnNM:8050
master=yarn-client
queueName=default
oozie.use.system.libpath=true
oozie.libpath=/user/oozie/share/lib/
user.name=hadoop
oozie.wf.application.path=${nameNode}/user/hadoop/Sparkmongo/ After trying all the combinations i am facing some issue which is attached.capture1.jpgcapture2.jpg
... View more
Labels:
- Labels:
-
Apache Oozie
-
Apache Spark
-
Apache YARN