Member since
11-04-2016
74
Posts
16
Kudos Received
7
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3167 | 02-28-2019 03:22 AM | |
2829 | 02-01-2019 01:15 AM | |
4067 | 04-16-2018 03:38 AM | |
32341 | 09-16-2017 04:36 AM | |
8980 | 09-11-2017 02:43 PM |
01-16-2019
07:41 AM
Hi Nick, Thanks for the advice. I have added "dr.who" to the list and now everything is back to normal! Many thanks mate 🙂
... View more
01-11-2019
02:36 AM
Hi @lwang Thanks for your reply. My cluster is not Kerberized. Also, SPNEGO is not selected/enabled. Here is my fair-scheduler.xml file: <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<allocations>
<queue name="root">
<weight>1.0</weight>
<schedulingPolicy>drf</schedulingPolicy>
<aclSubmitApps>maziyar,test-user,hdfs,admin iscpif-hadoop,admin,hdfs,hive</aclSubmitApps>
<aclAdministerApps>maziyar,admin </aclAdministerApps>
<queue name="users" type="parent">
<maxResources>60.0%</maxResources>
<maxChildResources>10.0%</maxChildResources>
<maxRunningApps>15</maxRunningApps>
<weight>4.0</weight>
<schedulingPolicy>drf</schedulingPolicy>
<aclSubmitApps>*</aclSubmitApps>
<aclAdministerApps>maziyar,root,spark,hdfs </aclAdministerApps>
<queue name="mpanahi">
<maxResources>30.0%</maxResources>
<weight>3.0</weight>
<schedulingPolicy>drf</schedulingPolicy>
</queue>
</queue>
<queue name="default">
<maxResources>10.0%</maxResources>
<weight>1.0</weight>
<schedulingPolicy>fifo</schedulingPolicy>
<aclSubmitApps>*</aclSubmitApps>
<aclAdministerApps>*</aclAdministerApps>
</queue>
<queue name="multivac">
<maxResources>80.0%</maxResources>
<maxRunningApps>3</maxRunningApps>
<weight>5.0</weight>
<schedulingPolicy>drf</schedulingPolicy>
<aclSubmitApps>mziyar,hdfs,hive </aclSubmitApps>
<aclAdministerApps>maziyar </aclAdministerApps>
</queue>
</queue>
<defaultQueueSchedulingPolicy>drf</defaultQueueSchedulingPolicy>
<queuePlacementPolicy>
<rule name="specified" create="false"/>
<rule name="nestedUserQueue" create="true">
<rule name="default" create="true" queue="users"/>
</rule>
<rule name="default"/>
</queuePlacementPolicy>
</allocations> Thanks again for your follow up, I really appreciate it. Best, Maziyar
... View more
01-09-2019
06:33 AM
Hi @lwang, My previous version was the latest CDH 5 (5.16.x) which it became compatible to upgrade to 6.1. Yes, everything else is fine: HDFS browser, Hive editor, notebook, etc. I don't have Impala. I only see the error if I access Hue for the first time. As you can see in search, there is a solr_url. Out of hue.ini and hue_safety_valve_server.ini together with hue_safety_valve.ini I only have "hue_safety_valve.ini" which is for notebook and there is nowhere that I mentioned anything about solr. Many thanks.
... View more
01-09-2019
06:21 AM
Hi @edy, Thanks for the reply. I have tested reading CSV and JSON files in spark-shell, and it was fine! So I figured this is a similar issue as before with Zeppelin as the Zepplin 0.8.x is using a different version of "commons.lang3" (3.5) than the one used in Spark 2.4 (3.7). I have reported this issue already: https://issues.apache.org/jira/projects/ZEPPELIN/issues/ZEPPELIN-3939 Sorry for the false alarm 🙂
... View more
01-09-2019
06:18 AM
.Hi @lwang, Yes! I have yarn.acl.enable to not allow other users to have admin level access to the queues (mostly not kill others' application by mistake). My username has an admin level access, but I have the same wrong format for my own applications as well as the other users' apps in YARN UI. In the section "Administration Access Control" of the queues, there are only two options: Allow anyone to administer this pool Allow these users and groups to administer this pool Which I chose the second one with listing my own username and few others (system). I couldn't find any file name fair-scheduler.xml in all my servers, is this something I should generate? Also, I couldn't find "startedTime" in aclAdministerApps But I have the JSON format of what's inside aclAdministerApps (I think it is generated automatically): {
"defaultFairSharePreemptionThreshold": null,
"defaultFairSharePreemptionTimeout": null,
"defaultMinSharePreemptionTimeout": null,
"defaultQueueSchedulingPolicy": "drf",
"queueMaxAMShareDefault": null,
"queueMaxAppsDefault": null,
"queuePlacementRules": [{
"create": false,
"name": "specified",
"queue": null,
"rules": null
}, {
"create": true,
"name": "nestedUserQueue",
"queue": null,
"rules": [{
"create": true,
"name": "default",
"queue": "users",
"rules": null
}]
}, {
"create": null,
"name": "default",
"queue": null,
"rules": null
}],
"queues": [{
"aclAdministerApps": "maziyar ",
"aclSubmitApps": "maziyar,test-user,hdfs,admin hadoop-admin,admin,hdfs,hive",
"allowPreemptionFrom": null,
"fairSharePreemptionThreshold": null,
"fairSharePreemptionTimeout": null,
"minSharePreemptionTimeout": null,
"name": "root",
"queues": [{
"aclAdministerApps": "maziyar,root,spark,hdfs ",
"aclSubmitApps": "*",
"allowPreemptionFrom": null,
"fairSharePreemptionThreshold": null,
"fairSharePreemptionTimeout": null,
"minSharePreemptionTimeout": null,
"name": "users",
"queues": [{
"aclAdministerApps": null,
"aclSubmitApps": null,
"allowPreemptionFrom": null,
"fairSharePreemptionThreshold": null,
"fairSharePreemptionTimeout": null,
"minSharePreemptionTimeout": null,
"name": "maziyar",
"queues": [],
"schedulablePropertiesList": [{
"impalaClampMemLimitQueryOption": null,
"impalaDefaultQueryMemLimit": null,
"impalaDefaultQueryOptions": null,
"impalaMaxMemory": null,
"impalaMaxQueryMemLimit": null,
"impalaMaxQueuedQueries": null,
"impalaMaxRunningQueries": null,
"impalaMinQueryMemLimit": null,
"impalaQueueTimeout": null,
"maxAMShare": null,
"maxChildResources": null,
"maxResources": {
"cpuPercent": 30.0,
"memory": null,
"memoryPercent": 30.0,
"vcores": null
},
"maxRunningApps": null,
"minResources": null,
"scheduleName": "default",
"weight": 3.0
}],
"schedulingPolicy": "drf",
"type": null
}],
"schedulablePropertiesList": [{
"impalaClampMemLimitQueryOption": null,
"impalaDefaultQueryMemLimit": null,
"impalaDefaultQueryOptions": null,
"impalaMaxMemory": null,
"impalaMaxQueryMemLimit": null,
"impalaMaxQueuedQueries": null,
"impalaMaxRunningQueries": null,
"impalaMinQueryMemLimit": null,
"impalaQueueTimeout": null,
"maxAMShare": null,
"maxChildResources": {
"cpuPercent": 10.0,
"memory": null,
"memoryPercent": 10.0,
"vcores": null
},
"maxResources": {
"cpuPercent": 60.0,
"memory": null,
"memoryPercent": 60.0,
"vcores": null
},
"maxRunningApps": 15,
"minResources": null,
"scheduleName": "default",
"weight": 4.0
}],
"schedulingPolicy": "drf",
"type": "parent"
}, {
"aclAdministerApps": "*",
"aclSubmitApps": "*",
"allowPreemptionFrom": null,
"fairSharePreemptionThreshold": null,
"fairSharePreemptionTimeout": null,
"minSharePreemptionTimeout": null,
"name": "default",
"queues": [],
"schedulablePropertiesList": [{
"impalaClampMemLimitQueryOption": null,
"impalaDefaultQueryMemLimit": null,
"impalaDefaultQueryOptions": null,
"impalaMaxMemory": null,
"impalaMaxQueryMemLimit": null,
"impalaMaxQueuedQueries": null,
"impalaMaxRunningQueries": null,
"impalaMinQueryMemLimit": null,
"impalaQueueTimeout": null,
"maxAMShare": null,
"maxChildResources": null,
"maxResources": {
"cpuPercent": 10.0,
"memory": null,
"memoryPercent": 10.0,
"vcores": null
},
"maxRunningApps": null,
"minResources": null,
"scheduleName": "default",
"weight": 1.0
}],
"schedulingPolicy": "fifo",
"type": null
}, {
"aclAdministerApps": "maziyar ",
"aclSubmitApps": "mziyar,hdfs,hive ",
"allowPreemptionFrom": null,
"fairSharePreemptionThreshold": null,
"fairSharePreemptionTimeout": null,
"minSharePreemptionTimeout": null,
"name": "multivac",
"queues": [],
"schedulablePropertiesList": [{
"impalaClampMemLimitQueryOption": null,
"impalaDefaultQueryMemLimit": null,
"impalaDefaultQueryOptions": null,
"impalaMaxMemory": null,
"impalaMaxQueryMemLimit": null,
"impalaMaxQueuedQueries": null,
"impalaMaxRunningQueries": null,
"impalaMinQueryMemLimit": null,
"impalaQueueTimeout": null,
"maxAMShare": null,
"maxChildResources": null,
"maxResources": {
"cpuPercent": 80.0,
"memory": null,
"memoryPercent": 80.0,
"vcores": null
},
"maxRunningApps": 3,
"minResources": null,
"scheduleName": "default",
"weight": 5.0
}],
"schedulingPolicy": "drf",
"type": null
}],
"schedulablePropertiesList": [{
"impalaClampMemLimitQueryOption": null,
"impalaDefaultQueryMemLimit": null,
"impalaDefaultQueryOptions": null,
"impalaMaxMemory": null,
"impalaMaxQueryMemLimit": null,
"impalaMaxQueuedQueries": null,
"impalaMaxRunningQueries": null,
"impalaMinQueryMemLimit": null,
"impalaQueueTimeout": null,
"maxAMShare": null,
"maxChildResources": null,
"maxResources": null,
"maxRunningApps": null,
"minResources": null,
"scheduleName": "default",
"weight": 1.0
}],
"schedulingPolicy": "drf",
"type": null
}],
"userMaxAppsDefault": null,
"users": []
} Many thanks, I feel we are close to solving this problem 🙂
... View more
01-08-2019
06:11 AM
Also, the reading CSV files result in this error, I thought maybe helps to narrow down the problem: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 4 times, most recent failure: Lost task 0.3 in stage 4.0 (TID 117, hadoop-16, executor 3): java.io.InvalidClassException: org.apache.commons.lang3.time.FastDateParser; local class incompatible: stream classdesc serialVersionUID = 2, local class serialVersionUID = 3 at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:490) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1170) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2178) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:490) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1170) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2178) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:490) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1170) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2178) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$11.apply(Executor.scala:407) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:413) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1890) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1878) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1877) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:929) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:929) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:929) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2111) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2060) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2049) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:740) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2073) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2094) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2113) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:365) at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3383) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2544) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2544) at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3364) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3363) at org.apache.spark.sql.Dataset.head(Dataset.scala:2544) at org.apache.spark.sql.Dataset.take(Dataset.scala:2758) at org.apache.spark.sql.Dataset.getRows(Dataset.scala:254) at org.apache.spark.sql.Dataset.showString(Dataset.scala:291) at org.apache.spark.sql.Dataset.show(Dataset.scala:745) at org.apache.spark.sql.Dataset.show(Dataset.scala:704) at org.apache.spark.sql.Dataset.show(Dataset.scala:713) ... 47 elided Caused by: java.io.InvalidClassException: org.apache.commons.lang3.time.FastDateParser; local class incompatible: stream classdesc serialVersionUID = 2, local class serialVersionUID = 3 at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:490) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1170) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2178) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:490) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1170) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2178) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:490) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1170) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2178) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$11.apply(Executor.scala:407) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:413) ... 3 more
... View more
01-08-2019
04:46 AM
Hi, I have a simple single that used to read a JSON file on HDFS (line by line) into a DataFram. After upgrading to 6.1 and Spark 2.4 from my CDH 5.16 with Spark 2.3, now I can't run the same code. I am facing this error: org.apache.spark.SparkException: Job aborted due to stage failure: Aborting TaskSet 0.0 because task 22 (partition 22) cannot run anywhere due to node and executor blacklist. Most recent failure: Lost task 22.1 in stage 0.0 (TID 3, hadoop-9, executor 2): java.io.InvalidClassException: org.apache.commons.lang3.time.FastDateParser; local class incompatible: stream classdesc serialVersionUID = 2, local class serialVersionUID = 3 at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$11.apply(Executor.scala:407) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:413) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Blacklisting behavior can be configured via spark.blacklist.*. at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1890) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1878) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1877) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:929) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:929) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:929) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2111) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2060) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2049) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:740) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2073) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2170) at org.apache.spark.sql.catalyst.json.JsonInferSchema$.infer(JsonInferSchema.scala:83) at org.apache.spark.sql.execution.datasources.json.TextInputJsonDataSource$$anonfun$inferFromDataset$1.apply(JsonDataSource.scala:109) at org.apache.spark.sql.execution.datasources.json.TextInputJsonDataSource$$anonfun$inferFromDataset$1.apply(JsonDataSource.scala:109) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) at org.apache.spark.sql.execution.datasources.json.TextInputJsonDataSource$.inferFromDataset(JsonDataSource.scala:108) at org.apache.spark.sql.execution.datasources.json.TextInputJsonDataSource$.infer(JsonDataSource.scala:98) at org.apache.spark.sql.execution.datasources.json.JsonDataSource.inferSchema(JsonDataSource.scala:64) at org.apache.spark.sql.execution.datasources.json.JsonFileFormat.inferSchema(JsonFileFormat.scala:59) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:179) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:179) at scala.Option.orElse(Option.scala:289) at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:178) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:372) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211) at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:391) at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:325) ... 48 elided The code: val simpleDF = spark.read.json(pathToFile) If I change json to text it runs without any problem. I also didn't experience any problem with reading parquet or other things I have inside my pipeline. It's just reading JSON. I haven't seen anything regarding JSON in Spark 2.4, so I don't know what is happening here. Many thanks
... View more
Labels:
- Labels:
-
Apache Spark
01-08-2019
02:30 AM
1 Kudo
Hi,
I have upgraded my CM/CDH to 6.1 and every time I login into Hue I receive this error:
Solr server could not be contacted properly: HTTPConnectionPool(host='localhost', port=8983): Max retries exceeded with url: /solr/admin/info/system?user.name=hue&doAs=mpanahi&wt=json (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused',))
The problem is I have never had Solr installed nor configured inside Hue. The Solr is None in Hue inside CM and there is no such thing that refers to [search] or [dashboard] inside my hue_safety_valve.ini so I don't know why it tries to connect to something without me setting it up.
Does anyone have any idea why in the new Hue it tries to connect to Solr without being it installed or selected?
Many thanks,
... View more
Labels:
- Labels:
-
Apache Solr
-
Cloudera Hue
01-08-2019
02:13 AM
Just to update this post, I have upgraded to CM/CDH 6.1 and I still experiencing the same thing! I am out of ideas and don't know how to fix this 🙂
... View more
11-22-2018
09:00 AM
Hi, What do you suggest instead to overcome this issue? I mean being able to use something like ansible and ask all the node to run conda install ... ?
... View more