Created 10-14-2024 09:40 PM
Please help me,
I got an issue with the posted job by Livy with the error " table not found", even though the table exists.
this script for run job Livy
curl -X POST --data '{
"file": "hdfs:///scripts/get_count.py",
"conf": {
"spark.executor.memory": "17g",
"spark.executor.cores": "5",
"spark.executor.instances": "23",
"spark.driver.memory": "17g",
"spark.driver.cores": "5",
"spark.default.parallelism": "110",
"spark.driver.maxResultSize": "17g",
"spark.driver.memoryOverhead": "1740m",
"spark.executor.memoryOverhead": "1740m",
"spark.dynamicAllocation.enabled": "false",
"spark.sql.adaptive.enabled": "true",
"spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
"spark.sql.catalog.spark_catalog": "org.apache.iceberg.spark.SparkCatalog",
"spark.sql.catalog.spark_catalog.type": "hive",
"spark.sql.catalog.spark_catalog.uri": "thrift://master1.com:9083,thrift://master2.com:9083",
"spark.sql.catalog.spark_catalog.warehouse": "hdfs:///apps/spark/warehouse"
},
"jars": ["hdfs:///user/spark/iceberg-jars/iceberg-spark-runtime-3.4_2.12-1.5.2.jar"],
"name": "your-spark-job",
"executorMemory": "17g",
"executorCores": 5,
"numExecutors": 23,
"driverMemory": "17g",
"driverCores": 5
}' -H "Content-Type: application/json" http://master1.com:8998/batches
but show error in log Livy
24/10/15 04:26:20 INFO SparkContext: Running Spark version 3.4.2.1.2.2.0-46
24/10/15 04:26:20 INFO ResourceUtils: ==============================================================
24/10/15 04:26:20 INFO ResourceUtils: No custom resources configured for spark.driver.
24/10/15 04:26:20 INFO ResourceUtils: ==============================================================
24/10/15 04:26:20 INFO SparkContext: Submitted application: LIVY
24/10/15 04:26:20 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(memoryOverhead -> name: memoryOverhead, amount: 921, script: , vendor: , cores -> name: cores, amount: 5, script: , vendor: , memory -> name: memory, amount: 9216, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
24/10/15 04:26:20 INFO ResourceProfile: Limiting resource is cpus at 5 tasks per executor
24/10/15 04:26:20 INFO ResourceProfileManager: Added ResourceProfile id: 0
24/10/15 04:26:20 INFO SecurityManager: Changing view acls to: root
24/10/15 04:26:20 INFO SecurityManager: Changing modify acls to: root
24/10/15 04:26:20 INFO SecurityManager: Changing view acls groups to:
24/10/15 04:26:20 INFO SecurityManager: Changing modify acls groups to:
24/10/15 04:26:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: root; groups with view permissions: EMPTY; users with modify permissions: root; groups with modify permissions: EMPTY
24/10/15 04:26:21 INFO Utils: Successfully started service 'sparkDriver' on port 42139.
24/10/15 04:26:21 INFO SparkEnv: Registering MapOutputTracker
24/10/15 04:26:21 INFO SparkEnv: Registering BlockManagerMaster
24/10/15 04:26:21 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
24/10/15 04:26:21 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
24/10/15 04:26:21 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
24/10/15 04:26:21 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-cca161d1-f821-457b-ba47-9b3105aa298e
24/10/15 04:26:21 INFO MemoryStore: MemoryStore started with capacity 8.9 GiB
24/10/15 04:26:21 INFO SparkEnv: Registering OutputCommitCoordinator
24/10/15 04:26:22 INFO JettyUtils: Start Jetty 0.0.0.0:4040 for SparkUI
24/10/15 04:26:22 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
24/10/15 04:26:22 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
24/10/15 04:26:22 INFO Utils: Successfully started service 'SparkUI' on port 4042.
24/10/15 04:26:22 INFO SparkContext: Added JAR hdfs:///user/spark/iceberg-jars/iceberg-spark-runtime-3.4_2.12-1.5.2.jar at hdfs:///user/spark/iceberg-jars/iceberg-spark-runtime-3.4_2.12-1.5.2.jar with timestamp 1728966380634
24/10/15 04:26:22 INFO AHSProxy: Connecting to Application History server at master1.com/192.168.2.211:10200
24/10/15 04:26:23 INFO ConfiguredRMFailoverProxyProvider: Failing over to rm2
24/10/15 04:26:23 INFO Configuration: found resource resource-types.xml at file:/usr/odp/1.2.2.0-46/hadoop/conf/resource-types.xml
24/10/15 04:26:23 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (39936 MB per container)
24/10/15 04:26:23 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
24/10/15 04:26:23 INFO Client: Setting up container launch context for our AM
24/10/15 04:26:23 INFO Client: Setting up the launch environment for our AM container
24/10/15 04:26:23 INFO Client: Preparing resources for our AM container
24/10/15 04:26:23 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
24/10/15 04:26:27 INFO Client: Uploading resource file:/tmp/spark-a0d06438-489e-4606-af57-eb39cadc53ab/__spark_libs__1742541905345715334.zip -> hdfs://ha-cluster/user/root/.sparkStaging/application_1728888529853_1238/__spark_libs__1742541905345715334.zip
24/10/15 04:26:37 INFO Client: Source and destination file systems are the same. Not copying hdfs://ha-cluster/user/spark/iceberg-jars/iceberg-spark-runtime-3.4_2.12-1.5.2.jar
24/10/15 04:26:38 INFO Client: Uploading resource file:/usr/odp/1.2.2.0-46/spark3/python/lib/pyspark.zip -> hdfs://ha-cluster/user/root/.sparkStaging/application_1728888529853_1238/pyspark.zip
24/10/15 04:26:38 INFO Client: Uploading resource file:/usr/odp/1.2.2.0-46/spark3/python/lib/py4j-0.10.9.7-src.zip -> hdfs://ha-cluster/user/root/.sparkStaging/application_1728888529853_1238/py4j-0.10.9.7-src.zip
24/10/15 04:26:38 INFO Client: Uploading resource file:/tmp/spark-a0d06438-489e-4606-af57-eb39cadc53ab/__spark_conf__7724803187618892992.zip -> hdfs://ha-cluster/user/root/.sparkStaging/application_1728888529853_1238/__spark_conf__.zip
24/10/15 04:26:38 INFO SecurityManager: Changing view acls to: root
24/10/15 04:26:38 INFO SecurityManager: Changing modify acls to: root
24/10/15 04:26:38 INFO SecurityManager: Changing view acls groups to:
24/10/15 04:26:38 INFO SecurityManager: Changing modify acls groups to:
24/10/15 04:26:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: root; groups with view permissions: EMPTY; users with modify permissions: root; groups with modify permissions: EMPTY
24/10/15 04:26:38 INFO Client: Submitting application application_1728888529853_1238 to ResourceManager
24/10/15 04:26:38 INFO YarnClientImpl: Submitted application application_1728888529853_1238
24/10/15 04:26:39 INFO Client: Application report for application_1728888529853_1238 (state: ACCEPTED)
24/10/15 04:26:39 INFO Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1728966398738
final status: UNDEFINED
tracking URL: http://master2.com:8088/proxy/application_1728888529853_1238/
user: root
24/10/15 04:26:40 INFO Client: Application report for application_1728888529853_1238 (state: ACCEPTED)
24/10/15 04:26:41 INFO Client: Application report for application_1728888529853_1238 (state: ACCEPTED)
24/10/15 04:26:42 INFO Client: Application report for application_1728888529853_1238 (state: ACCEPTED)
24/10/15 04:26:44 INFO Client: Application report for application_1728888529853_1238 (state: ACCEPTED)
24/10/15 04:26:45 INFO Client: Application report for application_1728888529853_1238 (state: ACCEPTED)
24/10/15 04:26:46 INFO Client: Application report for application_1728888529853_1238 (state: RUNNING)
24/10/15 04:26:46 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 192.168.2.235
ApplicationMaster RPC port: -1
queue: default
start time: 1728966398738
final status: UNDEFINED
tracking URL: http://master2.com:8088/proxy/application_1728888529853_1238/
user: root
24/10/15 04:26:46 INFO YarnClientSchedulerBackend: Application application_1728888529853_1238 has started running.
24/10/15 04:26:46 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40573.
24/10/15 04:26:46 INFO NettyBlockTransferService: Server created on master1.com:40573
24/10/15 04:26:46 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
24/10/15 04:26:46 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, master1.com, 40573, None)
24/10/15 04:26:46 INFO BlockManagerMasterEndpoint: Registering block manager master1.com:40573 with 8.9 GiB RAM, BlockManagerId(driver, master1.com, 40573, None)
24/10/15 04:26:46 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, master1.com, 40573, None)
24/10/15 04:26:46 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, master1.com, 40573, None)
24/10/15 04:26:46 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> master1.com,master2.com, PROXY_URI_BASES -> http://master1.com:8088/proxy/application_1728888529853_1238,http://master2.com:8088/proxy/application_1728888529853_1238, RM_HA_URLS -> master1.com:8088,master2.com:8088), /proxy/application_1728888529853_1238
24/10/15 04:26:46 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
24/10/15 04:26:46 INFO SingleEventLogFileWriter: Logging events to hdfs:/spark3-history/application_1728888529853_1238.inprogress
24/10/15 04:26:46 INFO ServerInfo: Adding filter to /jobs: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
24/10/15 04:26:46 INFO ServerInfo: Adding filter to /jobs/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
24/10/15 04:26:46 INFO ServerInfo: Adding filter to /jobs/job: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
24/10/15 04:26:46 INFO ServerInfo: Adding filter to /jobs/job/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
24/10/15 04:26:46 INFO ServerInfo: Adding filter to /stages: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
24/10/15 04:26:46 INFO ServerInfo: Adding filter to /stages/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
24/10/15 04:26:46 INFO ServerInfo: Adding filter to /stages/stage: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
24/10/15 04:26:46 INFO ServerInfo: Adding filter to /stages/stage/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
24/10/15 04:26:47 INFO ServerInfo: Adding filter to /stages/pool: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
24/10/15 04:26:47 INFO ServerInfo: Adding filter to /stages/pool/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
24/10/15 04:26:47 INFO ServerInfo: Adding filter to /storage: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
24/10/15 04:26:47 INFO ServerInfo: Adding filter to /storage/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
24/10/15 04:26:47 INFO ServerInfo: Adding filter to /storage/rdd: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
24/10/15 04:26:47 INFO ServerInfo: Adding filter to /storage/rdd/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
24/10/15 04:26:47 INFO ServerInfo: Adding filter to /environment: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
24/10/15 04:26:47 INFO ServerInfo: Adding filter to /environment/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
24/10/15 04:26:47 INFO ServerInfo: Adding filter to /executors: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
24/10/15 04:26:47 INFO ServerInfo: Adding filter to /executors/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
24/10/15 04:26:47 INFO ServerInfo: Adding filter to /executors/threadDump: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
24/10/15 04:26:47 INFO ServerInfo: Adding filter to /executors/threadDump/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
24/10/15 04:26:47 INFO ServerInfo: Adding filter to /static: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
24/10/15 04:26:47 INFO ServerInfo: Adding filter to /: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
24/10/15 04:26:47 INFO ServerInfo: Adding filter to /api: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
24/10/15 04:26:47 INFO ServerInfo: Adding filter to /jobs/job/kill: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
24/10/15 04:26:47 INFO ServerInfo: Adding filter to /stages/stage/kill: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
24/10/15 04:26:47 INFO ServerInfo: Adding filter to /metrics/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
24/10/15 04:26:52 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000000000(ns)
24/10/15 04:26:52 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.
24/10/15 04:26:52 INFO SharedState: Warehouse path is 'hdfs://ha-cluster/apps/spark/warehouse'.
24/10/15 04:26:52 INFO ServerInfo: Adding filter to /SQL: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
24/10/15 04:26:52 INFO ServerInfo: Adding filter to /SQL/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
24/10/15 04:26:52 INFO ServerInfo: Adding filter to /SQL/execution: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
24/10/15 04:26:52 INFO ServerInfo: Adding filter to /SQL/execution/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
24/10/15 04:26:52 INFO ServerInfo: Adding filter to /static/sql: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
24/10/15 04:26:55 INFO HiveConf: Found configuration file file:/etc/spark3/1.2.2.0-46/0/hive-site.xml
24/10/15 04:26:55 INFO metastore: Trying to connect to metastore with URI thrift://master2.com:9083
24/10/15 04:26:55 INFO metastore: Opened a connection to metastore, current connections: 1
24/10/15 04:26:55 INFO metastore: Connected to metastore.
Traceback (most recent call last):
File "/tmp/spark-5c957e19-f868-4993-a196-f5832dc0a1d9/get_count.py", line 29, in <module>
df = spark.sql("SELECT * FROM user_clients LIMIT 10")
File "/usr/odp/1.2.2.0-46/spark3/python/lib/pyspark.zip/pyspark/sql/session.py", line 1440, in sql
File "/usr/odp/1.2.2.0-46/spark3/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
File "/usr/odp/1.2.2.0-46/spark3/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", line 175, in deco
pyspark.errors.exceptions.captured.AnalysisException: [TABLE_OR_VIEW_NOT_FOUND] The table or view `user_clients` cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS.; line 1 pos 14;
'GlobalLimit 10
+- 'LocalLimit 10
+- 'Project [*]
+- 'UnresolvedRelation [user_clients], [], false
24/10/15 04:26:56 INFO SparkContext: Invoking stop() from shutdown hook
24/10/15 04:26:56 INFO SparkContext: SparkContext is stopping with exitCode 0.
24/10/15 04:26:57 INFO SparkUI: Stopped Spark web UI at http://master1.com:4042
24/10/15 04:26:57 INFO YarnClientSchedulerBackend: Interrupting monitor thread
24/10/15 04:26:57 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.2.223:41684) with ID 3, ResourceProfileId 0
24/10/15 04:26:57 INFO YarnClientSchedulerBackend: Shutting down all executors
24/10/15 04:26:57 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
24/10/15 04:26:57 INFO YarnClientSchedulerBackend: YARN client scheduler backend Stopped
24/10/15 04:26:57 INFO BlockManagerMasterEndpoint: Registering block manager slave6.com:34935 with 4.6 GiB RAM, BlockManagerId(3, slave6.com, 34935, None)
24/10/15 04:26:57 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
24/10/15 04:26:57 WARN Dispatcher: Message RequestMessage(192.168.2.223:41684, NettyRpcEndpointRef(spark://CoarseGrainedScheduler@master1.com:42139), RemoveExecutor(3,Unable to create executor due to Exception thrown in awaitResult: )) dropped due to sparkEnv is stopped. Could not find CoarseGrainedScheduler.
24/10/15 04:26:57 INFO MemoryStore: MemoryStore cleared
24/10/15 04:26:57 INFO BlockManager: BlockManager stopped
24/10/15 04:26:57 INFO BlockManagerMaster: BlockManagerMaster stopped
24/10/15 04:26:57 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
24/10/15 04:26:57 INFO SparkContext: Successfully stopped SparkContext
24/10/15 04:26:57 INFO ShutdownHookManager: Shutdown hook called
24/10/15 04:26:57 INFO ShutdownHookManager: Deleting directory /tmp/spark-a0d06438-489e-4606-af57-eb39cadc53ab/pyspark-814ca5e7-dd77-4dca-a14f-f80b8876408c
24/10/15 04:26:57 INFO ShutdownHookManager: Deleting directory /tmp/spark-5c957e19-f868-4993-a196-f5832dc0a1d9
24/10/15 04:26:57 INFO ShutdownHookManager: Deleting directory /tmp/spark-a0d06438-489e-4606-af57-eb39cadc53ab