Support Questions

Find answers, ask questions, and share your expertise

Airflow DAG failing

avatar
New Contributor

I have a sh script which moves CSV file from HDFS to HIVE. Airflow has two dags

dag 1: bashoperator with sh script which checks if file exists in HDFS
dag 2: bash operatior with sh script where it runs a spark job to load file from HDFS and load into Hive. When the airflow dag was triggered , dag 1 was successful but dag2 was failing. it couldn't connect to yarn.
When I run the script from dag 2 in Putty I'm able to move the file but when launched from airflow the job is failing. I tried to see error log but I don't find any errors in the log. please find the log below; Could you please help me in understanding the problem

{taskinstance.py:887} INFO - Executing <Task(BashOperator): start_insertfile> on xxx
{standard_task_runner.py:53} INFO - Started process 6927 to run task
{logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: import-file1.start_file1 xxx [running]> Host.bc
{bash_operator.py:82} INFO - Tmp dir root location:
/tmp
Temporary script location: /tmp/airflowtmp/filexxx
Running command: xxxx.sh
Output:
which: no /usr/hdp/2.x.x.0-xxx//hadoop/bin/hadoop.distro in ((null))
dirname: missing operand
Try 'dirname --hxxxp' for more information.
xx_2020_09_30.csv xx.csv
INFO SparkContext:58 - Running Spark version 1.6.3
WARN SparkConf:70 - In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).
WARN SparkConf:70 -
SPARK_CLASSPATH was detected (set to '/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop-compat.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop2-compat.jar:/usr/hdp/current/phoenix-client/phoenix-cient.jar:/usr/hdp/current/phoenix-client/*.jar:/usr/hdp/current/phoenix-client/lib/*.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler-1.2.1000.2.6.3.0-235.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler.jar').
This is deprecated in Spark 1.0+.
Please instead use:
- ./spark-submit with --driver-class-path to augment the driver classpath
- spark.executor.extraClassPath to augment the executor classpath

WARN SparkConf:70 - Setting 'spark.executor.extraClassPath' to '/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop-compat.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop2-compat.jar:/usr/hdp/current/phoenix-client/phoenix-cient.jar:/usr/hdp/current/phoenix-client/*.jar:/usr/hdp/current/phoenix-client/lib/*.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler-1.2.1000.2.6.3.0-235.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler.jar' as a work-around.
WARN SparkConf:70 - Setting 'spark.driver.extraClassPath' to '/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop-compat.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop2-compat.jar:/usr/hdp/current/phoenix-client/phoenix-cient.jar:/usr/hdp/current/phoenix-client/*.jar:/usr/hdp/current/phoenix-client/lib/*.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler-1.2.1000.2.6.3.0-235.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler.jar' as a work-around.
INFO SecurityManager:58 - Changing view acls to: xxx
INFO SecurityManager:58 - Changing modify acls to: xxx
INFO SecurityManager:58 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(xxx); users with modify permissions: Set(xxx)
INFO Utils:58 - Successfully started service 'sparkDriver' on port xxxxx.
INFO Slf4jLogger:80 - Slf4jLogger started
INFO Remoting:74 - Starting remoting
INFO Remoting:74 - Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.xxx.0.xxx:xxxxx]
INFO Utils:58 - Successfully started service 'sparkDriverActorSystem' on port xxxxx.
INFO SparkEnv:58 - Registering MapOutputTracker
INFO SparkEnv:58 - Registering BlockManagerMaster
INFO DiskBlockManager:58 - Created local directory at /spark/blockmgr-xxx
INFO MemoryStore:58 - MemoryStore started with capacity 7.0 GB
INFO SparkEnv:58 - Registering OutputCommitCoordinator
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
...
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/api,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/static,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/executors/json,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/executors,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/environment/json,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/stages/json,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/stages,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/jobs,null}
Utils:70 - Service 'SparkUI' could not bind on port 3040. Attempting port xxx.
Server:272 - jetty-8.y.z-SNAPSHOT
AbstractConnector:338 - Started SxxxectChannxxxConnector@0.0.0.0:xxx
Utils:58 - Successfully started service 'SparkUI' on port xxx.
SparkUI:58 - Bound SparkUI to 0.0.0.0, and started at http://10.109.0.100:xxxx
HttpFileServer:58 - HTTP File server directory is /spark/spark-xxx/httpd-xxx
HttpServer:58 - Starting HTTP Server
Server:272 - jetty-8.y.z-SNAPSHOT
AbstractConnector:338 - Started SocketConnector@xxx
Utils:58 - Successfully started service 'HTTP file server' on port xxx.
SparkContext:58 - Added JAR file:/xxx.jar at http://10.108.0.xxxx:xxxxx/jars/xxx.jar with timestamp xxx
SparkContext:58 - Added JAR file:xxx.jar at http://10.108.0.xxx:xxxxx/jars/xxx.jar with timestamp xxx
spark.yarn.driver.memoryOverhead is set but does not apply in client mode.
AHSProxy:42 - Connecting to Application History server at xxx.bc/xxx
RequestHedgingRMFailoverProxyProvider:146 - Looking for the active RM in [rm1, rm2]...
RequestHedgingRMFailoverProxyProvider:170 - Found active RM [rm1]
Client:58 - Requesting a new application from cluster with 21 NodeManagers
Client:58 - Verifying our application has not requested more than the maximum memory capability of the cluster (614400 MB per container)
Client:58 - Will allocate AM container, with 896 MB memory including 384 MB overhead
Client:58 - Setting up container launch context for our AM
Client:58 - Setting up the launch environment for our AM container
Client:58 - Using the spark assembly jar on HDFS because you are using HDP, defaultSparkAssembly:hdfs://xxx/hdp/apps/2.6.5.0-292/spark/spark-hdp-assembly.jar
Client:58 - Preparing resources for our AM container
YarnSparkHadoopUtil:58 - getting token for namenode: hdfs://xxx/user/xxx/.sparkStaging/application_xxx
DFSClient:1052 - Created HDFS_DxxxEGATION_TOKEN token 29307570 for xxx on ha-hdfs:xxx
metastore:376 - Trying to connect to metastore with URI thrift://xxxxxx.bc:xxxx
metastore:472 - Connected to metastore.
RecoverableZooKeeper:120 - Process identifier=hconnection-xxxx connecting to ZooKeeper ensemble=xxx5176.bc:xxx,xxxxxx.bc:xxx,xxx5178.bc:xxx
ZooKeeper:100 - Client environment:zookeeper.version=3.4.6-xxx--1, built on 05/11/2018 06:40 GMT
ZooKeeper:100 - Client environment:host.name=Host.bc
ZooKeeper:100 - Client environment:java.version=1.8.0_262
ZooKeeper:100 - Client environment:java.vendor=Oracle Corporation
ZooKeeper:100 - Client environment:java.home=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.xxx.b10-0.xxx7_8.x86_64/jre
ZooKeeper:100 - Client environment:java.class.path=/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop-compat.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop2-compat.jar:/usr/hdp/current/phoenix-client/phoenix-cient.jar:/usr/hdp/current/phoenix-client/*.jar:/usr/hdp/current/phoenix-client/lib/*.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler-1.2.1000.2.6.3.0-235.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler.jar:/usr/hdp/current/spark-client/conf/:/usr/hdp/current/spark-client/lib/spark-assembly-1.6.3.2.6.5.0-292-hadoop2.7.3.2.6.5.0-292.jar:/usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar:/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar:/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar:/usr/hdp/2.6.5.0-292/hadoop/conf/:/usr/hdp/current/hadoop-client/lib/aws-java-sdk-s3-1.10.6.jar:/usr/hdp/current/hadoop-client/lib/aws-java-sdk-core-1.10.6.jar:/usr/hdp/current/hadoop-client/lib/aws-java-sdk-kms-1.10.6.jar
ZooKeeper:100 - Client environment:java.library.path=/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
ZooKeeper:100 - Client environment:java.io.tmpdir=/tmp
ZooKeeper:100 - Client environment:java.compiler=<NA>
ZooKeeper:100 - Client environment:os.name=Linux
ZooKeeper:100 - Client environment:os.arch=amd64
ZooKeeper:100 - Client environment:os.version=3.10.0-xxxx.19.1.xxx7.x86_64
ZooKeeper:100 - Client environment:user.name=xxx
ZooKeeper:100 - Client environment:user.home=/home/xxx
ZooKeeper:100 - Client environment:user.dir=/tmp/airflowtmphp8uukgh
ZooKeeper:438 - Initiating client connection, connectString=xxxx.bc:xxx,xxxxxx.bc:xxx,xxx.bc: sessionTimeout=180000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@7af327e3
ClientCnxn:1019 - Opening socket connection to server xxxxxx.bc/10.xxx.0.98:xxx. Will not attempt to authenticate using SASL (unknown error)
ClientCnxn:864 - Socket connection established, initiating session, client: /10.xxx.0.100:xxxxx, server: xxxxxx.bc/xxx
ClientCnxn:1279 - Session establishment complete on server xxxxxx.bc/10.xxx.0.98:xxx, sessionid = xxx, negotiated timeout = 60000
ConnectionManager$HConnectionImplementation:1703 - Closing zookeeper sessionid=xxx
ZooKeeper:684 - Session: xxx closed
ClientCnxn:524 - EventThread shut down
YarnSparkHadoopUtil:58 - Added HBase security token to credentials.
Client:58 - Using the spark assembly jar on HDFS because you are using HDP, defaultSparkAssembly:hdfs://xxx/hdp/apps/2.6.5.0-292/spark/spark-hdp-assembly.jar
Client:58 - Source and destination file systems are the same. Not copying hdfs://xxx/hdp/apps/2.6.5.0-xxx/spark/spark-hdp-assembly.jar
Client:58 - Uploading resource file:/xxx/airflow/xxx.keytab -> hdfs://xxx/user/xxx/.sparkStaging/application_xxx/xxx.keytab
Client:58 - Uploading resource file:/xxx/ux_source/import/conf/file1-log4j.properties -> hdfs://xxx/user/xxx/.sparkStaging/application_xxx/file1-log4j.properties
Client:58 - Uploading resource file:/spark/spark-xxxxx/__spark_conf__xxxx.zip -> hdfs://xxx/user/xxx/.sparkStaging/application_xxx/__spark_conf__xxxxx.zip
SecurityManager:58 - Changing view acls to: xxx
SecurityManager:58 - Changing modify acls to: xxx
SecurityManager:58 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(xxx); users with modify permissions: Set(xxx)
Client:58 - Submitting application 8168 to ResourceManager
TimxxxineClientImpl:302 - Timxxxine service address: http://xxx/ws/v1/timxxxine/
YarnClientImpl:274 - Submitted application application_xxx
SchedulerExtensionServices:58 - Starting Yarn extension services with app application_xxx and attemptId None
Client:58 - Application report for application_xxx (state: ACCEPTED)
Client:58 -
client token: Token { kind: YARN_CLIENT_TOKEN, service: }
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: DAILY
start time: 1601568091265
final status: UNDEFINED
tracking URL: http://xxx5176.bc:8088/proxy/application_xxx/
user: xxx
Client:58 - Application report for application_xxx (state: ACCEPTED)
Client:58 - Application report for application_xxx (state: ACCEPTED)
Client:58 - Application report for application_xxx (state: ACCEPTED)
YarnSchedulerBackend$YarnSchedulerEndpoint:58 - ApplicationMaster registered as NettyRpcEndpointRef(null)
YarnClientSchedulerBackend:58 - Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> xxx5176.bc,xxxxxx.bc, PROXY_URI_BASES -> http://xxx5176.bc:8088/proxy/application_xxx,http://xxxxxx.bc:8088/proxy/application_xxx), /proxy/application_xxx
JettyUtils:58 - Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
Client:58 - Application report for application_xxx (state: RUNNING)
Client:58 -
client token: Token { kind: YARN_CLIENT_TOKEN, service: }
diagnostics: N/A
ApplicationMaster host: 10.xxx.0.xxx
ApplicationMaster RPC port: 0
queue: DAILY
start time: 1601568091265
final status: UNDEFINED
tracking URL: http://xxx5176.bc:8088/proxy/application_xxx/
user: xxx
YarnClientSchedulerBackend:58 - Application application_xxx has started running.
Utils:58 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port xxxxx.
NettyBlockTransferService:58 - Server created on xxxxx
BlockManagerMaster:58 - Trying to register BlockManager
BlockManagerMasterEndpoint:58 - Registering block manager 10.xx.0.xx:xxxxx with 7.0 GB RAM, BlockManagerId(driver, 10.xx.0.xxx, xxxxx)
BlockManagerMaster:58 - Registered BlockManager
EventLoggingListener:58 - Logging events to hdfs:///spark-history/application_xxx
YarnClientSchedulerBackend:58 - Registered executor NettyRpcEndpointRef(null) (xx.bc:xx) with ID 5
YarnClientSchedulerBackend:58 - Registered executor NettyRpcEndpointRef(null) (xx.bc:xx) with ID 1
BlockManagerMasterEndpoint:58 - Registering block manager xx.bc:xx with 7.0 GB RAM, BlockManagerId(5, xx.bc, xx)
BlockManagerMasterEndpoint:58 - Registering block manager xx.bc:xx with 7.0 GB RAM, BlockManagerId(1, xx.bc, xx)
YarnClientSchedulerBackend:58 - Registered executor NettyRpcEndpointRef(null) (xxx.bc:xx) with ID 4
BlockManagerMasterEndpoint:58 - Registering block manager xxx.bc:xxx with 7.0 GB RAM, BlockManagerId(4, xxx.bc, xxx)
YarnClientSchedulerBackend:58 - Registered executor NettyRpcEndpointRef(null) (xxx.bc:xx) with ID 2
BlockManagerMasterEndpoint:58 - Registering block manager xx.bc:xx with 7.0 GB RAM, BlockManagerId(2, xxx.bc, xxx)
YarnClientSchedulerBackend:58 - SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
YarnClientSchedulerBackend:58 - Registered executor NettyRpcEndpointRef(null) (xxx.bc:xxx) with ID 3
BlockManagerMasterEndpoint:58 - Registering block manager xxx.bc:xxx with 7.0 GB RAM, BlockManagerId(3, xxx.bc, xxx)
HiveContext:58 - Initializing execution hive, version 1.2.1
ClientWrapper:58 - Inspected Hadoop version: 2.7.3.2.6.5.0-292
ClientWrapper:58 - Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.3.2.6.5.0-292
HiveMetaStore:589 - 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
ObjectStore:289 - ObjectStore, initialize called
Persistence:77 - Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
Persistence:77 - Property datanucleus.cache.levxxx2 unknown - will be ignored
ObjectStore:370 - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FixxxdSchema,Order"
Datastore:77 - The class "org.apache.hadoop.hive.metastore.modxxx.MFixxxdSchema" is tagged as "embedded-only" so does not have its own datastore table.
Datastore:77 - The class "org.apache.hadoop.hive.metastore.modxxx.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
Datastore:77 - The class "org.apache.hadoop.hive.metastore.modxxx.MFixxxdSchema" is tagged as "embedded-only" so does not have its own datastore table.
Datastore:77 - The class "org.apache.hadoop.hive.metastore.modxxx.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
MetaStoreDirectSql:139 - Using direct SQL, underlying DB is DERBY
ObjectStore:272 - Initialized ObjectStore
ObjectStore:6666 - Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
ObjectStore:568 - Failed to get database default, returning NoSuchObjectException
HiveMetaStore:663 - Added admin role in metastore
HiveMetaStore:672 - Added public role in metastore
HiveMetaStore:712 - No user is added in admin role, since config is empty
SessionState:641 - Created local directory: /tmp/xxx_resources
SessionState:641 - Created HDFS directory: /tmp/hive/xxx/xxx
SessionState:641 - Created local directory: /tmp/xxx/xxx
SessionState:641 - Created HDFS directory: /tmp/hive/xxx/xxx/_tmp_space.db
HiveContext:58 - default warehouse location is /user/hive/warehouse
HiveContext:58 - Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
ClientWrapper:58 - Inspected Hadoop version: 2.7.3.2.6.5.0-292
ClientWrapper:58 - Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.3.2.6.5.0-292
metastore:376 - Trying to connect to metastore with URI thrift://xxxxxx.bc:9xxx
metastore:472 - Connected to metastore.
SessionState:641 - Created local directory: /tmp/xxx
SessionState:641 - Created HDFS directory: /tmp/hive/xxx/xxx
SessionState:641 - Created local directory: /tmp/xxx/xxx
SessionState:641 - Created HDFS directory: /tmp/hive/xxx/xxx/_tmp_space.db
file1$:1212 - Amount of files to be processed: 2
file1$:1215 - Files to be processed;
file.csv, file1.csv
file1$:1220 - Start processing file;file.csv
MemoryStore:58 - Block broadcast_0 stored as values in memory (estimated size 379.1 KB, free 7.0 GB)
MemoryStore:58 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 33.6 KB, free 7.0 GB)
BlockManagerInfo:58 - Added broadcast_0_piece0 in memory on 10.108.0.100:45893 (size: 33.6 KB, free: 7.0 GB)
SparkContext:58 - Created broadcast 0 from textFile at MainRevamp.scala:1227
deprecation:1261 - mapred.job.id is deprecated. Instead, use mapreduce.job.id
deprecation:1261 - mapred.tip.id is deprecated. Instead, use mapreduce.task.id
deprecation:1261 - mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
deprecation:1261 - mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
deprecation:1261 - mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
FileOutputCommitter:123 - File Output Committer Algorithm version is 1
FileOutputCommitter:138 - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
DefaultWriterContainer:58 - Using output committer class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
DFSClient:1052 - Created HDFS_DxxxEGATION_TOKEN token 29307571 for xxx on ha-hdfs:xxx
TokenCache:144 - Got dt for hdfs://xxx; Kind: HDFS_DxxxEGATION_TOKEN, Service: ha-hdfs:xxx, Ident: (HDFS_DxxxEGATION_TOKEN token 29307571 for xxx)
FileInputFormat:249 - Total input paths to process : 1
SparkContext:58 - Starting job: save at MainRevamp.scala:1248
DAGScheduler:58 - Got job 0 (save at MainRevamp.scala:1248) with 1 output partitions
DAGScheduler:58 - Final stage: ResultStage 0 (save at MainRevamp.scala:1248)
DAGScheduler:58 - Parents of final stage: List()
DAGScheduler:58 - Missing parents: List()
DAGScheduler:58 - Submitting ResultStage 0 (MapPartitionsRDD[4] at createDataFrame at MainRevamp.scala:1242), which has no missing parents
MemoryStore:58 - Block broadcast_1 stored as values in memory (estimated size 104.3 KB, free 7.0 GB)
MemoryStore:58 - Block broadcast_1_piece0 stored as bytes in memory (estimated size 39.1 KB, free 7.0 GB)
BlockManagerInfo:58 - Added broadcast_1_piece0 in memory on 10.108.0.100:45893 (size: 39.1 KB, free: 7.0 GB)
SparkContext:58 - Created broadcast 1 from broadcast at DAGScheduler.scala:1008
DAGScheduler:58 - Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[4] at createDataFrame at MainRevamp.scala:1242)
YarnScheduler:58 - Adding task set 0.0 with 1 tasks
TaskSetManager:58 - Starting task 0.0 in stage 0.0 (TID 0, xxx.bc, partition 0,RACK_LOCAL, 2349 bytes)
BlockManagerInfo:58 - Added broadcast_1_piece0 in memory on xxx.bc:463 (size: 39.1 KB, free: 7.0 GB)
BlockManagerInfo:58 - Added broadcast_0_piece0 in memory on xxx.bc:463 (size: 33.6 KB, free: 7.0 GB)
TaskSetManager:58 - Finished task 0.0 in stage 0.0 (TID 0) in 10579 ms on xxx.bc (1/1)
YarnScheduler:58 - Removed TaskSet 0.0, whose tasks have all completed, from pool
DAGScheduler:58 - ResultStage 0 (save at MainRevamp.scala:1248) finished in 10.585 s
DAGScheduler:58 - Job 0 finished: save at MainRevamp.scala:1248, took 10.766074 s
DefaultWriterContainer:58 - Job job_202010011801_0000 committed.
OrcRxxxation:58 - Listing hdfs://xxx/apps/hive/warehouse/xxx/thedate=2020-09-30 on driver
OrcRxxxation:58 - Listing hdfs://xxx/apps/hive/warehouse/xxx/thedate=2020-09-30 on driver
ParseDriver:185 - Parsing command: ALTER TABLE tablename ADD IF NOT EXISTS PARTITION(thedate='2020-09-30')
ParseDriver:209 - Parse Completed
PerfLogger:121 - <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:121 - <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:121 - <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:121 - <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
ParseDriver:185 - Parsing command: ALTER TABLE tablename ADD IF NOT EXISTS PARTITION(thedate='2020-09-30')
ParseDriver:209 - Parse Completed
PerfLogger:148 - </PERFLOG method=parse start=xxx end=xxx duration=1011 from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:121 - <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
Driver:436 - Semantic Analysis Completed
PerfLogger:148 - </PERFLOG method=semanticAnalyze start=xxx end=xxx duration=185 from=org.apache.hadoop.hive.ql.Driver>
Driver:240 - Returning Hive schema: Schema(fixxxdSchemas:null, properties:null)
PerfLogger:148 - </PERFLOG method=compile start=xxx end=xxx duration=1237 from=org.apache.hadoop.hive.ql.Driver>
Driver:160 - Concurrency mode is disabled, not creating a lock manager
PerfLogger:121 - <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
Driver:1328 - Starting command(queryId=xxx): ALTER TABLE tablename ADD IF NOT EXISTS PARTITION(thedate='2020-09-30')
PerfLogger:148 - </PERFLOG method=TimeToSubmit start=xxx end=xxx duration=1244 from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:121 - <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:121 - <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
Driver:1651 - Starting task [Stage-0:DDL] in serial mode
PerfLogger:148 - </PERFLOG method=runTasks start=xxx end=xxx duration=63 from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:148 - </PERFLOG method=Driver.execute start=xxx end=xxx duration=70 from=org.apache.hadoop.hive.ql.Driver>
Driver:951 - OK
PerfLogger:121 - <PERFLOG method=rxxxeasxxxocks from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:148 - </PERFLOG method=rxxxeasxxxocks start=xxx end=xxx duration=0 from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:148 - </PERFLOG method=Driver.run start=xxx end=xxx duration=1308 from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:121 - <PERFLOG method=rxxxeasxxxocks from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:148 - </PERFLOG method=rxxxeasxxxocks start=xxx end=xxx duration=0 from=org.apache.hadoop.hive.ql.Driver>
file1$:1259 - Filefile.csv is processed and all data has been inserted into Hive
file1$:1261 - Filefile.csv has been moved to the /completed directory
file1$:1220 - Start processing file; file.csv
MemoryStore:58 - Block broadcast_2 stored as values in memory (estimated size 379.1 KB, free 7.0 GB)
MemoryStore:58 - Block broadcast_2_piece0 stored as bytes in memory (estimated size 33.6 KB, free 7.0 GB)
BlockManagerInfo:58 - Added broadcast_2_piece0 in memory on xxx(size: 33.6 KB, free: 7.0 GB)
SparkContext:58 - Created broadcast 2 from textFile at MainRevamp.scala:1227
FileOutputCommitter:123 - File Output Committer Algorithm version is 1
FileOutputCommitter:138 - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
DefaultWriterContainer:58 - Using output committer class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
DFSClient:1052 - Created HDFS_DxxxEGATION_TOKEN token 29307572 for xxx on ha-hdfs:xxx
TokenCache:144 - Got dt for hdfs://xxx; Kind: HDFS_DxxxEGATION_TOKEN, Service: ha-hdfs:xxx, Ident: (HDFS_DxxxEGATION_TOKEN token xxx for xxx)
FileInputFormat:249 - Total input paths to process : 1
SparkContext:58 - Starting job: save at MainRevamp.scala:1248
DAGScheduler:58 - Got job 1 (save at MainRevamp.scala:1248) with 1 output partitions
DAGScheduler:58 - Final stage: ResultStage 1 (save at MainRevamp.scala:1248)
DAGScheduler:58 - Parents of final stage: List()
DAGScheduler:58 - Missing parents: List()
DAGScheduler:58 - Submitting ResultStage 1 (MapPartitionsRDD[11] at createDataFrame at MainRevamp.scala:1242), which has no missing parents
MemoryStore:58 - Block broadcast_3 stored as values in memory (estimated size 104.3 KB, free 7.0 GB)
MemoryStore:58 - Block broadcast_3_piece0 stored as bytes in memory (estimated size 39.1 KB, free 7.0 GB)
BlockManagerInfo:58 - Added broadcast_3_piece0 in memory on 10.xx.0.xx:xxxxx (size: 39.1 KB, free: 7.0 GB)
SparkContext:58 - Created broadcast 3 from broadcast at DAGScheduler.scala:1008
DAGScheduler:58 - Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[11] at createDataFrame at MainRevamp.scala:1242)
YarnScheduler:58 - Adding task set 1.0 with 1 tasks
TaskSetManager:58 - Starting task 0.0 in stage 1.0 (TID 1, xxx.bc, partition 0,RACK_LOCAL, 2349 bytes)
BlockManagerInfo:58 - Added broadcast_3_piece0 in memory on xxx.bc:xxx (size: 39.1 KB, free: 7.0 GB)

1 REPLY 1

avatar
New Contributor

Hello,

 

I keep getting the following error. Can you help me with this?

 

{taskinstance.py:887} INFO - Executing <Task(BashOperator): start_insertfile> on xxx
{standard_task_runner.py:53} INFO - Started process 6927 to run task
{logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: import-file1.start_file1 xxx [running]> Host.bc
{bash_operator.py:82} INFO - Tmp dir root location:
/tmp
Temporary script location: /tmp/airflowtmp/filexxx
Running command: xxxx.sh
Output:
which: no /usr/hdp/2.x.x.0-xxx//hadoop/bin/hadoop.distro in ((null))
dirname: missing operand
Try 'dirname --hxxxp' for more information.
xx_2020_09_30.csv xx.csv
INFO SparkContext:58 - Running Spark version 1.6.3
WARN SparkConf:70 - In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).
WARN SparkConf:70 -
SPARK_CLASSPATH was detected (set to '/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop-compat.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop2-compat.jar:/usr/hdp/current/phoenix-client/phoenix-cient.jar:/usr/hdp/current/phoenix-client/*.jar:/usr/hdp/current/phoenix-client/lib/*.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler-1.2.1000.2.6.3.0-235.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler.jar').
This is deprecated in Spark 1.0+.
Please instead use:
- ./spark-submit with --driver-class-path to augment the driver classpath
- spark.executor.extraClassPath to augment the executor classpath

WARN SparkConf:70 - Setting 'spark.executor.extraClassPath' to '/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop-compat.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop2-compat.jar:/usr/hdp/current/phoenix-client/phoenix-cient.jar:/usr/hdp/current/phoenix-client/*.jar:/usr/hdp/current/phoenix-client/lib/*.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler-1.2.1000.2.6.3.0-235.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler.jar' as a work-around.
WARN SparkConf:70 - Setting 'spark.driver.extraClassPath' to '/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop-compat.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop2-compat.jar:/usr/hdp/current/phoenix-client/phoenix-cient.jar:/usr/hdp/current/phoenix-client/*.jar:/usr/hdp/current/phoenix-client/lib/*.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler-1.2.1000.2.6.3.0-235.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler.jar' as a work-around.
INFO SecurityManager:58 - Changing view acls to: xxx
INFO SecurityManager:58 - Changing modify acls to: xxx
INFO SecurityManager:58 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(xxx); users with modify permissions: Set(xxx)
INFO Utils:58 - Successfully started service 'sparkDriver' on port xxxxx.
INFO Slf4jLogger:80 - Slf4jLogger started
INFO Remoting:74 - Starting remoting
INFO Remoting:74 - Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.xxx.0.xxx:xxxxx]
INFO Utils:58 - Successfully started service 'sparkDriverActorSystem' on port xxxxx.
INFO SparkEnv:58 - Registering MapOutputTracker
INFO SparkEnv:58 - Registering BlockManagerMaster
INFO DiskBlockManager:58 - Created local directory at /spark/blockmgr-xxx
INFO MemoryStore:58 - MemoryStore started with capacity 7.0 GB
INFO SparkEnv:58 - Registering OutputCommitCoordinator
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
...
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/api,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/static,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/executors/json,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/executors,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/environment/json,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/stages/json,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/stages,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/jobs,null}
Utils:70 - Service 'SparkUI' could not bind on port 3040. Attempting port xxx.
Server:272 - jetty-8.y.z-SNAPSHOT
AbstractConnector:338 - Started SxxxectChannxxxConnector@0.0.0.0:xxx
Utils:58 - Successfully started service 'SparkUI' on port xxx.
SparkUI:58 - Bound SparkUI to 0.0.0.0, and started at http://10.109.0.100:xxxx
HttpFileServer:58 - HTTP File server directory is /spark/spark-xxx/httpd-xxx
HttpServer:58 - Starting HTTP Server
Server:272 - jetty-8.y.z-SNAPSHOT
AbstractConnector:338 - Started SocketConnector@xxx
Utils:58 - Successfully started service 'HTTP file server' on port xxx.
SparkContext:58 - Added JAR file:/xxx.jar at http://10.108.0.xxxx:xxxxx/jars/xxx.jar with timestamp xxx
SparkContext:58 - Added JAR file:xxx.jar at http://10.108.0.xxx:xxxxx/jars/xxx.jar with timestamp xxx
spark.yarn.driver.memoryOverhead is set but does not apply in client mode.
AHSProxy:42 - Connecting to Application History server at xxx.bc/xxx
RequestHedgingRMFailoverProxyProvider:146 - Looking for the active RM in [rm1, rm2]...
RequestHedgingRMFailoverProxyProvider:170 - Found active RM [rm1]
Client:58 - Requesting a new application from cluster with 21 NodeManagers
Client:58 - Verifying our application has not requested more than the maximum memory capability of the cluster (614400 MB per container)
Client:58 - Will allocate AM container, with 896 MB memory including 384 MB overhead
Client:58 - Setting up container launch context for our AM
Client:58 - Setting up the launch environment for our AM container
Client:58 - Using the spark assembly jar on HDFS because you are using HDP, defaultSparkAssembly:hdfs://xxx/hdp/apps/2.6.5.0-292/spark/spark-hdp-assembly.jar
Client:58 - Preparing resources for our AM container
YarnSparkHadoopUtil:58 - getting token for namenode: hdfs://xxx/user/xxx/.sparkStaging/application_xxx
DFSClient:1052 - Created HDFS_DxxxEGATION_TOKEN token 29307570 for xxx on ha-hdfs:xxx
metastore:376 - Trying to connect to metastore with URI thrift://xxxxxx.bc:xxxx


metastore:472 - Connected to metastore.
RecoverableZooKeeper:120 - Process identifier=hconnection-xxxx connecting to ZooKeeper ensemble=xxx5176.bc:xxx,xxxxxx.bc:xxx,xxx5178.bc:xxx
ZooKeeper:100 - Client environment:zookeeper.version=3.4.6-xxx--1, built on 05/11/2018 06:40 GMT
ZooKeeper:100 - Client environment:host.name=Host.bc
ZooKeeper:100 - Client environment:java.version=1.8.0_262
ZooKeeper:100 - Client environment:java.vendor=Oracle Corporation
ZooKeeper:100 - Client environment:java.home=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.xxx.b10-0.xxx7_8.x86_64/jre
ZooKeeper:100 - Client environment:java.class.path=/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop-compat.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop2-compat.jar:/usr/hdp/current/phoenix-client/phoenix-cient.jar:/usr/hdp/current/phoenix-client/*.jar:/usr/hdp/current/phoenix-client/lib/*.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler-1.2.1000192.168.0.1 routerlogin 192.168.10.1.2.6.3.0-235.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler.jar:/usr/hdp/current/spark-client/conf/:/usr/hdp/current/spark-client/lib/spark-assembly-1.6.3.2.6.5.0-292-hadoop2.7.3.2.6.5.0-292.jar:/usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar:/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar:/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar:/usr/hdp/2.6.5.0-292/hadoop/conf/:/usr/hdp/current/hadoop-client/lib/aws-java-sdk-s3-1.10.6.jar:/usr/hdp/current/hadoop-client/lib/aws-java-sdk-core-1.10.6.jar:/usr/hdp/current/hadoop-client/lib/aws-java-sdk-kms-1.10.6.jar
ZooKeeper:100 - Client environment:java.library.path=/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
ZooKeeper:100 - Client environment:java.io.tmpdir=/tmp
ZooKeeper:100 - Client environment:java.compiler=<NA>
ZooKeeper:100 - Client environment:os.name=Linux
ZooKeeper:100 - Client environment:os.arch=amd64
ZooKeeper:100 - Client environment:os.version=3.10.0-xxxx.19.1.xxx7.x86_64
ZooKeeper:100 - Client environment:user.name=xxx
ZooKeeper:100 - Client environment:user.home=/home/xxx
ZooKeeper:100 - Client environment:user.dir=/tmp/airflowtmphp8uukgh
ZooKeeper:438 - Initiating client connection, connectString=xxxx.bc:xxx,xxxxxx.bc:xxx,xxx.bc: sessionTimeout=180000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@7af327e3
ClientCnxn:1019 - Opening socket connection to server xxxxxx.bc/10.xxx.0.98:xxx. Will not attempt to authenticate using SASL (unknown error)
ClientCnxn:864 - Socket connection established, initiating session, client: /10.xxx.0.100:xxxxx, server: xxxxxx.bc/xxx
ClientCnxn:1279 - Session establishment complete on server xxxxxx.bc/10.xxx.0.98:xxx, sessionid = xxx, negotiated timeout = 60000
ConnectionManager$HConnectionImplementation:1703 - Closing zookeeper sessionid=xxx
ZooKeeper:684 - Session: xxx closed
ClientCnxn:524 - EventThread shut down
YarnSparkHadoopUtil:58 - Added HBase security token to credentials.
Client:58 - Using the spark assembly jar on HDFS because you are using HDP, defaultSparkAssembly:hdfs://xxx/hdp/apps/2.6.5.0-292/spark/spark-hdp-assembly.jar
Client:58 - Source and destination file systems are the same. Not copying hdfs://xxx/hdp/apps/2.6.5.0-xxx/spark/spark-hdp-assembly.jar
Client:58 - Uploading resource file:/xxx/airflow/xxx.keytab -> hdfs://xxx/user/xxx/.sparkStaging/application_xxx/xxx.keytab
Client:58 - Uploading resource file:/xxx/ux_source/import/conf/file1-log4j.properties -> hdfs://xxx/user/xxx/.sparkStaging/application_xxx/file1-log4j.properties
Client:58 - Uploading resource file:/spark/spark-xxxxx/__spark_conf__xxxx.zip -> hdfs://xxx/user/xxx/.sparkStaging/application_xxx/__spark_conf__xxxxx.zip
SecurityManager:58 - Changing view acls to: xxx
SecurityManager:58 - Changing modify acls to: xxx
SecurityManager:58 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(xxx); users with modify permissions: Set(xxx)
Client:58 - Submitting application 8168 to ResourceManager
TimxxxineClientImpl:302 - Timxxxine service address: http://xxx/ws/v1/timxxxine/
YarnClientImpl:274 - Submitted application application_xxx
SchedulerExtensionServices:58 - Starting Yarn extension services with app application_xxx and attemptId None
Client:58 - Application report for application_xxx (state: ACCEPTED)
Client:58 -
client token: Token { kind: YARN_CLIENT_TOKEN, service: }
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: DAILY
start time: 1601568091265
final status: UNDEFINED
tracking URL: http://xxx5176.bc:8088/proxy/application_xxx/
user: xxx
Client:58 - Application report for application_xxx (state: ACCEPTED)
Client:58 - Application report for application_xxx (state: ACCEPTED)
Client:58 - Application report for application_xxx (state: ACCEPTED)
YarnSchedulerBackend$YarnSchedulerEndpoint:58 - ApplicationMaster registered as NettyRpcEndpointRef(null)
YarnClientSchedulerBackend:58 - Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> xxx5176.bc,xxxxxx.bc, PROXY_URI_BASES -> http://xxx5176.bc:8088/proxy/application_xxx,http://xxxxxx.bc:8088/proxy/application_xxx), /proxy/application_xxx
JettyUtils:58 - Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
Client:58 - Application report for application_xxx (state: RUNNING)
Client:58 -
client token: Token { kind: YARN_CLIENT_TOKEN, service: }
diagnostics: N/A
ApplicationMaster host: 10.xxx.0.xxx
ApplicationMaster RPC port: 0
queue: DAILY
start time: 1601568091265
final status: UNDEFINED
tracking URL: http://xxx5176.bc:8088/proxy/application_xxx/
user: xxx
YarnClientSchedulerBackend:58 - Application application_xxx has started running.
Utils:58 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port xxxxx.
NettyBlockTransferService:58 - Server created on xxxxx
BlockManagerMaster:58 - Trying to register BlockManager
BlockManagerMasterEndpoint:58 - Registering block manager 10.xx.0.xx:xxxxx with 7.0 GB RAM, BlockManagerId(driver, 10.xx.0.xxx, xxxxx)
BlockManagerMaster:58 - Registered BlockManager
EventLoggingListener:58 - Logging events to hdfs:///spark-history/application_xxx
YarnClientSchedulerBackend:58 - Registered executor NettyRpcEndpointRef(null) (xx.bc:xx) with ID 5
YarnClientSchedulerBackend:58 - Registered executor NettyRpcEndpointRef(null) (xx.bc:xx) with ID 1
BlockManagerMasterEndpoint:58 - Registering block manager xx.bc:xx with 7.0 GB RAM, BlockManagerId(5, xx.bc, xx)
BlockManagerMasterEndpoint:58 - Registering block manager xx.bc:xx with 7.0 GB RAM, BlockManagerId(1, xx.bc, xx)
YarnClientSchedulerBackend:58 - Registered executor NettyRpcEndpointRef(null) (xxx.bc:xx) with ID 4
BlockManagerMasterEndpoint:58 - Registering block manager xxx.bc:xxx with 7.0 GB RAM, BlockManagerId(4, xxx.bc, xxx)
YarnClientSchedulerBackend:58 - Registered executor NettyRpcEndpointRef(null) (xxx.bc:xx) with ID 2
BlockManagerMasterEndpoint:58 - Registering block manager xx.bc:xx with 7.0 GB RAM, BlockManagerId(2, xxx.bc, xxx)
YarnClientSchedulerBackend:58 - SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
YarnClientSchedulerBackend:58 - Registered executor NettyRpcEndpointRef(null) (xxx.bc:xxx) with ID 3
BlockManagerMasterEndpoint:58 - Registering block manager xxx.bc:xxx with 7.0 GB RAM, BlockManagerId(3, xxx.bc, xxx)
HiveContext:58 - Initializing execution hive, version 1.2.1
ClientWrapper:58 - Inspected Hadoop version: 2.7.3.2.6.5.0-292
ClientWrapper:58 - Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.3.2.6.5.0-292
HiveMetaStore:589 - 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
ObjectStore:289 - ObjectStore, initialize called
Persistence:77 - Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
Persistence:77 - Property datanucleus.cache.levxxx2 unknown - will be ignored
ObjectStore:370 - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FixxxdSchema,Order"
Datastore:77 - The class "org.apache.hadoop.hive.metastore.modxxx.MFixxxdSchema" is tagged as "embedded-only" so does not have its own datastore table.
Datastore:77 - The class "org.apache.hadoop.hive.metastore.modxxx.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
Datastore:77 - The class "org.apache.hadoop.hive.metastore.modxxx.MFixxxdSchema" is tagged as "embedded-only" so does not have its own datastore table.
Datastore:77 - The class "org.apache.hadoop.hive.metastore.modxxx.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
MetaStoreDirectSql:139 - Using direct SQL, underlying DB is DERBY
ObjectStore:272 - Initialized ObjectStore
ObjectStore:6666 - Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
ObjectStore:568 - Failed to get database default, returning NoSuchObjectException
HiveMetaStore:663 - Added admin role in metastore
HiveMetaStore:672 - Added public role in metastore
HiveMetaStore:712 - No user is added in admin role, since config is empty
SessionState:641 - Created local directory: /tmp/xxx_resources
SessionState:641 - Created HDFS directory: /tmp/hive/xxx/xxx
SessionState:641 - Created local directory: /tmp/xxx/xxx
SessionState:641 - Created HDFS directory: /tmp/hive/xxx/xxx/_tmp_space.db
HiveContext:58 - default warehouse location is /user/hive/warehouse
HiveContext:58 - Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
ClientWrapper:58 - Inspected Hadoop version: 2.7.3.2.6.5.0-292
ClientWrapper:58 - Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.3.2.6.5.0-292
metastore:376 - Trying to connect to metastore with URI thrift://xxxxxx.bc:9xxx
metastore:472 - Connected to metastore.
SessionState:641 - Created local directory: /tmp/xxx
SessionState:641 - Created HDFS directory: /tmp/hive/xxx/xxx
SessionState:641 - Created local directory: /tmp/xxx/xxx
SessionState:641 - Created HDFS directory: /tmp/hive/xxx/xxx/_tmp_space.db
file1$:1212 - Amount of files to be processed: 2
file1$:1215 - Files to be processed;
file.csv, file1.csv
file1$:1220 - Start processing file;file.csv
MemoryStore:58 - Block broadcast_0 stored as values in memory (estimated size 379.1 KB, free 7.0 GB)
MemoryStore:58 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 33.6 KB, free 7.0 GB)
BlockManagerInfo:58 - Added broadcast_0_piece0 in memory on 10.108.0.100:45893 (size: 33.6 KB, free: 7.0 GB)
SparkContext:58 - Created broadcast 0 from textFile at MainRevamp.scala:1227
deprecation:1261 - mapred.job.id is deprecated. Instead, use mapreduce.job.id
deprecation:1261 - mapred.tip.id is deprecated. Instead, use mapreduce.task.id
deprecation:1261 - mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
deprecation:1261 - mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
deprecation:1261 - mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
FileOutputCommitter:123 - File Output Committer Algorithm version is 1
FileOutputCommitter:138 - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
DefaultWriterContainer:58 - Using output committer class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
DFSClient:1052 - Created HDFS_DxxxEGATION_TOKEN token 29307571 for xxx on ha-hdfs:xxx
TokenCache:144 - Got dt for hdfs://xxx; Kind: HDFS_DxxxEGATION_TOKEN, Service: ha-hdfs:xxx, Ident: (HDFS_DxxxEGATION_TOKEN token 29307571 for xxx)
FileInputFormat:249 - Total input paths to process : 1
SparkContext:58 - Starting job: save at MainRevamp.scala:1248
DAGScheduler:58 - Got job 0 (save at MainRevamp.scala:1248) with 1 output partitions
DAGScheduler:58 - Final stage: ResultStage 0 (save at MainRevamp.scala:1248)
DAGScheduler:58 - Parents of final stage: List()
DAGScheduler:58 - Missing parents: List()
DAGScheduler:58 - Submitting ResultStage 0 (MapPartitionsRDD[4] at createDataFrame at MainRevamp.scala:1242), which has no missing parents
MemoryStore:58 - Block broadcast_1 stored as values in memory (estimated size 104.3 KB, free 7.0 GB)
MemoryStore:58 - Block broadcast_1_piece0 stored as bytes in memory (estimated size 39.1 KB, free 7.0 GB)
BlockManagerInfo:58 - Added broadcast_1_piece0 in memory on 10.108.0.100:45893 (size: 39.1 KB, free: 7.0 GB)
SparkContext:58 - Created broadcast 1 from broadcast at DAGScheduler.scala:1008
DAGScheduler:58 - Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[4] at createDataFrame at MainRevamp.scala:1242)
YarnScheduler:58 - Adding task set 0.0 with 1 tasks
TaskSetManager:58 - Starting task 0.0 in stage 0.0 (TID 0, xxx.bc, partition 0,RACK_LOCAL, 2349 bytes)
BlockManagerInfo:58 - Added broadcast_1_piece0 in memory on xxx.bc:463 (size: 39.1 KB, free: 7.0 GB)
BlockManagerInfo:58 - Added broadcast_0_piece0 in memory on xxx.bc:463 (size: 33.6 KB, free: 7.0 GB)
TaskSetManager:58 - Finished task 0.0 in stage 0.0 (TID 0) in 10579 ms on xxx.bc (1/1)
YarnScheduler:58 - Removed TaskSet 0.0, whose tasks have all completed, from pool
DAGScheduler:58 - ResultStage 0 (save at MainRevamp.scala:1248) finished in 10.585 s
DAGScheduler:58 - Job 0 finished: save at MainRevamp.scala:1248, took 10.766074 s
DefaultWriterContainer:58 - Job job_202010011801_0000 committed.
OrcRxxxation:58 - Listing hdfs://xxx/apps/hive/warehouse/xxx/thedate=2020-09-30 on driver
OrcRxxxation:58 - Listing hdfs://xxx/apps/hive/warehouse/xxx/thedate=2020-09-30 on driver
ParseDriver:185 - Parsing command: ALTER TABLE tablename ADD IF NOT EXISTS PARTITION(thedate='2020-09-30')
ParseDriver:209 - Parse Completed
PerfLogger:121 - <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:121 - <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:121 - <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:121 - <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
ParseDriver:185 - Parsing command: ALTER TABLE tablename ADD IF NOT EXISTS PARTITION(thedate='2020-09-30')
ParseDriver:209 - Parse Completed
PerfLogger:148 - </PERFLOG method=parse start=xxx end=xxx duration=1011 from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:121 - <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
Driver:436 - Semantic Analysis Completed
PerfLogger:148 - </PERFLOG method=semanticAnalyze start=xxx end=xxx duration=185 from=org.apache.hadoop.hive.ql.Driver>
Driver:240 - Returning Hive schema: Schema(fixxxdSchemas:null, properties:null)
PerfLogger:148 - </PERFLOG method=compile start=xxx end=xxx duration=1237 from=org.apache.hadoop.hive.ql.Driver>
Driver:160 - Concurrency mode is disabled, not creating a lock manager
PerfLogger:121 - <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
Driver:1328 - Starting command(queryId=xxx): ALTER TABLE tablename ADD IF NOT EXISTS PARTITION(thedate='2020-09-30')
PerfLogger:148 - </PERFLOG method=TimeToSubmit start=xxx end=xxx duration=1244 from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:121 - <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:121 - <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
Driver:1651 - Starting task [Stage-0:DDL] in serial mode
PerfLogger:148 - </PERFLOG method=runTasks start=xxx end=xxx duration=63 from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:148 - </PERFLOG method=Driver.execute start=xxx end=xxx duration=70 from=org.apache.hadoop.hive.ql.Driver>
Driver:951 - OK
PerfLogger:121 - <PERFLOG method=rxxxeasxxxocks from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:148 - </PERFLOG method=rxxxeasxxxocks start=xxx end=xxx duration=0 from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:148 - </PERFLOG method=Driver.run start=xxx end=xxx duration=1308 from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:121 - <PERFLOG method=rxxxeasxxxocks from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:148 - </PERFLOG method=rxxxeasxxxocks start=xxx end=xxx duration=0 from=org.apache.hadoop.hive.ql.Driver>
file1$:1259 - Filefile.csv is processed and all data has been inserted into Hive
file1$:1261 - Filefile.csv has been moved to the /completed directory
file1$:1220 - Start processing file; file.csv
MemoryStore:58 - Block broadcast_2 stored as values in memory (estimated size 379.1 KB, free 7.0 GB)
MemoryStore:58 - Block broadcast_2_piece0 stored as bytes in memory (estimated size 33.6 KB, free 7.0 GB)
BlockManagerInfo:58 - Added broadcast_2_piece0 in memory on xxx(size: 33.6 KB, free: 7.0 GB)
SparkContext:58 - Created broadcast 2 from textFile at MainRevamp.scala:1227
FileOutputCommitter:123 - File Output Committer Algorithm version is 1
FileOutputCommitter:138 - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
DefaultWriterContainer:58 - Using output committer class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
DFSClient:1052 - Created HDFS_DxxxEGATION_TOKEN token 29307572 for xxx on ha-hdfs:xxx
TokenCache:144 - Got dt for hdfs://xxx; Kind: HDFS_DxxxEGATION_TOKEN, Service: ha-hdfs:xxx, Ident: (HDFS_DxxxEGATION_TOKEN token xxx for xxx)
FileInputFormat:249 - Total input paths to process : 1
SparkContext:58 - Starting job: save at MainRevamp.scala:1248
DAGScheduler:58 - Got job 1 (save at MainRevamp.scala:1248) with 1 output partitions
DAGScheduler:58 - Final stage: ResultStage 1 (save at MainRevamp.scala:1248)
DAGScheduler:58 - Parents of final stage: List()
DAGScheduler:58 - Missing parents: List()
DAGScheduler:58 - Submitting ResultStage 1 (MapPartitionsRDD[11] at createDataFrame at MainRevamp.scala:1242), which has no missing parents
MemoryStore:58 - Block broadcast_3 stored as values in memory (estimated size 104.3 KB, free 7.0 GB)
MemoryStore:58 - Block broadcast_3_piece0 stored as bytes in memory (estimated size 39.1 KB, free 7.0 GB)
BlockManagerInfo:58 - Added broadcast_3_piece0 in memory on 10.xx.0.xx:xxxxx (size: 39.1 KB, free: 7.0 GB)
SparkContext:58 - Created broadcast 3 from broadcast at DAGScheduler.scala:1008
DAGScheduler:58 - Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[11] at createDataFrame at MainRevamp.scala:1242)
YarnScheduler:58 - Adding task set 1.0 with 1 tasks
TaskSetManager:58 - Starting task 0.0 in stage 1.0 (TID 1, xxx.bc, partition 0,RACK_LOCAL, 2349 bytes)
BlockManagerInfo:58 - Added broadcast_3_piece0 in memory on xxx.bc:xxx (size: 39.1 KB, free: 7.0 GB)

 

Best regards.