Support Questions

debbiecortez · ‎10-09-2020

I have a sh script which moves CSV file from HDFS to HIVE. Airflow has two dags

dag 1: bashoperator with sh script which checks if file exists in HDFS
dag 2: bash operatior with sh script where it runs a spark job to load file from HDFS and load into Hive. When the airflow dag was triggered , dag 1 was successful but dag2 was failing. it couldn't connect to yarn.
When I run the script from dag 2 in Putty I'm able to move the file but when launched from airflow the job is failing. I tried to see error log but I don't find any errors in the log. please find the log below; Could you please help me in understanding the problem

{taskinstance.py:887} INFO - Executing <Task(BashOperator): start_insertfile> on xxx
{standard_task_runner.py:53} INFO - Started process 6927 to run task
{logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: import-file1.start_file1 xxx [running]> Host.bc
{bash_operator.py:82} INFO - Tmp dir root location:
/tmp
Temporary script location: /tmp/airflowtmp/filexxx
Running command: xxxx.sh
Output:
which: no /usr/hdp/2.x.x.0-xxx//hadoop/bin/hadoop.distro in ((null))
dirname: missing operand
Try 'dirname --hxxxp' for more information.
xx_2020_09_30.csv xx.csv
INFO SparkContext:58 - Running Spark version 1.6.3
WARN SparkConf:70 - In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).
WARN SparkConf:70 -
SPARK_CLASSPATH was detected (set to '/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop-compat.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop2-compat.jar:/usr/hdp/current/phoenix-client/phoenix-cient.jar:/usr/hdp/current/phoenix-client/*.jar:/usr/hdp/current/phoenix-client/lib/*.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler-1.2.1000.2.6.3.0-235.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler.jar').
This is deprecated in Spark 1.0+.
Please instead use:
- ./spark-submit with --driver-class-path to augment the driver classpath
- spark.executor.extraClassPath to augment the executor classpath

WARN SparkConf:70 - Setting 'spark.executor.extraClassPath' to '/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop-compat.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop2-compat.jar:/usr/hdp/current/phoenix-client/phoenix-cient.jar:/usr/hdp/current/phoenix-client/*.jar:/usr/hdp/current/phoenix-client/lib/*.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler-1.2.1000.2.6.3.0-235.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler.jar' as a work-around.
WARN SparkConf:70 - Setting 'spark.driver.extraClassPath' to '/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop-compat.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop2-compat.jar:/usr/hdp/current/phoenix-client/phoenix-cient.jar:/usr/hdp/current/phoenix-client/*.jar:/usr/hdp/current/phoenix-client/lib/*.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler-1.2.1000.2.6.3.0-235.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler.jar' as a work-around.
INFO SecurityManager:58 - Changing view acls to: xxx
INFO SecurityManager:58 - Changing modify acls to: xxx
INFO SecurityManager:58 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(xxx); users with modify permissions: Set(xxx)
INFO Utils:58 - Successfully started service 'sparkDriver' on port xxxxx.
INFO Slf4jLogger:80 - Slf4jLogger started
INFO Remoting:74 - Starting remoting
INFO Remoting:74 - Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.xxx.0.xxx:xxxxx]
INFO Utils:58 - Successfully started service 'sparkDriverActorSystem' on port xxxxx.
INFO SparkEnv:58 - Registering MapOutputTracker
INFO SparkEnv:58 - Registering BlockManagerMaster
INFO DiskBlockManager:58 - Created local directory at /spark/blockmgr-xxx
INFO MemoryStore:58 - MemoryStore started with capacity 7.0 GB
INFO SparkEnv:58 - Registering OutputCommitCoordinator
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
...
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/api,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/static,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/executors/json,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/executors,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/environment/json,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/stages/json,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/stages,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/jobs,null}
Utils:70 - Service 'SparkUI' could not bind on port 3040. Attempting port xxx.
Server:272 - jetty-8.y.z-SNAPSHOT
AbstractConnector:338 - Started SxxxectChannxxxConnector@0.0.0.0:xxx
Utils:58 - Successfully started service 'SparkUI' on port xxx.
SparkUI:58 - Bound SparkUI to 0.0.0.0, and started at http://10.109.0.100:xxxx
HttpFileServer:58 - HTTP File server directory is /spark/spark-xxx/httpd-xxx
HttpServer:58 - Starting HTTP Server
Server:272 - jetty-8.y.z-SNAPSHOT
AbstractConnector:338 - Started SocketConnector@xxx
Utils:58 - Successfully started service 'HTTP file server' on port xxx.
SparkContext:58 - Added JAR file:/xxx.jar at http://10.108.0.xxxx:xxxxx/jars/xxx.jar with timestamp xxx
SparkContext:58 - Added JAR file:xxx.jar at http://10.108.0.xxx:xxxxx/jars/xxx.jar with timestamp xxx
spark.yarn.driver.memoryOverhead is set but does not apply in client mode.
AHSProxy:42 - Connecting to Application History server at xxx.bc/xxx
RequestHedgingRMFailoverProxyProvider:146 - Looking for the active RM in [rm1, rm2]...
RequestHedgingRMFailoverProxyProvider:170 - Found active RM [rm1]
Client:58 - Requesting a new application from cluster with 21 NodeManagers
Client:58 - Verifying our application has not requested more than the maximum memory capability of the cluster (614400 MB per container)
Client:58 - Will allocate AM container, with 896 MB memory including 384 MB overhead
Client:58 - Setting up container launch context for our AM
Client:58 - Setting up the launch environment for our AM container
Client:58 - Using the spark assembly jar on HDFS because you are using HDP, defaultSparkAssembly:hdfs://xxx/hdp/apps/2.6.5.0-292/spark/spark-hdp-assembly.jar
Client:58 - Preparing resources for our AM container
YarnSparkHadoopUtil:58 - getting token for namenode: hdfs://xxx/user/xxx/.sparkStaging/application_xxx
DFSClient:1052 - Created HDFS_DxxxEGATION_TOKEN token 29307570 for xxx on ha-hdfs:xxx
metastore:376 - Trying to connect to metastore with URI thrift://xxxxxx.bc:xxxx
metastore:472 - Connected to metastore.
RecoverableZooKeeper:120 - Process identifier=hconnection-xxxx connecting to ZooKeeper ensemble=xxx5176.bc:xxx,xxxxxx.bc:xxx,xxx5178.bc:xxx
ZooKeeper:100 - Client environment:zookeeper.version=3.4.6-xxx--1, built on 05/11/2018 06:40 GMT
ZooKeeper:100 - Client environment:host.name=Host.bc
ZooKeeper:100 - Client environment:java.version=1.8.0_262
ZooKeeper:100 - Client environment:java.vendor=Oracle Corporation
ZooKeeper:100 - Client environment:java.home=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.xxx.b10-0.xxx7_8.x86_64/jre
ZooKeeper:100 - Client environment:java.class.path=/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop-compat.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop2-compat.jar:/usr/hdp/current/phoenix-client/phoenix-cient.jar:/usr/hdp/current/phoenix-client/*.jar:/usr/hdp/current/phoenix-client/lib/*.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler-1.2.1000.2.6.3.0-235.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler.jar:/usr/hdp/current/spark-client/conf/:/usr/hdp/current/spark-client/lib/spark-assembly-1.6.3.2.6.5.0-292-hadoop2.7.3.2.6.5.0-292.jar:/usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar:/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar:/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar:/usr/hdp/2.6.5.0-292/hadoop/conf/:/usr/hdp/current/hadoop-client/lib/aws-java-sdk-s3-1.10.6.jar:/usr/hdp/current/hadoop-client/lib/aws-java-sdk-core-1.10.6.jar:/usr/hdp/current/hadoop-client/lib/aws-java-sdk-kms-1.10.6.jar
ZooKeeper:100 - Client environment:java.library.path=/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
ZooKeeper:100 - Client environment:java.io.tmpdir=/tmp
ZooKeeper:100 - Client environment:java.compiler=<NA>
ZooKeeper:100 - Client environment:os.name=Linux
ZooKeeper:100 - Client environment:os.arch=amd64
ZooKeeper:100 - Client environment:os.version=3.10.0-xxxx.19.1.xxx7.x86_64
ZooKeeper:100 - Client environment:user.name=xxx
ZooKeeper:100 - Client environment:user.home=/home/xxx
ZooKeeper:100 - Client environment:user.dir=/tmp/airflowtmphp8uukgh
ZooKeeper:438 - Initiating client connection, connectString=xxxx.bc:xxx,xxxxxx.bc:xxx,xxx.bc: sessionTimeout=180000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@7af327e3
ClientCnxn:1019 - Opening socket connection to server xxxxxx.bc/10.xxx.0.98:xxx. Will not attempt to authenticate using SASL (unknown error)
ClientCnxn:864 - Socket connection established, initiating session, client: /10.xxx.0.100:xxxxx, server: xxxxxx.bc/xxx
ClientCnxn:1279 - Session establishment complete on server xxxxxx.bc/10.xxx.0.98:xxx, sessionid = xxx, negotiated timeout = 60000
ConnectionManager$HConnectionImplementation:1703 - Closing zookeeper sessionid=xxx
ZooKeeper:684 - Session: xxx closed
ClientCnxn:524 - EventThread shut down
YarnSparkHadoopUtil:58 - Added HBase security token to credentials.
Client:58 - Using the spark assembly jar on HDFS because you are using HDP, defaultSparkAssembly:hdfs://xxx/hdp/apps/2.6.5.0-292/spark/spark-hdp-assembly.jar
Client:58 - Source and destination file systems are the same. Not copying hdfs://xxx/hdp/apps/2.6.5.0-xxx/spark/spark-hdp-assembly.jar
Client:58 - Uploading resource file:/xxx/airflow/xxx.keytab -> hdfs://xxx/user/xxx/.sparkStaging/application_xxx/xxx.keytab
Client:58 - Uploading resource file:/xxx/ux_source/import/conf/file1-log4j.properties -> hdfs://xxx/user/xxx/.sparkStaging/application_xxx/file1-log4j.properties
Client:58 - Uploading resource file:/spark/spark-xxxxx/__spark_conf__xxxx.zip -> hdfs://xxx/user/xxx/.sparkStaging/application_xxx/__spark_conf__xxxxx.zip
SecurityManager:58 - Changing view acls to: xxx
SecurityManager:58 - Changing modify acls to: xxx
SecurityManager:58 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(xxx); users with modify permissions: Set(xxx)
Client:58 - Submitting application 8168 to ResourceManager
TimxxxineClientImpl:302 - Timxxxine service address: http://xxx/ws/v1/timxxxine/
YarnClientImpl:274 - Submitted application application_xxx
SchedulerExtensionServices:58 - Starting Yarn extension services with app application_xxx and attemptId None
Client:58 - Application report for application_xxx (state: ACCEPTED)
Client:58 -
client token: Token { kind: YARN_CLIENT_TOKEN, service: }
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: DAILY
start time: 1601568091265
final status: UNDEFINED
tracking URL: http://xxx5176.bc:8088/proxy/application_xxx/
user: xxx
Client:58 - Application report for application_xxx (state: ACCEPTED)
Client:58 - Application report for application_xxx (state: ACCEPTED)
Client:58 - Application report for application_xxx (state: ACCEPTED)
YarnSchedulerBackend$YarnSchedulerEndpoint:58 - ApplicationMaster registered as NettyRpcEndpointRef(null)
YarnClientSchedulerBackend:58 - Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> xxx5176.bc,xxxxxx.bc, PROXY_URI_BASES -> http://xxx5176.bc:8088/proxy/application_xxx,http://xxxxxx.bc:8088/proxy/application_xxx), /proxy/application_xxx
JettyUtils:58 - Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
Client:58 - Application report for application_xxx (state: RUNNING)
Client:58 -
client token: Token { kind: YARN_CLIENT_TOKEN, service: }
diagnostics: N/A
ApplicationMaster host: 10.xxx.0.xxx
ApplicationMaster RPC port: 0
queue: DAILY
start time: 1601568091265
final status: UNDEFINED
tracking URL: http://xxx5176.bc:8088/proxy/application_xxx/
user: xxx
YarnClientSchedulerBackend:58 - Application application_xxx has started running.
Utils:58 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port xxxxx.
NettyBlockTransferService:58 - Server created on xxxxx
BlockManagerMaster:58 - Trying to register BlockManager
BlockManagerMasterEndpoint:58 - Registering block manager 10.xx.0.xx:xxxxx with 7.0 GB RAM, BlockManagerId(driver, 10.xx.0.xxx, xxxxx)
BlockManagerMaster:58 - Registered BlockManager
EventLoggingListener:58 - Logging events to hdfs:///spark-history/application_xxx
YarnClientSchedulerBackend:58 - Registered executor NettyRpcEndpointRef(null) (xx.bc:xx) with ID 5
YarnClientSchedulerBackend:58 - Registered executor NettyRpcEndpointRef(null) (xx.bc:xx) with ID 1
BlockManagerMasterEndpoint:58 - Registering block manager xx.bc:xx with 7.0 GB RAM, BlockManagerId(5, xx.bc, xx)
BlockManagerMasterEndpoint:58 - Registering block manager xx.bc:xx with 7.0 GB RAM, BlockManagerId(1, xx.bc, xx)
YarnClientSchedulerBackend:58 - Registered executor NettyRpcEndpointRef(null) (xxx.bc:xx) with ID 4
BlockManagerMasterEndpoint:58 - Registering block manager xxx.bc:xxx with 7.0 GB RAM, BlockManagerId(4, xxx.bc, xxx)
YarnClientSchedulerBackend:58 - Registered executor NettyRpcEndpointRef(null) (xxx.bc:xx) with ID 2
BlockManagerMasterEndpoint:58 - Registering block manager xx.bc:xx with 7.0 GB RAM, BlockManagerId(2, xxx.bc, xxx)
YarnClientSchedulerBackend:58 - SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
YarnClientSchedulerBackend:58 - Registered executor NettyRpcEndpointRef(null) (xxx.bc:xxx) with ID 3
BlockManagerMasterEndpoint:58 - Registering block manager xxx.bc:xxx with 7.0 GB RAM, BlockManagerId(3, xxx.bc, xxx)
HiveContext:58 - Initializing execution hive, version 1.2.1
ClientWrapper:58 - Inspected Hadoop version: 2.7.3.2.6.5.0-292
ClientWrapper:58 - Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.3.2.6.5.0-292
HiveMetaStore:589 - 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
ObjectStore:289 - ObjectStore, initialize called
Persistence:77 - Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
Persistence:77 - Property datanucleus.cache.levxxx2 unknown - will be ignored
ObjectStore:370 - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FixxxdSchema,Order"
Datastore:77 - The class "org.apache.hadoop.hive.metastore.modxxx.MFixxxdSchema" is tagged as "embedded-only" so does not have its own datastore table.
Datastore:77 - The class "org.apache.hadoop.hive.metastore.modxxx.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
Datastore:77 - The class "org.apache.hadoop.hive.metastore.modxxx.MFixxxdSchema" is tagged as "embedded-only" so does not have its own datastore table.
Datastore:77 - The class "org.apache.hadoop.hive.metastore.modxxx.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
MetaStoreDirectSql:139 - Using direct SQL, underlying DB is DERBY
ObjectStore:272 - Initialized ObjectStore
ObjectStore:6666 - Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
ObjectStore:568 - Failed to get database default, returning NoSuchObjectException
HiveMetaStore:663 - Added admin role in metastore
HiveMetaStore:672 - Added public role in metastore
HiveMetaStore:712 - No user is added in admin role, since config is empty
SessionState:641 - Created local directory: /tmp/xxx_resources
SessionState:641 - Created HDFS directory: /tmp/hive/xxx/xxx
SessionState:641 - Created local directory: /tmp/xxx/xxx
SessionState:641 - Created HDFS directory: /tmp/hive/xxx/xxx/_tmp_space.db
HiveContext:58 - default warehouse location is /user/hive/warehouse
HiveContext:58 - Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
ClientWrapper:58 - Inspected Hadoop version: 2.7.3.2.6.5.0-292
ClientWrapper:58 - Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.3.2.6.5.0-292
metastore:376 - Trying to connect to metastore with URI thrift://xxxxxx.bc:9xxx
metastore:472 - Connected to metastore.
SessionState:641 - Created local directory: /tmp/xxx
SessionState:641 - Created HDFS directory: /tmp/hive/xxx/xxx
SessionState:641 - Created local directory: /tmp/xxx/xxx
SessionState:641 - Created HDFS directory: /tmp/hive/xxx/xxx/_tmp_space.db
file1$:1212 - Amount of files to be processed: 2
file1$:1215 - Files to be processed;
file.csv, file1.csv
file1$:1220 - Start processing file;file.csv
MemoryStore:58 - Block broadcast_0 stored as values in memory (estimated size 379.1 KB, free 7.0 GB)
MemoryStore:58 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 33.6 KB, free 7.0 GB)
BlockManagerInfo:58 - Added broadcast_0_piece0 in memory on 10.108.0.100:45893 (size: 33.6 KB, free: 7.0 GB)
SparkContext:58 - Created broadcast 0 from textFile at MainRevamp.scala:1227
deprecation:1261 - mapred.job.id is deprecated. Instead, use mapreduce.job.id
deprecation:1261 - mapred.tip.id is deprecated. Instead, use mapreduce.task.id
deprecation:1261 - mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
deprecation:1261 - mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
deprecation:1261 - mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
FileOutputCommitter:123 - File Output Committer Algorithm version is 1
FileOutputCommitter:138 - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
DefaultWriterContainer:58 - Using output committer class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
DFSClient:1052 - Created HDFS_DxxxEGATION_TOKEN token 29307571 for xxx on ha-hdfs:xxx
TokenCache:144 - Got dt for hdfs://xxx; Kind: HDFS_DxxxEGATION_TOKEN, Service: ha-hdfs:xxx, Ident: (HDFS_DxxxEGATION_TOKEN token 29307571 for xxx)
FileInputFormat:249 - Total input paths to process : 1
SparkContext:58 - Starting job: save at MainRevamp.scala:1248
DAGScheduler:58 - Got job 0 (save at MainRevamp.scala:1248) with 1 output partitions
DAGScheduler:58 - Final stage: ResultStage 0 (save at MainRevamp.scala:1248)
DAGScheduler:58 - Parents of final stage: List()
DAGScheduler:58 - Missing parents: List()
DAGScheduler:58 - Submitting ResultStage 0 (MapPartitionsRDD[4] at createDataFrame at MainRevamp.scala:1242), which has no missing parents
MemoryStore:58 - Block broadcast_1 stored as values in memory (estimated size 104.3 KB, free 7.0 GB)
MemoryStore:58 - Block broadcast_1_piece0 stored as bytes in memory (estimated size 39.1 KB, free 7.0 GB)
BlockManagerInfo:58 - Added broadcast_1_piece0 in memory on 10.108.0.100:45893 (size: 39.1 KB, free: 7.0 GB)
SparkContext:58 - Created broadcast 1 from broadcast at DAGScheduler.scala:1008
DAGScheduler:58 - Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[4] at createDataFrame at MainRevamp.scala:1242)
YarnScheduler:58 - Adding task set 0.0 with 1 tasks
TaskSetManager:58 - Starting task 0.0 in stage 0.0 (TID 0, xxx.bc, partition 0,RACK_LOCAL, 2349 bytes)
BlockManagerInfo:58 - Added broadcast_1_piece0 in memory on xxx.bc:463 (size: 39.1 KB, free: 7.0 GB)
BlockManagerInfo:58 - Added broadcast_0_piece0 in memory on xxx.bc:463 (size: 33.6 KB, free: 7.0 GB)
TaskSetManager:58 - Finished task 0.0 in stage 0.0 (TID 0) in 10579 ms on xxx.bc (1/1)
YarnScheduler:58 - Removed TaskSet 0.0, whose tasks have all completed, from pool
DAGScheduler:58 - ResultStage 0 (save at MainRevamp.scala:1248) finished in 10.585 s
DAGScheduler:58 - Job 0 finished: save at MainRevamp.scala:1248, took 10.766074 s
DefaultWriterContainer:58 - Job job_202010011801_0000 committed.
OrcRxxxation:58 - Listing hdfs://xxx/apps/hive/warehouse/xxx/thedate=2020-09-30 on driver
OrcRxxxation:58 - Listing hdfs://xxx/apps/hive/warehouse/xxx/thedate=2020-09-30 on driver
ParseDriver:185 - Parsing command: ALTER TABLE tablename ADD IF NOT EXISTS PARTITION(thedate='2020-09-30')
ParseDriver:209 - Parse Completed
PerfLogger:121 - <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:121 - <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:121 - <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:121 - <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
ParseDriver:185 - Parsing command: ALTER TABLE tablename ADD IF NOT EXISTS PARTITION(thedate='2020-09-30')
ParseDriver:209 - Parse Completed
PerfLogger:148 - </PERFLOG method=parse start=xxx end=xxx duration=1011 from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:121 - <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
Driver:436 - Semantic Analysis Completed
PerfLogger:148 - </PERFLOG method=semanticAnalyze start=xxx end=xxx duration=185 from=org.apache.hadoop.hive.ql.Driver>
Driver:240 - Returning Hive schema: Schema(fixxxdSchemas:null, properties:null)
PerfLogger:148 - </PERFLOG method=compile start=xxx end=xxx duration=1237 from=org.apache.hadoop.hive.ql.Driver>
Driver:160 - Concurrency mode is disabled, not creating a lock manager
PerfLogger:121 - <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
Driver:1328 - Starting command(queryId=xxx): ALTER TABLE tablename ADD IF NOT EXISTS PARTITION(thedate='2020-09-30')
PerfLogger:148 - </PERFLOG method=TimeToSubmit start=xxx end=xxx duration=1244 from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:121 - <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:121 - <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
Driver:1651 - Starting task [Stage-0:DDL] in serial mode
PerfLogger:148 - </PERFLOG method=runTasks start=xxx end=xxx duration=63 from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:148 - </PERFLOG method=Driver.execute start=xxx end=xxx duration=70 from=org.apache.hadoop.hive.ql.Driver>
Driver:951 - OK
PerfLogger:121 - <PERFLOG method=rxxxeasxxxocks from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:148 - </PERFLOG method=rxxxeasxxxocks start=xxx end=xxx duration=0 from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:148 - </PERFLOG method=Driver.run start=xxx end=xxx duration=1308 from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:121 - <PERFLOG method=rxxxeasxxxocks from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:148 - </PERFLOG method=rxxxeasxxxocks start=xxx end=xxx duration=0 from=org.apache.hadoop.hive.ql.Driver>
file1$:1259 - Filefile.csv is processed and all data has been inserted into Hive
file1$:1261 - Filefile.csv has been moved to the /completed directory
file1$:1220 - Start processing file; file.csv
MemoryStore:58 - Block broadcast_2 stored as values in memory (estimated size 379.1 KB, free 7.0 GB)
MemoryStore:58 - Block broadcast_2_piece0 stored as bytes in memory (estimated size 33.6 KB, free 7.0 GB)
BlockManagerInfo:58 - Added broadcast_2_piece0 in memory on xxx(size: 33.6 KB, free: 7.0 GB)
SparkContext:58 - Created broadcast 2 from textFile at MainRevamp.scala:1227
FileOutputCommitter:123 - File Output Committer Algorithm version is 1
FileOutputCommitter:138 - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
DefaultWriterContainer:58 - Using output committer class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
DFSClient:1052 - Created HDFS_DxxxEGATION_TOKEN token 29307572 for xxx on ha-hdfs:xxx
TokenCache:144 - Got dt for hdfs://xxx; Kind: HDFS_DxxxEGATION_TOKEN, Service: ha-hdfs:xxx, Ident: (HDFS_DxxxEGATION_TOKEN token xxx for xxx)
FileInputFormat:249 - Total input paths to process : 1
SparkContext:58 - Starting job: save at MainRevamp.scala:1248
DAGScheduler:58 - Got job 1 (save at MainRevamp.scala:1248) with 1 output partitions
DAGScheduler:58 - Final stage: ResultStage 1 (save at MainRevamp.scala:1248)
DAGScheduler:58 - Parents of final stage: List()
DAGScheduler:58 - Missing parents: List()
DAGScheduler:58 - Submitting ResultStage 1 (MapPartitionsRDD[11] at createDataFrame at MainRevamp.scala:1242), which has no missing parents
MemoryStore:58 - Block broadcast_3 stored as values in memory (estimated size 104.3 KB, free 7.0 GB)
MemoryStore:58 - Block broadcast_3_piece0 stored as bytes in memory (estimated size 39.1 KB, free 7.0 GB)
BlockManagerInfo:58 - Added broadcast_3_piece0 in memory on 10.xx.0.xx:xxxxx (size: 39.1 KB, free: 7.0 GB)
SparkContext:58 - Created broadcast 3 from broadcast at DAGScheduler.scala:1008
DAGScheduler:58 - Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[11] at createDataFrame at MainRevamp.scala:1242)
YarnScheduler:58 - Adding task set 1.0 with 1 tasks
TaskSetManager:58 - Starting task 0.0 in stage 1.0 (TID 1, xxx.bc, partition 0,RACK_LOCAL, 2349 bytes)
BlockManagerInfo:58 - Added broadcast_3_piece0 in memory on xxx.bc:xxx (size: 39.1 KB, free: 7.0 GB)

debbiecortez · ‎10-14-2020

Hello,

I keep getting the following error. Can you help me with this?

{taskinstance.py:887} INFO - Executing <Task(BashOperator): start_insertfile> on xxx
{standard_task_runner.py:53} INFO - Started process 6927 to run task
{logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: import-file1.start_file1 xxx [running]> Host.bc
{bash_operator.py:82} INFO - Tmp dir root location:
/tmp
Temporary script location: /tmp/airflowtmp/filexxx
Running command: xxxx.sh
Output:
which: no /usr/hdp/2.x.x.0-xxx//hadoop/bin/hadoop.distro in ((null))
dirname: missing operand
Try 'dirname --hxxxp' for more information.
xx_2020_09_30.csv xx.csv
INFO SparkContext:58 - Running Spark version 1.6.3
WARN SparkConf:70 - In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).
WARN SparkConf:70 -
SPARK_CLASSPATH was detected (set to '/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop-compat.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop2-compat.jar:/usr/hdp/current/phoenix-client/phoenix-cient.jar:/usr/hdp/current/phoenix-client/*.jar:/usr/hdp/current/phoenix-client/lib/*.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler-1.2.1000.2.6.3.0-235.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler.jar').
This is deprecated in Spark 1.0+.
Please instead use:
- ./spark-submit with --driver-class-path to augment the driver classpath
- spark.executor.extraClassPath to augment the executor classpath

WARN SparkConf:70 - Setting 'spark.executor.extraClassPath' to '/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop-compat.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop2-compat.jar:/usr/hdp/current/phoenix-client/phoenix-cient.jar:/usr/hdp/current/phoenix-client/*.jar:/usr/hdp/current/phoenix-client/lib/*.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler-1.2.1000.2.6.3.0-235.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler.jar' as a work-around.
WARN SparkConf:70 - Setting 'spark.driver.extraClassPath' to '/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop-compat.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop2-compat.jar:/usr/hdp/current/phoenix-client/phoenix-cient.jar:/usr/hdp/current/phoenix-client/*.jar:/usr/hdp/current/phoenix-client/lib/*.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler-1.2.1000.2.6.3.0-235.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler.jar' as a work-around.
INFO SecurityManager:58 - Changing view acls to: xxx
INFO SecurityManager:58 - Changing modify acls to: xxx
INFO SecurityManager:58 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(xxx); users with modify permissions: Set(xxx)
INFO Utils:58 - Successfully started service 'sparkDriver' on port xxxxx.
INFO Slf4jLogger:80 - Slf4jLogger started
INFO Remoting:74 - Starting remoting
INFO Remoting:74 - Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.xxx.0.xxx:xxxxx]
INFO Utils:58 - Successfully started service 'sparkDriverActorSystem' on port xxxxx.
INFO SparkEnv:58 - Registering MapOutputTracker
INFO SparkEnv:58 - Registering BlockManagerMaster
INFO DiskBlockManager:58 - Created local directory at /spark/blockmgr-xxx
INFO MemoryStore:58 - MemoryStore started with capacity 7.0 GB
INFO SparkEnv:58 - Registering OutputCommitCoordinator
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
...
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/api,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/static,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/executors/json,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/executors,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/environment/json,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/stages/json,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/stages,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/jobs,null}
Utils:70 - Service 'SparkUI' could not bind on port 3040. Attempting port xxx.
Server:272 - jetty-8.y.z-SNAPSHOT
AbstractConnector:338 - Started SxxxectChannxxxConnector@0.0.0.0:xxx
Utils:58 - Successfully started service 'SparkUI' on port xxx.
SparkUI:58 - Bound SparkUI to 0.0.0.0, and started at http://10.109.0.100:xxxx
HttpFileServer:58 - HTTP File server directory is /spark/spark-xxx/httpd-xxx
HttpServer:58 - Starting HTTP Server
Server:272 - jetty-8.y.z-SNAPSHOT
AbstractConnector:338 - Started SocketConnector@xxx
Utils:58 - Successfully started service 'HTTP file server' on port xxx.
SparkContext:58 - Added JAR file:/xxx.jar at http://10.108.0.xxxx:xxxxx/jars/xxx.jar with timestamp xxx
SparkContext:58 - Added JAR file:xxx.jar at http://10.108.0.xxx:xxxxx/jars/xxx.jar with timestamp xxx
spark.yarn.driver.memoryOverhead is set but does not apply in client mode.
AHSProxy:42 - Connecting to Application History server at xxx.bc/xxx
RequestHedgingRMFailoverProxyProvider:146 - Looking for the active RM in [rm1, rm2]...
RequestHedgingRMFailoverProxyProvider:170 - Found active RM [rm1]
Client:58 - Requesting a new application from cluster with 21 NodeManagers
Client:58 - Verifying our application has not requested more than the maximum memory capability of the cluster (614400 MB per container)
Client:58 - Will allocate AM container, with 896 MB memory including 384 MB overhead
Client:58 - Setting up container launch context for our AM
Client:58 - Setting up the launch environment for our AM container
Client:58 - Using the spark assembly jar on HDFS because you are using HDP, defaultSparkAssembly:hdfs://xxx/hdp/apps/2.6.5.0-292/spark/spark-hdp-assembly.jar
Client:58 - Preparing resources for our AM container
YarnSparkHadoopUtil:58 - getting token for namenode: hdfs://xxx/user/xxx/.sparkStaging/application_xxx
DFSClient:1052 - Created HDFS_DxxxEGATION_TOKEN token 29307570 for xxx on ha-hdfs:xxx
metastore:376 - Trying to connect to metastore with URI thrift://xxxxxx.bc:xxxx

metastore:472 - Connected to metastore.
RecoverableZooKeeper:120 - Process identifier=hconnection-xxxx connecting to ZooKeeper ensemble=xxx5176.bc:xxx,xxxxxx.bc:xxx,xxx5178.bc:xxx
ZooKeeper:100 - Client environment:zookeeper.version=3.4.6-xxx--1, built on 05/11/2018 06:40 GMT
ZooKeeper:100 - Client environment:host.name=Host.bc
ZooKeeper:100 - Client environment:java.version=1.8.0_262
ZooKeeper:100 - Client environment:java.vendor=Oracle Corporation
ZooKeeper:100 - Client environment:java.home=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.xxx.b10-0.xxx7_8.x86_64/jre
ZooKeeper:100 - Client environment:java.class.path=/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop-compat.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop2-compat.jar:/usr/hdp/current/phoenix-client/phoenix-cient.jar:/usr/hdp/current/phoenix-client/*.jar:/usr/hdp/current/phoenix-client/lib/*.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler-1.2.1000192.168.0.1 routerlogin 192.168.10.1.2.6.3.0-235.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler.jar:/usr/hdp/current/spark-client/conf/:/usr/hdp/current/spark-client/lib/spark-assembly-1.6.3.2.6.5.0-292-hadoop2.7.3.2.6.5.0-292.jar:/usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar:/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar:/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar:/usr/hdp/2.6.5.0-292/hadoop/conf/:/usr/hdp/current/hadoop-client/lib/aws-java-sdk-s3-1.10.6.jar:/usr/hdp/current/hadoop-client/lib/aws-java-sdk-core-1.10.6.jar:/usr/hdp/current/hadoop-client/lib/aws-java-sdk-kms-1.10.6.jar
ZooKeeper:100 - Client environment:java.library.path=/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
ZooKeeper:100 - Client environment:java.io.tmpdir=/tmp
ZooKeeper:100 - Client environment:java.compiler=<NA>
ZooKeeper:100 - Client environment:os.name=Linux
ZooKeeper:100 - Client environment:os.arch=amd64
ZooKeeper:100 - Client environment:os.version=3.10.0-xxxx.19.1.xxx7.x86_64
ZooKeeper:100 - Client environment:user.name=xxx
ZooKeeper:100 - Client environment:user.home=/home/xxx
ZooKeeper:100 - Client environment:user.dir=/tmp/airflowtmphp8uukgh
ZooKeeper:438 - Initiating client connection, connectString=xxxx.bc:xxx,xxxxxx.bc:xxx,xxx.bc: sessionTimeout=180000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@7af327e3
ClientCnxn:1019 - Opening socket connection to server xxxxxx.bc/10.xxx.0.98:xxx. Will not attempt to authenticate using SASL (unknown error)
ClientCnxn:864 - Socket connection established, initiating session, client: /10.xxx.0.100:xxxxx, server: xxxxxx.bc/xxx
ClientCnxn:1279 - Session establishment complete on server xxxxxx.bc/10.xxx.0.98:xxx, sessionid = xxx, negotiated timeout = 60000
ConnectionManager$HConnectionImplementation:1703 - Closing zookeeper sessionid=xxx
ZooKeeper:684 - Session: xxx closed
ClientCnxn:524 - EventThread shut down
YarnSparkHadoopUtil:58 - Added HBase security token to credentials.
Client:58 - Using the spark assembly jar on HDFS because you are using HDP, defaultSparkAssembly:hdfs://xxx/hdp/apps/2.6.5.0-292/spark/spark-hdp-assembly.jar
Client:58 - Source and destination file systems are the same. Not copying hdfs://xxx/hdp/apps/2.6.5.0-xxx/spark/spark-hdp-assembly.jar
Client:58 - Uploading resource file:/xxx/airflow/xxx.keytab -> hdfs://xxx/user/xxx/.sparkStaging/application_xxx/xxx.keytab
Client:58 - Uploading resource file:/xxx/ux_source/import/conf/file1-log4j.properties -> hdfs://xxx/user/xxx/.sparkStaging/application_xxx/file1-log4j.properties
Client:58 - Uploading resource file:/spark/spark-xxxxx/__spark_conf__xxxx.zip -> hdfs://xxx/user/xxx/.sparkStaging/application_xxx/__spark_conf__xxxxx.zip
SecurityManager:58 - Changing view acls to: xxx
SecurityManager:58 - Changing modify acls to: xxx
SecurityManager:58 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(xxx); users with modify permissions: Set(xxx)
Client:58 - Submitting application 8168 to ResourceManager
TimxxxineClientImpl:302 - Timxxxine service address: http://xxx/ws/v1/timxxxine/
YarnClientImpl:274 - Submitted application application_xxx
SchedulerExtensionServices:58 - Starting Yarn extension services with app application_xxx and attemptId None
Client:58 - Application report for application_xxx (state: ACCEPTED)
Client:58 -
client token: Token { kind: YARN_CLIENT_TOKEN, service: }
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: DAILY
start time: 1601568091265
final status: UNDEFINED
tracking URL: http://xxx5176.bc:8088/proxy/application_xxx/
user: xxx
Client:58 - Application report for application_xxx (state: ACCEPTED)
Client:58 - Application report for application_xxx (state: ACCEPTED)
Client:58 - Application report for application_xxx (state: ACCEPTED)
YarnSchedulerBackend$YarnSchedulerEndpoint:58 - ApplicationMaster registered as NettyRpcEndpointRef(null)
YarnClientSchedulerBackend:58 - Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> xxx5176.bc,xxxxxx.bc, PROXY_URI_BASES -> http://xxx5176.bc:8088/proxy/application_xxx,http://xxxxxx.bc:8088/proxy/application_xxx), /proxy/application_xxx
JettyUtils:58 - Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
Client:58 - Application report for application_xxx (state: RUNNING)
Client:58 -
client token: Token { kind: YARN_CLIENT_TOKEN, service: }
diagnostics: N/A
ApplicationMaster host: 10.xxx.0.xxx
ApplicationMaster RPC port: 0
queue: DAILY
start time: 1601568091265
final status: UNDEFINED
tracking URL: http://xxx5176.bc:8088/proxy/application_xxx/
user: xxx
YarnClientSchedulerBackend:58 - Application application_xxx has started running.
Utils:58 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port xxxxx.
NettyBlockTransferService:58 - Server created on xxxxx
BlockManagerMaster:58 - Trying to register BlockManager
BlockManagerMasterEndpoint:58 - Registering block manager 10.xx.0.xx:xxxxx with 7.0 GB RAM, BlockManagerId(driver, 10.xx.0.xxx, xxxxx)
BlockManagerMaster:58 - Registered BlockManager
EventLoggingListener:58 - Logging events to hdfs:///spark-history/application_xxx
YarnClientSchedulerBackend:58 - Registered executor NettyRpcEndpointRef(null) (xx.bc:xx) with ID 5
YarnClientSchedulerBackend:58 - Registered executor NettyRpcEndpointRef(null) (xx.bc:xx) with ID 1
BlockManagerMasterEndpoint:58 - Registering block manager xx.bc:xx with 7.0 GB RAM, BlockManagerId(5, xx.bc, xx)
BlockManagerMasterEndpoint:58 - Registering block manager xx.bc:xx with 7.0 GB RAM, BlockManagerId(1, xx.bc, xx)
YarnClientSchedulerBackend:58 - Registered executor NettyRpcEndpointRef(null) (xxx.bc:xx) with ID 4
BlockManagerMasterEndpoint:58 - Registering block manager xxx.bc:xxx with 7.0 GB RAM, BlockManagerId(4, xxx.bc, xxx)
YarnClientSchedulerBackend:58 - Registered executor NettyRpcEndpointRef(null) (xxx.bc:xx) with ID 2
BlockManagerMasterEndpoint:58 - Registering block manager xx.bc:xx with 7.0 GB RAM, BlockManagerId(2, xxx.bc, xxx)
YarnClientSchedulerBackend:58 - SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
YarnClientSchedulerBackend:58 - Registered executor NettyRpcEndpointRef(null) (xxx.bc:xxx) with ID 3
BlockManagerMasterEndpoint:58 - Registering block manager xxx.bc:xxx with 7.0 GB RAM, BlockManagerId(3, xxx.bc, xxx)
HiveContext:58 - Initializing execution hive, version 1.2.1
ClientWrapper:58 - Inspected Hadoop version: 2.7.3.2.6.5.0-292
ClientWrapper:58 - Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.3.2.6.5.0-292
HiveMetaStore:589 - 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
ObjectStore:289 - ObjectStore, initialize called
Persistence:77 - Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
Persistence:77 - Property datanucleus.cache.levxxx2 unknown - will be ignored
ObjectStore:370 - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FixxxdSchema,Order"
Datastore:77 - The class "org.apache.hadoop.hive.metastore.modxxx.MFixxxdSchema" is tagged as "embedded-only" so does not have its own datastore table.
Datastore:77 - The class "org.apache.hadoop.hive.metastore.modxxx.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
Datastore:77 - The class "org.apache.hadoop.hive.metastore.modxxx.MFixxxdSchema" is tagged as "embedded-only" so does not have its own datastore table.
Datastore:77 - The class "org.apache.hadoop.hive.metastore.modxxx.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
MetaStoreDirectSql:139 - Using direct SQL, underlying DB is DERBY
ObjectStore:272 - Initialized ObjectStore
ObjectStore:6666 - Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
ObjectStore:568 - Failed to get database default, returning NoSuchObjectException
HiveMetaStore:663 - Added admin role in metastore
HiveMetaStore:672 - Added public role in metastore
HiveMetaStore:712 - No user is added in admin role, since config is empty
SessionState:641 - Created local directory: /tmp/xxx_resources
SessionState:641 - Created HDFS directory: /tmp/hive/xxx/xxx
SessionState:641 - Created local directory: /tmp/xxx/xxx
SessionState:641 - Created HDFS directory: /tmp/hive/xxx/xxx/_tmp_space.db
HiveContext:58 - default warehouse location is /user/hive/warehouse
HiveContext:58 - Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
ClientWrapper:58 - Inspected Hadoop version: 2.7.3.2.6.5.0-292
ClientWrapper:58 - Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.3.2.6.5.0-292
metastore:376 - Trying to connect to metastore with URI thrift://xxxxxx.bc:9xxx
metastore:472 - Connected to metastore.
SessionState:641 - Created local directory: /tmp/xxx
SessionState:641 - Created HDFS directory: /tmp/hive/xxx/xxx
SessionState:641 - Created local directory: /tmp/xxx/xxx
SessionState:641 - Created HDFS directory: /tmp/hive/xxx/xxx/_tmp_space.db
file1$:1212 - Amount of files to be processed: 2
file1$:1215 - Files to be processed;
file.csv, file1.csv
file1$:1220 - Start processing file;file.csv
MemoryStore:58 - Block broadcast_0 stored as values in memory (estimated size 379.1 KB, free 7.0 GB)
MemoryStore:58 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 33.6 KB, free 7.0 GB)
BlockManagerInfo:58 - Added broadcast_0_piece0 in memory on 10.108.0.100:45893 (size: 33.6 KB, free: 7.0 GB)
SparkContext:58 - Created broadcast 0 from textFile at MainRevamp.scala:1227
deprecation:1261 - mapred.job.id is deprecated. Instead, use mapreduce.job.id
deprecation:1261 - mapred.tip.id is deprecated. Instead, use mapreduce.task.id
deprecation:1261 - mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
deprecation:1261 - mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
deprecation:1261 - mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
FileOutputCommitter:123 - File Output Committer Algorithm version is 1
FileOutputCommitter:138 - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
DefaultWriterContainer:58 - Using output committer class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
DFSClient:1052 - Created HDFS_DxxxEGATION_TOKEN token 29307571 for xxx on ha-hdfs:xxx
TokenCache:144 - Got dt for hdfs://xxx; Kind: HDFS_DxxxEGATION_TOKEN, Service: ha-hdfs:xxx, Ident: (HDFS_DxxxEGATION_TOKEN token 29307571 for xxx)
FileInputFormat:249 - Total input paths to process : 1
SparkContext:58 - Starting job: save at MainRevamp.scala:1248
DAGScheduler:58 - Got job 0 (save at MainRevamp.scala:1248) with 1 output partitions
DAGScheduler:58 - Final stage: ResultStage 0 (save at MainRevamp.scala:1248)
DAGScheduler:58 - Parents of final stage: List()
DAGScheduler:58 - Missing parents: List()
DAGScheduler:58 - Submitting ResultStage 0 (MapPartitionsRDD[4] at createDataFrame at MainRevamp.scala:1242), which has no missing parents
MemoryStore:58 - Block broadcast_1 stored as values in memory (estimated size 104.3 KB, free 7.0 GB)
MemoryStore:58 - Block broadcast_1_piece0 stored as bytes in memory (estimated size 39.1 KB, free 7.0 GB)
BlockManagerInfo:58 - Added broadcast_1_piece0 in memory on 10.108.0.100:45893 (size: 39.1 KB, free: 7.0 GB)
SparkContext:58 - Created broadcast 1 from broadcast at DAGScheduler.scala:1008
DAGScheduler:58 - Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[4] at createDataFrame at MainRevamp.scala:1242)
YarnScheduler:58 - Adding task set 0.0 with 1 tasks
TaskSetManager:58 - Starting task 0.0 in stage 0.0 (TID 0, xxx.bc, partition 0,RACK_LOCAL, 2349 bytes)
BlockManagerInfo:58 - Added broadcast_1_piece0 in memory on xxx.bc:463 (size: 39.1 KB, free: 7.0 GB)
BlockManagerInfo:58 - Added broadcast_0_piece0 in memory on xxx.bc:463 (size: 33.6 KB, free: 7.0 GB)
TaskSetManager:58 - Finished task 0.0 in stage 0.0 (TID 0) in 10579 ms on xxx.bc (1/1)
YarnScheduler:58 - Removed TaskSet 0.0, whose tasks have all completed, from pool
DAGScheduler:58 - ResultStage 0 (save at MainRevamp.scala:1248) finished in 10.585 s
DAGScheduler:58 - Job 0 finished: save at MainRevamp.scala:1248, took 10.766074 s
DefaultWriterContainer:58 - Job job_202010011801_0000 committed.
OrcRxxxation:58 - Listing hdfs://xxx/apps/hive/warehouse/xxx/thedate=2020-09-30 on driver
OrcRxxxation:58 - Listing hdfs://xxx/apps/hive/warehouse/xxx/thedate=2020-09-30 on driver
ParseDriver:185 - Parsing command: ALTER TABLE tablename ADD IF NOT EXISTS PARTITION(thedate='2020-09-30')
ParseDriver:209 - Parse Completed
PerfLogger:121 - <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:121 - <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:121 - <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:121 - <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
ParseDriver:185 - Parsing command: ALTER TABLE tablename ADD IF NOT EXISTS PARTITION(thedate='2020-09-30')
ParseDriver:209 - Parse Completed
PerfLogger:148 - </PERFLOG method=parse start=xxx end=xxx duration=1011 from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:121 - <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
Driver:436 - Semantic Analysis Completed
PerfLogger:148 - </PERFLOG method=semanticAnalyze start=xxx end=xxx duration=185 from=org.apache.hadoop.hive.ql.Driver>
Driver:240 - Returning Hive schema: Schema(fixxxdSchemas:null, properties:null)
PerfLogger:148 - </PERFLOG method=compile start=xxx end=xxx duration=1237 from=org.apache.hadoop.hive.ql.Driver>
Driver:160 - Concurrency mode is disabled, not creating a lock manager
PerfLogger:121 - <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
Driver:1328 - Starting command(queryId=xxx): ALTER TABLE tablename ADD IF NOT EXISTS PARTITION(thedate='2020-09-30')
PerfLogger:148 - </PERFLOG method=TimeToSubmit start=xxx end=xxx duration=1244 from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:121 - <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:121 - <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
Driver:1651 - Starting task [Stage-0:DDL] in serial mode
PerfLogger:148 - </PERFLOG method=runTasks start=xxx end=xxx duration=63 from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:148 - </PERFLOG method=Driver.execute start=xxx end=xxx duration=70 from=org.apache.hadoop.hive.ql.Driver>
Driver:951 - OK
PerfLogger:121 - <PERFLOG method=rxxxeasxxxocks from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:148 - </PERFLOG method=rxxxeasxxxocks start=xxx end=xxx duration=0 from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:148 - </PERFLOG method=Driver.run start=xxx end=xxx duration=1308 from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:121 - <PERFLOG method=rxxxeasxxxocks from=org.apache.hadoop.hive.ql.Driver>
PerfLogger:148 - </PERFLOG method=rxxxeasxxxocks start=xxx end=xxx duration=0 from=org.apache.hadoop.hive.ql.Driver>
file1$:1259 - Filefile.csv is processed and all data has been inserted into Hive
file1$:1261 - Filefile.csv has been moved to the /completed directory
file1$:1220 - Start processing file; file.csv
MemoryStore:58 - Block broadcast_2 stored as values in memory (estimated size 379.1 KB, free 7.0 GB)
MemoryStore:58 - Block broadcast_2_piece0 stored as bytes in memory (estimated size 33.6 KB, free 7.0 GB)
BlockManagerInfo:58 - Added broadcast_2_piece0 in memory on xxx(size: 33.6 KB, free: 7.0 GB)
SparkContext:58 - Created broadcast 2 from textFile at MainRevamp.scala:1227
FileOutputCommitter:123 - File Output Committer Algorithm version is 1
FileOutputCommitter:138 - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
DefaultWriterContainer:58 - Using output committer class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
DFSClient:1052 - Created HDFS_DxxxEGATION_TOKEN token 29307572 for xxx on ha-hdfs:xxx
TokenCache:144 - Got dt for hdfs://xxx; Kind: HDFS_DxxxEGATION_TOKEN, Service: ha-hdfs:xxx, Ident: (HDFS_DxxxEGATION_TOKEN token xxx for xxx)
FileInputFormat:249 - Total input paths to process : 1
SparkContext:58 - Starting job: save at MainRevamp.scala:1248
DAGScheduler:58 - Got job 1 (save at MainRevamp.scala:1248) with 1 output partitions
DAGScheduler:58 - Final stage: ResultStage 1 (save at MainRevamp.scala:1248)
DAGScheduler:58 - Parents of final stage: List()
DAGScheduler:58 - Missing parents: List()
DAGScheduler:58 - Submitting ResultStage 1 (MapPartitionsRDD[11] at createDataFrame at MainRevamp.scala:1242), which has no missing parents
MemoryStore:58 - Block broadcast_3 stored as values in memory (estimated size 104.3 KB, free 7.0 GB)
MemoryStore:58 - Block broadcast_3_piece0 stored as bytes in memory (estimated size 39.1 KB, free 7.0 GB)
BlockManagerInfo:58 - Added broadcast_3_piece0 in memory on 10.xx.0.xx:xxxxx (size: 39.1 KB, free: 7.0 GB)
SparkContext:58 - Created broadcast 3 from broadcast at DAGScheduler.scala:1008
DAGScheduler:58 - Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[11] at createDataFrame at MainRevamp.scala:1242)
YarnScheduler:58 - Adding task set 1.0 with 1 tasks
TaskSetManager:58 - Starting task 0.0 in stage 1.0 (TID 1, xxx.bc, partition 0,RACK_LOCAL, 2349 bytes)
BlockManagerInfo:58 - Added broadcast_3_piece0 in memory on xxx.bc:xxx (size: 39.1 KB, free: 7.0 GB)

Best regards.

Cloudera Community

Support Questions

Airflow DAG failing

Tactical modularity in CDE Airflow by loading code...

How to manage Airflow Python Environments with CDE...

Creating and using Custom Airflow Operators in Clo...

CDE Airflow for CML Pipeline Orchestration

Airflow is failing to start in cloudera with issue...

Airflow Job scheduling with CDE and CDW ( ETL jobs...

Issue running spark jobs with Airflow

Writing files to Cloudera Machine Learning using A...

Monitor / alert long running Airflow jobs.

Apache Airflow