Member since
10-09-2020
2
Posts
0
Kudos Received
0
Solutions
10-14-2020
01:42 AM
Hello, I keep getting the following error. Can you help me with this? {taskinstance.py:887} INFO - Executing <Task(BashOperator): start_insertfile> on xxx {standard_task_runner.py:53} INFO - Started process 6927 to run task {logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: import-file1.start_file1 xxx [running]> Host.bc {bash_operator.py:82} INFO - Tmp dir root location: /tmp Temporary script location: /tmp/airflowtmp/filexxx Running command: xxxx.sh Output: which: no /usr/hdp/2.x.x.0-xxx//hadoop/bin/hadoop.distro in ((null)) dirname: missing operand Try 'dirname --hxxxp' for more information. xx_2020_09_30.csv xx.csv INFO SparkContext:58 - Running Spark version 1.6.3 WARN SparkConf:70 - In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN). WARN SparkConf:70 - SPARK_CLASSPATH was detected (set to '/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop-compat.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop2-compat.jar:/usr/hdp/current/phoenix-client/phoenix-cient.jar:/usr/hdp/current/phoenix-client/*.jar:/usr/hdp/current/phoenix-client/lib/*.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler-1.2.1000.2.6.3.0-235.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler.jar'). This is deprecated in Spark 1.0+. Please instead use: - ./spark-submit with --driver-class-path to augment the driver classpath - spark.executor.extraClassPath to augment the executor classpath WARN SparkConf:70 - Setting 'spark.executor.extraClassPath' to '/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop-compat.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop2-compat.jar:/usr/hdp/current/phoenix-client/phoenix-cient.jar:/usr/hdp/current/phoenix-client/*.jar:/usr/hdp/current/phoenix-client/lib/*.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler-1.2.1000.2.6.3.0-235.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler.jar' as a work-around. WARN SparkConf:70 - Setting 'spark.driver.extraClassPath' to '/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop-compat.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop2-compat.jar:/usr/hdp/current/phoenix-client/phoenix-cient.jar:/usr/hdp/current/phoenix-client/*.jar:/usr/hdp/current/phoenix-client/lib/*.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler-1.2.1000.2.6.3.0-235.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler.jar' as a work-around. INFO SecurityManager:58 - Changing view acls to: xxx INFO SecurityManager:58 - Changing modify acls to: xxx INFO SecurityManager:58 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(xxx); users with modify permissions: Set(xxx) INFO Utils:58 - Successfully started service 'sparkDriver' on port xxxxx. INFO Slf4jLogger:80 - Slf4jLogger started INFO Remoting:74 - Starting remoting INFO Remoting:74 - Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.xxx.0.xxx:xxxxx] INFO Utils:58 - Successfully started service 'sparkDriverActorSystem' on port xxxxx. INFO SparkEnv:58 - Registering MapOutputTracker INFO SparkEnv:58 - Registering BlockManagerMaster INFO DiskBlockManager:58 - Created local directory at /spark/blockmgr-xxx INFO MemoryStore:58 - MemoryStore started with capacity 7.0 GB INFO SparkEnv:58 - Registering OutputCommitCoordinator at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) ... ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null} ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/api,null} ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/,null} ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/static,null} ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null} ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/executors/json,null} ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/executors,null} ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/environment/json,null} ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/stages/stage,null} ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/stages/json,null} ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/stages,null} ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null} ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/jobs/job,null} ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/jobs/json,null} ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/jobs,null} Utils:70 - Service 'SparkUI' could not bind on port 3040. Attempting port xxx. Server:272 - jetty-8.y.z-SNAPSHOT AbstractConnector:338 - Started SxxxectChannxxxConnector@0.0.0.0:xxx Utils:58 - Successfully started service 'SparkUI' on port xxx. SparkUI:58 - Bound SparkUI to 0.0.0.0, and started at http://10.109.0.100:xxxx HttpFileServer:58 - HTTP File server directory is /spark/spark-xxx/httpd-xxx HttpServer:58 - Starting HTTP Server Server:272 - jetty-8.y.z-SNAPSHOT AbstractConnector:338 - Started SocketConnector@xxx Utils:58 - Successfully started service 'HTTP file server' on port xxx. SparkContext:58 - Added JAR file:/xxx.jar at http://10.108.0.xxxx:xxxxx/jars/xxx.jar with timestamp xxx SparkContext:58 - Added JAR file:xxx.jar at http://10.108.0.xxx:xxxxx/jars/xxx.jar with timestamp xxx spark.yarn.driver.memoryOverhead is set but does not apply in client mode. AHSProxy:42 - Connecting to Application History server at xxx.bc/xxx RequestHedgingRMFailoverProxyProvider:146 - Looking for the active RM in [rm1, rm2]... RequestHedgingRMFailoverProxyProvider:170 - Found active RM [rm1] Client:58 - Requesting a new application from cluster with 21 NodeManagers Client:58 - Verifying our application has not requested more than the maximum memory capability of the cluster (614400 MB per container) Client:58 - Will allocate AM container, with 896 MB memory including 384 MB overhead Client:58 - Setting up container launch context for our AM Client:58 - Setting up the launch environment for our AM container Client:58 - Using the spark assembly jar on HDFS because you are using HDP, defaultSparkAssembly:hdfs://xxx/hdp/apps/2.6.5.0-292/spark/spark-hdp-assembly.jar Client:58 - Preparing resources for our AM container YarnSparkHadoopUtil:58 - getting token for namenode: hdfs://xxx/user/xxx/.sparkStaging/application_xxx DFSClient:1052 - Created HDFS_DxxxEGATION_TOKEN token 29307570 for xxx on ha-hdfs:xxx metastore:376 - Trying to connect to metastore with URI thrift://xxxxxx.bc:xxxx metastore:472 - Connected to metastore. RecoverableZooKeeper:120 - Process identifier=hconnection-xxxx connecting to ZooKeeper ensemble=xxx5176.bc:xxx,xxxxxx.bc:xxx,xxx5178.bc:xxx ZooKeeper:100 - Client environment:zookeeper.version=3.4.6-xxx--1, built on 05/11/2018 06:40 GMT ZooKeeper:100 - Client environment:host.name=Host.bc ZooKeeper:100 - Client environment:java.version=1.8.0_262 ZooKeeper:100 - Client environment:java.vendor=Oracle Corporation ZooKeeper:100 - Client environment:java.home=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.xxx.b10-0.xxx7_8.x86_64/jre ZooKeeper:100 - Client environment:java.class.path=/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop-compat.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop2-compat.jar:/usr/hdp/current/phoenix-client/phoenix-cient.jar:/usr/hdp/current/phoenix-client/*.jar:/usr/hdp/current/phoenix-client/lib/*.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler-1.2.1000192.168.0.1 routerlogin 192.168.10.1.2.6.3.0-235.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler.jar:/usr/hdp/current/spark-client/conf/:/usr/hdp/current/spark-client/lib/spark-assembly-1.6.3.2.6.5.0-292-hadoop2.7.3.2.6.5.0-292.jar:/usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar:/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar:/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar:/usr/hdp/2.6.5.0-292/hadoop/conf/:/usr/hdp/current/hadoop-client/lib/aws-java-sdk-s3-1.10.6.jar:/usr/hdp/current/hadoop-client/lib/aws-java-sdk-core-1.10.6.jar:/usr/hdp/current/hadoop-client/lib/aws-java-sdk-kms-1.10.6.jar ZooKeeper:100 - Client environment:java.library.path=/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib ZooKeeper:100 - Client environment:java.io.tmpdir=/tmp ZooKeeper:100 - Client environment:java.compiler=<NA> ZooKeeper:100 - Client environment:os.name=Linux ZooKeeper:100 - Client environment:os.arch=amd64 ZooKeeper:100 - Client environment:os.version=3.10.0-xxxx.19.1.xxx7.x86_64 ZooKeeper:100 - Client environment:user.name=xxx ZooKeeper:100 - Client environment:user.home=/home/xxx ZooKeeper:100 - Client environment:user.dir=/tmp/airflowtmphp8uukgh ZooKeeper:438 - Initiating client connection, connectString=xxxx.bc:xxx,xxxxxx.bc:xxx,xxx.bc: sessionTimeout=180000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@7af327e3 ClientCnxn:1019 - Opening socket connection to server xxxxxx.bc/10.xxx.0.98:xxx. Will not attempt to authenticate using SASL (unknown error) ClientCnxn:864 - Socket connection established, initiating session, client: /10.xxx.0.100:xxxxx, server: xxxxxx.bc/xxx ClientCnxn:1279 - Session establishment complete on server xxxxxx.bc/10.xxx.0.98:xxx, sessionid = xxx, negotiated timeout = 60000 ConnectionManager$HConnectionImplementation:1703 - Closing zookeeper sessionid=xxx ZooKeeper:684 - Session: xxx closed ClientCnxn:524 - EventThread shut down YarnSparkHadoopUtil:58 - Added HBase security token to credentials. Client:58 - Using the spark assembly jar on HDFS because you are using HDP, defaultSparkAssembly:hdfs://xxx/hdp/apps/2.6.5.0-292/spark/spark-hdp-assembly.jar Client:58 - Source and destination file systems are the same. Not copying hdfs://xxx/hdp/apps/2.6.5.0-xxx/spark/spark-hdp-assembly.jar Client:58 - Uploading resource file:/xxx/airflow/xxx.keytab -> hdfs://xxx/user/xxx/.sparkStaging/application_xxx/xxx.keytab Client:58 - Uploading resource file:/xxx/ux_source/import/conf/file1-log4j.properties -> hdfs://xxx/user/xxx/.sparkStaging/application_xxx/file1-log4j.properties Client:58 - Uploading resource file:/spark/spark-xxxxx/__spark_conf__xxxx.zip -> hdfs://xxx/user/xxx/.sparkStaging/application_xxx/__spark_conf__xxxxx.zip SecurityManager:58 - Changing view acls to: xxx SecurityManager:58 - Changing modify acls to: xxx SecurityManager:58 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(xxx); users with modify permissions: Set(xxx) Client:58 - Submitting application 8168 to ResourceManager TimxxxineClientImpl:302 - Timxxxine service address: http://xxx/ws/v1/timxxxine/ YarnClientImpl:274 - Submitted application application_xxx SchedulerExtensionServices:58 - Starting Yarn extension services with app application_xxx and attemptId None Client:58 - Application report for application_xxx (state: ACCEPTED) Client:58 - client token: Token { kind: YARN_CLIENT_TOKEN, service: } diagnostics: AM container is launched, waiting for AM container to Register with RM ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: DAILY start time: 1601568091265 final status: UNDEFINED tracking URL: http://xxx5176.bc:8088/proxy/application_xxx/ user: xxx Client:58 - Application report for application_xxx (state: ACCEPTED) Client:58 - Application report for application_xxx (state: ACCEPTED) Client:58 - Application report for application_xxx (state: ACCEPTED) YarnSchedulerBackend$YarnSchedulerEndpoint:58 - ApplicationMaster registered as NettyRpcEndpointRef(null) YarnClientSchedulerBackend:58 - Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> xxx5176.bc,xxxxxx.bc, PROXY_URI_BASES -> http://xxx5176.bc:8088/proxy/application_xxx,http://xxxxxx.bc:8088/proxy/application_xxx), /proxy/application_xxx JettyUtils:58 - Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter Client:58 - Application report for application_xxx (state: RUNNING) Client:58 - client token: Token { kind: YARN_CLIENT_TOKEN, service: } diagnostics: N/A ApplicationMaster host: 10.xxx.0.xxx ApplicationMaster RPC port: 0 queue: DAILY start time: 1601568091265 final status: UNDEFINED tracking URL: http://xxx5176.bc:8088/proxy/application_xxx/ user: xxx YarnClientSchedulerBackend:58 - Application application_xxx has started running. Utils:58 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port xxxxx. NettyBlockTransferService:58 - Server created on xxxxx BlockManagerMaster:58 - Trying to register BlockManager BlockManagerMasterEndpoint:58 - Registering block manager 10.xx.0.xx:xxxxx with 7.0 GB RAM, BlockManagerId(driver, 10.xx.0.xxx, xxxxx) BlockManagerMaster:58 - Registered BlockManager EventLoggingListener:58 - Logging events to hdfs:///spark-history/application_xxx YarnClientSchedulerBackend:58 - Registered executor NettyRpcEndpointRef(null) (xx.bc:xx) with ID 5 YarnClientSchedulerBackend:58 - Registered executor NettyRpcEndpointRef(null) (xx.bc:xx) with ID 1 BlockManagerMasterEndpoint:58 - Registering block manager xx.bc:xx with 7.0 GB RAM, BlockManagerId(5, xx.bc, xx) BlockManagerMasterEndpoint:58 - Registering block manager xx.bc:xx with 7.0 GB RAM, BlockManagerId(1, xx.bc, xx) YarnClientSchedulerBackend:58 - Registered executor NettyRpcEndpointRef(null) (xxx.bc:xx) with ID 4 BlockManagerMasterEndpoint:58 - Registering block manager xxx.bc:xxx with 7.0 GB RAM, BlockManagerId(4, xxx.bc, xxx) YarnClientSchedulerBackend:58 - Registered executor NettyRpcEndpointRef(null) (xxx.bc:xx) with ID 2 BlockManagerMasterEndpoint:58 - Registering block manager xx.bc:xx with 7.0 GB RAM, BlockManagerId(2, xxx.bc, xxx) YarnClientSchedulerBackend:58 - SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8 YarnClientSchedulerBackend:58 - Registered executor NettyRpcEndpointRef(null) (xxx.bc:xxx) with ID 3 BlockManagerMasterEndpoint:58 - Registering block manager xxx.bc:xxx with 7.0 GB RAM, BlockManagerId(3, xxx.bc, xxx) HiveContext:58 - Initializing execution hive, version 1.2.1 ClientWrapper:58 - Inspected Hadoop version: 2.7.3.2.6.5.0-292 ClientWrapper:58 - Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.3.2.6.5.0-292 HiveMetaStore:589 - 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore ObjectStore:289 - ObjectStore, initialize called Persistence:77 - Property hive.metastore.integral.jdo.pushdown unknown - will be ignored Persistence:77 - Property datanucleus.cache.levxxx2 unknown - will be ignored ObjectStore:370 - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FixxxdSchema,Order" Datastore:77 - The class "org.apache.hadoop.hive.metastore.modxxx.MFixxxdSchema" is tagged as "embedded-only" so does not have its own datastore table. Datastore:77 - The class "org.apache.hadoop.hive.metastore.modxxx.MOrder" is tagged as "embedded-only" so does not have its own datastore table. Datastore:77 - The class "org.apache.hadoop.hive.metastore.modxxx.MFixxxdSchema" is tagged as "embedded-only" so does not have its own datastore table. Datastore:77 - The class "org.apache.hadoop.hive.metastore.modxxx.MOrder" is tagged as "embedded-only" so does not have its own datastore table. MetaStoreDirectSql:139 - Using direct SQL, underlying DB is DERBY ObjectStore:272 - Initialized ObjectStore ObjectStore:6666 - Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0 ObjectStore:568 - Failed to get database default, returning NoSuchObjectException HiveMetaStore:663 - Added admin role in metastore HiveMetaStore:672 - Added public role in metastore HiveMetaStore:712 - No user is added in admin role, since config is empty SessionState:641 - Created local directory: /tmp/xxx_resources SessionState:641 - Created HDFS directory: /tmp/hive/xxx/xxx SessionState:641 - Created local directory: /tmp/xxx/xxx SessionState:641 - Created HDFS directory: /tmp/hive/xxx/xxx/_tmp_space.db HiveContext:58 - default warehouse location is /user/hive/warehouse HiveContext:58 - Initializing HiveMetastoreConnection version 1.2.1 using Spark classes. ClientWrapper:58 - Inspected Hadoop version: 2.7.3.2.6.5.0-292 ClientWrapper:58 - Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.3.2.6.5.0-292 metastore:376 - Trying to connect to metastore with URI thrift://xxxxxx.bc:9xxx metastore:472 - Connected to metastore. SessionState:641 - Created local directory: /tmp/xxx SessionState:641 - Created HDFS directory: /tmp/hive/xxx/xxx SessionState:641 - Created local directory: /tmp/xxx/xxx SessionState:641 - Created HDFS directory: /tmp/hive/xxx/xxx/_tmp_space.db file1$:1212 - Amount of files to be processed: 2 file1$:1215 - Files to be processed; file.csv, file1.csv file1$:1220 - Start processing file;file.csv MemoryStore:58 - Block broadcast_0 stored as values in memory (estimated size 379.1 KB, free 7.0 GB) MemoryStore:58 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 33.6 KB, free 7.0 GB) BlockManagerInfo:58 - Added broadcast_0_piece0 in memory on 10.108.0.100:45893 (size: 33.6 KB, free: 7.0 GB) SparkContext:58 - Created broadcast 0 from textFile at MainRevamp.scala:1227 deprecation:1261 - mapred.job.id is deprecated. Instead, use mapreduce.job.id deprecation:1261 - mapred.tip.id is deprecated. Instead, use mapreduce.task.id deprecation:1261 - mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id deprecation:1261 - mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap deprecation:1261 - mapred.task.partition is deprecated. Instead, use mapreduce.task.partition FileOutputCommitter:123 - File Output Committer Algorithm version is 1 FileOutputCommitter:138 - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false DefaultWriterContainer:58 - Using output committer class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter DFSClient:1052 - Created HDFS_DxxxEGATION_TOKEN token 29307571 for xxx on ha-hdfs:xxx TokenCache:144 - Got dt for hdfs://xxx; Kind: HDFS_DxxxEGATION_TOKEN, Service: ha-hdfs:xxx, Ident: (HDFS_DxxxEGATION_TOKEN token 29307571 for xxx) FileInputFormat:249 - Total input paths to process : 1 SparkContext:58 - Starting job: save at MainRevamp.scala:1248 DAGScheduler:58 - Got job 0 (save at MainRevamp.scala:1248) with 1 output partitions DAGScheduler:58 - Final stage: ResultStage 0 (save at MainRevamp.scala:1248) DAGScheduler:58 - Parents of final stage: List() DAGScheduler:58 - Missing parents: List() DAGScheduler:58 - Submitting ResultStage 0 (MapPartitionsRDD[4] at createDataFrame at MainRevamp.scala:1242), which has no missing parents MemoryStore:58 - Block broadcast_1 stored as values in memory (estimated size 104.3 KB, free 7.0 GB) MemoryStore:58 - Block broadcast_1_piece0 stored as bytes in memory (estimated size 39.1 KB, free 7.0 GB) BlockManagerInfo:58 - Added broadcast_1_piece0 in memory on 10.108.0.100:45893 (size: 39.1 KB, free: 7.0 GB) SparkContext:58 - Created broadcast 1 from broadcast at DAGScheduler.scala:1008 DAGScheduler:58 - Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[4] at createDataFrame at MainRevamp.scala:1242) YarnScheduler:58 - Adding task set 0.0 with 1 tasks TaskSetManager:58 - Starting task 0.0 in stage 0.0 (TID 0, xxx.bc, partition 0,RACK_LOCAL, 2349 bytes) BlockManagerInfo:58 - Added broadcast_1_piece0 in memory on xxx.bc:463 (size: 39.1 KB, free: 7.0 GB) BlockManagerInfo:58 - Added broadcast_0_piece0 in memory on xxx.bc:463 (size: 33.6 KB, free: 7.0 GB) TaskSetManager:58 - Finished task 0.0 in stage 0.0 (TID 0) in 10579 ms on xxx.bc (1/1) YarnScheduler:58 - Removed TaskSet 0.0, whose tasks have all completed, from pool DAGScheduler:58 - ResultStage 0 (save at MainRevamp.scala:1248) finished in 10.585 s DAGScheduler:58 - Job 0 finished: save at MainRevamp.scala:1248, took 10.766074 s DefaultWriterContainer:58 - Job job_202010011801_0000 committed. OrcRxxxation:58 - Listing hdfs://xxx/apps/hive/warehouse/xxx/thedate=2020-09-30 on driver OrcRxxxation:58 - Listing hdfs://xxx/apps/hive/warehouse/xxx/thedate=2020-09-30 on driver ParseDriver:185 - Parsing command: ALTER TABLE tablename ADD IF NOT EXISTS PARTITION(thedate='2020-09-30') ParseDriver:209 - Parse Completed PerfLogger:121 - <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver> PerfLogger:121 - <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver> PerfLogger:121 - <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver> PerfLogger:121 - <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver> ParseDriver:185 - Parsing command: ALTER TABLE tablename ADD IF NOT EXISTS PARTITION(thedate='2020-09-30') ParseDriver:209 - Parse Completed PerfLogger:148 - </PERFLOG method=parse start=xxx end=xxx duration=1011 from=org.apache.hadoop.hive.ql.Driver> PerfLogger:121 - <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver> Driver:436 - Semantic Analysis Completed PerfLogger:148 - </PERFLOG method=semanticAnalyze start=xxx end=xxx duration=185 from=org.apache.hadoop.hive.ql.Driver> Driver:240 - Returning Hive schema: Schema(fixxxdSchemas:null, properties:null) PerfLogger:148 - </PERFLOG method=compile start=xxx end=xxx duration=1237 from=org.apache.hadoop.hive.ql.Driver> Driver:160 - Concurrency mode is disabled, not creating a lock manager PerfLogger:121 - <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver> Driver:1328 - Starting command(queryId=xxx): ALTER TABLE tablename ADD IF NOT EXISTS PARTITION(thedate='2020-09-30') PerfLogger:148 - </PERFLOG method=TimeToSubmit start=xxx end=xxx duration=1244 from=org.apache.hadoop.hive.ql.Driver> PerfLogger:121 - <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver> PerfLogger:121 - <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver> Driver:1651 - Starting task [Stage-0:DDL] in serial mode PerfLogger:148 - </PERFLOG method=runTasks start=xxx end=xxx duration=63 from=org.apache.hadoop.hive.ql.Driver> PerfLogger:148 - </PERFLOG method=Driver.execute start=xxx end=xxx duration=70 from=org.apache.hadoop.hive.ql.Driver> Driver:951 - OK PerfLogger:121 - <PERFLOG method=rxxxeasxxxocks from=org.apache.hadoop.hive.ql.Driver> PerfLogger:148 - </PERFLOG method=rxxxeasxxxocks start=xxx end=xxx duration=0 from=org.apache.hadoop.hive.ql.Driver> PerfLogger:148 - </PERFLOG method=Driver.run start=xxx end=xxx duration=1308 from=org.apache.hadoop.hive.ql.Driver> PerfLogger:121 - <PERFLOG method=rxxxeasxxxocks from=org.apache.hadoop.hive.ql.Driver> PerfLogger:148 - </PERFLOG method=rxxxeasxxxocks start=xxx end=xxx duration=0 from=org.apache.hadoop.hive.ql.Driver> file1$:1259 - Filefile.csv is processed and all data has been inserted into Hive file1$:1261 - Filefile.csv has been moved to the /completed directory file1$:1220 - Start processing file; file.csv MemoryStore:58 - Block broadcast_2 stored as values in memory (estimated size 379.1 KB, free 7.0 GB) MemoryStore:58 - Block broadcast_2_piece0 stored as bytes in memory (estimated size 33.6 KB, free 7.0 GB) BlockManagerInfo:58 - Added broadcast_2_piece0 in memory on xxx(size: 33.6 KB, free: 7.0 GB) SparkContext:58 - Created broadcast 2 from textFile at MainRevamp.scala:1227 FileOutputCommitter:123 - File Output Committer Algorithm version is 1 FileOutputCommitter:138 - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false DefaultWriterContainer:58 - Using output committer class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter DFSClient:1052 - Created HDFS_DxxxEGATION_TOKEN token 29307572 for xxx on ha-hdfs:xxx TokenCache:144 - Got dt for hdfs://xxx; Kind: HDFS_DxxxEGATION_TOKEN, Service: ha-hdfs:xxx, Ident: (HDFS_DxxxEGATION_TOKEN token xxx for xxx) FileInputFormat:249 - Total input paths to process : 1 SparkContext:58 - Starting job: save at MainRevamp.scala:1248 DAGScheduler:58 - Got job 1 (save at MainRevamp.scala:1248) with 1 output partitions DAGScheduler:58 - Final stage: ResultStage 1 (save at MainRevamp.scala:1248) DAGScheduler:58 - Parents of final stage: List() DAGScheduler:58 - Missing parents: List() DAGScheduler:58 - Submitting ResultStage 1 (MapPartitionsRDD[11] at createDataFrame at MainRevamp.scala:1242), which has no missing parents MemoryStore:58 - Block broadcast_3 stored as values in memory (estimated size 104.3 KB, free 7.0 GB) MemoryStore:58 - Block broadcast_3_piece0 stored as bytes in memory (estimated size 39.1 KB, free 7.0 GB) BlockManagerInfo:58 - Added broadcast_3_piece0 in memory on 10.xx.0.xx:xxxxx (size: 39.1 KB, free: 7.0 GB) SparkContext:58 - Created broadcast 3 from broadcast at DAGScheduler.scala:1008 DAGScheduler:58 - Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[11] at createDataFrame at MainRevamp.scala:1242) YarnScheduler:58 - Adding task set 1.0 with 1 tasks TaskSetManager:58 - Starting task 0.0 in stage 1.0 (TID 1, xxx.bc, partition 0,RACK_LOCAL, 2349 bytes) BlockManagerInfo:58 - Added broadcast_3_piece0 in memory on xxx.bc:xxx (size: 39.1 KB, free: 7.0 GB) Best regards.
... View more
10-09-2020
06:27 AM
I have a sh script which moves CSV file from HDFS to HIVE. Airflow has two dags dag 1: bashoperator with sh script which checks if file exists in HDFS dag 2: bash operatior with sh script where it runs a spark job to load file from HDFS and load into Hive. When the airflow dag was triggered , dag 1 was successful but dag2 was failing. it couldn't connect to yarn. When I run the script from dag 2 in Putty I'm able to move the file but when launched from airflow the job is failing. I tried to see error log but I don't find any errors in the log. please find the log below; Could you please help me in understanding the problem {taskinstance.py:887} INFO - Executing <Task(BashOperator): start_insertfile> on xxx {standard_task_runner.py:53} INFO - Started process 6927 to run task {logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: import-file1.start_file1 xxx [running]> Host.bc {bash_operator.py:82} INFO - Tmp dir root location: /tmp Temporary script location: /tmp/airflowtmp/filexxx Running command: xxxx.sh Output: which: no /usr/hdp/2.x.x.0-xxx//hadoop/bin/hadoop.distro in ((null)) dirname: missing operand Try 'dirname --hxxxp' for more information. xx_2020_09_30.csv xx.csv INFO SparkContext:58 - Running Spark version 1.6.3 WARN SparkConf:70 - In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN). WARN SparkConf:70 - SPARK_CLASSPATH was detected (set to '/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop-compat.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop2-compat.jar:/usr/hdp/current/phoenix-client/phoenix-cient.jar:/usr/hdp/current/phoenix-client/*.jar:/usr/hdp/current/phoenix-client/lib/*.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler-1.2.1000.2.6.3.0-235.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler.jar'). This is deprecated in Spark 1.0+. Please instead use: - ./spark-submit with --driver-class-path to augment the driver classpath - spark.executor.extraClassPath to augment the executor classpath WARN SparkConf:70 - Setting 'spark.executor.extraClassPath' to '/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop-compat.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop2-compat.jar:/usr/hdp/current/phoenix-client/phoenix-cient.jar:/usr/hdp/current/phoenix-client/*.jar:/usr/hdp/current/phoenix-client/lib/*.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler-1.2.1000.2.6.3.0-235.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler.jar' as a work-around. WARN SparkConf:70 - Setting 'spark.driver.extraClassPath' to '/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop-compat.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop2-compat.jar:/usr/hdp/current/phoenix-client/phoenix-cient.jar:/usr/hdp/current/phoenix-client/*.jar:/usr/hdp/current/phoenix-client/lib/*.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler-1.2.1000.2.6.3.0-235.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler.jar' as a work-around. INFO SecurityManager:58 - Changing view acls to: xxx INFO SecurityManager:58 - Changing modify acls to: xxx INFO SecurityManager:58 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(xxx); users with modify permissions: Set(xxx) INFO Utils:58 - Successfully started service 'sparkDriver' on port xxxxx. INFO Slf4jLogger:80 - Slf4jLogger started INFO Remoting:74 - Starting remoting INFO Remoting:74 - Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.xxx.0.xxx:xxxxx] INFO Utils:58 - Successfully started service 'sparkDriverActorSystem' on port xxxxx. INFO SparkEnv:58 - Registering MapOutputTracker INFO SparkEnv:58 - Registering BlockManagerMaster INFO DiskBlockManager:58 - Created local directory at /spark/blockmgr-xxx INFO MemoryStore:58 - MemoryStore started with capacity 7.0 GB INFO SparkEnv:58 - Registering OutputCommitCoordinator at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) ... ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null} ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/api,null} ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/,null} ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/static,null} ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null} ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/executors/json,null} ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/executors,null} ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/environment/json,null} ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/stages/stage,null} ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/stages/json,null} ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/stages,null} ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null} ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/jobs/job,null} ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/jobs/json,null} ContextHandler:843 - stopped o.s.j.s.ServletContextHandler{/jobs,null} Utils:70 - Service 'SparkUI' could not bind on port 3040. Attempting port xxx. Server:272 - jetty-8.y.z-SNAPSHOT AbstractConnector:338 - Started SxxxectChannxxxConnector@0.0.0.0:xxx Utils:58 - Successfully started service 'SparkUI' on port xxx. SparkUI:58 - Bound SparkUI to 0.0.0.0, and started at http://10.109.0.100:xxxx HttpFileServer:58 - HTTP File server directory is /spark/spark-xxx/httpd-xxx HttpServer:58 - Starting HTTP Server Server:272 - jetty-8.y.z-SNAPSHOT AbstractConnector:338 - Started SocketConnector@xxx Utils:58 - Successfully started service 'HTTP file server' on port xxx. SparkContext:58 - Added JAR file:/xxx.jar at http://10.108.0.xxxx:xxxxx/jars/xxx.jar with timestamp xxx SparkContext:58 - Added JAR file:xxx.jar at http://10.108.0.xxx:xxxxx/jars/xxx.jar with timestamp xxx spark.yarn.driver.memoryOverhead is set but does not apply in client mode. AHSProxy:42 - Connecting to Application History server at xxx.bc/xxx RequestHedgingRMFailoverProxyProvider:146 - Looking for the active RM in [rm1, rm2]... RequestHedgingRMFailoverProxyProvider:170 - Found active RM [rm1] Client:58 - Requesting a new application from cluster with 21 NodeManagers Client:58 - Verifying our application has not requested more than the maximum memory capability of the cluster (614400 MB per container) Client:58 - Will allocate AM container, with 896 MB memory including 384 MB overhead Client:58 - Setting up container launch context for our AM Client:58 - Setting up the launch environment for our AM container Client:58 - Using the spark assembly jar on HDFS because you are using HDP, defaultSparkAssembly:hdfs://xxx/hdp/apps/2.6.5.0-292/spark/spark-hdp-assembly.jar Client:58 - Preparing resources for our AM container YarnSparkHadoopUtil:58 - getting token for namenode: hdfs://xxx/user/xxx/.sparkStaging/application_xxx DFSClient:1052 - Created HDFS_DxxxEGATION_TOKEN token 29307570 for xxx on ha-hdfs:xxx metastore:376 - Trying to connect to metastore with URI thrift://xxxxxx.bc:xxxx metastore:472 - Connected to metastore. RecoverableZooKeeper:120 - Process identifier=hconnection-xxxx connecting to ZooKeeper ensemble=xxx5176.bc:xxx,xxxxxx.bc:xxx,xxx5178.bc:xxx ZooKeeper:100 - Client environment:zookeeper.version=3.4.6-xxx--1, built on 05/11/2018 06:40 GMT ZooKeeper:100 - Client environment:host.name=Host.bc ZooKeeper:100 - Client environment:java.version=1.8.0_262 ZooKeeper:100 - Client environment:java.vendor=Oracle Corporation ZooKeeper:100 - Client environment:java.home=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.xxx.b10-0.xxx7_8.x86_64/jre ZooKeeper:100 - Client environment:java.class.path=/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop-compat.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop2-compat.jar:/usr/hdp/current/phoenix-client/phoenix-cient.jar:/usr/hdp/current/phoenix-client/*.jar:/usr/hdp/current/phoenix-client/lib/*.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler-1.2.1000.2.6.3.0-235.jar:/usr/hdp/current/hive-client/lib/hive-hbase-handler.jar:/usr/hdp/current/spark-client/conf/:/usr/hdp/current/spark-client/lib/spark-assembly-1.6.3.2.6.5.0-292-hadoop2.7.3.2.6.5.0-292.jar:/usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar:/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar:/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar:/usr/hdp/2.6.5.0-292/hadoop/conf/:/usr/hdp/current/hadoop-client/lib/aws-java-sdk-s3-1.10.6.jar:/usr/hdp/current/hadoop-client/lib/aws-java-sdk-core-1.10.6.jar:/usr/hdp/current/hadoop-client/lib/aws-java-sdk-kms-1.10.6.jar ZooKeeper:100 - Client environment:java.library.path=/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib ZooKeeper:100 - Client environment:java.io.tmpdir=/tmp ZooKeeper:100 - Client environment:java.compiler=<NA> ZooKeeper:100 - Client environment:os.name=Linux ZooKeeper:100 - Client environment:os.arch=amd64 ZooKeeper:100 - Client environment:os.version=3.10.0-xxxx.19.1.xxx7.x86_64 ZooKeeper:100 - Client environment:user.name=xxx ZooKeeper:100 - Client environment:user.home=/home/xxx ZooKeeper:100 - Client environment:user.dir=/tmp/airflowtmphp8uukgh ZooKeeper:438 - Initiating client connection, connectString=xxxx.bc:xxx,xxxxxx.bc:xxx,xxx.bc: sessionTimeout=180000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@7af327e3 ClientCnxn:1019 - Opening socket connection to server xxxxxx.bc/10.xxx.0.98:xxx. Will not attempt to authenticate using SASL (unknown error) ClientCnxn:864 - Socket connection established, initiating session, client: /10.xxx.0.100:xxxxx, server: xxxxxx.bc/xxx ClientCnxn:1279 - Session establishment complete on server xxxxxx.bc/10.xxx.0.98:xxx, sessionid = xxx, negotiated timeout = 60000 ConnectionManager$HConnectionImplementation:1703 - Closing zookeeper sessionid=xxx ZooKeeper:684 - Session: xxx closed ClientCnxn:524 - EventThread shut down YarnSparkHadoopUtil:58 - Added HBase security token to credentials. Client:58 - Using the spark assembly jar on HDFS because you are using HDP, defaultSparkAssembly:hdfs://xxx/hdp/apps/2.6.5.0-292/spark/spark-hdp-assembly.jar Client:58 - Source and destination file systems are the same. Not copying hdfs://xxx/hdp/apps/2.6.5.0-xxx/spark/spark-hdp-assembly.jar Client:58 - Uploading resource file:/xxx/airflow/xxx.keytab -> hdfs://xxx/user/xxx/.sparkStaging/application_xxx/xxx.keytab Client:58 - Uploading resource file:/xxx/ux_source/import/conf/file1-log4j.properties -> hdfs://xxx/user/xxx/.sparkStaging/application_xxx/file1-log4j.properties Client:58 - Uploading resource file:/spark/spark-xxxxx/__spark_conf__xxxx.zip -> hdfs://xxx/user/xxx/.sparkStaging/application_xxx/__spark_conf__xxxxx.zip SecurityManager:58 - Changing view acls to: xxx SecurityManager:58 - Changing modify acls to: xxx SecurityManager:58 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(xxx); users with modify permissions: Set(xxx) Client:58 - Submitting application 8168 to ResourceManager TimxxxineClientImpl:302 - Timxxxine service address: http://xxx/ws/v1/timxxxine/ YarnClientImpl:274 - Submitted application application_xxx SchedulerExtensionServices:58 - Starting Yarn extension services with app application_xxx and attemptId None Client:58 - Application report for application_xxx (state: ACCEPTED) Client:58 - client token: Token { kind: YARN_CLIENT_TOKEN, service: } diagnostics: AM container is launched, waiting for AM container to Register with RM ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: DAILY start time: 1601568091265 final status: UNDEFINED tracking URL: http://xxx5176.bc:8088/proxy/application_xxx/ user: xxx Client:58 - Application report for application_xxx (state: ACCEPTED) Client:58 - Application report for application_xxx (state: ACCEPTED) Client:58 - Application report for application_xxx (state: ACCEPTED) YarnSchedulerBackend$YarnSchedulerEndpoint:58 - ApplicationMaster registered as NettyRpcEndpointRef(null) YarnClientSchedulerBackend:58 - Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> xxx5176.bc,xxxxxx.bc, PROXY_URI_BASES -> http://xxx5176.bc:8088/proxy/application_xxx,http://xxxxxx.bc:8088/proxy/application_xxx), /proxy/application_xxx JettyUtils:58 - Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter Client:58 - Application report for application_xxx (state: RUNNING) Client:58 - client token: Token { kind: YARN_CLIENT_TOKEN, service: } diagnostics: N/A ApplicationMaster host: 10.xxx.0.xxx ApplicationMaster RPC port: 0 queue: DAILY start time: 1601568091265 final status: UNDEFINED tracking URL: http://xxx5176.bc:8088/proxy/application_xxx/ user: xxx YarnClientSchedulerBackend:58 - Application application_xxx has started running. Utils:58 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port xxxxx. NettyBlockTransferService:58 - Server created on xxxxx BlockManagerMaster:58 - Trying to register BlockManager BlockManagerMasterEndpoint:58 - Registering block manager 10.xx.0.xx:xxxxx with 7.0 GB RAM, BlockManagerId(driver, 10.xx.0.xxx, xxxxx) BlockManagerMaster:58 - Registered BlockManager EventLoggingListener:58 - Logging events to hdfs:///spark-history/application_xxx YarnClientSchedulerBackend:58 - Registered executor NettyRpcEndpointRef(null) (xx.bc:xx) with ID 5 YarnClientSchedulerBackend:58 - Registered executor NettyRpcEndpointRef(null) (xx.bc:xx) with ID 1 BlockManagerMasterEndpoint:58 - Registering block manager xx.bc:xx with 7.0 GB RAM, BlockManagerId(5, xx.bc, xx) BlockManagerMasterEndpoint:58 - Registering block manager xx.bc:xx with 7.0 GB RAM, BlockManagerId(1, xx.bc, xx) YarnClientSchedulerBackend:58 - Registered executor NettyRpcEndpointRef(null) (xxx.bc:xx) with ID 4 BlockManagerMasterEndpoint:58 - Registering block manager xxx.bc:xxx with 7.0 GB RAM, BlockManagerId(4, xxx.bc, xxx) YarnClientSchedulerBackend:58 - Registered executor NettyRpcEndpointRef(null) (xxx.bc:xx) with ID 2 BlockManagerMasterEndpoint:58 - Registering block manager xx.bc:xx with 7.0 GB RAM, BlockManagerId(2, xxx.bc, xxx) YarnClientSchedulerBackend:58 - SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8 YarnClientSchedulerBackend:58 - Registered executor NettyRpcEndpointRef(null) (xxx.bc:xxx) with ID 3 BlockManagerMasterEndpoint:58 - Registering block manager xxx.bc:xxx with 7.0 GB RAM, BlockManagerId(3, xxx.bc, xxx) HiveContext:58 - Initializing execution hive, version 1.2.1 ClientWrapper:58 - Inspected Hadoop version: 2.7.3.2.6.5.0-292 ClientWrapper:58 - Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.3.2.6.5.0-292 HiveMetaStore:589 - 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore ObjectStore:289 - ObjectStore, initialize called Persistence:77 - Property hive.metastore.integral.jdo.pushdown unknown - will be ignored Persistence:77 - Property datanucleus.cache.levxxx2 unknown - will be ignored ObjectStore:370 - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FixxxdSchema,Order" Datastore:77 - The class "org.apache.hadoop.hive.metastore.modxxx.MFixxxdSchema" is tagged as "embedded-only" so does not have its own datastore table. Datastore:77 - The class "org.apache.hadoop.hive.metastore.modxxx.MOrder" is tagged as "embedded-only" so does not have its own datastore table. Datastore:77 - The class "org.apache.hadoop.hive.metastore.modxxx.MFixxxdSchema" is tagged as "embedded-only" so does not have its own datastore table. Datastore:77 - The class "org.apache.hadoop.hive.metastore.modxxx.MOrder" is tagged as "embedded-only" so does not have its own datastore table. MetaStoreDirectSql:139 - Using direct SQL, underlying DB is DERBY ObjectStore:272 - Initialized ObjectStore ObjectStore:6666 - Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0 ObjectStore:568 - Failed to get database default, returning NoSuchObjectException HiveMetaStore:663 - Added admin role in metastore HiveMetaStore:672 - Added public role in metastore HiveMetaStore:712 - No user is added in admin role, since config is empty SessionState:641 - Created local directory: /tmp/xxx_resources SessionState:641 - Created HDFS directory: /tmp/hive/xxx/xxx SessionState:641 - Created local directory: /tmp/xxx/xxx SessionState:641 - Created HDFS directory: /tmp/hive/xxx/xxx/_tmp_space.db HiveContext:58 - default warehouse location is /user/hive/warehouse HiveContext:58 - Initializing HiveMetastoreConnection version 1.2.1 using Spark classes. ClientWrapper:58 - Inspected Hadoop version: 2.7.3.2.6.5.0-292 ClientWrapper:58 - Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.3.2.6.5.0-292 metastore:376 - Trying to connect to metastore with URI thrift://xxxxxx.bc:9xxx metastore:472 - Connected to metastore. SessionState:641 - Created local directory: /tmp/xxx SessionState:641 - Created HDFS directory: /tmp/hive/xxx/xxx SessionState:641 - Created local directory: /tmp/xxx/xxx SessionState:641 - Created HDFS directory: /tmp/hive/xxx/xxx/_tmp_space.db file1$:1212 - Amount of files to be processed: 2 file1$:1215 - Files to be processed; file.csv, file1.csv file1$:1220 - Start processing file;file.csv MemoryStore:58 - Block broadcast_0 stored as values in memory (estimated size 379.1 KB, free 7.0 GB) MemoryStore:58 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 33.6 KB, free 7.0 GB) BlockManagerInfo:58 - Added broadcast_0_piece0 in memory on 10.108.0.100:45893 (size: 33.6 KB, free: 7.0 GB) SparkContext:58 - Created broadcast 0 from textFile at MainRevamp.scala:1227 deprecation:1261 - mapred.job.id is deprecated. Instead, use mapreduce.job.id deprecation:1261 - mapred.tip.id is deprecated. Instead, use mapreduce.task.id deprecation:1261 - mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id deprecation:1261 - mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap deprecation:1261 - mapred.task.partition is deprecated. Instead, use mapreduce.task.partition FileOutputCommitter:123 - File Output Committer Algorithm version is 1 FileOutputCommitter:138 - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false DefaultWriterContainer:58 - Using output committer class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter DFSClient:1052 - Created HDFS_DxxxEGATION_TOKEN token 29307571 for xxx on ha-hdfs:xxx TokenCache:144 - Got dt for hdfs://xxx; Kind: HDFS_DxxxEGATION_TOKEN, Service: ha-hdfs:xxx, Ident: (HDFS_DxxxEGATION_TOKEN token 29307571 for xxx) FileInputFormat:249 - Total input paths to process : 1 SparkContext:58 - Starting job: save at MainRevamp.scala:1248 DAGScheduler:58 - Got job 0 (save at MainRevamp.scala:1248) with 1 output partitions DAGScheduler:58 - Final stage: ResultStage 0 (save at MainRevamp.scala:1248) DAGScheduler:58 - Parents of final stage: List() DAGScheduler:58 - Missing parents: List() DAGScheduler:58 - Submitting ResultStage 0 (MapPartitionsRDD[4] at createDataFrame at MainRevamp.scala:1242), which has no missing parents MemoryStore:58 - Block broadcast_1 stored as values in memory (estimated size 104.3 KB, free 7.0 GB) MemoryStore:58 - Block broadcast_1_piece0 stored as bytes in memory (estimated size 39.1 KB, free 7.0 GB) BlockManagerInfo:58 - Added broadcast_1_piece0 in memory on 10.108.0.100:45893 (size: 39.1 KB, free: 7.0 GB) SparkContext:58 - Created broadcast 1 from broadcast at DAGScheduler.scala:1008 DAGScheduler:58 - Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[4] at createDataFrame at MainRevamp.scala:1242) YarnScheduler:58 - Adding task set 0.0 with 1 tasks TaskSetManager:58 - Starting task 0.0 in stage 0.0 (TID 0, xxx.bc, partition 0,RACK_LOCAL, 2349 bytes) BlockManagerInfo:58 - Added broadcast_1_piece0 in memory on xxx.bc:463 (size: 39.1 KB, free: 7.0 GB) BlockManagerInfo:58 - Added broadcast_0_piece0 in memory on xxx.bc:463 (size: 33.6 KB, free: 7.0 GB) TaskSetManager:58 - Finished task 0.0 in stage 0.0 (TID 0) in 10579 ms on xxx.bc (1/1) YarnScheduler:58 - Removed TaskSet 0.0, whose tasks have all completed, from pool DAGScheduler:58 - ResultStage 0 (save at MainRevamp.scala:1248) finished in 10.585 s DAGScheduler:58 - Job 0 finished: save at MainRevamp.scala:1248, took 10.766074 s DefaultWriterContainer:58 - Job job_202010011801_0000 committed. OrcRxxxation:58 - Listing hdfs://xxx/apps/hive/warehouse/xxx/thedate=2020-09-30 on driver OrcRxxxation:58 - Listing hdfs://xxx/apps/hive/warehouse/xxx/thedate=2020-09-30 on driver ParseDriver:185 - Parsing command: ALTER TABLE tablename ADD IF NOT EXISTS PARTITION(thedate='2020-09-30') ParseDriver:209 - Parse Completed PerfLogger:121 - <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver> PerfLogger:121 - <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver> PerfLogger:121 - <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver> PerfLogger:121 - <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver> ParseDriver:185 - Parsing command: ALTER TABLE tablename ADD IF NOT EXISTS PARTITION(thedate='2020-09-30') ParseDriver:209 - Parse Completed PerfLogger:148 - </PERFLOG method=parse start=xxx end=xxx duration=1011 from=org.apache.hadoop.hive.ql.Driver> PerfLogger:121 - <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver> Driver:436 - Semantic Analysis Completed PerfLogger:148 - </PERFLOG method=semanticAnalyze start=xxx end=xxx duration=185 from=org.apache.hadoop.hive.ql.Driver> Driver:240 - Returning Hive schema: Schema(fixxxdSchemas:null, properties:null) PerfLogger:148 - </PERFLOG method=compile start=xxx end=xxx duration=1237 from=org.apache.hadoop.hive.ql.Driver> Driver:160 - Concurrency mode is disabled, not creating a lock manager PerfLogger:121 - <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver> Driver:1328 - Starting command(queryId=xxx): ALTER TABLE tablename ADD IF NOT EXISTS PARTITION(thedate='2020-09-30') PerfLogger:148 - </PERFLOG method=TimeToSubmit start=xxx end=xxx duration=1244 from=org.apache.hadoop.hive.ql.Driver> PerfLogger:121 - <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver> PerfLogger:121 - <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver> Driver:1651 - Starting task [Stage-0:DDL] in serial mode PerfLogger:148 - </PERFLOG method=runTasks start=xxx end=xxx duration=63 from=org.apache.hadoop.hive.ql.Driver> PerfLogger:148 - </PERFLOG method=Driver.execute start=xxx end=xxx duration=70 from=org.apache.hadoop.hive.ql.Driver> Driver:951 - OK PerfLogger:121 - <PERFLOG method=rxxxeasxxxocks from=org.apache.hadoop.hive.ql.Driver> PerfLogger:148 - </PERFLOG method=rxxxeasxxxocks start=xxx end=xxx duration=0 from=org.apache.hadoop.hive.ql.Driver> PerfLogger:148 - </PERFLOG method=Driver.run start=xxx end=xxx duration=1308 from=org.apache.hadoop.hive.ql.Driver> PerfLogger:121 - <PERFLOG method=rxxxeasxxxocks from=org.apache.hadoop.hive.ql.Driver> PerfLogger:148 - </PERFLOG method=rxxxeasxxxocks start=xxx end=xxx duration=0 from=org.apache.hadoop.hive.ql.Driver> file1$:1259 - Filefile.csv is processed and all data has been inserted into Hive file1$:1261 - Filefile.csv has been moved to the /completed directory file1$:1220 - Start processing file; file.csv MemoryStore:58 - Block broadcast_2 stored as values in memory (estimated size 379.1 KB, free 7.0 GB) MemoryStore:58 - Block broadcast_2_piece0 stored as bytes in memory (estimated size 33.6 KB, free 7.0 GB) BlockManagerInfo:58 - Added broadcast_2_piece0 in memory on xxx(size: 33.6 KB, free: 7.0 GB) SparkContext:58 - Created broadcast 2 from textFile at MainRevamp.scala:1227 FileOutputCommitter:123 - File Output Committer Algorithm version is 1 FileOutputCommitter:138 - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false DefaultWriterContainer:58 - Using output committer class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter DFSClient:1052 - Created HDFS_DxxxEGATION_TOKEN token 29307572 for xxx on ha-hdfs:xxx TokenCache:144 - Got dt for hdfs://xxx; Kind: HDFS_DxxxEGATION_TOKEN, Service: ha-hdfs:xxx, Ident: (HDFS_DxxxEGATION_TOKEN token xxx for xxx) FileInputFormat:249 - Total input paths to process : 1 SparkContext:58 - Starting job: save at MainRevamp.scala:1248 DAGScheduler:58 - Got job 1 (save at MainRevamp.scala:1248) with 1 output partitions DAGScheduler:58 - Final stage: ResultStage 1 (save at MainRevamp.scala:1248) DAGScheduler:58 - Parents of final stage: List() DAGScheduler:58 - Missing parents: List() DAGScheduler:58 - Submitting ResultStage 1 (MapPartitionsRDD[11] at createDataFrame at MainRevamp.scala:1242), which has no missing parents MemoryStore:58 - Block broadcast_3 stored as values in memory (estimated size 104.3 KB, free 7.0 GB) MemoryStore:58 - Block broadcast_3_piece0 stored as bytes in memory (estimated size 39.1 KB, free 7.0 GB) BlockManagerInfo:58 - Added broadcast_3_piece0 in memory on 10.xx.0.xx:xxxxx (size: 39.1 KB, free: 7.0 GB) SparkContext:58 - Created broadcast 3 from broadcast at DAGScheduler.scala:1008 DAGScheduler:58 - Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[11] at createDataFrame at MainRevamp.scala:1242) YarnScheduler:58 - Adding task set 1.0 with 1 tasks TaskSetManager:58 - Starting task 0.0 in stage 1.0 (TID 1, xxx.bc, partition 0,RACK_LOCAL, 2349 bytes) BlockManagerInfo:58 - Added broadcast_3_piece0 in memory on xxx.bc:xxx (size: 39.1 KB, free: 7.0 GB)
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache YARN