About geko

geko · ‎04-30-2014

Hi, I wanted to build a pipeline of ingesting log data via flume into solr, sounds like nothing special.... But I get stuck at starting the flume agent (with exec source 'tail -f...'), its log tells me that it stops doing anything after "INFO org.kitesdk.morphline.api.MorphlineContext: Importing commands", because this is the last log entry before it repeatedly got restarted (every 30sec.). If I remove the SolrSink from my flume config, the expected files are written to the HDFS sink, thereby the base workflow is fine. For my testing I tried to use the Syslog-example provided in the SearchUserGuide (http://www.cloudera.com/content/cloudera-content/cloudera-docs/Search/latest/Cloudera-Search-User-Guide/csug_morphline_example.html). One strange thing is, how to configure the grok-dictionaries in the morphlines.conf while using ClouderaManager for configuring the stuff? The configuration itself is clear, the text area in "Flume-NG Solr Sink", but how to reference the grok-dictionaries? just "dictionaryFiles : [grok-dictionaries]" or some path prefixes ?!?! ========================= this is the log of the flume agent (while I am writing entries to the watched file, but nothing will be processed): "" 2014-04-30 15:42:37,285 INFO org.apache.flume.sink.hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false 2014-04-30 15:44:16,448 INFO org.apache.flume.node.PollingPropertiesFileConfigurationProvider: Configuration provider starting 2014-04-30 15:44:16,493 INFO org.apache.flume.node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:/var/run/cloudera-scm-agent/process/1027-flume-AGENT/flume.conf 2014-04-30 15:44:16,506 INFO org.apache.flume.conf.FlumeConfiguration: Processing:HDFS 2014-04-30 15:44:16,507 INFO org.apache.flume.conf.FlumeConfiguration: Processing:solrSink 2014-04-30 15:44:16,507 INFO org.apache.flume.conf.FlumeConfiguration: Processing:HDFS 2014-04-30 15:44:16,508 INFO org.apache.flume.conf.FlumeConfiguration: Processing:HDFS 2014-04-30 15:44:16,508 INFO org.apache.flume.conf.FlumeConfiguration: Processing:HDFS 2014-04-30 15:44:16,508 INFO org.apache.flume.conf.FlumeConfiguration: Processing:HDFS 2014-04-30 15:44:16,509 INFO org.apache.flume.conf.FlumeConfiguration: Processing:solrSink 2014-04-30 15:44:16,509 INFO org.apache.flume.conf.FlumeConfiguration: Processing:solrSink 2014-04-30 15:44:16,510 INFO org.apache.flume.conf.FlumeConfiguration: Processing:solrSink 2014-04-30 15:44:16,510 INFO org.apache.flume.conf.FlumeConfiguration: Processing:solrSink 2014-04-30 15:44:16,510 INFO org.apache.flume.conf.FlumeConfiguration: Processing:HDFS 2014-04-30 15:44:16,511 INFO org.apache.flume.conf.FlumeConfiguration: Added sinks: HDFS solrSink Agent: agent 2014-04-30 15:44:16,511 INFO org.apache.flume.conf.FlumeConfiguration: Processing:HDFS 2014-04-30 15:44:16,512 INFO org.apache.flume.conf.FlumeConfiguration: Processing:solrSink 2014-04-30 15:44:16,512 INFO org.apache.flume.conf.FlumeConfiguration: Processing:HDFS 2014-04-30 15:44:16,513 INFO org.apache.flume.conf.FlumeConfiguration: Processing:HDFS 2014-04-30 15:44:16,561 INFO org.apache.flume.conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [agent] 2014-04-30 15:44:16,562 INFO org.apache.flume.node.AbstractConfigurationProvider: Creating channels 2014-04-30 15:44:16,580 INFO org.apache.flume.channel.DefaultChannelFactory: Creating instance of channel memoryChannel type memory 2014-04-30 15:44:16,592 INFO org.apache.flume.node.AbstractConfigurationProvider: Created channel memoryChannel 2014-04-30 15:44:16,594 INFO org.apache.flume.source.DefaultSourceFactory: Creating instance of source execSrc, type exec 2014-04-30 15:44:16,609 INFO org.apache.flume.sink.DefaultSinkFactory: Creating instance of sink: solrSink, type: org.apache.flume.sink.solr.morphline.MorphlineSolrSink 2014-04-30 15:44:16,616 INFO org.apache.flume.sink.DefaultSinkFactory: Creating instance of sink: HDFS, type: hdfs 2014-04-30 15:44:17,477 INFO org.apache.flume.sink.hdfs.HDFSEventSink: Hadoop Security enabled: false 2014-04-30 15:44:17,481 INFO org.apache.flume.node.AbstractConfigurationProvider: Channel memoryChannel connected to [execSrc, solrSink, HDFS] 2014-04-30 15:44:17,509 INFO org.apache.flume.node.Application: Starting new configuration:{ sourceRunners:{execsrc=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource{name:execSrc,state:IDLE} }} sinkRunners:{HDFS=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@9a87fad counterGroup:{ name:null counters:{} } }, solrSink=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@15563bcf counterGroup:{ name:null counters:{} } }} channels:{memoryChannel=org.apache.flume.channel.MemoryChannel{name: memoryChannel}} } 2014-04-30 15:44:17,521 INFO org.apache.flume.node.Application: Starting Channel memoryChannel 2014-04-30 15:44:17,623 INFO org.apache.flume.instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: memoryChannel: Successfully registered new MBean. 2014-04-30 15:44:17,623 INFO org.apache.flume.instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: memoryChannel started 2014-04-30 15:44:17,630 INFO org.apache.flume.node.Application: Starting Sink HDFS 2014-04-30 15:44:17,632 INFO org.apache.flume.node.Application: Starting Sink solrSink 2014-04-30 15:44:17,632 INFO org.apache.flume.instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: HDFS: Successfully registered new MBean. 2014-04-30 15:44:17,633 INFO org.apache.flume.instrumentation.MonitoredCounterGroup: Component type: SINK, name: HDFS started 2014-04-30 15:44:17,633 INFO org.apache.flume.sink.solr.morphline.MorphlineSink: Starting Morphline Sink solrSink (MorphlineSolrSink) ... 2014-04-30 15:44:17,633 INFO org.apache.flume.instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: solrSink: Successfully registered new MBean. 2014-04-30 15:44:17,633 INFO org.apache.flume.instrumentation.MonitoredCounterGroup: Component type: SINK, name: solrSink started 2014-04-30 15:44:17,634 INFO org.apache.flume.node.Application: Starting Source execSrc 2014-04-30 15:44:17,637 INFO org.apache.flume.source.ExecSource: Exec source starting with command:tail -F /tmp/spooldir/test.txt 2014-04-30 15:44:17,650 INFO org.apache.flume.instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: execSrc: Successfully registered new MBean. 2014-04-30 15:44:17,650 INFO org.apache.flume.instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: execSrc started 2014-04-30 15:44:17,687 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2014-04-30 15:44:17,877 INFO org.mortbay.log: jetty-6.1.26 2014-04-30 15:44:17,956 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:41414 2014-04-30 15:44:18,134 INFO org.kitesdk.morphline.api.MorphlineContext: Importing commands 2014-04-30 15:45:00,994 INFO org.apache.flume.sink.hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false "" This log lines are written every ~30sec. =====flume config==== agent.sources = execSrc agent.channels = memoryChannel agent.sinks = HDFS solrSink agent.sources.execSrc.type = exec agent.sources.execSrc.command = tail -F /tmp/spooldir/test.txt agent.sources.execSrc.interceptors.uuidinterceptor.type = org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder agent.sources.execSrc.interceptors.uuidinterceptor.headerName = id agent.sources.execSrc.interceptors.uuidinterceptor.preserveExisting = false agent.sources.execSrc.interceptors.uuidinterceptor.prefix = myhostname agent.sources.execSrc.channels = memoryChannel agent.channels.memoryChannel.type = memory agent.channels.memoryChannel.capacity = 10000 agent.channels.memoryChannel.transactionCapacity = 1000 agent.sinks.solrSink.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink agent.sinks.solrSink.channel = memoryChannel agent.sinks.solrSink.batchSize = 1000 agent.sinks.solrSink.batchDurationMillis = 1000 agent.sinks.solrSink.morphlineFile = morphlines.conf agent.sinks.solrSink.morphlineId = morphline1 agent.sinks.HDFS.channel = memoryChannel agent.sinks.HDFS.type = hdfs agent.sinks.HDFS.hdfs.path = hdfs://hadoop-pg-6.cluster:8020/tmp/test4solr agent.sinks.HDFS.hdfs.fileType = DataStream agent.sinks.HDFS.hdfs.writeFormat = Text agent.sinks.HDFS.hdfs.batchSize = 2000 agent.sinks.HDFS.hdfs.rollSize = 0 agent.sinks.HDFS.hdfs.rollCount = 2000 agent.sinks.HDFS.hdfs.rollInterval = 30 ======morphline config======= # Specify server locations in a SOLR_LOCATOR variable; used later in variable substitutions: SOLR_LOCATOR : { collection : workshop # ZooKeeper ensemble zkHost : "$ZK_HOST" # The maximum number of documents to send to Solr per network batch (throughput knob) # batchSize : 100 } morphlines : [ { id : morphline1 importCommands : ["org.kitesdk.**", "org.apache.solr.**"] commands : [ { readLine { charset : UTF-8 } } { addCurrentTime { field : manual_timestamp preserveExisting : false } } { grok { dictionaryFiles : [grok-dictionaries] expressions : { message : """<%{POSINT:syslog_pri}>%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA&colon;syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA&colon;syslog_message}""" } } } # convert timestamp field to native Solr timestamp format # e.g. 2012-09-06T07:14:34Z to 2012-09-06T07:14:34.000Z { convertTimestamp { field : created_at inputFormats : ["yyyy-MM-dd'T'HH:mm:ss'Z'", "yyyy-MM-dd"] inputTimezone : America/Los_Angeles outputFormat : "yyyy-MM-dd'T'HH:mm:ss.SSSZ" outputTimezone : UTC } } # Recall that Solr throws an exception on any attempt to load a document that contains a # field that isn't specified in schema.xml. { sanitizeUnknownSolrFields { # Location from which to fetch Solr schema solrLocator : ${SOLR_LOCATOR} } } # log the record at DEBUG level to SLF4J { logDebug { format : "output record: {}", args : ["@{}"] } } # load the record into a SolrServer { loadSolr { solrLocator : ${SOLR_LOCATOR} } } ] } ] Additionally I wanted to ask where the logDebug output from the morphline will be written to? What do I need to modify to be able to ingest data into Solr?!?! any help appreciated....

geko · ‎04-28-2014

Hi, some further googling enlightened me about the correct syntax 😉 for interested people the correct syntax is: hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi -Dmapred.job.queue.name=lowprio 10 100

geko · ‎04-28-2014

Hi, I configured using capacity scheduler and 3 queues with some ACLs on it. Now I want to test the stuff by executing the "pi" M/R example provided by CDH. How do I have to call it, to assign it to a specific queue ? I tried: sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 20 20 -D mapreduce.job.queuename=lowprio but this just prints out the usage hint: "" Usage: org.apache.hadoop.examples.QuasiMonteCarlo <nMaps> <nSamples> Generic options supported are -conf <configuration file> specify an application configuration file -D <property=value> use value for given property -fs <local|namenode:port> specify a namenode -jt <local|jobtracker:port> specify a job tracker -files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster -libjars <comma separated list of jars> specify comma separated jar files to include in the classpath. -archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] "" thanks

geko · ‎04-17-2014

Hi, the previous error at trying to access a parquet based table via shark "java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat" has been resolved by adding parquet-hive-bundle-1.4.1.jar to shark's lib folder. Now the Hive metastore can be read successfully (also the parquet based table). But if I want to select from that table I receive: org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 4 times (most recent failure: Exception failure: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018) This is really strange, since the class org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe is included in the parquet-hive-bundle-1.4.1.jar, too ?!?! I copied that .jar to both lib folders, shark (/opt/shark/shark-0.9.1/lib) and spark (under /opt/cloudera/parcels...) ...getting more and more confused 😉 any help ? regards, Gerd

geko · ‎04-17-2014

Hi, I have a parquet based table and can successfully select it within Hive and Impala, but if I want to select from that table in shark, I receive the error: 14/04/17 11:33:49 INFO parse.ParseDriver: Parse Completed 14/04/17 11:33:49 INFO parse.SharkSemanticAnalyzer: Get metadata for source tables FAILED: Hive Internal Error: java.lang.RuntimeException(java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat) 14/04/17 11:33:50 ERROR shark.SharkDriver: FAILED: Hive Internal Error: java.lang.RuntimeException(java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat) java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat at org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:306) at org.apache.hadoop.hive.ql.metadata.Table.<init>(Table.java:99) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:988) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:891) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1083) Where is this class included? what to do/link/install/configure to get rid of the error? I am using CDH5, parquet libs are in /opt/cloudera/parcels/CDH/lib/parquet thanks in advance, Gerd

geko · ‎04-15-2014

Hi, issue has been solved. Problem was that there was a mismatch between directory permissions and ownership (owner was 700, not the permissions, stupid thing 😉 ). Nevertheless the error message is somehow misleading and it would preferrably print that the user/permissions are incorrect. Gerd

geko · ‎04-15-2014

Hi, in our CDH3 cluster (hadoop-0.20.2, yes, it's pretty old 😉 ) we had a disk failure on one node and thereby the datanode went down. After replacing the disk and setting up directories/permissions, starting the datanode still fails with this error: 2014-04-15 16:14:43,165 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 5, volumes configured: 6, volumes failed: 1, volume failures tolerated: 0 at org.apache.hadoop.hdfs.server.datanode.FSDataset.<init>(FSDataset.java:1025) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:416) at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:303) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1643) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1583) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1601) at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1727) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1744) How to tell the datanode that the disk has been replaced, or how to "enable" the replaced disk ?!?! I don't want to configure a tolerated disk failure of 1 to be able to start the datanode 😉 br, Gerd

geko · ‎04-14-2014

Hi Sean, thanks for your hint, increasing the worker memory settings solved the problem. I set the worker_max_heapsize to its default val of 512MB (it was just 64MB before) and the total executor memsize to 2GB. thanks, Gerd

geko · ‎04-13-2014

Hi, I'm going to start working with Spark and installed the parcels in our CDH5 GA cluster. Master: hadoop-pg-5.cluster, Worker: hadoop-pg-7.cluster Both daemons are running, Master-Web-UI shows the connected worker, and the log entries show: master: 2014-04-13 21:26:40,641 INFO Remoting: Starting remoting 2014-04-13 21:26:40,930 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkMaster@hadoop-pg-5.cluster:7077] 2014-04-13 21:26:41,356 INFO org.apache.spark.deploy.master.Master: Starting Spark master at spark://hadoop-pg-5.cluster:7077 ... 2014-04-13 21:26:41,439 INFO org.eclipse.jetty.server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:18080 2014-04-13 21:26:41,441 INFO org.apache.spark.deploy.master.ui.MasterWebUI: Started Master web UI at http://hadoop-pg-5.cluster:18080 2014-04-13 21:26:41,476 INFO org.apache.spark.deploy.master.Master: I have been elected leader! New state: ALIVE 2014-04-13 21:27:40,319 INFO org.apache.spark.deploy.master.Master: Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM worker: 2014-04-13 21:27:39,037 INFO akka.event.slf4j.Slf4jLogger: Slf4jLogger started 2014-04-13 21:27:39,136 INFO Remoting: Starting remoting 2014-04-13 21:27:39,413 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkWorker@hadoop-pg-7.cluster:7078] 2014-04-13 21:27:39,706 INFO org.apache.spark.deploy.worker.Worker: Starting Spark worker hadoop-pg-7.cluster:7078 with 2 cores, 64.0 MB RAM 2014-04-13 21:27:39,708 INFO org.apache.spark.deploy.worker.Worker: Spark home: /opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/spark ... 2014-04-13 21:27:39,888 INFO org.eclipse.jetty.server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:18081 2014-04-13 21:27:39,889 INFO org.apache.spark.deploy.worker.ui.WorkerWebUI: Started Worker web UI at http://hadoop-pg-7.cluster:18081 2014-04-13 21:27:39,890 INFO org.apache.spark.deploy.worker.Worker: Connecting to master spark://hadoop-pg-5.cluster:7077... 2014-04-13 21:27:40,360 INFO org.apache.spark.deploy.worker.Worker: Successfully registered with master spark://hadoop-pg-5.cluster:7077 Looks good, so far. Now I want to execute the python pi example by executing (on the worker): cd /opt/cloudera/parcels/CDH/lib/spark && ./bin/pyspark ./python/examples/pi.py spark://hadoop-pg-5.cluster:7077 Here the strange thing happens, the script doesn't get executed, it hangs (repeating this output forever) at : 14/04/13 21:31:03 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 14/04/13 21:31:18 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory The whole log is: 14/04/13 21:30:44 INFO Slf4jLogger: Slf4jLogger started 14/04/13 21:30:45 INFO Remoting: Starting remoting 14/04/13 21:30:45 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@hadoop-pg-7.cluster:50601] 14/04/13 21:30:45 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@hadoop-pg-7.cluster:50601] 14/04/13 21:30:45 INFO SparkEnv: Registering BlockManagerMaster 14/04/13 21:30:45 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20140413213045-acec 14/04/13 21:30:45 INFO MemoryStore: MemoryStore started with capacity 294.9 MB. 14/04/13 21:30:45 INFO ConnectionManager: Bound socket to port 57506 with id = ConnectionManagerId(hadoop-pg-7.cluster,57506) 14/04/13 21:30:45 INFO BlockManagerMaster: Trying to register BlockManager 14/04/13 21:30:45 INFO BlockManagerMasterActor$BlockManagerInfo: Registering block manager hadoop-pg-7.cluster:57506 with 294.9 MB RAM 14/04/13 21:30:45 INFO BlockManagerMaster: Registered BlockManager 14/04/13 21:30:45 INFO HttpServer: Starting HTTP Server 14/04/13 21:30:45 INFO HttpBroadcast: Broadcast server started at http://10.147.210.7:51224 14/04/13 21:30:45 INFO SparkEnv: Registering MapOutputTracker 14/04/13 21:30:45 INFO HttpFileServer: HTTP File server directory is /tmp/spark-f9ab98c8-2adf-460a-9099-6dc07c7dc89f 14/04/13 21:30:45 INFO HttpServer: Starting HTTP Server 14/04/13 21:30:46 INFO SparkUI: Started Spark Web UI at http://hadoop-pg-7.cluster:4040 14/04/13 21:30:46 INFO AppClient$ClientActor: Connecting to master spark://hadoop-pg-5.cluster:7077... 14/04/13 21:30:47 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20140413213046-0000 14/04/13 21:30:48 INFO SparkContext: Starting job: reduce at ./python/examples/pi.py:36 14/04/13 21:30:48 INFO DAGScheduler: Got job 0 (reduce at ./python/examples/pi.py:36) with 2 output partitions (allowLocal=false) 14/04/13 21:30:48 INFO DAGScheduler: Final stage: Stage 0 (reduce at ./python/examples/pi.py:36) 14/04/13 21:30:48 INFO DAGScheduler: Parents of final stage: List() 14/04/13 21:30:48 INFO DAGScheduler: Missing parents: List() 14/04/13 21:30:48 INFO DAGScheduler: Submitting Stage 0 (PythonRDD[1] at reduce at ./python/examples/pi.py:36), which has no missing parents 14/04/13 21:30:48 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (PythonRDD[1] at reduce at ./python/examples/pi.py:36) 14/04/13 21:30:48 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 14/04/13 21:31:03 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 14/04/13 21:31:18 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory Thereby I have to cancel the execution of the script. If I am doing this, I receive the following log entries on the master (! at cancellation of the python pi script !): 2014-04-13 21:30:46,965 INFO org.apache.spark.deploy.master.Master: Registering app PythonPi 2014-04-13 21:30:46,974 INFO org.apache.spark.deploy.master.Master: Registered app PythonPi with ID app-20140413213046-0000 2014-04-13 21:31:27,123 INFO org.apache.spark.deploy.master.Master: akka.tcp://spark@hadoop-pg-7.cluster:50601 got disassociated, removing it. 2014-04-13 21:31:27,125 INFO org.apache.spark.deploy.master.Master: Removing app app-20140413213046-0000 2014-04-13 21:31:27,143 INFO org.apache.spark.deploy.master.Master: akka.tcp://spark@hadoop-pg-7.cluster:50601 got disassociated, removing it. 2014-04-13 21:31:27,144 INFO akka.actor.LocalActorRef: Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://sparkMaster/deadLetters] to Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%4010.147.210.7%3A44207-2#-389971336] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. 2014-04-13 21:31:27,194 ERROR akka.remote.EndpointWriter: AssociationError [akka.tcp://sparkMaster@hadoop-pg-5.cluster:7077] -> [akka.tcp://spark@hadoop-pg-7.cluster:50601]: Error [Association failed with [akka.tcp://spark@hadoop-pg-7.cluster:50601]] [ akka.remote.EndpointAssociationException: Association failed with [akka.tcp://spark@hadoop-pg-7.cluster:50601] Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: hadoop-pg-7.cluster/10.147.210.7:50601 ] 2014-04-13 21:31:27,199 INFO org.apache.spark.deploy.master.Master: akka.tcp://spark@hadoop-pg-7.cluster:50601 got disassociated, removing it. 2014-04-13 21:31:27,215 ERROR akka.remote.EndpointWriter: AssociationError [akka.tcp://sparkMaster@hadoop-pg-5.cluster:7077] -> [akka.tcp://spark@hadoop-pg-7.cluster:50601]: Error [Association failed with [akka.tcp://spark@hadoop-pg-7.cluster:50601]] [ akka.remote.EndpointAssociationException: Association failed with [akka.tcp://spark@hadoop-pg-7.cluster:50601] Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: hadoop-pg-7.cluster/10.147.210.7:50601 ] 2014-04-13 21:31:27,222 INFO org.apache.spark.deploy.master.Master: akka.tcp://spark@hadoop-pg-7.cluster:50601 got disassociated, removing it. 2014-04-13 21:31:27,234 ERROR akka.remote.EndpointWriter: AssociationError [akka.tcp://sparkMaster@hadoop-pg-5.cluster:7077] -> [akka.tcp://spark@hadoop-pg-7.cluster:50601]: Error [Association failed with [akka.tcp://spark@hadoop-pg-7.cluster:50601]] [ akka.remote.EndpointAssociationException: Association failed with [akka.tcp://spark@hadoop-pg-7.cluster:50601] Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: hadoop-pg-7.cluster/10.147.210.7:50601 ] 2014-04-13 21:31:27,238 INFO org.apache.spark.deploy.master.Master: akka.tcp://spark@hadoop-pg-7.cluster:50601 got disassociated, removing it. What is going wrong here ?!?!?!? I get the same behaviour if I start the spark-shell on the worker and try to execute e.g. sc.parallelize(1 to 100,10).count any help highly appreciated, Gerd

geko · ‎04-02-2014

Hi Clint, many thanks, the "world:anyone" combination was the missing fact 😉 Despite a "Authentication is not valid" message at executing the setAcl I was able to get access/delete the "shutdown" node under /hbase Log: [zk: localhost:2181(CONNECTED) 4] getAcl /hbase/shutdown 'sasl,'hbase : cdrwa [zk: localhost:2181(CONNECTED) 5] setAcl /hbase/shutdown world:anyone:cdrwa Authentication is not valid : /hbase/shutdown [zk: localhost:2181(CONNECTED) 6] delete /hbase/shutdown [zk: localhost:2181(CONNECTED) 7] getAcl /hbase/shutdown Node does not exist: /hbase/shutdown HBase is up and running again, that's what matters 😄 regards....: Gerd :...

Online	Offline
Last Visited	‎03-12-2020 05:23 AM

Member Since	‎08-08-2013 05:01 AM
Last Visited	‎03-12-2020 05:23 AM
Posts	339
Kudos received	133

Cloudera Community

Re: report: Access denied for user root. Superuser...

Re: configure kafka ssl failed

Re: SOLR + Kerberos + curl ==> Cannot find key of ...

Re: Sentry ACLs are not being applied

Re: Error while batch importing from HBase to Solr...

need help with flume - morphline - solr pipeline, ...

Re: How to submit a job to a specific queue???

How to submit a job to a specific queue???

Re: ClassNotFoundException: org.apache.hadoop.hive...

ClassNotFoundException: org.apache.hadoop.hive.ql....

Re: CDH3: disk failure, datanode doesn't start eve...

CDH3: disk failure, datanode doesn't start even af...

Re: TaskSchedulerImpl: Initial job has not accepte...

TaskSchedulerImpl: Initial job has not accepted an...

Re: how to remove a node in zookeeper, forcibly ?