Reply
New Contributor
Posts: 5
Registered: ‎09-25-2015

How does speed layer updates model?

I am running complete oryx 2 setup on server.

 

All works fine but when a new data is ingested from serving layer (/ingest api), I see that ingested data is processed at the interval of 5 minutes only.

Is this model update runs in speed layer or batch?

 

In oryx.conf, batch has below configuration:

 

batch {

    streaming {

      generation-interval-sec = 300

      num-executors = 4

      executor-cores = 8

      executor-memory = "4g"

    }

 

It seems batch frequency is 5 minutes, so the ingested data is processed by batch layer? Does the complete model is re-computed every 5 minutes?

 

Thanks.

Cloudera Employee
Posts: 366
Registered: ‎07-29-2013

Re: How does speed layer updates model?

There, you are showing configuration for the batch layer. The batch
layer builds models, rather than updates them. (In a way those are
similar things -- a new model is like a complete update to the old
one.) Yes a model is built every 5 minutes in this configuration.

If you also run a speed layer, it can make updates more rapidly, even
though they're only approximate updates.

New Contributor
Posts: 5
Registered: ‎09-25-2015

Re: How does speed layer updates model?

I am running speed layer as well but I dont see any activity on kafka topic when data is ingested till the lapse of 5 minute.

 

So right now batch is what taking all the inputs and re-generating the model. Here is the complete config file:

 

kafka-brokers = "XXX"

zk-servers = "XXX"

hdfs-base = "hdfs:///XXX"

 

oryx {

  id = "ALSExample"

  input-topic {

    broker = ${kafka-brokers}

    lock = {

      master = ${zk-servers}

    }

  }

  update-topic {

    broker = ${kafka-brokers}

    lock = {

      master = ${zk-servers}

    }

  }

  batch {

    streaming {

      generation-interval-sec = 300

      num-executors = 4

      executor-cores = 8

      executor-memory = "4g"

    }

    update-class = "com.cloudera.oryx.app.batch.mllib.als.ALSUpdate"

    storage {

      data-dir =  ${hdfs-base}"/data/"

      model-dir = ${hdfs-base}"/model/"

    }

    ui {

      port = 4040

    }

  }

  speed {

    model-manager-class = "com.cloudera.oryx.app.speed.als.ALSSpeedModelManager"

    ui {

      port = 4041

    }

  }

  serving {

    model-manager-class = "com.cloudera.oryx.app.serving.als.model.ALSServingModelManager"

    application-resources = "com.cloudera.oryx.app.serving,com.cloudera.oryx.app.serving.als"

    api {

      port = 8080

    }

  }

}

 

Does it need any update to force speed layer to process input?

Cloudera Employee
Posts: 366
Registered: ‎07-29-2013

Re: How does speed layer updates model?

You don't see activity on which topic -- the input or update topic?
you should certainly see input topic activity right after you ingest.
If not, that's the source of all the other issues, presumably.

You should see activity on the update topic after a model is built,
yes, after 5 mins. If you're not running a speed layer, you would not
see any other update topic activity.

New Contributor
Posts: 5
Registered: ‎09-25-2015

Re: How does speed layer updates model?

Sorry I am little confused.

I do see the data ingested in input queue and the model created by batch
after 5 min lapse on kafka topic. But data is ingested by serving layer and
batch layer is throwing complete model on topic using that data.

Isn't speed layer supposed to throw model updates on update topic, before
model is re-genrated by batch? I don't see that model update on topic even
though speed layer is running.

Cloudera Employee
Posts: 366
Registered: ‎07-29-2013

Re: How does speed layer updates model?

Yes. But are you running a speed layer?

New Contributor
Posts: 5
Registered: ‎09-25-2015

Re: How does speed layer updates model?

Yes. it is running.

Here is the output of ps command:

ec2-user 12073 12006 0 07:05 pts/0 00:00:00 bash ./oryx-run.sh speed

ec2-user 12106 12073 0 07:05 pts/0 00:01:02 java -Xmx512m
-Dspark.yarn.dist.files=oryx.conf
-Dspark.jars=oryx-speed-2.0.0-SNAPSHOT.jar,/opt/cloudera/parcels/CDH/jars/spark-examples-1.3.0-cdh5.4.2-hadoop2.6.0-cdh5.4.2.jar
-Dsun.io.serialization.extendeddebuginfo=true
-Dspark.executor.extraJavaOptions="-Dconfig.file=oryx.conf
-Dsun.io.serialization.extendeddebuginfo=true" -Dconfig.file=oryx.conf -cp
oryx-speed-2.0.0-SNAPSHOT.jar:/etc/hadoop/conf:/opt/cloudera/parcels/CDH/jars/avro-1.7.6-cdh5.4.2.jar:/opt/cloudera/parcels/CDH/jars/commons-cli-1.2.jar:/opt/cloudera/parcels/CDH/jars/commons-collections-3.2.1.jar:/opt/cloudera/parcels/CDH/jars/commons-configuration-1.7.jar:/opt/cloudera/parcels/CDH/jars/commons-lang-2.6.jar:/opt/cloudera/parcels/CDH/jars/hadoop-auth-2.6.0-cdh5.4.2.jar:/opt/cloudera/parcels/CDH/jars/hadoop-common-2.6.0-cdh5.4.2.jar:/opt/cloudera/parcels/CDH/jars/hadoop-hdfs-2.6.0-cdh5.4.2.jar:/opt/cloudera/parcels/CDH/jars/hadoop-hdfs-nfs-2.6.0-cdh5.4.2.jar:/opt/cloudera/parcels/CDH/jars/hadoop-mapreduce-client-core-2.6.0-cdh5.4.2.jar:/opt/cloudera/parcels/CDH/jars/hadoop-yarn-api-2.6.0-cdh5.4.2.jar:/opt/cloudera/parcels/CDH/jars/hadoop-yarn-applications-distributedshell-2.6.0-cdh5.4.2.jar:/opt/cloudera/parcels/CDH/jars/hadoop-yarn-client-2.6.0-cdh5.4.2.jar:/opt/cloudera/parcels/CDH/jars/hadoop-yarn-common-2.6.0-cdh5.4.2.jar:/opt/cloudera/parcels/CDH/jars/hadoop-yarn-server-web-proxy-2.6.0-cdh5.4.2.jar:/opt/cloudera/parcels/CDH/jars/htrace-core-3.0.4.jar:/opt/cloudera/parcels/CDH/jars/httpclient-4.2.5.jar:/opt/cloudera/parcels/CDH/jars/httpcore-4.2.5.jar:/opt/cloudera/parcels/CDH/jars/httpmime-4.2.5.jar:/opt/cloudera/parcels/CDH/jars/jackson-core-asl-1.9.12.jar:/opt/cloudera/parcels/CDH/jars/jackson-mapper-asl-1.9.12.jar:/opt/cloudera/parcels/CDH/jars/protobuf-java-2.5.0.jar:/opt/cloudera/parcels/CDH/jars/snappy-java-1.0.5.jar:/opt/cloudera/parcels/CDH/jars/spark-assembly-1.3.0-cdh5.4.2-hadoop2.6.0-cdh5.4.2.jar:/opt/cloudera/parcels/CDH/jars/spark-examples-1.3.0-cdh5.4.2-hadoop2.6.0-cdh5.4.2.jar:/opt/cloudera/parcels/CDH/jars/zookeeper-3.4.5-cdh5.4.2.jar
com.cloudera.oryx.speed.Main

Cloudera Employee
Posts: 366
Registered: ‎07-29-2013

Re: How does speed layer updates model?

OK that's good. You can look at the driver UI on port 4041, according
to your config. That might give you some insight. Has it processed any
data?

The speed layer can't make updates until it has seen the first model.
So you would see nothing happen before the first 5 minutes passed and
the first model was loaded. Then subsequent new data can create
updates.

I'm about to release a "2.0.0 beta 3", which has some fixed related to
this. I don't think it is causing a problem for you, but it will be
worth consuming if you are using ALS and the speed layer.

New Contributor
Posts: 5
Registered: ‎09-25-2015

Re: How does speed layer updates model?

Let me check if I can get some more information from the driver UI. Will let you know.

Explorer
Posts: 27
Registered: ‎01-06-2016

Re: How does speed layer updates model?

Was there any follow up on this? I am seeing this issue where the speed layer does not seem to pick up the updates. I constantly get an 503 from the serving layer also. The batch layer is working and I can see the data and model output being pushed to HDFS. I am just trying to run the ALS example on my local machine. 

Announcements