Member since
01-06-2016
27
Posts
0
Kudos Received
0
Solutions
06-27-2017
06:18 AM
Apologies, new thread created: https://community.cloudera.com/t5/Data-Science-and-Machine/Oryx2-ALS-recommender/m-p/56566
... View more
06-27-2017
05:00 AM
Hi srowen, I've a general question about the ALS recommender. Let's say instead of a specific user rating, I'd like to recommend based on a usage count i.e. everytime a user interacted with an item, I submit a new ingestion: userID,itemID (with this type of input the "strength" is assigned a default value of 1). I've seen that if I submit multiple requests like this for the same userID,itemID combo that /recommend/userID/?considerKnownItems=true will return itemID with a higher recommended score. Is this based on a cumulative "strength" i.e. multiple "ratings" of the same item adding up, or is it based on the fact that the item was "rated" more recently than other items? Or is it something else? Alternatively, on ingestion, would it make sense to get the current "rating" of an item by a user and increment that by 1 before submitting it? Does my use case make sense or would it be suited to a different type of algorithm? Thanks
... View more
04-24-2017
09:34 AM
Hi, I can confirm my setup is working with 0.10.0.0. I noticed one issue in the serving layer output (I noticed this before I made the change, so it is not new). 2017-04-24 16:26:54,052 INFO ALSServingModelManager:96 ALSServingModel[features:10, implicit:true, X:(877 users), Y:(1639 items, partitions: [0:1296, 1:343]...), fractionLoaded:1.0]
2017-04-24 16:26:54,053 INFO SolverCache:78 Computing cached solver
2017-04-24 16:26:54,111 INFO SolverCache:83 Computed new solver null Shoud there be a null value in this last message output? here's the code in question com.cloudera.oryx.app.als.SolverCache if (newYTYSolver != null) {
log.info("Computed new solver {}", solver);
solver.set(newYTYSolver);
}
... View more
04-20-2017
08:12 AM
I just spotted an issue with using S3. java.lang.IllegalArgumentException: Wrong FS: s3://mybucket/Oryx/data/oryx-1492697400000.data, expected: hdfs://<master-node>:8020
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:653)
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:194)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:106)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1430)
at com.cloudera.oryx.lambda.batch.SaveToHDFSFunction.call(SaveToHDFSFunction.java:71)
at com.cloudera.oryx.lambda.batch.SaveToHDFSFunction.call(SaveToHDFSFunction.java:35) It's strange that it worked a few times and then crashed on this attempt. Anyway, following guidance from this post: https://forums.aws.amazon.com/thread.jspa?threadID=30945 I updated com.cloudera.oryx.lambda.batch.SaveToHDFSFunction from: FileSystem fs = FileSystem.get(hadoopConf); to FileSystem fs = FileSystem.get(path.toUri(), hadoopConf); This seems to have done the trick.
... View more
04-20-2017
06:26 AM
I can confirm that the speed layer is working as expected now. My current setup is that I have two separate EC2 instances and two seperate EMR clusters. One EC2 instance is running Kafka, while the other is running Zookeeper and an instance of the Serving layer. (This serving layer currently doesn't have any access to HDFS, so I will probably see some errors if the models get too large). One EMR cluster is running the Batch layer and the other is running the Speed layer. This approach seems to working fine for now. My Batch layer is currently even writing the output to S3. I didn't expect for this to work straight out of the box, but it did. I just updated the oryx.conf and I guess Amazon's implementation of HDFS (EMRFS) takes care of the rest. hdfs-base = "s3://mybucket/Oryx" Do you see any issues with this setup? (Apart from the Serving layer and HDFS access)
... View more
04-20-2017
03:23 AM
The batch and serving layers look good. I'm testing out the speed layer now. Here's a question. As per the architectural description (http://oryx.io/index.html), I see that historical data is stored in HDFS by the batch layer. The speed and serving layer only seem to interact with the kafka topics for input and updates. So my question is, is it acceptable for the batch and speed layers to be actually running on separate hadoop clusters? They are configured to use the same kafka brokers. The reason I ask this is that AWS EMR clusters only allow adding "steps" that are run in sequential order. So my speed layer would never actually be launched on the batch layer cluster unless the batch layer spark job was killed or stopped. Also, does the serving layer interact with HDFS some way? I see that the hadoop dependencies are needed for the jar but I was under the impression in only interacted with the kafka topics.
... View more
04-20-2017
02:36 AM
I mentioned above the reason for the 'disconnected' message. It is resolved now. There was no more detail but I found this similar issue which had me look into the kafka versions: http://stackoverflow.com/questions/42851834/apache-kafka-producer-networkclient-broker-server-disconnected
... View more
04-19-2017
09:44 AM
I solved this! The issue was that my kafka/zookeeper hosts and my cluster were using kafka_2.11-0.10.0.0 2.4.0-SNAPSHOT is using 0.10.1.1 I've updated to use kafka_2.11-0.10.1.1
... View more
04-19-2017
08:46 AM
Correction to my last post: The ProducerConfig only shows in the servling layer ouput once the ingest is invoked. I also see the following: Apr 19, 2017 12:04:01 PM org.apache.catalina.util.SessionIdGeneratorBase createSecureRandom
INFO: Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [34,430] milliseconds.
2017-04-19 12:04:01,939 WARN NetworkClient:568 Bootstrap broker <kafka-host>:9092 disconnected
2017-04-19 12:04:02,040 INFO OryxApplication:65 Creating JAX-RS from endpoints in package(s) com.cloudera.oryx.app.serving,com.cloudera.oryx.app.serving.als,com.cloudera.oryx.lambda.serving
2017-04-19 12:04:02,148 WARN NetworkClient:568 Bootstrap broker <kafka-host>:9092 disconnected
2017-04-19 12:04:02,354 WARN NetworkClient:568 Bootstrap broker <kafka-host>:9092 disconnected
2017-04-19 12:04:02,413 INFO Reflections:229 Reflections took 345 ms to scan 1 urls, producing 17 keys and 106 values
2017-04-19 12:04:02,572 WARN NetworkClient:568 Bootstrap broker <kafka-host>:9092 disconnected
2017-04-19 12:04:02,681 INFO Reflections:229 Reflections took 219 ms to scan 1 urls, producing 10 keys and 64 values
2017-04-19 12:04:02,781 WARN NetworkClient:568 Bootstrap broker <kafka-host>:9092 disconnected
2017-04-19 12:04:02,831 INFO Reflections:229 Reflections took 145 ms to scan 1 urls, producing 13 keys and 14 values
2017-04-19 12:04:02,998 WARN NetworkClient:568 Bootstrap broker <kafka-host>:9092 disconnected
2017-04-19 12:04:03,216 WARN NetworkClient:568 Bootstrap broker <kafka-host>:9092 disconnected
2017-04-19 12:04:03,439 WARN NetworkClient:568 Bootstrap broker <kafka-host>:9092 disconnected
Apr 19, 2017 12:04:03 PM org.apache.coyote.AbstractProtocol start
INFO: Starting ProtocolHandler ["http-nio2-8080"]
... View more