I wanted to quickly share that we’ve released FIVE new Applied ML Prototypes (AMPs) in Cloudera Machine Learning (CML) and Cloudera Data Science Workbench (CDSW)! These AMPs solve a wide range of problems for data scientists and help jumpstart ML / AI projects, enabling them to deliver greater value faster across their organization. Here’s an overview of what was released:
Summarization — This project demonstrates four automatic summarization models, including extractive and abstractive techniques.
Why you should care: Summarization enables users to quickly extract important information from larger bodies of text. This is useful for any industry looking to accelerate research, competitive analysis, or businesses that need to process and understand large amounts of text information that would be time-prohibitive for a human to do.
AutoML with TPot — This project enables automated machine learning on a sample of credit card fraud data.
Why you should care: AutoML has the potential to accelerate many repetitive processes in the ML model development lifecycle. While many proprietary AutoML organizations make it difficult to tweak or adjust how a model is built, TPot — an open-source library for AutoML — makes it easy to get the most from AutoML without compromising flexibility and customization.
Train Gensim’s Word2Vec - This project demonstrates how to train Word2Vec for a non-language use case to learn embeddings for products on an e-commerce website.
Why you should care: Word embedding is a very popular natural language processing (NLP) technique, it enables capturing the meaning of a word in the context of a document. For users this allows them to extract a more accurate meaning from words when performing natural language processing, thus leading to more accurate text analysis.
Why you should care: In addition to the UI interface in CML, the new API v2 release delivers the ability to programmatically interact with your models. For users this means greater flexibility across their production environments for interacting with and retraining models, giving them the freedom to maintain more ML projects effectively.
TensorBoard as a CML Application — TensorBoard is a tool that provides the measurements and visualizations needed to help inspect, debug, and iterate during the machine learning workflow.
Why you should care: TensorBoard makes it easier to track the complex process of developing an ML model. This AMP demonstrates how to run TensorBoard within Cloudera Machine Learning (CML) via the Application feature, an example that will be easy to repurpose for any CML project.