Created on 04-26-202311:50 PM - edited 04-26-202311:51 PM
Data ingestion is a common task in the data science workflow, which often involves coordinating with multiple teams. With the "Add Data" action on CDP Data Connections, data scientists can now easily upload data into CDP Data Stores such as Impala or Hive Virtual Warehouses to manage and govern data at scale. This means that data scientists can focus on analyzing and working with their own data rather than dealing with the complexities of data ingestion.
To get started with this feature, users can simply open the “Data” tab in their CML Project and click on the "Add Data" action on the CDP Data Connection they wish to use, and follow the prompts to upload their data into a CDP Data Store.
In addition to simplifying the data ingestion process, the "Add Data" action also provides users with several options for customizing the data import. These options include selecting the database and table name for the data, as well as selecting the column delimiter and locale. Users can also change the column names and types during the import process, giving them greater flexibility in how they want to land their data. These options make it easier for data scientists to import their data into CDP in a way that is customized to their specific needs, reducing the time and effort required to prepare their data for analysis.
For more information about the "Add Data" action on CDP Data Connections, users can refer to Cloudera's documentation.