About RonPick

RonPick · ‎06-09-2026

Bridging the Visibility Gap for Legacy and Niche Systems While Cloudera Data Lineage provides native harvesting for various BI systems, many organizations rely on niche, homegrown, or legacy tools that aren't covered by our standard support. To bridge this gap, we previously offered the "Universal Connector," enabling users to import lineage from ETL tools, reports, or databases using CSV templates. However, a significant hurdle remained: while SQL scripts power many processes and reports, users were forced to manually define every source column and its associated logic. Furthermore, the platform lacked a way to represent the actual scripts or the specific logic they contained. The Solution: Custom Lineage Connector The latest update to this capability introduces the ability to include Source SQL scripts directly within CSV files. This update allows direct injections of source SQL queries for databases, ETL/ELT tools, and BI tools via CSV. To emphasize this new flexibility and acknowledge the manual setup involved, the "Universal Connector" is being rebranded as the Custom Lineage Connector. Key Benefits: Automated Lineage Creation: The injected metadata is automatically analyzed, creating unified lineage that behaves exactly like a native connector across the entire platform - fully integrated into all lineage layers, discovery, and the Knowledge Hub. Streamlined Manual Mapping: By parsing SQL scripts from within CSV files, we now automatically detect sources and transformations. This streamlines manual mapping by integrating them into the lineage platform alongside existing metadata. Enhanced Script Visibility: The update provides visibility into the actual scripts themselves, addressing the previous inability to represent specific logic. Unified Perspective: This allows for the consolidation of manual entries with native metadata to create a unified perspective of the data environment. Clearer Product Identity: The updated name highlights our capacity to ingest lineage from sources beyond our standard out-of-the-box support while clearly indicating that some manual configuration is required. Key Use Cases: Integrating ETL processes or reports from systems lacking native support that utilize SQL queries instead of direct table references. Documenting embedded logic within database objects that exceed the scope of our standard compatibility. Limitations of the Custom Connector: Please note that parsing for sources is restricted to SQL only at this time. This connector does not include harvesting the metadata, the CSV templates need to be filled by the customer or professional services. Resources Technical Documentation: Custom Lineage Connector Guide

RonPick · ‎05-11-2026

This update allows organizations to move beyond reactive troubleshooting by providing a live, continuous pulse of their data ecosystem. By delivering instant visibility into infrastructure, services, and active workloads, RTM ensures that your platform remains healthy, performant, and reliable. Eliminate Blind Spots with Proactive Visibility In high-scale data environments, even a few minutes of downtime or a runaway query can derail critical business operations. RTM addresses this by monitoring three core areas: Cluster Infrastructure: Track the health of your underlying hardware and cloud resources (e.g., CPU, memory, and disk I/O) to spot resource exhaustion before it triggers a crash. Service Health: Monitor the uptime and performance of vital services like Hive, Impala, and Spark in real time. Workload Performance: Identify failing jobs or deviating queries as they happen, allowing for immediate intervention to prevent cascading failures across the cluster. Key Benefits: Proactive Mitigation: Shift from "fixing what broke" to "preventing the break" by detecting deviations from normal baseline behavior instantly. Unified SaaS Experience: Access deep, real-time insights across hybrid and multi-cloud environments from a single, centralized dashboard. Optimized SLAs: Maintain high availability for mission-critical applications by reducing Mean Time to Detection (MTTD) and Resolution (MTTR). Resource Efficiency: Identify and kill "rogue" processes that are consuming capacity before they impact other users. Use Cases Optimize Capacity Planning: Analyze average cluster usage at any given time to identify available capacity and efficiently schedule new jobs or queries. Monitor Health in Real-Time: Monitor the uptime and performance of vital services like Hive, Impala, and Spark in real time. Manage Workload Performance: Real-time tracking of Hive, Impala, Spark, MR and Oozie workloads. Instead of waiting for a query to finish to see why it was slow, you can now identify bottlenecks while the query is executing. Track Granular Infrastructure Metrics: Monitor system telemetry at the node level, including CPU, memory, network, and storage to identify CPU spikes, memory quota issues etc Improved Financial Governance: By integrating real-time data with financial governance, customers can precisely track provisioned versus utilized costs. This eliminates over-provisioning and optimizes cloud spend through more accurate capacity forecasting. The Bottom Line: Real-time monitoring enables Cloudera Observability to provide a live view of your platform, helping you safeguard your data platform's current state and proactively avoid workload failures. Resources Release Notes Support Matrix How to enable real time monitoring Feature docs

RonPick · ‎03-18-2026

Native Governance and Data Lineage for the Unified Data Fabric The Cloudera Data Lineage team is happy to announce that Cloudera Data Lineage now supports Trino federated query engine for big data.This integration allows Trino users to access data wherever it lives without sacrificing governance, trust, or visibility. As modern enterprises navigate increasingly complex, hybrid, and multi-cloud environments, data often becomes fragmented across various systems and silos. Combining the federated query engine of Trino with the automated lineage of Cloudera Data Lineage provides organizations with unprecedented insight and visibility. Here are the primary benefits and use cases that the integration of Cloudera Data Lineage with Trino provides: Interoperability and Federated Access: Cloudera Data Lineage's newly released Trino integration enables customers to federate queries and securely access data from both Cloudera and non-Cloudera engines. This capability allows organizations to query data in place, dramatically improving interoperability within extended data environments. Comprehensive Visibility Across the Data Estate: While Trino seamlessly connects disparate and decentralized data sources, Cloudera Data Lineage ensures that this federated querying doesn't result in a "black box." The integration provides comprehensive cross system visibility across the entire data estate, mapping out end-to-end data lineage and extracting deep metadata insights across all connected systems. End-to-End Impact and Root Cause Analysis: Querying data across complex, hybrid environments naturally complicates troubleshooting and change management. While other federated query providers (like Starburst) restrict your visibility to just the first level of lineage on immediate systems, Cloudera Data Lineage delivers complete, multi-layered traceability. Data teams can perform instant impact analysis by investigating upstream all the way to the original source systems, and tracing downstream to see the exact impact on the BI reports and AI models supported by Trino. This turns a potentially blind, complex debugging process into a fast, automated, and fully transparent workflow. Example: If a data engineer needs to drop or rename a column in an on-prem Oracle database, they can instantly trace the lineage to see that this specific column feeds a Trino query powering a critical executive Power BI dashboard. They can proactively update the downstream queries before making the change, entirely preventing dashboard failures and data downtime. Discovery & Business Glossary: By linking the technical metadata from all federated sources to standardized business terms, the integration bridges the gap between IT and business teams. Users can instantly find the data they need and fully understand its business context before running a Trino query, ensuring everyone speaks the same language and operates from a single source of truth. Example: A business analyst wanting to report on "Customer Lifetime Value" no longer has to guess which of the dozens of cryptic, federated tables (e.g., cust_ltv_v2 vs c_val_final) to query. They simply search the Cloudera Data Lineage business glossary for the approved term and are immediately directed to the exact, certified technical tables they should query via Trino. A Trusted Foundation for AI Adoption: For AI and advanced analytics initiatives to succeed, organizations must fundamentally trust the data feeding their models. Trino delivers the expansive data access required for AI, and Cloudera Data Lineage provides the crucial transparency and audit trails needed to verify data origins and transformations. This ensures that AI models are built on reliable, governed data. Example: When deploying a new AI or machine learning model for fraud detection, compliance officers can use the lineage graph to audit the exact Trino data pipelines feeding the model. They can definitively prove to internal risk boards and external regulators that the model relies exclusively on governed, bias-checked source data—accelerating the safe launch of the AI initiative. The integration of Cloudera Data Lineage for Trino creates a robust, open data fabric that accelerates time-to-insight and safely powers modern AI and analytics. To read the technical documentation visit this link and to learn more about features and current capabilities, please visit the new Internal Documentation Portal, which includes technical product deep-dives and configuration guides. A huge thank you to the Engineering teams for their hard work in bringing these enhancements to production! If you have any questions or feedback, please join the conversation on Slack at octopai_technical_questions.

RonPick · ‎01-24-2026

Our goal is to provide a seamless, high-performance environment that bridges the gap between raw data and actionable insights. By integrating AI-driven SQL conversion and strengthening our connector suite, we are empowering teams to work faster, more securely, and with greater visibility across the entire data lifecycle. To better align with Cloudera’s suite of solutions, the Octopai convention and reference will now be labelled Cloudera Data Lineage. All materials referencing Cloudera Data Lineage relate to what was formerly Octopai. New Features This release focuses on three core pillars: advanced cloud-data integration, enterprise-grade security, and robust pipeline connectivity. Enterprise-Grade Authentication The following releases prioritize support for Cloudera authentication protocols. Spark Lineage + Kerberos Authentication Support – Secure your Spark lineage leveraging secure integration with industry-standard Kerberos authentication. Secure Hive & Impala Kerberos Integration – Supported authentication protocols for secure integration. Databricks Ecosystem & AI Intelligence The following releases positions our lineage as a more robust solution over the native Databricks offering! Unity Catalog Integration – Lowering the barrier for Databricks complex data catalog analysis. Enhanced Delta Live Tables and analysis with deepened support for Databricks Delta Tables to ensure high-performance ACID transactions and metadata handling. Supported Lineage for Databricks Notebooks AI that reside outside Unity Catalog. Supporting Lineage of Python jobs operated by Databricks compute engine. Databricks/Hive Metastore (HMS) Connector (Multiple Metastores) – Seamlessly bridge your Hive metadata with Databricks environments. Streaming, Connectivity & Lineage Snowflake Stage Enhancement – Improved support for pipeline lineage, providing end-to-end visibility of data movement into Snowflake. Apache Kafka & Kafka Connect – New, robust connectors to support Kafka lineage. DataStage Sequencers – Enhanced analysis support for IBM InfoSphere DataStage Sequencer jobs to improve ETL orchestration visibility. This is an industry-first integration! Apache NiFi Connector: Full support for NiFi integration with Apache Knox authentication support. Use Cases These enhancements prioritize productivity, security, and transparency. By automating manual tasks and hardening our security posture, we enable data teams to focus on innovation rather than infrastructure. For Data Engineers & Analysts Solve the "metadata management" chaos by leveraging Cloudera Data Lineage as a critical risk-mitigation tool that automatically maps the entire lineage—across Cloudera and external systems like Databricks, Oracle, or Snowflake—instantly revealing what data is obsolete versus what is business-critical. End-to-End Lineage: With enhanced Databricks and Snowflake Stage support, teams can audit and trace data flow with 100% confidence, simplifying compliance and troubleshooting. Seamless Ingestion: Use the new Kafka and NiFi connectors to build real-time pipelines without custom-coding complex integrations. For Enterprise Security Teams Cloudera connector protocols: 5 Cloudera connectors enable unified governance that can be enforced using a single governance model that follows the data, regardless of where it is stored or processed. Unified Governance: The Databricks connector allows for a single source of truth for metadata, reducing the risk of "data silos or partial governance and enabling full Databricks lineage (going beyond Cloudera), giving insight into migration success. To learn more about these features and releases, please visit our [Internal Documentation Portal], which includes technical deep-dives, configuration guides, and demo videos. To receive more details on how you can benefit from these new integrations and enhancements, please reach out to your Cloudera representative.

Online	Offline
Last Visited	‎06-10-2026 05:41 AM

Member Since	‎01-16-2026 05:58 AM
Last Visited	‎06-10-2026 05:41 AM
Posts	5
Kudos received	1

Cloudera Community

Cloudera Data Lineage Custom Lineage Connector Rel...

Cloudera Real time Monitoring for 7.1.9+ and 7.2.1...

Announcing Cloudera Data Lineage for Trino

Cloudera Data Lineage enhancements - Cloudera conn...