About wcbdata

Althotta · ‎03-03-2025

@wcbdata , @HorizonNet Does it support one Git and multiple NIFI registry?. We have a 2 nifi registries one in PROD and other in DEV Zone. Commit from either directions.

AmandaJansen · ‎07-22-2023

Thanks for the Awesome information!

maheshchimmiri · ‎05-07-2020

Here is the very good explanation of hive view and hue replace in hdp 3.0 https://hadoopcdp.com/data-analytics-studio-das-replace-of-hue-hive-views-in-cdp/

mahmedrao · ‎12-29-2019

I am unable to find libfb303 in zeppelin 0.8.2, can you please help?

maebert · ‎11-28-2019

Hi wcdata, I took your advice and redesigned my workflow but even with 10 records the PutDatabaseRecord loads and loads. I suspect that the Translate Field Names setting is to blame. Because my source columns are capitalized and my target columns are small. I don't want to define a schema here.

wcbdata · ‎10-07-2019

Hi - It looks like this error is being caused by having two persistence providers in the providers.xml. You may need to add the file-based provider properties (providers.flowPersistenceProvider.file-provider.property.Flow Storage Directory and xml.providers.flowPersistenceProvider.file-provider.class) to the nifi.registry.providers.ignored list so that they are not added to the xml file when CM generates it. Here is an article detailing the process: https://community.cloudera.com/t5/Community-Articles/Configuring-the-Git-Persistence-Provider-for-the-NiFi/ta-p/278867

wcbdata · ‎09-12-2019

OBJECTIVE: Add more useful service-level metrics for NiFi in Cloudera Flow Management (CFM)/Cloudera DataFlow (CDF). OVERVIEW: The default install of early releases of Cloudera Flow Management (CFM) includes only a few basic metrics for NiFi in the Cloudera Manager dashboards. This article explains how to add several more service and host level metrics that improve the performance monitoring in Cloudera Manager 5.x or 6.x. PREREQUISITES: CFM (CDF) 1.0.1 or later and CM 5.16 or CM 6.2 or later ADDING NiFi METRICS TO CM Basic Metrics 1. From the NiFi status page in Cloudera Manager, select “Chart Builder” from the “Charts” menu. 2. Insert one of the following SELECT statements in the query text box: NiFi Memory SELECT total_mem_rss_across_nifi_nodes, total_mem_virtual_across_nifi_nodes WHERE category = SERVICE and entityName = $SERVICENAME CPU Rates SELECT cpu_user_rate_across_nifi_nodes, cpu_system_rate_across_nifi_nodes WHERE entityName = $SERVICENAME AND category = SERVICE NiFi Bytes Written SELECT write_bytes_rate_across_nifi_nodes, avg(write_bytes_rate_across_nifi_nodes) WHERE entityName = $SERVICENAME AND category = SERVICE 3. Select a chart type (“Line” works well for fluid metrics like memory and CPU) 4. Click “Build Chart” 5. Update the Title (e.g., “CPU Rates”) 6. Select “All Combined” to place all selected metrics on the same chart. Select “All Separate” to add a separate chart for each metric. (NOTE: You may see a syntax error related to the $SERVICENAME variable not being available in the builder outside of a dashboard. This may be ignored and will resolve after saving) Screenshot of new chart before saving 7. Click “Save” and select the NiFi Status Page dashboard (listed under CDH5 or CDH6, depending upon release) Screenshot of CM chart save operation 8. Open the NiFi Status Page and confirm that your new metric is added Screenshot with new NiFi metrics charts added Stacked Metrics: For some metrics, such as memory utilization, a stacked presentation better represents total usage across the service. 1. For stacked metrics, add the SELECT statement as before, for example: CPU Rates Stacked SELECT cpu_user_rate_across_nifi_nodes, cpu_system_rate_across_nifi_nodes WHERE entityName = $SERVICENAME AND category = SERVICE 2. Select “All Combined” 3. Select a stacked format, such as “Stack Area” 4. Build the chart CM chart builder, stacked area memory 5. Save the chart to the appropriate dashboard as above Conclusion: Once you have added the metrics, experiment with other metric categories, filters, and aggregation to develop a dashboard that suits their needs. For additional metrics reporting capabilities, consider adding a Reporting Controller for DataDog or AppDynamics to push NiFi metrics to one of these general purpose SEIM/Operations tools.

wcbdata · ‎01-04-2019

Image Data Flow for Industrial Imaging OBJECTIVE: Ingest and store manufacturing quality assurance images, measurements, and metadata in a cost-effective and simple-to-retrieve-from platform that can provide analytic capability in the future. OVERVIEW: In high-speed manufacturing, imaging systems may be used to identify material imperfections, monitor thermal state, or identify when tolerances are exceeded. Many commercially-available systems automate measurement and reporting of specific tests, but combining results from multiple instrumentation vendors, longer-term storage, process analytics, and comprehensive auditability require different technology. Using HDF’s NiFi and HDP’s HDFS, Hive or Hbase, and Zeppelin, one can build a cost-effective and performant solution to store and retrieve these images, as well as provide a platform for machine learning based on that data. Sample files and code, including the Zeppelin notebook, can be found on this github repository: https://github.com/wcbdata/materials-imaging PREREQUISITES: HDF 3.0 or later (NiFi 1.2.0.3+) HDP 2.6.5 or later (Hadoop 2.6.3+ and Hive 1.2.1+) Spark 2.1.1.2.6.2.0-205 or later Zeppelin 0.7.2+ STEPS: Get the files to a filesystem accessible to NiFi. In this case, we are assuming the source system can get the files to a local directory (e.g., via an NFS mount). Ingest the image and data files to long-term storage Use a ListFile processor to scrape the directory. In this example, the collected data files are in a root directory, with each manufacturing run’s image files placed in a separate subdirectory. We’ll use that location to separate out the files later Use a FetchFile to pull the files listed in the flowfile generated by our ListFile. FetchFile can move all the files to an archive directory once they have been read. Since we’re using a different path for the images versus the original source data, we’ll split the flow using an UpdateAttribute to store the file type, then route them to two PutHDFS processors to place them appropriately. Note that PutHDFS_images uses the automatically-parsed original ${path} to reproduce the source folder structure. Parse the data files to make them available for SQL queries Beginning with only the csv flowfiles, the ExtractGrok processor is used to pick one field from the second line of the flowfile (skipping the header row). This field is referenced by expression language that sets the schema name we will use to parse the flowfile. A RouteOnAttribute processor checks the schema name using a regex to determine whether it the flowfile format is one that requires additional processing to parse. In the example, flowfiles identified as the “sem_meta_10083” schema are routed to the processor group “Preprocess-SEM-particle.” This processor group contains the steps for parsing nested arrays within the csv flowfile. Within the “Preprocess-SEM-particle” processor group, the flowfile is parsed using a temporary schema. A temporary schema can be helpful to parse some sections of a flowfile row (or tuple) while leaving others for later processing. The flowfile is split into individual records by a SplitRecord processor. SplitRecord is similar to a SplitJSON or SplitText processor, but it uses NiFi’s record-oriented parsers to identify each record rather than relying strictly on length or linebreaks. A JoltTransform uses the powerful JOLT language to parse a section of the csv file with nested arrays. In this case, a semicolon-delimited array of comma-separated values is reformatted to valid JSON then split out into separate flowfiles by an EvaluateJSONPath processor. This JOLT transform uses an interesting combination of JOLT wildcards and repeated processing of the same path to handle multiple possible formats. Once formatted by a record-oriented processor such as ConvertRecord or SplitRecord, the flowfile can be reformatted easily as Avro, then inserted into a Hive table using a PutHiveStreaming processor. PutHiveStreaming can be configured to ignore extra fields in the source flowfile or target Hive table so that many overlapping formats can be written to a table with a superset of columns in Hive. In this example, the 10083-formatted flowfiles are inserted row-by-row, and the particle and 10021-formatted flowfiles are inserted in bulk. Create a simple interface to retrieve individual images for review. The browser-based Zeppelin notebook can natively render images stored in SQL tables or in HDFS. The notebook begins with some basic queries to view the data loaded from the imaging subsystems. The first example paragraph uses SQL to pull a specific record from the manufacturing run, then looks for the matching file on HDFS by its timestamp. The second set of example paragraphs use an HTML/Angular form to collect the information, then display the matching image. The third set of sample paragraphs demonstrates how to obtain the image via Scala for analysis or display. RELATED POSTS: JOLT transformation quick reference FUTURE POSTS: Storing microscopy data in HDF5/USID Schema to make it available for analysis using standard libraries Applying TensorFlow to microscopy data for image analysis

wcbdata · ‎09-11-2018

OBJECTIVE: Resolve issues with some lightweight LDAP services such as the HDP Demo LDAP Provider OVERVIEW: Some LDAP services do not properly support paging for LDAP query results. In order to support these LDAP services, paging of results needs to be disabled in the LDAP provider properties for the NiFi Registry service SYMPTOM: The NiFi Registry does not respond or times out, and the following error is seen repeatedly in the NiFi Registry log (nifi-registry-app.log): 2018-09-01 01:01:05,189 INFO [main] o.s.l.c.AbstractRequestControlDirContextProcessor No matching response control found - looking for 'class javax.naming.ldap.PagedResultsResponseControl 2018-09-01 01:01:05,274 INFO [main] o.s.l.c.AbstractRequestControlDirContextProcessor No matching response control found - looking for 'class javax.naming.ldap.PagedResultsResponseControl 2018-09-01 01:01:05,359 INFO [main] o.s.l.c.AbstractRequestControlDirContextProcessor No matching response control found - looking for 'class javax.naming.ldap.PagedResultsResponseControl RESOLUTION: Using Ambari, under the configs tab for NiFi Registry, navigate to the Advanced nifi-registry-authorizers-env section. Edit the "Template for authorizers.xml" value to remove the Page Size property. Removing this property will disable paging for LDAP queries in this identity provider. Be aware that for extremely large result sets, this can result in a connection timeout. After saving the change, restart the Nifi Registry service.

wcbdata · ‎07-17-2018

Excellent - glad that's working! There is a patch being tested for this issue here, too: https://issues.apache.org/jira/browse/ZEPPELIN-3128

Online	Offline
Last Visited	‎01-06-2020 11:34 AM

Member Since	‎07-26-2019 06:38 AM
Last Visited	‎01-06-2020 11:34 AM
Posts	68
Kudos received	28

Cloudera Community

Re: I got "org.apache.hive.service.cli.HiveSQLExce...

Re: is it possible to expose dataset sample data(e...

Re: Zeppelin notebook encryption for local file sy...

Re: Is there Hive storage handler for Postgres?

Re: How to delete a Type in Atlas?

Re: Configuring the Git Persistence Provider for t...

Re: Jolt quick reference for Nifi Jolt Processors

Re: Where is Hive View on HDP 3 ?

Re: Exception (noSuchMethodError) trying to run ML...

Re: PutSQL Processor really slow

Re: Location of Nifi Registry's provider.xml on Cl...

Adding more NiFi metrics in Cloudera Manager

Image Data Flow for Industrial Imaging

Disabling Paging in NiFi Registry LDAP Identity Pr...

Re: Zeppelin Note Permission doesn't work with whe...