Member since
07-26-2019
68
Posts
30
Kudos Received
10
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
15351 | 01-22-2019 02:40 PM | |
2640 | 01-09-2019 03:10 PM | |
2192 | 08-30-2018 08:14 PM | |
1655 | 08-06-2018 07:58 PM | |
4788 | 07-16-2018 07:05 PM |
07-22-2023
01:40 AM
Thanks for the Awesome information!
... View more
05-07-2020
08:43 AM
Here is the very good explanation of hive view and hue replace in hdp 3.0 https://hadoopcdp.com/data-analytics-studio-das-replace-of-hue-hive-views-in-cdp/
... View more
12-29-2019
09:39 PM
I am unable to find libfb303 in zeppelin 0.8.2, can you please help?
... View more
11-28-2019
04:41 AM
Hi wcdata, I took your advice and redesigned my workflow but even with 10 records the PutDatabaseRecord loads and loads. I suspect that the Translate Field Names setting is to blame. Because my source columns are capitalized and my target columns are small. I don't want to define a schema here.
... View more
10-21-2019
01:09 PM
Thanks, @HorizonNet - I've updated the article with the correction!
... View more
10-07-2019
12:56 PM
Hi - It looks like this error is being caused by having two persistence providers in the providers.xml. You may need to add the file-based provider properties (providers.flowPersistenceProvider.file-provider.property.Flow Storage Directory and xml.providers.flowPersistenceProvider.file-provider.class) to the nifi.registry.providers.ignored list so that they are not added to the xml file when CM generates it. Here is an article detailing the process: https://community.cloudera.com/t5/Community-Articles/Configuring-the-Git-Persistence-Provider-for-the-NiFi/ta-p/278867
... View more
09-12-2019
09:01 AM
1 Kudo
OBJECTIVE: Add more useful service-level metrics for NiFi in Cloudera Flow Management (CFM)/Cloudera DataFlow (CDF). OVERVIEW: The default install of early releases of Cloudera Flow Management (CFM) includes only a few basic metrics for NiFi in the Cloudera Manager dashboards. This article explains how to add several more service and host level metrics that improve the performance monitoring in Cloudera Manager 5.x or 6.x. PREREQUISITES: CFM (CDF) 1.0.1 or later and CM 5.16 or CM 6.2 or later ADDING NiFi METRICS TO CM Basic Metrics 1. From the NiFi status page in Cloudera Manager, select “Chart Builder” from the “Charts” menu. 2. Insert one of the following SELECT statements in the query text box: NiFi Memory SELECT total_mem_rss_across_nifi_nodes, total_mem_virtual_across_nifi_nodes WHERE category = SERVICE and entityName = $SERVICENAME CPU Rates SELECT cpu_user_rate_across_nifi_nodes, cpu_system_rate_across_nifi_nodes WHERE entityName = $SERVICENAME AND category = SERVICE NiFi Bytes Written SELECT write_bytes_rate_across_nifi_nodes, avg(write_bytes_rate_across_nifi_nodes) WHERE entityName = $SERVICENAME AND category = SERVICE 3. Select a chart type (“Line” works well for fluid metrics like memory and CPU) 4. Click “Build Chart” 5. Update the Title (e.g., “CPU Rates”) 6. Select “All Combined” to place all selected metrics on the same chart. Select “All Separate” to add a separate chart for each metric. (NOTE: You may see a syntax error related to the $SERVICENAME variable not being available in the builder outside of a dashboard. This may be ignored and will resolve after saving) Screenshot of new chart before saving 7. Click “Save” and select the NiFi Status Page dashboard (listed under CDH5 or CDH6, depending upon release) Screenshot of CM chart save operation 8. Open the NiFi Status Page and confirm that your new metric is added Screenshot with new NiFi metrics charts added Stacked Metrics: For some metrics, such as memory utilization, a stacked presentation better represents total usage across the service. 1. For stacked metrics, add the SELECT statement as before, for example: CPU Rates Stacked SELECT cpu_user_rate_across_nifi_nodes, cpu_system_rate_across_nifi_nodes WHERE entityName = $SERVICENAME AND category = SERVICE 2. Select “All Combined” 3. Select a stacked format, such as “Stack Area” 4. Build the chart CM chart builder, stacked area memory 5. Save the chart to the appropriate dashboard as above Conclusion: Once you have added the metrics, experiment with other metric categories, filters, and aggregation to develop a dashboard that suits their needs. For additional metrics reporting capabilities, consider adding a Reporting Controller for DataDog or AppDynamics to push NiFi metrics to one of these general purpose SEIM/Operations tools.
... View more
01-04-2019
03:24 AM
11 Kudos
Image Data Flow for
Industrial Imaging OBJECTIVE: Ingest and store
manufacturing quality assurance images, measurements, and metadata in a
cost-effective and simple-to-retrieve-from platform that can provide analytic
capability in the future. OVERVIEW: In high-speed
manufacturing, imaging systems may be used to identify material imperfections,
monitor thermal state, or identify when tolerances are exceeded. Many
commercially-available systems automate measurement and reporting of specific
tests, but combining results from multiple instrumentation vendors, longer-term
storage, process analytics, and comprehensive auditability require different
technology. Using HDF’s NiFi and
HDP’s HDFS, Hive or Hbase, and Zeppelin, one can build a cost-effective and performant
solution to store and retrieve these images, as well as provide a platform for
machine learning based on that data. Sample files and code, including the Zeppelin notebook, can be found on this github repository: https://github.com/wcbdata/materials-imaging PREREQUISITES: HDF 3.0 or later
(NiFi 1.2.0.3+) HDP 2.6.5 or later
(Hadoop 2.6.3+ and Hive 1.2.1+) Spark
2.1.1.2.6.2.0-205 or later Zeppelin 0.7.2+ STEPS: Get the files to a
filesystem accessible to NiFi. In this case, we are assuming the source system
can get the files to a local directory (e.g., via an NFS mount). Ingest the image and
data files to long-term storage Use a ListFile
processor to scrape the directory. In this example, the collected data files
are in a root directory, with each manufacturing run’s image files placed in a
separate subdirectory. We’ll use that location to separate out the files later Use a FetchFile to
pull the files listed in the flowfile generated by our ListFile. FetchFile can
move all the files to an archive directory once they have been read. Since we’re using a
different path for the images versus the original source data, we’ll split the
flow using an UpdateAttribute to store the file type, then route them to two
PutHDFS processors to place them appropriately. Note that PutHDFS_images uses the
automatically-parsed original ${path} to reproduce the source folder structure. Parse the data files
to make them available for SQL queries Beginning with only
the csv flowfiles, the ExtractGrok processor is used to pick one field from the
second line of the flowfile (skipping the header row). This field is referenced
by expression language that sets the schema name we will use to parse the flowfile. A RouteOnAttribute
processor checks the schema name using a regex to determine whether it the
flowfile format is one that requires additional processing to parse. In the
example, flowfiles identified as the “sem_meta_10083” schema are routed to the
processor group “Preprocess-SEM-particle.” This processor group contains the
steps for parsing nested arrays within the csv flowfile. Within the
“Preprocess-SEM-particle” processor group, the flowfile is parsed using a
temporary schema. A temporary schema can be helpful to parse some sections of a
flowfile row (or tuple) while leaving others for later processing. The flowfile is
split into individual records by a SplitRecord processor. SplitRecord is
similar to a SplitJSON or SplitText processor, but it uses NiFi’s
record-oriented parsers to identify each record rather than relying strictly on
length or linebreaks. A JoltTransform uses
the powerful JOLT language to parse a section of the csv file with nested
arrays. In this case, a semicolon-delimited array of comma-separated values is
reformatted to valid JSON then split out into separate flowfiles by an
EvaluateJSONPath processor. This JOLT transform uses an interesting combination
of JOLT wildcards and repeated processing of the same path to handle multiple
possible formats. Once formatted by a
record-oriented processor such as ConvertRecord or SplitRecord, the flowfile
can be reformatted easily as Avro, then inserted into a Hive table using a
PutHiveStreaming processor. PutHiveStreaming can be configured to ignore extra
fields in the source flowfile or target Hive table so that many overlapping
formats can be written to a table with a superset of columns in Hive. In this
example, the 10083-formatted flowfiles are inserted row-by-row, and the
particle and 10021-formatted flowfiles are inserted in bulk. Create a simple interface to retrieve individual images for review. The browser-based Zeppelin notebook can natively render images stored in SQL tables or in HDFS. The notebook begins
with some basic queries to view the data loaded from the imaging subsystems. The first example
paragraph uses SQL to pull a specific record from the manufacturing run, then
looks for the matching file on HDFS by its timestamp. The second set of
example paragraphs use an HTML/Angular form to collect the information, then
display the matching image. The third set of
sample paragraphs demonstrates how to obtain the image via Scala for analysis
or display. RELATED POSTS: JOLT transformation quick reference FUTURE POSTS: Storing microscopy
data in HDF5/USID Schema to make it available for analysis using standard
libraries Applying TensorFlow
to microscopy data for image analysis
... View more
09-11-2018
01:14 AM
OBJECTIVE: Resolve issues with some lightweight LDAP services such as the HDP Demo LDAP Provider OVERVIEW: Some LDAP services do not properly support paging for LDAP query results. In order to support these LDAP services, paging of results needs to be disabled in the LDAP provider properties for the NiFi Registry service SYMPTOM: The NiFi Registry does not respond or times out, and the following error is seen repeatedly in the NiFi Registry log (nifi-registry-app.log): 2018-09-01 01:01:05,189 INFO [main] o.s.l.c.AbstractRequestControlDirContextProcessor No matching response control found - looking for 'class javax.naming.ldap.PagedResultsResponseControl
2018-09-01 01:01:05,274 INFO [main] o.s.l.c.AbstractRequestControlDirContextProcessor No matching response control found - looking for 'class javax.naming.ldap.PagedResultsResponseControl
2018-09-01 01:01:05,359 INFO [main] o.s.l.c.AbstractRequestControlDirContextProcessor No matching response control found - looking for 'class javax.naming.ldap.PagedResultsResponseControl
RESOLUTION: Using Ambari, under the configs tab for NiFi Registry, navigate to the Advanced nifi-registry-authorizers-env section. Edit the "Template for authorizers.xml" value to remove the Page Size property. Removing this property will disable paging for LDAP queries in this identity provider. Be aware that for extremely large result sets, this can result in a connection timeout. After saving the change, restart the Nifi Registry service.
... View more
Labels:
07-17-2018
02:03 PM
Excellent - glad that's working! There is a patch being tested for this issue here, too: https://issues.apache.org/jira/browse/ZEPPELIN-3128
... View more