Community Articles

Find and share helpful community-sourced technical articles.
avatar
Expert Contributor

Cloudera DataFlow (CDF) is an integrated data management platform designed to handle real-time streaming data pipelines. It allows organizations to ingest, process, analyze, and distribute data in motion across various environments such as on-premises, public clouds, or hybrid architectures.

Cloudera CDF leverages Kubernetes as the foundational infrastructure for deploying, managing, and scaling data flow applications.
Cloudera CDF also uses NiFi as the core engine for data movement, transformation, and management. 

When developing or troubleshooting DataFlows you might need to set processors on DEBUG mode.
NiFi uses Logback as the default logging framework to handle logging of application activities  and error messages.

With the below examples you can set a processor in to DEBUG mode by just knowing the processor name.

The below example assumes that you are pointing to your appropriate kubeconfig file a an environment variable on your shell named KUBECONFIG.
However you can alway add the --kubeconfig=/path/to/your/kubeconfig-file flag


You will need to know what name space your DataFlow is on:

kubectl get namespaces

*** Deployed DataFlows will have the naming convention of dfx-<Flow name>-ns
*** If you need to set a DataFlow on DEBUG using Flow Designer the naming convention will be dfx-ts-<numbers>-ns
You will also need to know the name of the processor you want to set to DEBUG
Finally after running below command your processor will begin to log on DEBUG after 30 seconds

The below is a one line command that will set DEBUG on the processor you want and on the pod named dfx-nifi-0

*** If your DataFlow autoscales you can set it on other pods that increment by the number 1

Just change the text in RED to meet your environment.

In the below example I am setting the processor queryrecord to DEBUG

kubectl exec -n dfx-logback-test-ns dfx-nifi-0 -c nifi -- bash -c \
"CLASS=\$(zcat /opt/nifi/nifi-current/data/flow.json.gz | \
jq -r '.. | .processors? // empty | .[].type | select(test(\"queryrecord\"; \"i\"))' | \
sort | uniq); sed -i '/<logger name=\"org.apache.nifi\" \
level=\"INFO\"\/>/a \ <logger name=\"'\$CLASS'\" level=\"DEBUG\"\/>' \
/opt/nifi/nifi-current/conf/logback.xml"

 

325 Views
0 Kudos