About ywu

ywu · ‎11-07-2024

JSON_EXTRACT function in the QueryRecord processor may not be interpreting Src_obj__event_metadata as a JSON object. Instead, it likely sees Src_obj__event_metadata as a plain string, so it cannot directly access the "$.timestamp" field. We may need to use EvaluateJsonPath processor first to extract timestamp from Src_obj__event_metadata into a new attribute: • Destination: flowfile-content •Return Type: json •JSON Path Expression: Use the following configuration in the Properties tab: Property Value timestamp $.Src_obj__event_metadata.timestamp Once we extracted timestamp as a separate column, then we could call it directly in QueryRecord processor: SELECT * FROM flowfile ORDER BY timestamp ASC

ywu · ‎11-06-2024

If CAST and JSON_PARSE functions are not supported in the Nifi processor you're using, we may try extracting the timestamp value as a string and sorting alphabetically SELECT * FROM flowfile ORDER BY JSON_EXTRACT_SCALAR(Src_obj__event_metadata, "$.timestamp") ASC

ywu · ‎11-06-2024

The field Src_obj__event_metadata is a JSON string, so to access fields within it, you might need to parse it into a JSON object first. Some systems may require you to explicitly parse JSON strings before extracting fields. Please try: SELECT * FROM flowfile ORDER BY CAST(JSON_EXTRACT(Src_obj__event_metadata, "$.timestamp") AS TIMESTAMP) ASC

ywu · ‎07-29-2024

Hi, Please see if you could access: http(s)://<CDSW_DOAMIN>/<USER>/<PROJECT>/settings/delete If it does then use the delete button here to delete it. If you cannot access the page then we can only remove it from the backend psql db: # kubectl exec -it $(kubectl get pods -l role=db -o jsonpath='{.items[*].metadata.name}') -- psql -P pager=off -U sense Find the project table and remove the related entry. Note: please be very careful when you're operating the backend db. Incorrect operation may cause irreversible loss.

ywu · ‎02-29-2024

The Kafka broker address is specified as "localhost". Is the broker running on the same host as the producer？I guess not. prop.setProperty("bootstrap.servers", "localhost:9092"); Please use IP address or hostname of the host where your Kafka broker is running and try again.

ywu · ‎02-28-2024

Hello @Andy1989 , "socket connection setup timeout" sounds like some network issue on client side. May I know you specify Kafka broker address and port in your code? Is the address solvable from client side and is the port number of Kafka broker correct?

ywu · ‎03-10-2023

Hi @Hyeonseo , If we print package locations from predict.py respectively: python -c 'import site; print(site.getsitepackages())' do you find the result is different? If we add all the package locations is there still such issue?

ywu · ‎01-30-2023

May I know what values you have set for below properties: yarn.nodemanager.local-dirs yarn.nodemanager.log-dirs Also, please make sure you don't have noexec or nosuid flags set on the corresponding disk. You may check this using "mount" command.

ywu · ‎06-17-2022

Hi, this is most likely due to some long running jobs like Spark Streaming, which will continuously generate logs while they are running. We need to modify the log level on the application side. Still, taking Spark Streaming as example, we can add rolling appender in the log4j.properties for the application, so the job will rotate the logs with limited size you set in the log4j file. For details steps please refer: https://my.cloudera.com/knowledge/Long-running-Spark-streaming-applications-s-YARN-container?id=90615 https://my.cloudera.com/knowledge/Video-KB-How-to-configure-log4j-for-Spark-on-YARN-cluster?id=271201 Regarding other types of jobs it's similar, we need to let application team tune the log level so they will not generate indefinite amount of logs.

ywu · ‎04-22-2022

Hi @reca., May I know have you specified Kerberos principal and keytab in your Flume conf: https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cm_sg_use_subs_vars_s11.html If you have many long running jobs we would recommend you to increase default HDFS Delegation token max lifetime and renew time: add following properties into “HDFS Service Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml” dfs.namenode.delegation.token.max-lifetime 604800000 (7days) -> increase to 30 days dfs.namenode.delegation.token.renew-interval 86400000 (1days) -> increase to 30 days You can set max-lifetime to even 1 year and the renew interval just need to be equal or smaller than max-lifetime.

Online	Offline
Last Visited	‎03-07-2025 02:12 AM

Member Since	‎12-14-2020 04:49 AM
Last Visited	‎03-07-2025 02:12 AM
Posts	95
Kudos received	8

Cloudera Community

Re: How can I sort record in parquet file?

Re: Disconnecting from node due to socket connecti...

Re: [CDSW] Package import error during model deplo...

Re: How can I sort record in parquet file?

Re: How can I sort record in parquet file?

Re: How can I sort record in parquet file?

Re: cdsw page not founds

Re: Disconnecting from node due to socket connecti...

Re: Disconnecting from node due to socket connecti...

Re: [CDSW] Package import error during model deplo...

Re: Map Reduce Job fails (exited with exitCode: -...

Re: How to configure YARN to limit YARN applicatio...

Re: Dataload Controller Error