Member since
07-29-2024
5
Posts
0
Kudos Received
0
Solutions
08-19-2024
07:03 AM
Hi @ckumar , I couldn't find the command, would it be in the parcels path as shown in the attached printout?
... View more
08-12-2024
06:14 AM
In the development cluster, the NiFi nodes do not show data or connection to Cloudera Manager, but if I access them via WebUI, the nodes appear properly connected and functional. Attached are NiFi logs and screen prints as evidence Previously I was instructed to review the Ranger policies as described in the Cloudera documentation, but the problem persisted, attached are the current Ranger settings. I also found the following error curious: Cookie rejected [hadoop.auth=""u=nifi&p=nifi/v321d186.prevnet@DEVDATALAKE.CDPDTP&t=kerberos&e=1721689709429&s=qJPbU6hDNqLtkL1BokMp...", version:0, domain:v321d175.prevnet, path:/solr/ranger_audits_shard1_replica_n1, expiry:null] Domain attribute "v321d175.prevnet" violates the Netscape cookie specification However, in the prod cluster the same error appears in the logs and I don't have this problem
... View more
Labels:
- Labels:
-
Apache NiFi
08-09-2024
11:03 AM
@smruti I've validated it with the user, and here's what happens. The date field comes in string format and it only comes with 8 fields "Yyyy-mm-dd" and it was converting to time stamp which in this case aggregates the time fields, and that's where the divergence was. Is there anything in the tool that can be done to standardize this or do I need to change the data engineering?
... View more
08-08-2024
08:38 AM
Hello @smruti Here's a printout of a query in the data source table in string format, I'm going to provide a query in ORC format
... View more
08-02-2024
07:49 AM
We are experiencing an issue with converting data from string to timestamp in Hive, where the timestamps undergo unexpected hour changes during conversion and processing. We would like to verify if this could be a bug in the tool or if there are additional configurations required. Process and Steps to Reproduce the Issue: Loading Data into HDFS: 1. We use the hdfs put command to load a positional file into HDFS and store it in a directory accessible via an external table ex_sa_beneficios_mantidos where the data is in string format. 2. We create an external table in Hive to access the data as strings. 3. Converting String to Timestamp and Writing to ORC Table: When converting the string to timestamp, we notice an alteration in the hour of the dt_afast_trabalho field. 4. Verifying the Conversion and Data Reading: When reading the data from the ORC table, the hours in the dt_afast_trabalho field were different from expected. 5. Counting Timestamps by Hour: We performed a count by hour of the timestamps to verify the changes. Spark Configuration in Zeppelin: When initializing Hive in Zeppelin, we used the following parameters: %livy.spark from pyspark_llap import HiveWarehouseSession hive = HiveWarehouseSession.session(spark).build() spark.conf.set("spark.datasource.hive.warehouse.read.mode", value="SECURE_ACCESS") spark.conf.set("spark.datasource.hive.warehouse.load.staging.dir", value="hdfs://pcdpclusterdatalakennha:8020/dados/datalake/zona_temporaria/dadosprovisorios/hwc") spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNoemptyLocation", value="true") spark.conf.set("spark.sql.session.timeZone", "America/Sao_Paulo") Expected Behavior: We expect the hour values of the dt_afast_trabalho field to remain consistent throughout the conversion and processing stages, respecting the America/Sao_Paulo timezone. Current Behavior: We observe that the hour values of the dt_afast_trabalho field are being altered during the process of converting from string to timestamp and subsequent processing in Hive and Spark.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
-
Apache Zeppelin