Member since
05-30-2025
10
Posts
0
Kudos Received
0
Solutions
08-15-2025
07:03 AM
So this mean that the default configuration is the most correct configuration? I would assume the configuration is supposed to ensure these core functionalities work 100%.
... View more
08-13-2025
05:15 AM
But this is not supposed to be configured out of the box? The aggregation logs are not using the service principals and built-in authentication?
... View more
07-10-2025
01:02 AM
Hi everyone, I have been facing intermittent issues with a delete statement on a iceberg table v2. Below is the error I am having and I dont know what is causing it. Had anyone gone through the same issue? Thank you [HiveServer2-Background-Pool: Thread-1255]: Vertex failed, vertexName=Reducer 2, vertexId=vertex_1752069399412_0193_1_01, diagnostics=[Task failed, taskId=task_1752069399412_0193_1_01_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Node: xxxxx-compute0.xxxxx.ysv060.a0.cloudera.site/10.XX.XX.XX : Error while running task ( failure ) : attempt_1752069399412_0193_1_01_000000_0:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing vector batch (tag=0) (vectorizedVertexNum 1)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:351)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:280)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:84)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:70)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:70)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:40)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing vector batch (tag=0) (vectorizedVertexNum 1)
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:411)
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:259)
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:318)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:297)
... 16 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing vector batch (tag=0) (vectorizedVertexNum 1)
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:519)
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:402)
... 19 more
Caused by: java.lang.UnsupportedOperationException: writePage with SizeStatistics is not implemented
at org.apache.parquet.column.page.PageWriter.writePage(PageWriter.java:99)
at org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:69)
at org.apache.parquet.column.impl.ColumnWriterBase.writePage(ColumnWriterBase.java:378)
at org.apache.parquet.column.impl.ColumnWriteStoreBase.sizeCheck(ColumnWriteStoreBase.java:242)
at org.apache.parquet.column.impl.ColumnWriteStoreBase.endRecord(ColumnWriteStoreBase.java:227)
at org.apache.parquet.column.impl.ColumnWriteStoreV1.endRecord(ColumnWriteStoreV1.java:29)
at org.apache.iceberg.parquet.ParquetWriter.add(ParquetWriter.java:139)
at org.apache.iceberg.deletes.PositionDeleteWriter.write(PositionDeleteWriter.java:64)
at org.apache.iceberg.deletes.PositionDeleteWriter.write(PositionDeleteWriter.java:35)
at org.apache.iceberg.io.RollingFileWriter.write(RollingFileWriter.java:90)
at org.apache.iceberg.io.ClusteredWriter.write(ClusteredWriter.java:104)
at org.apache.iceberg.mr.hive.writer.HiveIcebergDeleteWriter.write(HiveIcebergDeleteWriter.java:66)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1176)
at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:111)
at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:968)
at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:503)
... 20 more
... View more
Labels:
07-09-2025
05:34 AM
I am using Cloudera CDP 7.2.18... Where to install the library? I have installed on the nodes and restarted Zeppelin service but still cannot use for example Numpy...
... View more
07-09-2025
04:03 AM
Hi everyone, I am having an issue where the aggregated logs are not being generated/saved for Spark Jobs. There is the below error but I don't know what can be causing this. The user has permissions on Ranger for ADLS. Failed to setup application log directory for application_1751535521169_0953
Failed to acquire a SAS token for get-status on /oplogs/yarn-app-logs/csso_luis.simoes/bucket-logs-ifile/0953/application_1751535521169_0953 due to org.apache.hadoop.security.AccessControlException: org.apache.ranger.raz.intg.RangerRazException: <!doctype html><html lang="en"><head><title>HTTP Status 401 – Unauthorized</title><style type="text/css">body {font-family:Tahoma,Arial,sans-serif;} h1, h2, h3, b {color:white;background-color:#525D76;} h1 {font-size:22px;} h2 {font-size:16px;} h3 {font-size:14px;} p {font-size:12px;} a {color:black;} .line {height:1px;background-color:#525D76;border:none;}</style></head><body><h1>HTTP Status 401 – Unauthorized</h1><hr class="line" /><p><b>Type</b> Status Report</p><p><b>Message</b> Authentication required</p><p><b>Description</b> The request has not been applied to the target resource because it lacks valid authentication credentials for that resource.</p><hr class="line" /><h3>Apache Tomcat/8.5.100</h3></body></html>; HttpStatus: 401
at org.apache.hadoop.fs.azurebfs.services.AbfsClient.appendSASTokenToQuery(AbfsClient.java:1233)
at org.apache.hadoop.fs.azurebfs.services.AbfsClient.appendSASTokenToQuery(AbfsClient.java:1199)
at org.apache.hadoop.fs.azurebfs.services.AbfsClient.getPathStatus(AbfsClient.java:905)
at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getFileStatus(AzureBlobFileSystemStore.java:1007)
at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:729)
at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:719)
at org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileController.checkExists(LogAggregationFileController.java:530)
at org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileController$1.run(LogAggregationFileController.java:479)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileController.createAppDir(LogAggregationFileController.java:460)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:273)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:223)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:366)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:69)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:267)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:157)
at java.lang.Thread.run(Thread.java:750) Any help would be very much appreciated.
... View more
Labels:
07-08-2025
07:21 AM
Hi everyone, Currently trying to create a pySpark application that requires Numpy. I have tried to install it on every worker and master nodes but without success since the notebook always returns the error that it does not exist. Had anyone done this? I believe this should be something rather easy, but my experience is probably not enough to get it at this point. Thanks
... View more
Labels:
06-20-2025
03:26 AM
As an update, this is not a Kafka related issue. The same situation happen with mappings using Hive, HDFS or others. If someone had ever similar situation please let me know.
... View more
05-31-2025
05:31 AM
Hi everyone, I'm currently building my first Informatica mapping, which is designed to read XML documents from a Kafka topic and store them in an HDFS location. Since I'm still new to both Informatica and Cloudera, I’d appreciate your guidance on a few issues I’m facing. Setup: Cloudera version: 7.2.18 (Public Cloud) Authentication: I'm using my user keytab and a KDC/FreeIPA certificate. I’ve also created a jaas_client.conf file that allows Kafka access. This setup works fine within the Informatica Developer tool when using the files on the Informatica server. Issue 1: I'm struggling to pass these authentication files (keytab, certificate, JAAS config) to the Spark execution context so that Spark can connect to Kafka and HDFS. I manually copied the files to the /tmp directory of the master and worker nodes, but I’m unsure if this is the correct approach. Question: Is manually copying these files to Spark nodes the recommended method, or should Informatica handle this automatically when submitting the job? Issue 2: Occasionally, my job fails with the following error on certain nodes: Caused by: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via: [TOKEN, KERBEROS] This seems to indicate an authentication failure, possibly related to the way credentials are being propagated or used. Any tips, best practices, or clarifications would be greatly appreciated! Thanks in advance for your support.
... View more
Labels:
- Labels:
-
Apache Kafka
-
Apache Spark
-
Kerberos
05-30-2025
06:07 AM
How is this configured if using a tool like Informatica DEI? Where do we configure the submission of these files and how we make sure the JAAS path to the keytab is correct? Thank you
... View more