About LSIMS

LSIMS · ‎08-15-2025

So this mean that the default configuration is the most correct configuration? I would assume the configuration is supposed to ensure these core functionalities work 100%.

LSIMS · ‎08-13-2025

But this is not supposed to be configured out of the box? The aggregation logs are not using the service principals and built-in authentication?

LSIMS · ‎07-10-2025

Hi everyone, I have been facing intermittent issues with a delete statement on a iceberg table v2. Below is the error I am having and I dont know what is causing it. Had anyone gone through the same issue? Thank you [HiveServer2-Background-Pool: Thread-1255]: Vertex failed, vertexName=Reducer 2, vertexId=vertex_1752069399412_0193_1_01, diagnostics=[Task failed, taskId=task_1752069399412_0193_1_01_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Node: xxxxx-compute0.xxxxx.ysv060.a0.cloudera.site/10.XX.XX.XX : Error while running task ( failure ) : attempt_1752069399412_0193_1_01_000000_0:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing vector batch (tag=0) (vectorizedVertexNum 1) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:351) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:280) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:84) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:70) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:70) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:40) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing vector batch (tag=0) (vectorizedVertexNum 1) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:411) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:259) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:318) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:297) ... 16 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing vector batch (tag=0) (vectorizedVertexNum 1) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:519) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:402) ... 19 more Caused by: java.lang.UnsupportedOperationException: writePage with SizeStatistics is not implemented at org.apache.parquet.column.page.PageWriter.writePage(PageWriter.java:99) at org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:69) at org.apache.parquet.column.impl.ColumnWriterBase.writePage(ColumnWriterBase.java:378) at org.apache.parquet.column.impl.ColumnWriteStoreBase.sizeCheck(ColumnWriteStoreBase.java:242) at org.apache.parquet.column.impl.ColumnWriteStoreBase.endRecord(ColumnWriteStoreBase.java:227) at org.apache.parquet.column.impl.ColumnWriteStoreV1.endRecord(ColumnWriteStoreV1.java:29) at org.apache.iceberg.parquet.ParquetWriter.add(ParquetWriter.java:139) at org.apache.iceberg.deletes.PositionDeleteWriter.write(PositionDeleteWriter.java:64) at org.apache.iceberg.deletes.PositionDeleteWriter.write(PositionDeleteWriter.java:35) at org.apache.iceberg.io.RollingFileWriter.write(RollingFileWriter.java:90) at org.apache.iceberg.io.ClusteredWriter.write(ClusteredWriter.java:104) at org.apache.iceberg.mr.hive.writer.HiveIcebergDeleteWriter.write(HiveIcebergDeleteWriter.java:66) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1176) at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:111) at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:968) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:503) ... 20 more

LSIMS · ‎07-09-2025

I am using Cloudera CDP 7.2.18... Where to install the library? I have installed on the nodes and restarted Zeppelin service but still cannot use for example Numpy...

LSIMS · ‎07-09-2025

how to use Numpy?

LSIMS · ‎07-09-2025

Hi everyone, I am having an issue where the aggregated logs are not being generated/saved for Spark Jobs. There is the below error but I don't know what can be causing this. The user has permissions on Ranger for ADLS. Failed to setup application log directory for application_1751535521169_0953 Failed to acquire a SAS token for get-status on /oplogs/yarn-app-logs/csso_luis.simoes/bucket-logs-ifile/0953/application_1751535521169_0953 due to org.apache.hadoop.security.AccessControlException: org.apache.ranger.raz.intg.RangerRazException: <!doctype html><html lang="en"><head><title>HTTP Status 401 – Unauthorized</title><style type="text/css">body {font-family:Tahoma,Arial,sans-serif;} h1, h2, h3, b {color:white;background-color:#525D76;} h1 {font-size:22px;} h2 {font-size:16px;} h3 {font-size:14px;} p {font-size:12px;} a {color:black;} .line {height:1px;background-color:#525D76;border:none;}</style></head><body><h1>HTTP Status 401 – Unauthorized</h1><hr class="line" /><p><b>Type</b> Status Report</p><p><b>Message</b> Authentication required</p><p><b>Description</b> The request has not been applied to the target resource because it lacks valid authentication credentials for that resource.</p><hr class="line" /><h3>Apache Tomcat/8.5.100</h3></body></html>; HttpStatus: 401 at org.apache.hadoop.fs.azurebfs.services.AbfsClient.appendSASTokenToQuery(AbfsClient.java:1233) at org.apache.hadoop.fs.azurebfs.services.AbfsClient.appendSASTokenToQuery(AbfsClient.java:1199) at org.apache.hadoop.fs.azurebfs.services.AbfsClient.getPathStatus(AbfsClient.java:905) at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getFileStatus(AzureBlobFileSystemStore.java:1007) at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:729) at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:719) at org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileController.checkExists(LogAggregationFileController.java:530) at org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileController$1.run(LogAggregationFileController.java:479) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) at org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileController.createAppDir(LogAggregationFileController.java:460) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:273) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:223) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:366) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:69) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:267) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:157) at java.lang.Thread.run(Thread.java:750) Any help would be very much appreciated.

LSIMS · ‎07-08-2025

Hi everyone, Currently trying to create a pySpark application that requires Numpy. I have tried to install it on every worker and master nodes but without success since the notebook always returns the error that it does not exist. Had anyone done this? I believe this should be something rather easy, but my experience is probably not enough to get it at this point. Thanks

LSIMS · ‎06-20-2025

As an update, this is not a Kafka related issue. The same situation happen with mappings using Hive, HDFS or others. If someone had ever similar situation please let me know.

LSIMS · ‎05-31-2025

Hi everyone, I'm currently building my first Informatica mapping, which is designed to read XML documents from a Kafka topic and store them in an HDFS location. Since I'm still new to both Informatica and Cloudera, I’d appreciate your guidance on a few issues I’m facing. Setup: Cloudera version: 7.2.18 (Public Cloud) Authentication: I'm using my user keytab and a KDC/FreeIPA certificate. I’ve also created a jaas_client.conf file that allows Kafka access. This setup works fine within the Informatica Developer tool when using the files on the Informatica server. Issue 1: I'm struggling to pass these authentication files (keytab, certificate, JAAS config) to the Spark execution context so that Spark can connect to Kafka and HDFS. I manually copied the files to the /tmp directory of the master and worker nodes, but I’m unsure if this is the correct approach. Question: Is manually copying these files to Spark nodes the recommended method, or should Informatica handle this automatically when submitting the job? Issue 2: Occasionally, my job fails with the following error on certain nodes: Caused by: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via: [TOKEN, KERBEROS] This seems to indicate an authentication failure, possibly related to the way credentials are being propagated or used. Any tips, best practices, or clarifications would be greatly appreciated! Thanks in advance for your support.

LSIMS · ‎05-30-2025

How is this configured if using a tool like Informatica DEI? Where do we configure the submission of these files and how we make sure the JAAS path to the keytab is correct? Thank you

Online	Offline
Last Visited	‎08-04-2025 12:07 AM

Member Since	‎05-30-2025 06:06 AM
Last Visited	‎08-04-2025 12:07 AM
Posts	10

Cloudera Community

Re: Error generating aggregated logs for Spark App...

Re: Error generating aggregated logs for Spark App...

Error when deleting data from Iceberg Table v2 on ...

Re: How could I use pandas library in Pyspark in Z...

Re: Adding libraries to Zeppelin

Error generating aggregated logs for Spark Applica...

How to import Numpy and other libraries when using...

Re: Using Spark and Kafka through Informatica Stre...

Using Spark and Kafka through Informatica Streamin...

Re: Does Jaas.conf file needs to be in local path ...