Member since
01-23-2022
1
Post
0
Kudos Received
0
Solutions
03-04-2024
03:30 AM
1 Kudo
@vhp1360 Given the behavior you've observed with different batch sizes and column counts, it's possible that there is a memory or resource constraint causing the error when dealing with a large number of columns and rows. Here are some potential causes and troubleshooting steps to consider: Memory Constraints: Loading a dataset with 200 columns and 20 million rows can require a significant amount of memory, especially if each column contains large amounts of data. Ensure that the system running IBM DataStage has sufficient memory allocated to handle the processing requirements. Configuration Limits: Check if there are any configuration limits or restrictions in the IBM DataStage or Hive connector settings that might be causing the issue. For example, there could be a maximum allowed stack size or buffer size that is being exceeded when processing large datasets. Resource Utilization: Monitor the resource utilization (CPU, memory, disk I/O) on the system running IBM DataStage during the data loading process. High resource utilization or contention could indicate a bottleneck that is causing the error. Optimization Techniques: Consider optimizing the data loading process by adjusting parameters such as batch size, record count, or buffer size. Experiment with different configurations to find the optimal settings that can handle the larger dataset without encountering errors. Data Format Issues: Verify that the data format and schema of the dataset are consistent and compatible with the Hive table schema. Data inconsistencies or mismatches could potentially cause errors during the loading process. Regards, Chethan YM
... View more