Hello everyone, first of all I apologise for my English.
I'm facing a big problem between IBM DataStage and HortonWorks Let me first explain IBM DataStage: It's an ETL tool that's some connection types for importing/exporting data from a number of DataSource types.
I'm trying to load data from IBM DataStage 11.7 into Hive using the Hive connector, but I'm encountering some strange behavior:
There are a couple of configurations for the Hive connector, the most important of which is - as I suspected - :
. Record count=2000
. Batch size=2000
for a dataset with 8 columns and almost 1000 rows, data inserted into Hive.
For a dataset with 200 columns and 20 million rows, it behaves strangely:
for 10 columns, works.
For more than 10 columns, the multiplication of the stack size propertiesfails - I mean for 2000, 4000 or 20000 rows - with 'IIS-CONN-DAAPI -00099 Hive_Connector_7,0: java.lang.StringIndexOutOfBoundsException: String index out of bounds: 0 at java.lang.String.substring (String.java: 2667)'
I'm sure this error isn't related to String because with 'Batch size=2000' the job loads almost 2000 rows into the hive table and if I increase the value to 4000 it loads almost 4000 records into the hive table.
Does anyone know the reason for this error?
Thanks a lot