Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

insert data into Hive from IBM DataStage

avatar
New Contributor

Hello everyone, first of all I apologise for my English.
I'm facing a big problem between IBM DataStage and HortonWorks Let me first explain IBM DataStage: It's an ETL tool that's some connection types for importing/exporting data from a number of DataSource types.
I'm trying to load data from IBM DataStage 11.7 into Hive using the Hive connector, but I'm encountering some strange behavior:

There are a couple of configurations for the Hive connector, the most important of which is - as I suspected - :
. Record count=2000
. Batch size=2000

for a dataset with 8 columns and almost 1000 rows, data inserted into Hive.

For a dataset with 200 columns and 20 million rows, it behaves strangely:

for 10 columns, works.

For more than 10 columns, the multiplication of the stack size propertiesfails - I mean for 2000, 4000 or 20000 rows - with 'IIS-CONN-DAAPI -00099 Hive_Connector_7,0: java.lang.StringIndexOutOfBoundsException: String index out of bounds: 0 at java.lang.String.substring (String.java: 2667)'

I'm sure this error isn't related to String because with 'Batch size=2000' the job loads almost 2000 rows into the hive table and if I increase the value to 4000 it loads almost 4000 records into the hive table.

Does anyone know the reason for this error?

Thanks a lot

0 REPLIES 0