Created 07-13-2017 06:12 PM
Hi,
I have a hive table in hdfs with only one partition in s3.
I can select and read to it but analyze command errors out after around 110 files.
The number of files is close to 300 and the total records in each of the files are around 50k. Now when I decrease the file size the analyze table partition works fine. Also if I just keep all the data in hdfs analyze command works fine, which makes me believe this is an s3 bug. Hive table is in ORC with snappy compression.
Below is the error: 2017-07-07 00:02:11,969 INFO [StatsNoJobTask-Thread-0]: orc.ReaderImpl (ReaderImpl.java:rowsOptions(480)) - Reading ORC rows from s3a://my.hadoop.development/apps/hive/warehouse/testraj.db/test_table/filename=fy16_p08_may16_01_test_detail.csv.gz/000183_0 with {include: null, offset: 0, length: 0} 2017-07-07 00:02:11,969 INFO [StatsNoJobTask-Thread-0]: orc.RecordReaderImpl (RecordReaderImpl.java:<init>(161)) - Reader schema not provided -- using file schema struct<_col0:string,_col1:string,_col2:string,_col3:string,_col4:string,_col5:string,_col6:string,_col7:string,_col8:string,_col9:string,_col10:string,_col11:string,_col12:string,_col13:string,_col14:string,_col15:string,_col16:string,_col17:string,_col18:string,_col19:string,_col20:string,_col21:string,_col22:string,_col23:string,_col24:string,_col25:string,_col26:string,_col27:string,_col28:string,_col29:string,_col30:string,_col31:string,_col32:string,_col33:string,_col34:string,_col35:string,_col36:string,_col37:string,_col38:string,_col39:string,_col40:string,_col41:string,_col42:string,_col43:string,_col44:string,_col45:string,_col46:string,_col47:string,_col48:string,_col49:string,_col50:string,_col51:string,_col52:string,_col53:string,_col54:string,_col55:string,_col56:string,_col57:string,_col58:string,_col59:string,_col60:string,_col61:string,_col62:string,_col63:string,_col64:string,_col65:string,_col66:string,_col67:string,_col68:string,_col69:string,_col70:string,_col71:string,_col72:string,_col73:string,_col74:string,_col75:string,_col76:string,_col77:string,_col78:string,_col79:string,_col80:string,_col81:string,_col82:string,_col83:string,_col84:string,_col85:string,_col86:string,_col87:string,_col88:string,_col89:string,_col90:string,_col91:string,_col92:string,_col93:string,_col94:string,_col95:string,_col96:string,_col97:string,_col98:string,_col99:string,_col100:string,_col101:string,_col102:string,_col103:string,_col104:string,_col105:string,_col106:string,_col107:string,_col108:string,_col109:string,_col110:string,_col111:string,_col112:string,_col113:string,_col114:string,_col115:string,_col116:string,_col117:string,_col118:string,_col119:string,_col120:string,_col121:string,_col122:string,_col123:string,_col124:string,_col125:string,_col126:string,_col127:string,_col128:string,_col129:string,_col130:string,_col131:string,_col132:string,_col133:string,_col134:string,_col135:string,_col136:string,_col137:string,_col138:string,_col139:string,_col140:string,_col141:string,_col142:string,_col143:string,_col144:string,_col145:string,_col146:string,_col147:string,_col148:string,_col149:string,_col150:string,_col151:string,_col152:string,_col153:string,_col154:string,_col155:string,_col156:string,_col157:string,_col158:string,_col159:string,_col160:string,_col161:string,_col162:string,_col163:string,_col164:string,_col165:string,_col166:string,_col167:string,_col168:string,_col169:string,_col170:string,_col171:string,_col172:string,_col173:string,_col174:string,_col175:string,_col176:string,_col177:string,_col178:string,_col179:string,_col180:string,_col181:string,_col182:string,_col183:string,_col184:string,_col185:string,_col186:string,_col187:string,_col188:string,_col189:string,_col190:string,_col191:string,_col192:string,_col193:string,_col194:string,_col195:string,_col196:string,_col197:string,_col198:string,_col199:string,_col200:string,_col201:string,_col202:string,_col203:string,_col204:string,_col205:string,_col206:string,_col207:string,_col208:string,_col209:string,_col210:string,_col211:string,_col212:string,_col213:string,_col214:string,_col215:string,_col216:string,_col217:string,_col218:string,_col219:string,_col220:string,_col221:string,_col222:string,_col223:string,_col224:string,_col225:string,_col226:string,_col227:string,_col228:string,_col229:string,_col230:string,_col231:string,_col232:string,_col233:string,_col234:string,_col235:string,_col236:string,_col237:string,_col238:string,_col239:string,_col240:string,_col241:string,_col242:string,_col243:string,_col244:string,_col245:string,_col246:string,_col247:string,_col248:string,_col249:string,_col250:string,_col251:string,_col252:string,_col253:string,_col254:string,_col255:string,_col256:string,_col257:string,_col258:string,_col259:string,_col260:string,_col261:string,_col262:string,_col263:string,_col264:string,_col265:string,_col266:string,_col267:string,_col268:string,_col269:string,_col270:string,_col271:string,_col272:string,_col273:string,_col274:string,_col275:string,_col276:string,_col277:string,_col278:string,_col279:string,_col280:string,_col281:string,_col282:string,_col283:string,_col284:string,_col285:string> 2017-07-07 00:02:12,496
INFO [StatsNoJobTask-Thread-0]: exec.Task (SessionState.java:printInfo(980)) - [Warning] could not update stats for testraj.test_table{filename=fy16_p08_may16_01_test_detail.csv.gz}.Failed with exception 2017-07-07 00:02:12,496 INFO [StatsNoJobTask-Thread-0]: exec.Task (SessionState.java:printInfo(980)) - [Warning] could not update stats for testraj.test_table{filename=fy16_p08_may16_01_test_detail.csv.gz}.Failed with exception com.amazonaws.AbortedException: at com.amazonaws.internal.SdkFilterInputStream.abortIfNeeded(SdkFilterInputStream.java:51) at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71) at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:373) at org.apache.hadoop.fs.s3a.S3AInputStream.readFully(S3AInputStream.java:579) at org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:111) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractFileTail(ReaderImpl.java:378) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:321) at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:241) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1329) at org.apache.hadoop.hive.ql.exec.StatsNoJobTask$StatsCollection.run(StatsNoJobTask.java:156) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
Appreciate your response.
Thanks & regards,
Rajdeep
Created 07-13-2017 09:45 PM
Can you share the logs? It means thread is interrupted.
Created 07-14-2017 05:41 PM
@Rajesh Balamohan these error is excerpted from the logs.