About Nishan

Nishan · ‎02-01-2019

@Harsh J How would I do this just for one job ?. I tried using below setting but it is not working. The issue is that I want to use a version of jersey which I bundled into my fat jar,however gateway node has an older version of that jar and it loads a class from there resulting in a NosSuchMethodException .My application is not a map reduce job and I run it by using hadoop jar and running on 5.14.4 export HADOOP_USER_CLASSPATH_FIRST=true export HADOOP_CLASSPATH=/projects/poc/test/config:$HADOOP_CLASSPATH

Nishan · ‎11-20-2017

Hello, We have a use case for ingesting binary files from mainframe to HDFS in avro format.These binary files contain different record types that are variable in length .The first 4 bytes denotes the length of the record.I have written a stand alone java program to ingest the data to hdfs using Avro DataFileWriter.Now these files from mainframe are much smaller in size (under a block size) and creates small files . Some of the options we came up with to avoid these are 1. Convert the batch process to more of a service that runs behind the scene ,so the avro datafile writer can keep running and flush the data based on certain interval (time/size ) . I do not see a default implementation for this right now . 2. Write the data into an hdfs tmp location,merge the files every hour or so and move the files to final hdfs destination. We can afford a latency of an hour before data is made available to consumers. 3. Make use of avro append functionality. Appreciate your help!

Nishan · ‎03-07-2016

Thanks Harsh. Yes my key was a void .So I changed the avro output to use keyOutputformat (earlier value is now the key) from Avrokeyvalueoutputformat and it worked. Thanks, Nishanth

Nishan · ‎02-12-2016

Hello, I am trying to load data from from avro backed hive table in a pig script using the below command A = LOAD 'dev.avro_test' USING org.apache.hive.hcatalog.pig.HCatLoader(); We are running into the below error.Request to give some direction.We are using CDH 5.5.1. 2016-02-12 19:21:26,515 [main] ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. Type void not present Failed to parse: Type void not present at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:198) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1688) at org.apache.pig.PigServer$Graph.access$000(PigServer.java:1421) at org.apache.pig.PigServer.parseAndBuild(PigServer.java:354) at org.apache.pig.PigServer.executeBatch(PigServer.java:379) at org.apache.pig.PigServer.executeBatch(PigServer.java:365) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:769) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) at org.apache.pig.Main.run(Main.java:484) at org.apache.pig.Main.main(Main.java:158) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.lang.TypeNotPresentException: Type void not present at org.apache.hive.hcatalog.data.schema.HCatFieldSchema$Type.getPrimitiveHType(HCatFieldSchema.java:92) at org.apache.hive.hcatalog.data.schema.HCatFieldSchema.<init>(HCatFieldSchema.java:226) at org.apache.hive.hcatalog.data.schema.HCatSchemaUtils.getHCatFieldSchema(HCatSchemaUtils.java:122) at org.apache.hive.hcatalog.data.schema.HCatSchemaUtils.getHCatFieldSchema(HCatSchemaUtils.java:115) at org.apache.hive.hcatalog.common.HCatUtil.getHCatFieldSchemaList(HCatUtil.java:151) at org.apache.hive.hcatalog.common.HCatUtil.getTableSchemaWithPtnCols(HCatUtil.java:184) at org.apache.hive.hcatalog.pig.HCatLoader.getSchema(HCatLoader.java:216) at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:175) at org.apache.pig.newplan.logical.relational.LOLoad.<init>(LOLoad.java:89) at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:853) at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3568) at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1625) at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102) at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560) at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:188) ... 19 more 2016-02-12 19:21:26,518 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Type void not present

Nishan · ‎12-04-2015

I tried using AvroParquetOutputFormat and MultipleOutputs class and was able to generate parquet files for a specific schema type.For the other schema type I am running into the below error.Any help is appreciated? java.lang.ArrayIndexOutOfBoundsException: 2820 at org.apache.parquet.io.api.Binary.hashCode(Binary.java:489) at org.apache.parquet.io.api.Binary.access$100(Binary.java:34) at org.apache.parquet.io.api.Binary$ByteBufferBackedBinary.hashCode(Binary.java:382) at org.apache.parquet.it.unimi.dsi.fastutil.objects.Object2IntLinkedOpenHashMap.getInt(Object2IntLinkedOpenHashMap.java:587) at org.apache.parquet.column.values.dictionary.DictionaryValuesWriter$PlainBinaryDictionaryValuesWriter.writeBytes(DictionaryValuesWriter.java:235) at org.apache.parquet.column.values.fallback.FallbackValuesWriter.writeBytes(FallbackValuesWriter.java:162) at org.apache.parquet.column.impl.ColumnWriterV1.write(ColumnWriterV1.java:203) at org.apache.parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.addBinary(MessageColumnIO.java:347) at org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:257) at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:167) at org.apache.parquet.avro.AvroWriteSupport.writeRecord(AvroWriteSupport.java:149) at org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:262) at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:167) at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142) at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121) at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123) at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42) at org.apache.hadoop.mapreduce.lib.output.LazyOutputFormat$LazyRecordWriter.write(LazyOutputFormat.java:115) at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:457) at com.visa.dps.mapreduce.logger.LoggerMapper.map(LoggerMapper.java:271)

Nishan · ‎12-04-2015

Hello All, We have a java map reduce application which reads in binary files does some data processing and converts to avro data.Currently we have two avro schemas and use Avromultipleoutputs class to write to multiple locations based on the schema.After we did some research we found that it would be beneficial if we could store the data as parquet.What is the best way to do this?Should I change the native map reduce to convert from avro to parquet or is there some other utility that I can use?. Thanks, Nishan

Nishan · ‎11-23-2015

Can you get the trace for that particular solr replica logs?.If your tlogs are large it could be replaying those but then it would normally been in recovery state.Are your tlogs are corrupt?.

Nishan · ‎11-23-2015

The issue in my case was I was not closing the avromultipleoutputs instance in the mapper.Combination of lazyoutputformat and closing the avromultipleoutputs instance in the mapper fixed the issue for me.

Nishan · ‎09-10-2015

Hello Harsh, I tried the same for AvroMultipleOut files and this still generates empty avro files.Should something in addition be done when we are using Avro MultipleOutputs?I am using avro 1.7.7 and CDH 5.4.Please let me know if you have faced this issue. Thanks, Nishanth

Nishan · ‎02-12-2015

I would probably suggest using curl command to add replica and not the solr UI.This is the wiki reference on how you can do that using the collection APIs. https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api_addreplica You may want to do this during a quite time as it would have more I/O on the system,again depends on what your cluster environment and index size looks like. Thanks, Nishan

Online	Offline
Last Visited	‎02-05-2019 02:51 PM

Member Since	‎02-11-2014 02:59 PM
Last Visited	‎02-05-2019 02:51 PM
Posts	162
Kudos received	2

Cloudera Community

Re: Re writing Avro map reduce to Parquet map redu...

Re: Cloudera Search: How HDFS replica and shards/r...

Re: CDH installation failed on all machines except...

Re: Unable to verify database connection

Re: Configuration setting via Cloudera Manager to ...

Small Avro Files

Re: Using Pig to Load data from avro backed hive ...

Using Pig to Load data from avro backed hive tabl...

Re: Re writing Avro map reduce to Parquet map redu...

Re writing Avro map reduce to Parquet map reduce

Re: How to restore a down replica ?

Re: how to suppress mapper output files if the out...

Re: how to suppress mapper output files if the out...

Re: Cloudera Search: How HDFS replica and shards/r...