Member since
02-11-2014
162
Posts
2
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4146 | 12-04-2015 12:46 PM | |
5685 | 02-12-2015 01:06 PM | |
4504 | 03-20-2014 12:41 PM | |
8755 | 03-19-2014 08:54 AM |
02-01-2019
02:39 PM
@Harsh J How would I do this just for one job ?. I tried using below setting but it is not working. The issue is that I want to use a version of jersey which I bundled into my fat jar,however gateway node has an older version of that jar and it loads a class from there resulting in a NosSuchMethodException .My application is not a map reduce job and I run it by using hadoop jar and running on 5.14.4 export HADOOP_USER_CLASSPATH_FIRST=true export HADOOP_CLASSPATH=/projects/poc/test/config:$HADOOP_CLASSPATH
... View more
11-20-2017
01:48 PM
Hello, We have a use case for ingesting binary files from mainframe to HDFS in avro format.These binary files contain different record types that are variable in length .The first 4 bytes denotes the length of the record.I have written a stand alone java program to ingest the data to hdfs using Avro DataFileWriter.Now these files from mainframe are much smaller in size (under a block size) and creates small files . Some of the options we came up with to avoid these are 1. Convert the batch process to more of a service that runs behind the scene ,so the avro datafile writer can keep running and flush the data based on certain interval (time/size ) . I do not see a default implementation for this right now . 2. Write the data into an hdfs tmp location,merge the files every hour or so and move the files to final hdfs destination. We can afford a latency of an hour before data is made available to consumers. 3. Make use of avro append functionality. Appreciate your help!
... View more
Labels:
- Labels:
-
HDFS
03-07-2016
04:33 PM
Thanks Harsh. Yes my key was a void .So I changed the avro output to use keyOutputformat (earlier value is now the key) from Avrokeyvalueoutputformat and it worked. Thanks, Nishanth
... View more
02-12-2016
11:22 AM
Hello, I am trying to load data from from avro backed hive table in a pig script using the below command A = LOAD 'dev.avro_test' USING org.apache.hive.hcatalog.pig.HCatLoader(); We are running into the below error.Request to give some direction.We are using CDH 5.5.1. 2016-02-12 19:21:26,515 [main] ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. Type void not present Failed to parse: Type void not present at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:198) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1688) at org.apache.pig.PigServer$Graph.access$000(PigServer.java:1421) at org.apache.pig.PigServer.parseAndBuild(PigServer.java:354) at org.apache.pig.PigServer.executeBatch(PigServer.java:379) at org.apache.pig.PigServer.executeBatch(PigServer.java:365) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:769) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) at org.apache.pig.Main.run(Main.java:484) at org.apache.pig.Main.main(Main.java:158) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.lang.TypeNotPresentException: Type void not present at org.apache.hive.hcatalog.data.schema.HCatFieldSchema$Type.getPrimitiveHType(HCatFieldSchema.java:92) at org.apache.hive.hcatalog.data.schema.HCatFieldSchema.<init>(HCatFieldSchema.java:226) at org.apache.hive.hcatalog.data.schema.HCatSchemaUtils.getHCatFieldSchema(HCatSchemaUtils.java:122) at org.apache.hive.hcatalog.data.schema.HCatSchemaUtils.getHCatFieldSchema(HCatSchemaUtils.java:115) at org.apache.hive.hcatalog.common.HCatUtil.getHCatFieldSchemaList(HCatUtil.java:151) at org.apache.hive.hcatalog.common.HCatUtil.getTableSchemaWithPtnCols(HCatUtil.java:184) at org.apache.hive.hcatalog.pig.HCatLoader.getSchema(HCatLoader.java:216) at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:175) at org.apache.pig.newplan.logical.relational.LOLoad.<init>(LOLoad.java:89) at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:853) at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3568) at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1625) at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102) at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560) at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:188) ... 19 more 2016-02-12 19:21:26,518 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Type void not present
... View more
12-04-2015
12:46 PM
I tried using AvroParquetOutputFormat and MultipleOutputs class and was able to generate parquet files for a specific schema type.For the other schema type I am running into the below error.Any help is appreciated? java.lang.ArrayIndexOutOfBoundsException: 2820 at org.apache.parquet.io.api.Binary.hashCode(Binary.java:489) at org.apache.parquet.io.api.Binary.access$100(Binary.java:34) at org.apache.parquet.io.api.Binary$ByteBufferBackedBinary.hashCode(Binary.java:382) at org.apache.parquet.it.unimi.dsi.fastutil.objects.Object2IntLinkedOpenHashMap.getInt(Object2IntLinkedOpenHashMap.java:587) at org.apache.parquet.column.values.dictionary.DictionaryValuesWriter$PlainBinaryDictionaryValuesWriter.writeBytes(DictionaryValuesWriter.java:235) at org.apache.parquet.column.values.fallback.FallbackValuesWriter.writeBytes(FallbackValuesWriter.java:162) at org.apache.parquet.column.impl.ColumnWriterV1.write(ColumnWriterV1.java:203) at org.apache.parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.addBinary(MessageColumnIO.java:347) at org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:257) at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:167) at org.apache.parquet.avro.AvroWriteSupport.writeRecord(AvroWriteSupport.java:149) at org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:262) at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:167) at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142) at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121) at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123) at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42) at org.apache.hadoop.mapreduce.lib.output.LazyOutputFormat$LazyRecordWriter.write(LazyOutputFormat.java:115) at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:457) at com.visa.dps.mapreduce.logger.LoggerMapper.map(LoggerMapper.java:271)
... View more
12-04-2015
10:16 AM
Hello All, We have a java map reduce application which reads in binary files does some data processing and converts to avro data.Currently we have two avro schemas and use Avromultipleoutputs class to write to multiple locations based on the schema.After we did some research we found that it would be beneficial if we could store the data as parquet.What is the best way to do this?Should I change the native map reduce to convert from avro to parquet or is there some other utility that I can use?. Thanks, Nishan
... View more
11-23-2015
09:24 PM
Can you get the trace for that particular solr replica logs?.If your tlogs are large it could be replaying those but then it would normally been in recovery state.Are your tlogs are corrupt?.
... View more
11-23-2015
02:49 PM
The issue in my case was I was not closing the avromultipleoutputs instance in the mapper.Combination of lazyoutputformat and closing the avromultipleoutputs instance in the mapper fixed the issue for me.
... View more
09-10-2015
03:19 PM
Hello Harsh, I tried the same for AvroMultipleOut files and this still generates empty avro files.Should something in addition be done when we are using Avro MultipleOutputs?I am using avro 1.7.7 and CDH 5.4.Please let me know if you have faced this issue. Thanks, Nishanth
... View more
02-12-2015
01:06 PM
I would probably suggest using curl command to add replica and not the solr UI.This is the wiki reference on how you can do that using the collection APIs. https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api_addreplica You may want to do this during a quite time as it would have more I/O on the system,again depends on what your cluster environment and index size looks like. Thanks, Nishan
... View more