About sudarshankumar_

anirban · ‎11-25-2019

Hi Vijay.. did u solve this issue? I am having same exception . kindly share .

mayank13081988 · ‎03-13-2019

Can you please share a sample java code for reading the hadoop sequential file which has hbase.io.ImmutableBytesWritable as Key class and hbase.client.Results as value class? Need to read from input stream which can read from hdfs. Would like to write it into output stream. My input stream shows the file can be read from hdfs but I cannot parse it. So need to build a parser for same.

sudarshankumar_ · ‎06-03-2017

Solved it after using correct path Create snapshot snapshot 'FundamentalAnalytic','FundamentalAnalyticSnapshot' Export Snapshot to local hdfs hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot FundamentalAnalyticSnapshot -copy-to /tmp -mappers 16 Driver Job Configuration to rum mapreduce on Hbase snapshot String snapshotName="FundamentalAnalyticSnapshot"; Path restoreDir = new Path("hdfs://quickstart.cloudera:8020/tmp"); String hbaseRootDir = "hdfs://quickstart.cloudera:8020/hbase"; TableMapReduceUtil.initTableSnapshotMapperJob(snapshotName, // snapshot name scan, // Scan instance to control CF and attribute selection DefaultMapper.class, // mapper class NullWritable.class, // mapper output key Text.class, // mapper output value job, true, restoreDir); Also running mapreduce on Hbase snapshot will skip scan on Hbase table and also there will be no impact on region server.

sudarshankumar_ · ‎02-25-2017

Finally i manged to resolve it . I just used multipleOutputs.write(NullWritable.get(), new Text(sb.toString()),strName); inside the for loop and that solved my problem .I have tested it with very huge data set 19 gb file and it worked fine for me . This is my final solution .Initially i thought it might create many objects but it is working fine for me .Map reduce is also getting competed very fast .

sudarshankumar_ · ‎03-31-2017

Customer needs data in the proper file .Even if one file will have 10 kb data also .

gkeys · ‎11-28-2016

There are a couple of optimizations you can try (below) but they almost certainly will not reduce a job duration from > 24 hours to a few hours. It likely is that your cluster is too small for the amount of processing you are doing. In that case, your best bet is to break your 200GB data set into smaller chunks and bulk load each sequentially (or preferably, add more nodes to your cluster). Also, be sure that you are not bulk loading when the scheduled major compaction is occurring. Optimizations: in addition to looking at your log, go to Ambari and see what is maxing out ... memory? CPU? This link gives a good overview for optimizing hbase loads. https://www.ibm.com/support/knowledgecenter/SSPT3X_3.0.0/com.ibm.swg.im.infosphere.biginsights.analyze.doc/doc/bigsql_loadhints.html It is not focused on bulkloading specifically, but does still come into play. Note: for each property mentioned, set it in your importtsv script as -D<property>=<value> \ One thing that usually helps map-reduce jobs is compressing the map output so travels across the wire faster to the reducer -Dmapred.compress.map.output=true\ -Dmapred.map.output.compression.code=org.apache.hadoop.io.compress.GzipCodec\ As mentioned though, it is likely that your cluster is not scaled properly for your workload.

Online	Offline
Last Visited	‎10-30-2018 10:27 AM

Member Since	‎11-11-2016 05:07 PM
Last Visited	‎10-30-2018 10:27 AM
Posts	43
Kudos received	4

Cloudera Community

Re: Running mapreduce on Hbase Exported Table thro...

Re: Error: Java heap space in reducer phase

Re: HBASE Caused by: java.io.IOException: Wrong nu...

Re: Running mapreduce on Hbase Exported Table thro...

Re: Running MaprRduce on Hbase table snapshot not ...

Re: Error: Java heap space in reducer phase

Re: How to set No of output file per reducer in Cu...

Re: Reducer is running very slow in hbase bulk loa...