Member since
06-07-2016
923
Posts
322
Kudos Received
115
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4114 | 10-18-2017 10:19 PM | |
| 4349 | 10-18-2017 09:51 PM | |
| 14872 | 09-21-2017 01:35 PM | |
| 1845 | 08-04-2017 02:00 PM | |
| 2427 | 07-31-2017 03:02 PM |
01-25-2017
07:15 AM
@Karan Alang Here is what I would suggest. run zookeeper cli by running zkcli.sh (probably at following location). Then do "ls /" to find znodes. I think you should see a path for hbase which should be /hbase. If not then probably similar. Then just run rmr <path> for example rmr /hbase. /usr/hdp/current/zookeeper-client/bin/zkCli.sh -server localhost:2181 ->assuming zookeeper is local
... View more
01-25-2017
02:39 AM
@Karan Alang
Did you try clearing up your zookeeper directory? Your zookeeper directory is hbase.zookeeper.property.datadir (in your hbase-site.xml). Login to zookeeper cli and run rmr /path. Make sure both hbase and zookeeper are shutdown.
... View more
01-23-2017
06:37 AM
The only thing you can do is limit which IP's can access your cluster. Basically specifying security rules for inbound traffic (or outbound also). http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html#ec2-classic-security-groups
... View more
01-23-2017
06:20 AM
@Avijeet Dash In cloud environment you create cluster within a VPC (AWS) or Azure Virtual network which becomes an extension of your own network. In addition both cloud environments (and other major ones) offers network ACLs. You are not really opening up ports to DMZ. Any practical deployment should use these features regardless of Hadoop.
... View more
01-22-2017
11:57 PM
2 Kudos
@ripunjay godhani Before I answer your question, please read the following discussion which will help you understand why larger block sizes are required for Hadoop. https://community.hortonworks.com/questions/51408/hdfs-federation-1.html Now, assuming you have read above link, yo understand why small files will not work with Hadoop. So not only that you need a 64 MB block size, you actually should bump it up to 128 MB (That is the default in HDP). This is not a bad news for your use case. There are literally 1000 plus deployments at this point where historical data is archived in Hadoop. Why do you have small files? Are those files small because the whole table is a few MB (less than 64 MB)? What is the total amount of data are you looking to offload into Hadoop? Once we know this, we can answer better but offloading historical data is a classic hadoop use case and you shouldn't run into small files problem.
... View more
01-17-2017
09:10 PM
Yes. I think it should. I have not done it specifically but I have used Result.java class so it should work as it is the same class. Here is how I have done it. // create hbase configuration
Configuration configuration = HBaseConfiguration.create();
configuration.addResource(new Path("/etc/hbase/conf/hbase-site.xml"));
configuration.set(TableInputFormat.INPUT_TABLE, hbaseTableName);
// create java hbase context
JavaHBaseContext javaHBaseContext = new JavaHBaseContext(javaSparkContext, configuration);
JavaPairRDD<ImmutableBytesWritable, Result> hbaseRDD =
javaSparkContext.newAPIHadoopRDD(configuration, TableInputFormat.class, ImmutableBytesWritable.class, Result.class);
JavaRDD<Row> rowJavaRDD = hbaseRDD.map(new Function<Tuple2<ImmutableBytesWritable, Result>, Row >() {
private static final long serialVersionUID = -2021713021648730786L;
public Row call(Tuple2<ImmutableBytesWritable, Result> tuple) throws Exception {
Object[] rowObject = new Object[namearr.length];
for (int i=0; i<namearr.length; i++) {
Result result = tuple._2;
// handle each data type we support
if (typesarr[i].equals("string")) {
String str = Bytes.toString(result.getValue(Bytes.toBytes(cfarr[i]), Bytes.toBytes(namearr[i])));
rowObject[i] = str;
}
}
... View more
01-17-2017
05:38 AM
@Todd Niven In your configuration, set the following and then use getColumnCells to get the version you want. Familiarize with Result.java from hbase client API which is probably what you are using. conf.set("hbase.mapreduce.scan.maxversions", "VERSION_YOU_WANT")
... View more
01-15-2017
06:16 PM
@Rohit Sharma Can you please try creating table on your /data/folder1 table instead of the file?
... View more
01-13-2017
05:33 PM
1 Kudo
@ed day My first thought was using back tick but I see you have already tried it. Can you also try select 'user'.contributors_enabled from tweets; or the following select "user".contributors_enabled from tweets or select "user.contributors_enabled" from tweets
... View more
01-11-2017
10:05 PM
@Avijeet Dash Here is a link for HBase sizing that you can use: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_Sys_Admin_Guides/content/ch_clust_capacity.html If you are using both HBase and SOLR, I am going to assume you are going to index HBase columns in SOLR. There are two concepts in SOLR when it comes to sizing. What will you be indexing and what will you be storing. If you know what you'll be storing (all of HBase columns? Probably not, but I am no one to say) and what will you be indexing (definitely not everything but whatever you index will be in addition to what you store). As for SOLR is better without HDFS is more of an opinion. I have seen cluster where SOLR cloud is running just fine along side HBase and HDFS. Here is what you should remember. Zookeeper should have its own dedicated disk (please do not share zookeeper disks - I cannot over emphasize this). Size appropriately. Meaning have the right amount of CPU and memory resources. If you are going to give 4GB of heap space to SOLR then there will likely be problems (do not go on the other extreme as it will result in Java garbage collection pauses - ideal heap to start with is 8-12 GB). Another thing to remember is what kind of queries will your end users be running. If they start scanning entire SOLR index, there shouldn't be a doubt that you will run into issues.
... View more