About mqureshi

mqureshi · ‎01-25-2017

@Karan Alang Here is what I would suggest. run zookeeper cli by running zkcli.sh (probably at following location). Then do "ls /" to find znodes. I think you should see a path for hbase which should be /hbase. If not then probably similar. Then just run rmr <path> for example rmr /hbase. /usr/hdp/current/zookeeper-client/bin/zkCli.sh -server localhost:2181 ->assuming zookeeper is local

mqureshi · ‎01-25-2017

@Karan Alang Did you try clearing up your zookeeper directory? Your zookeeper directory is hbase.zookeeper.property.datadir (in your hbase-site.xml). Login to zookeeper cli and run rmr /path. Make sure both hbase and zookeeper are shutdown.

mqureshi · ‎01-23-2017

The only thing you can do is limit which IP's can access your cluster. Basically specifying security rules for inbound traffic (or outbound also). http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html#ec2-classic-security-groups

mqureshi · ‎01-23-2017

@Avijeet Dash In cloud environment you create cluster within a VPC (AWS) or Azure Virtual network which becomes an extension of your own network. In addition both cloud environments (and other major ones) offers network ACLs. You are not really opening up ports to DMZ. Any practical deployment should use these features regardless of Hadoop.

mqureshi · ‎01-22-2017

@ripunjay godhani Before I answer your question, please read the following discussion which will help you understand why larger block sizes are required for Hadoop. https://community.hortonworks.com/questions/51408/hdfs-federation-1.html Now, assuming you have read above link, yo understand why small files will not work with Hadoop. So not only that you need a 64 MB block size, you actually should bump it up to 128 MB (That is the default in HDP). This is not a bad news for your use case. There are literally 1000 plus deployments at this point where historical data is archived in Hadoop. Why do you have small files? Are those files small because the whole table is a few MB (less than 64 MB)? What is the total amount of data are you looking to offload into Hadoop? Once we know this, we can answer better but offloading historical data is a classic hadoop use case and you shouldn't run into small files problem.

mqureshi · ‎01-17-2017

Yes. I think it should. I have not done it specifically but I have used Result.java class so it should work as it is the same class. Here is how I have done it. // create hbase configuration Configuration configuration = HBaseConfiguration.create(); configuration.addResource(new Path("/etc/hbase/conf/hbase-site.xml")); configuration.set(TableInputFormat.INPUT_TABLE, hbaseTableName); // create java hbase context JavaHBaseContext javaHBaseContext = new JavaHBaseContext(javaSparkContext, configuration); JavaPairRDD<ImmutableBytesWritable, Result> hbaseRDD = javaSparkContext.newAPIHadoopRDD(configuration, TableInputFormat.class, ImmutableBytesWritable.class, Result.class); JavaRDD<Row> rowJavaRDD = hbaseRDD.map(new Function<Tuple2<ImmutableBytesWritable, Result>, Row >() { private static final long serialVersionUID = -2021713021648730786L; public Row call(Tuple2<ImmutableBytesWritable, Result> tuple) throws Exception { Object[] rowObject = new Object[namearr.length]; for (int i=0; i<namearr.length; i++) { Result result = tuple._2; // handle each data type we support if (typesarr[i].equals("string")) { String str = Bytes.toString(result.getValue(Bytes.toBytes(cfarr[i]), Bytes.toBytes(namearr[i]))); rowObject[i] = str; } }

mqureshi · ‎01-17-2017

@Todd Niven In your configuration, set the following and then use getColumnCells to get the version you want. Familiarize with Result.java from hbase client API which is probably what you are using. conf.set("hbase.mapreduce.scan.maxversions", "VERSION_YOU_WANT")

mqureshi · ‎01-15-2017

@Rohit Sharma Can you please try creating table on your /data/folder1 table instead of the file?

mqureshi · ‎01-13-2017

@ed day My first thought was using back tick but I see you have already tried it. Can you also try select 'user'.contributors_enabled from tweets; or the following select "user".contributors_enabled from tweets or select "user.contributors_enabled" from tweets

mqureshi · ‎01-11-2017

@Avijeet Dash Here is a link for HBase sizing that you can use: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_Sys_Admin_Guides/content/ch_clust_capacity.html If you are using both HBase and SOLR, I am going to assume you are going to index HBase columns in SOLR. There are two concepts in SOLR when it comes to sizing. What will you be indexing and what will you be storing. If you know what you'll be storing (all of HBase columns? Probably not, but I am no one to say) and what will you be indexing (definitely not everything but whatever you index will be in addition to what you store). As for SOLR is better without HDFS is more of an opinion. I have seen cluster where SOLR cloud is running just fine along side HBase and HDFS. Here is what you should remember. Zookeeper should have its own dedicated disk (please do not share zookeeper disks - I cannot over emphasize this). Size appropriately. Meaning have the right amount of CPU and memory resources. If you are going to give 4GB of heap space to SOLR then there will likely be problems (do not go on the other extreme as it will result in Java garbage collection pauses - ideal heap to start with is 8-12 GB). Another thing to remember is what kind of queries will your end users be running. If they start scanning entire SOLR index, there shouldn't be a doubt that you will run into issues.

Online	Offline
Last Visited	‎10-31-2017 03:17 AM

Member Since	‎06-07-2016 09:05 AM
Last Visited	‎10-31-2017 03:17 AM
Posts	923
Kudos received	310

Cloudera Community

Re: YARN recommended configuration

Re: How to resolve for NULL values when they are c...

Re: Why is spark has better speed than Hadoop

Re: Is it possible to assign Hadoop queues to Hado...

Re: Kafka NiFi HDF Installation

Re: HDFS Encryption Zone - HBase shutting down

Re: HDFS Encryption Zone - HBase shutting down

Re: ports required to be open

Re: ports required to be open

Re: cases where changing hadoop block size is not ...

Re: HDP Spark Hbase Connector Cell Versions?

Re: HDP Spark Hbase Connector Cell Versions?

Re: The Directory in HDFS goes in Trash when i dro...

Re: Hive: how to query a reserved word?

Re: HBASE and SOLR capacity planning