About mqureshi

mqureshi · ‎12-15-2016

@Xiaojie Ma One thing just popped up in my mind. Do you have snapshots that may be pointing to your data? In that case, data is not deleted by major compaction and moved to an archiving folder. See under /hbase if you have .archive folder.

mqureshi · ‎12-13-2016

@Asma Dhaouadi I am not sure what the issue is. If you following the directions, then what you are doing is right. you are missing a user name. If this is not working then try adding following to yarns-site .xml (127.0.0.1 may be replaced with localhost assuming that's where resource manager is running) <code><property> <name>yarn.resourcemanager.hostname</name> <value>127.0.0.1</value> </property> Also I am assuming you already have the following in your yarn-site.xml <property> <name>yarn.resourcemanager.address</name> <value>localhost:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>localhost:8030</value> </property>

mqureshi · ‎12-13-2016

you don't need yarn.resourcemanager.hostname because it becomes where ever your resource manager is running. Is it running on localhost? If not, whatever IP address.

mqureshi · ‎12-13-2016

@Asma Dhaouadi Ah! By default this should already be configured. Go to Yarn in Cloudera Manager. click on configuration tab. Under configuration there is on left side a link for "ports and addresses" When you click that it will show you both of these settings. yarn.resourcemanager.scheduler.address --> set to 8030 by default yarn.resourcemanager.address --> set to 8032 by default.

mqureshi · ‎12-13-2016

@Asma Dhaouadi By default resource manager runs on port 8088. See if the following takes you to Resource Manager UI. http://<host where Resource manager is running>:8088 When your YARN is running, Cloudera Manager will show you a tab for "Web UI". Under that tab, there is a link for Resource Manager Web UI. click that and it opens up Resource manager running by default at: http://<host>:8088/cluster

mqureshi · ‎12-13-2016

@Michael Young I am reasonably sure you are aware of the core-java api of Apache ORC. Is this link not enough? https://orc.apache.org/docs/core-java.html

mqureshi · ‎12-13-2016

@Huahua Wei HBaseStoragehandler is what is required to read HBase tables. At the end of the day, you first have to create and manage HBase and then use Hive. Since, you are going to be doing updates, this might be the best way to go about it but I would strongly recommend to look at the following approach. The reason is probably my personal preference of not using HBase until required as it is complex and skill set required to successfully implement is difficult to find. That being said, in your use case, if you don't like the following approach, I'd prefer HBase over Hive ACID. http://hortonworks.com/blog/four-step-strategy-incremental-updates-hive/

mqureshi · ‎12-13-2016

@Junaid Rao No, this is not possible even when Windows was a supported platform. Now, HDP support for Windows has been deprecated. We strongly suggest using Linux platforms.

mqureshi · ‎12-13-2016

@oula.alshiekh@gmail.com alshiekh When you say that you are able to insert less than 30K records, does that mean the same source and destination? The reason I ask is because your error points towards a permission/access issue.

mqureshi · ‎12-13-2016

@Huahua Wei What is your use case? Type of data? Hive Acid performance will likely be slower than Hive on top of HBase specifically if you access data using HBase row key. Before I recommend Hive/ORC vs HBase, I'd like to understand your use case better. Here is what I say about HBase: When to use HBase: •Storing large amounts of data (TB/PB) •High throughput for a large number of requests •Storing unstructured or variable column data •Big Data with random read and writes •Well Suited for sparse rows where the number of column varies •Highly Available, Scalable (since it runs on HDFS) When NOT to use HBase: •Only use with Big Data problems •If you have data for only one or two nodes, HBase is likely not the tool you should be using to begin with. •Read straight through files •Write all at once or append new files •Not random reads or writes •Access patterns of the data are ill-defined

Online	Offline
Last Visited	‎10-31-2017 03:17 AM

Member Since	‎06-07-2016 09:05 AM
Last Visited	‎10-31-2017 03:17 AM
Posts	923
Kudos received	310

Cloudera Community

Re: YARN recommended configuration

Re: How to resolve for NULL values when they are c...

Re: Why is spark has better speed than Hadoop

Re: Is it possible to assign Hadoop queues to Hado...

Re: Kafka NiFi HDF Installation

Re: hbase can't move data from disk after major_co...

Re: Where find resource manager URI value

Re: Where find resource manager URI value

Re: Where find resource manager URI value

Re: Where find resource manager URI value

Re: Tutorial for writing directly to ORC files wit...

Re: Hive and Hbase table

Re: Add Windows Server 2012 Data Node to a CENTOS7...

Re: Inserting From external Data Table to Hive Tab...

Re: Hive and Hbase table