Member since
06-07-2016
923
Posts
322
Kudos Received
115
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4114 | 10-18-2017 10:19 PM | |
| 4353 | 10-18-2017 09:51 PM | |
| 14884 | 09-21-2017 01:35 PM | |
| 1852 | 08-04-2017 02:00 PM | |
| 2429 | 07-31-2017 03:02 PM |
12-15-2016
05:28 PM
@Xiaojie Ma One thing just popped up in my mind. Do you have snapshots that may be pointing to your data? In that case, data is not deleted by major compaction and moved to an archiving folder. See under /hbase if you have .archive folder.
... View more
12-13-2016
08:18 PM
@Asma Dhaouadi
I am not sure what the issue is. If you following the directions, then what you are doing is right. you are missing a user name. If this is not working then try adding following to yarns-site .xml (127.0.0.1 may be replaced with localhost assuming that's where resource manager is running) <code><property>
<name>yarn.resourcemanager.hostname</name>
<value>127.0.0.1</value>
</property>
Also I am assuming you already have the following in your yarn-site.xml
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:8030</value>
</property>
... View more
12-13-2016
06:05 PM
you don't need yarn.resourcemanager.hostname because it becomes where ever your resource manager is running. Is it running on localhost? If not, whatever IP address.
... View more
12-13-2016
05:52 PM
@Asma Dhaouadi Ah! By default this should already be configured. Go to Yarn in Cloudera Manager. click on configuration tab. Under configuration there is on left side a link for "ports and addresses" When you click that it will show you both of these settings. yarn.resourcemanager.scheduler.address --> set to 8030 by default yarn.resourcemanager.address --> set to 8032 by default.
... View more
12-13-2016
05:40 PM
@Asma Dhaouadi
By default resource manager runs on port 8088. See if the following takes you to Resource Manager UI. http://<host where Resource manager is running>:8088 When your YARN is running, Cloudera Manager will show you a tab for "Web UI". Under that tab, there is a link for Resource Manager Web UI. click that and it opens up Resource manager running by default at:
http://<host>:8088/cluster
... View more
12-13-2016
05:26 PM
1 Kudo
@Michael Young I am reasonably sure you are aware of the core-java api of Apache ORC. Is this link not enough? https://orc.apache.org/docs/core-java.html
... View more
12-13-2016
03:54 PM
1 Kudo
@Huahua Wei HBaseStoragehandler is what is required to read HBase tables. At the end of the day, you first have to create and manage HBase and then use Hive. Since, you are going to be doing updates, this might be the best way to go about it but I would strongly recommend to look at the following approach. The reason is probably my personal preference of not using HBase until required as it is complex and skill set required to successfully implement is difficult to find. That being said, in your use case, if you don't like the following approach, I'd prefer HBase over Hive ACID. http://hortonworks.com/blog/four-step-strategy-incremental-updates-hive/
... View more
12-13-2016
03:36 PM
@Junaid Rao No, this is not possible even when Windows was a supported platform. Now, HDP support for Windows has been deprecated. We strongly suggest using Linux platforms.
... View more
12-13-2016
03:32 PM
@oula.alshiekh@gmail.com alshiekh When you say that you are able to insert less than 30K records, does that mean the same source and destination? The reason I ask is because your error points towards a permission/access issue.
... View more
12-13-2016
04:01 AM
@Huahua Wei What is your use case? Type of data? Hive Acid performance will likely be slower than Hive on top of HBase specifically if you access data using HBase row key. Before I recommend Hive/ORC vs HBase, I'd like to understand your use case better. Here is what I say about HBase:
When to use HBase: •Storing
large amounts of data (TB/PB) •High
throughput for a large number of requests •Storing
unstructured or variable column data •Big
Data with random read and writes •Well
Suited for sparse rows where the number of column varies •Highly
Available, Scalable (since it runs on HDFS)
When NOT to use HBase:
•Only
use with Big Data problems •If you have data for only one or
two nodes, HBase is likely not the tool you should be using to begin with. •Read
straight through files •Write
all at once or append new files •Not random reads or writes •Access
patterns of the data are ill-defined
... View more