Member since
09-21-2015
31
Posts
59
Kudos Received
9
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2615 | 06-01-2016 12:10 PM | |
4844 | 03-08-2016 06:19 PM | |
2401 | 01-19-2016 06:18 PM | |
2026 | 12-15-2015 03:18 PM | |
4587 | 12-03-2015 10:53 PM |
12-15-2015
03:18 PM
1 Kudo
It appears you cannot resolve mirrorlist.centos.org via DNS from your virtual machine. Does the follow return a result? nslookup mirrorlist.centos.org If not, I expect you have configured the VM with a Host-Only adapter, which will not allow the VM to access the internet.
... View more
12-12-2015
07:48 AM
2 Kudos
Here is the mini cluster project. hadoop-mini-clusters Here is Dhruv's testing project: iot-integration-tester
... View more
12-03-2015
11:02 PM
4 Kudos
FWIW, XFS is the default in RHEL 7, so I expect an uptick in new clusters.
... View more
12-03-2015
10:53 PM
3 Kudos
Hello Mike, Check that /tmp is not mounted with the noexec flag on that node. sudo mount | grep /tmp If so, remounting without that option should fix this. If removing noexec isn't an option, you can control the directory Java uses for temporary storage through the java.io.tmpdir system property. Give the following a try, replacing the directory with your home directory or another filesystem without the noexec flag. hbase -Djava.io.tmpdir=/some/other/writable/directory shell
... View more
12-03-2015
08:52 PM
4 Kudos
DefaultResourceCalculator only takes memory into account. Here is a brief explanation of what you are seeing (relevant part bolded). Pluggable resource-vector in YARN scheduler The CapacityScheduler has the concept of a ResourceCalculator – a pluggable layer that is used for carrying out the math of allocations by looking at all the identified resources. This includes utilities to help make the following decisions:
Does this node have enough resources of each resource-type to satisfy this request? How many containers can I fit on this node, sorting a list of nodes with varying resources available. There are two kinds of calculators currently available in YARN – the DefaultResourceCalculator and theDominantResourceCalculator. The DefaultResourceCalculator only takes memory into account when doing its calculations. This is why CPU requirements are ignored when carrying out allocations in the CapacityScheduler by default. All the math of allocations is reduced to just examining the memory required by resource-requests and the memory available on the node that is being looked at during a specific scheduling-cycle. You can find more on this topic on our blog: managing-cpu-resources-in-your-hadoop-yarn-clusters
... View more
11-09-2015
04:01 PM
1 Kudo
I don't necessarily agree with this answer. We could avoid needing to change ownership through leveraging proxy users. I hope to find time to write a patch to demonstrate this. I'd also be interested in how many clusters are actually kerberos enabled. I expect it's lower than you think. Data ownership does matter and provides at least rudimentary controls when the user does not or can not enable Kerberos.
... View more
11-05-2015
02:20 PM
When writing data to HDFS in the PutHDFS NiFi Processor, the data is owned by "anonymous". I'm trying to find a good way to control the ownership of data landed via this processor. I looked into Remote Owner and Remote Group, however, those require that the NiFi server is running as the "hdfs" user. This seems like a bad idea to me. I'm curious why this processor doesn't leverage Hadoop Proxy Users, versus enforcing that the NiFi server runs as hdfs? Any other workarounds? My initial thought was to stage the data in HDFS with NiFi and use Falcon to move it to it's final location, however, this seems overkill for users that simply want to ingest the data into its final location. Am I missing something obvious here?
... View more
Labels:
11-03-2015
11:56 PM
1 Kudo
Demo article has been added here: creating-hbase-hfiles-from-an-existing-hive-table
... View more
11-03-2015
11:53 PM
10 Kudos
Hive HBase Generate HFiles Demo scripts available at: https://github.com/sakserv/hive-hbase-generatehfiles Below contains an example of leveraging the Hive HBaseStorageHandler for HFile generation. This pattern provides a means of taking data already stored in Hive, exporting it as HFiles, and bulk loading the HBase table from those HFiles. Overview The HFile generation feature was added in HIVE-6473. It adds the following properties that are then leveraged by the Hive HBaseStorageHandler.
hive.hbase.generatehfiles - true to generate HFiles hfile.family.path - path in HDFS to put the HFiles. Note that for hfile.family.path, the final sudirectory MUST MATCH the column family name. The scripts in the repo called out above can be used with the Hortonworks Sandbox to test and demo this feature. Example The following is an example of how to use this feature. The scripts in the repo above implement the steps below. It is assumed that the user already has data stored in a hive table, for the sake of this example, the following table was used. CREATE EXTERNAL TABLE passwd_orc(userid STRING, uid INT, shell STRING)
STORED AS ORC
LOCATION '/tmp/passwd_orc';
First, decide on the HBase table and column family name. We want to use a single column family. For the example below, the HBase table name is "passwd_hbase", the column family name is "passwd". Below is the DDL for the HBase table created through Hive. Couple of notes:
userid as my row key. :key is special syntax in the hbase.columns.mapping each column (qualifier) is in the form column family:column (qualifier) CREATE TABLE passwd_hbase(userid STRING, uid INT, shell STRING)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,passwd:uid,passwd:shell');
Next, generate the HFiles for the table. Couple of notes again:
The hfile.family.path is where the hfiles will be generated. The final subdirectory name MUST match the column family name. SET hive.hbase.generatehfiles=true;
SET hfile.family.path=/tmp/passwd_hfiles/passwd;
INSERT OVERWRITE TABLE passwd_hbase SELECT DISTINCT userid,uid,shell FROM passwd_orc CLUSTER BY userid;
Finally, load the HFiles into the HBase table: export HADOOP_CLASSPATH=`hbase classpath`
yarn jar /usr/hdp/current/hbase-client/lib/hbase-server.jar completebulkload /tmp/passwd_hfiles passwd_hbase
The data can now be queried from Hive or HBase.
... View more
Labels:
11-03-2015
11:48 PM
This shows promise as well. I plan to give this a try soon. However, the accepted answer avoids needing to go from ORC back to Csv, so it gets the win. 🙂
... View more
- « Previous
-
- 1
- 2
- Next »