Member since
10-01-2015
3933
Posts
1150
Kudos Received
374
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3365 | 05-03-2017 05:13 PM | |
2796 | 05-02-2017 08:38 AM | |
3074 | 05-02-2017 08:13 AM | |
3006 | 04-10-2017 10:51 PM | |
1516 | 03-28-2017 02:27 AM |
02-08-2017
12:34 PM
You also use -include -f with more than one host, not just that single datanode, I thought I was clear on that.
... View more
02-08-2017
12:14 PM
it is possible, I see people sqoop into manged table then alter table to be external, sqoop continues to work http://stackoverflow.com/questions/27991258/how-to-create-external-table-in-hive-using-sqoop-need-suggestions#29602510 Another option is to manually create table in hive as external and sqoop reference to that table, see section on incremental_table http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_dataintegration/content/incrementally-updating-hive-table-with-sqoop-and-ext-table.html For sqoop to create an external table outright, you might want to open an enhancement jira
... View more
02-08-2017
12:00 PM
Try to add the rest of the hdfs ports to port forwarding https://ambari.apache.org/1.2.3/installing-hadoop-using-ambari/content/reference_chap2_1.html and if you are on HDP 2.5 sandbox follow this tutorial to port forward https://community.hortonworks.com/articles/65914/how-to-add-ports-to-the-hdp-25-virtualbox-sandbox.html
... View more
02-08-2017
11:52 AM
In Azure portal, you should be able to open endpoints for those ports. Not sure if same option is available in HDInsight, then just open a case with them and tell them to open zk and hbase for you.
... View more
02-08-2017
11:46 AM
Here you go https://github.com/apache/ambari/blob/trunk/ambari-server/docs/api/v1/index.md
... View more
02-08-2017
11:41 AM
Check that HBase master is up, is there Satan's by master available? If not, add one. Then make sure all ports for HBase are open, if you are in cloud cluster, open endpoints and use public IPs. The doc below is from an old HDP release, I'm looking for equivalent in HDP 2.x. all you have to make sure is that 60000 ports in HBase switched to 16000. Zookeeper ports are fine. http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.2.0/bk_reference/content/reference_chap2_4.html
... View more
02-08-2017
11:36 AM
1 Kudo
You can deploy HBase as it ships with HDP and then deploy the jar for Co-processor to all RS with pdsh or Ansible or anything else. Editing hbase-site will be handled via Ambari configs. You don't need to rebuild HBase source.
... View more
02-08-2017
03:13 AM
are you asking about balancing HDFS so that all nodes are evenly distributed? In that case you need to run HDFS balancer and it will spread that node's data across nodes, otherwise it defeats the point of balancing out HDFS. You can use -include -f hostsfile to hint which nodes to run balancer against but idea is you need more than one node there, is meant to be across multiple datanodes not single datanode. https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#balancer If your question is to balance data across all disks on one node, you can use disk balancer, sadly it's a new feature in Hadoop 3. https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html
... View more
02-08-2017
02:52 AM
Interesting question, I should try it out. That said if you can't figure out on your own or won't get other responses, then try creating a hive schema on top of your avro files, then run the following statement in hive to create a new table in orc or text format from your original avro table. Then you should be able to find examples of exporting with sqoop from orc or text create table newtable as select * from avrotable
--sample avro backed hive table
CREATE DATABASE DEMO;
USE DEMO;
CREATE EXTERNAL TABLE mailinglist
COMMENT "just drop the schema right into the HQL"
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION '/tmp/rdbms'
TBLPROPERTIES (
'avro.schema.literal'='{
"type": "record",
"name": "mailinglist",
"namespace": "any.data",
"fields": [
{
"name": "id",
"type": [
"null",
"int"
]
},
{
"name": "first_name",
"type": [
"null",
"string"
]
},
{
"name": "last_name",
"type": [
"null",
"string"
]
},
{
"name": "email",
"type": [
"null",
"string"
]
},
{
"name": "gender",
"type": [
"null",
"string"
]
}
]
}');
... View more
02-08-2017
02:36 AM
why would you want to do that? Try HDC to spin up an HDP cluster with ease. http://hortonworks.com/products/cloud/aws/ If you need more than just ETL, Data Science, Machine Learning or Bi then try Cloudbreak. http://hortonworks.com/apache/cloudbreak/
... View more