Member since
12-16-2015
17
Posts
10
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
6314 | 06-13-2016 12:35 PM | |
4998 | 02-03-2016 04:17 PM |
01-27-2017
08:10 PM
1 Kudo
After pivoting you need to run an aggregate function (e.g. sum) to get back a DataFrame/Dataset. After aggregation you'll be able to show() the data. You can find an excellent overview of pivoting at this website: https://databricks.com/blog/2016/02/09/reshaping-data-with-pivot-in-apache-spark.html
... View more
06-13-2016
12:35 PM
2 Kudos
Hello , Thanks everyone. As it turned out, some Ambari features were in maintenance mode, which meant there actually was a discrepancy between the discoverable folder structures. Turning off maintenance mode and rebooting did the trick! Thanks
Aidan
... View more
03-08-2016
02:30 AM
Hey guys. The tutorial mentioned above has been updated and is also compatible with the latest Sandbox HDP 2.4. It addresses the issue of permissions. Here is the link: http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-hive/ When you a chance, can you go through the tutorial on our new Sandbox?
... View more
12-17-2015
10:17 AM
HI @Aidan Condron, If you're not bulk loading, you can upload to HBase through Hive. Head to Hive through Ambari. You can upload your .csv files to HDFS, I use the tmp folder. Then use the following in Hive, create table MyTable (col_value STRING);
LOAD DATA INPATH '/tmp/MyData.csv' OVERWRITE INTO TABLE MyTable;
CREATE TABLE MyHiveTable (FirstName STRING, LastName STRING);
insert overwrite table MyHiveTable
SELECT
regexp_extract(col_value, '^(?:([^,]*)\,?){1}', 1) FirstName,
regexp_extract(col_value, '^(?:([^,]*)\,?){2}', 1) LastName
from MyTable;
CREATE TABLE MyHBaseTable(firstname STRING, lastname STRING)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key, f:c1')
TBLPROPERTIES ('hbase.table.name' = 'MyNamesTable');
FROM MyHiveTable INSERT INTO TABLE MyHBaseTable
Select MyHiveTable.*; It's not a fast method, but the Regex and intermediary stages are useful if you need to additional control over your data before it goes into HBase
... View more