About acondron

tkiss · ‎01-27-2017

After pivoting you need to run an aggregate function (e.g. sum) to get back a DataFrame/Dataset. After aggregation you'll be able to show() the data. You can find an excellent overview of pivoting at this website: https://databricks.com/blog/2016/02/09/reshaping-data-with-pivot-in-apache-spark.html

acondron · ‎06-13-2016

Hello , Thanks everyone. As it turned out, some Ambari features were in maintenance mode, which meant there actually was a discrepancy between the discoverable folder structures. Turning off maintenance mode and rebooting did the trick! Thanks Aidan

jmedel · ‎03-08-2016

Hey guys. The tutorial mentioned above has been updated and is also compatible with the latest Sandbox HDP 2.4. It addresses the issue of permissions. Here is the link: http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-hive/ When you a chance, can you go through the tutorial on our new Sandbox?

sambass · ‎12-17-2015

HI @Aidan Condron, If you're not bulk loading, you can upload to HBase through Hive. Head to Hive through Ambari. You can upload your .csv files to HDFS, I use the tmp folder. Then use the following in Hive, create table MyTable (col_value STRING); LOAD DATA INPATH '/tmp/MyData.csv' OVERWRITE INTO TABLE MyTable; CREATE TABLE MyHiveTable (FirstName STRING, LastName STRING); insert overwrite table MyHiveTable SELECT regexp_extract(col_value, '^(?:([^,]*)\,?){1}', 1) FirstName, regexp_extract(col_value, '^(?:([^,]*)\,?){2}', 1) LastName from MyTable; CREATE TABLE MyHBaseTable(firstname STRING, lastname STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key, f:c1') TBLPROPERTIES ('hbase.table.name' = 'MyNamesTable'); FROM MyHiveTable INSERT INTO TABLE MyHBaseTable Select MyHiveTable.*; It's not a fast method, but the Regex and intermediary stages are useful if you need to additional control over your data before it goes into HBase

Online	Offline
Last Visited	‎01-27-2017 01:37 PM

Member Since	‎12-16-2015 04:00 PM
Last Visited	‎01-27-2017 01:37 PM
Posts	17
Kudos received	10

Cloudera Community

Re: How to change SparkR working directory? How to...

Re: Error on Tutorial "How to Process Data with Ap...

Re: How to display pivoted dataframe with PSark, P...

Re: How to change SparkR working directory? How to...

Re: Error on Tutorial "How to Process Data with Ap...

Re: How do I import data from csv file into Hbase?