Member since
06-18-2018
34
Posts
13
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
86588 | 02-02-2016 03:08 PM | |
1948 | 01-13-2016 09:52 AM |
02-02-2016
03:08 PM
2 Kudos
The solution is to create dynamically a table from avro, and then create a new table of parquet format from the avro one. there is the source code from Hive, which this helped you CREATE TABLE avro_test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS AVRO TBLPROPERTIES ('avro.schema.url'='myHost/myAvroSchema.avsc');
CREATE EXTERNAL TABLE parquet_test LIKE avro_test STORED AS PARQUET LOCATION 'hdfs://myParquetFilesPath';
... View more
02-02-2016
03:04 PM
actually, there is no answer to my question, i'll publish soon the answer and accept it
... View more
01-29-2016
10:13 AM
2 Kudos
let's assume that my HDFS block size is equal to 256Mb and that i need to store 20Gb of data on OCR/Parquet file(s), is it better to store all the data on one OCR/Parquet File, or is it better to store it on many ORC/Parquet files of 256Mb (HDFS Block Size) ?
... View more
Labels:
- Labels:
-
Apache Hadoop
01-27-2016
05:38 PM
Hello back ! sorry for 6days latency of my answer, otherwise i couldn't find how Ozone stores data on HDFS , in order to see how is it handling small files. do you have any idea ? thanks a lot 🙂
... View more
01-20-2016
10:45 PM
thanks a lot for you answer once again 🙂 1 - what do you mean by source to destination ? is it somekind of ETL on raw data to put in a DW ? 2.1 - is there in recomanded MPP data by hortonworks ? 2.2 if there is no option what other alternative exists ? Thanks 😉
... View more
01-20-2016
10:24 PM
no worry for that, i'm more talking about performance while reading, i know that hbase performs well in range scan but it is still true with huge amounts of data when it comes to run into operational issues like compaction,node rebuild and load distribution ?
... View more
01-20-2016
06:12 PM
Hello,
in short : Can I use HBase over HDFS as a datalake ?
in detail : as Hadoop has been designed to store massive amounts of data(as big files), i was wondering if according to my use case (storing lot of small files)
HBase is will be more suitable ?
of course data in HBase is stored in HDFS but what about metadata and when HBase runs into operational issues like compaction,node rebuild and load distribution ? thanks in advance.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache HBase
01-20-2016
09:44 AM
first of all thanks for you answer the duplication wasn't about the date but more about the data on parquet and hbase, otherwise using hive over hbase is not really as good as having a columnar format... have a nice day 🙂
... View more
01-19-2016
09:28 PM
Hello, I have some questions related to realtime analytics on hadoop, here is my use case and questions. I'm trying to use some BI solutions (like Tableau) in order to do realtime analytics on hadoop. 1 - what are the most used architectures in order to achieve my goal? 2 - does it make sense to use a MPP database as a datamart (loading data according to the business fields from hadoop to mpp)? 3 - can a nosql database like cassandra replace an mpp database? If yes, is it better?
... View more
Labels:
- Labels:
-
Apache Hadoop
01-19-2016
03:18 PM
yes thanks ^^, in my case i'm using hbase because i'm handling a large amount of small files.
... View more