Member since
06-17-2015
61
Posts
20
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1919 | 01-21-2017 06:18 PM | |
2312 | 08-19-2016 06:24 AM | |
1693 | 06-09-2016 03:23 AM | |
2769 | 05-27-2016 08:27 AM |
02-02-2017
02:37 AM
yes similar to this https://community.hortonworks.com/questions/79103/what-is-the-best-way-to-store-small-files-in-hadoo.html#comment-80387
... View more
01-29-2017
06:26 PM
@mqureshiThanks a lot for your help and for guiding 🙂 . thanks for explaining in detail
... View more
01-29-2017
03:46 PM
thanks so much for answering , i think i am closer to answer please elaborate below solutions as advised by you , i am not much familiar with hive
Put data into Hive. There are ways to put xml data into hive. At a very dirty level you have an xpath udf to work on xml data in Hive, or you can package it luxuriously by converting xml to avro and then using serde to map the filelds to column names. (let me know if you want to go over this in more detail and I can help you there) Combine bunch of files, zip it up and upload to hdfs. This option is good, if your access is very cold (once in a while) and you are going to access the files physically (like hadoop fs -get) FYI below: Please advice on how to store and read archive data from hive while storing data in hive , should i save it as .har in hdfs ? our application generatee small size xml files which are stored on NAS and XML associated metadata in DB . plan is to extract metadata from DB into 1 file and compress xml into 1 huge archive say 10GB , suppose each archive is 10GB and data is 3 months old so i wanted to know best solution for storing and accessing this archived data in hadoop --> HDFS/HIVE/Hbase Please advise what do you think will be the better approach for reading this archived data suppose i am storing this archived data in hive so how do i retrieve this archived data Please guide me for storing archived data in hive also guide for Retrieving/reading archived data from hive when needed
... View more
01-29-2017
03:42 PM
Hi thanks for your reply I appreciate your inputs Please advice on how to store and read archive data from hive while storing data in hive , should i save it as .har in hdfs ? our application generatee small size xml files which are stored on NAS and XML associated metadata in DB . plan is to extract metadata from DB into 1 file and compress xml into 1 huge archive say 10GB , suppose each archive is 10GB and data is 3 months old so i wanted to know best solution for storing and accessing this archived data in hadoop --> HDFS/HIVE/Hbase Please advise what do you think will be the better approach for reading this archived data suppose i am storing this archived data in hive so how do i retrieve this archived data Please guide me for storing archived data in hive also guide for Retrieving/reading archived data from hive when needed
... View more
01-29-2017
03:42 PM
@hduraiswamy i appreciate your inputs Please advice on how to store and read archive data from hive while storing data in hive , should i save it as .har in hdfs ? our application generatee small size xml files which are stored on NAS and XML associated metadata in DB . plan is to extract metadata from DB into 1 file and compress xml into 1 huge archive say 10GB , suppose each archive is 10GB and data is 3 months old so i wanted to know best solution for storing and accessing this archived data in hadoop --> HDFS/HIVE/Hbase Please advise what do you think will be the better approach for reading this archived data suppose i am storing this archived data in hive so how do i retrieve this archived data Please guide me for storing archived data in hive also guide for Retrieving/reading archived data from hive when needed
... View more
01-29-2017
03:40 PM
@mqureshiHi our application generate small size xml files which are stored on NAS and XML associated metadata in DB . plan is to extract metadata from DB into 1 file and compress xml into 1 huge archive say 10GB , suppose each archive is 10GB and data is 3 months old so i wanted to know best solution for storing and accessing this archived data in hadoop --> HDFS/HIVE/Hbase Please advise what do you think will be the better approach for reading this archived data suppose i am storing this archived data in hive so how do i retrieve this archived data Please guide me for storing archived data in hive also guide for Retrieving/reading archived data from hive when needed
... View more
01-22-2017
06:55 PM
Hi Team, consider a hadoop cluster with default block size of 64Mb , we have a case wherein we would like to make use of hadoop for storing historical data and retrieving it as per need historical data would be in form of archive containing many small files (millions) , so thats the reason we would like to reduce default block size in hadoop to 32MB ? I also understand that changing default size to 32MB may adversely affect if we plan to use that cluster for applications which , store files which are huge in size , so can anyone advise what to do in such situations
... View more
Labels:
- Labels:
-
Apache Hadoop
01-22-2017
06:23 PM
We have a situation where in we have lots of small xml files residing on Unix NAS and its associated metadata in Oracle DB.
we want to combine this 3 month old XML and its associated metadata in 1 archive file (10GB) and want to store in hadoop . whats the best way to implement this in hadoop ? Note after creating 1 big archive , we will have many small files (each file size may be 1Mb or less) inside my archive so i would reduce block size to 32MB for example may be
I read about hadoop archive .har files or storing data in hbase would like to know pros/cons from hadoop community experience whats the recommended practice for such situations can you please advise also reducing hdfs block size to 32 MB to cater to this requirement ? how does it look I want to read this data from hadoop whenever needed without affecting performance Thanks in advance
... View more
Labels:
- Labels:
-
Apache Hadoop
01-21-2017
06:18 PM
thanks for confirming , so what i wrote is correct that is changing dfs.blocksize . restart anyways will happen
... View more
01-19-2017
07:12 AM
can you please advise if we need to change any other parameter apart from changing dfs.blocksize in HDFS config any other suggestions are also welcome as long it helps in saving space in block and not wasting any block space also do we need any change on DataNodes side also ?
... View more
Labels:
- Labels:
-
Apache Hadoop