About ripu

ripu · ‎02-02-2017

yes similar to this https://community.hortonworks.com/questions/79103/what-is-the-best-way-to-store-small-files-in-hadoo.html#comment-80387

ripu · ‎01-29-2017

@mqureshiThanks a lot for your help and for guiding 🙂 . thanks for explaining in detail

ripu · ‎01-29-2017

thanks so much for answering , i think i am closer to answer please elaborate below solutions as advised by you , i am not much familiar with hive Put data into Hive. There are ways to put xml data into hive. At a very dirty level you have an xpath udf to work on xml data in Hive, or you can package it luxuriously by converting xml to avro and then using serde to map the filelds to column names. (let me know if you want to go over this in more detail and I can help you there) Combine bunch of files, zip it up and upload to hdfs. This option is good, if your access is very cold (once in a while) and you are going to access the files physically (like hadoop fs -get) FYI below: Please advice on how to store and read archive data from hive while storing data in hive , should i save it as .har in hdfs ? our application generatee small size xml files which are stored on NAS and XML associated metadata in DB . plan is to extract metadata from DB into 1 file and compress xml into 1 huge archive say 10GB , suppose each archive is 10GB and data is 3 months old so i wanted to know best solution for storing and accessing this archived data in hadoop --> HDFS/HIVE/Hbase Please advise what do you think will be the better approach for reading this archived data suppose i am storing this archived data in hive so how do i retrieve this archived data Please guide me for storing archived data in hive also guide for Retrieving/reading archived data from hive when needed

ripu · ‎01-29-2017

Hi thanks for your reply I appreciate your inputs Please advice on how to store and read archive data from hive while storing data in hive , should i save it as .har in hdfs ? our application generatee small size xml files which are stored on NAS and XML associated metadata in DB . plan is to extract metadata from DB into 1 file and compress xml into 1 huge archive say 10GB , suppose each archive is 10GB and data is 3 months old so i wanted to know best solution for storing and accessing this archived data in hadoop --> HDFS/HIVE/Hbase Please advise what do you think will be the better approach for reading this archived data suppose i am storing this archived data in hive so how do i retrieve this archived data Please guide me for storing archived data in hive also guide for Retrieving/reading archived data from hive when needed

ripu · ‎01-29-2017

@hduraiswamy i appreciate your inputs Please advice on how to store and read archive data from hive while storing data in hive , should i save it as .har in hdfs ? our application generatee small size xml files which are stored on NAS and XML associated metadata in DB . plan is to extract metadata from DB into 1 file and compress xml into 1 huge archive say 10GB , suppose each archive is 10GB and data is 3 months old so i wanted to know best solution for storing and accessing this archived data in hadoop --> HDFS/HIVE/Hbase Please advise what do you think will be the better approach for reading this archived data suppose i am storing this archived data in hive so how do i retrieve this archived data Please guide me for storing archived data in hive also guide for Retrieving/reading archived data from hive when needed

ripu · ‎01-29-2017

@mqureshiHi our application generate small size xml files which are stored on NAS and XML associated metadata in DB . plan is to extract metadata from DB into 1 file and compress xml into 1 huge archive say 10GB , suppose each archive is 10GB and data is 3 months old so i wanted to know best solution for storing and accessing this archived data in hadoop --> HDFS/HIVE/Hbase Please advise what do you think will be the better approach for reading this archived data suppose i am storing this archived data in hive so how do i retrieve this archived data Please guide me for storing archived data in hive also guide for Retrieving/reading archived data from hive when needed

ripu · ‎01-22-2017

Hi Team, consider a hadoop cluster with default block size of 64Mb , we have a case wherein we would like to make use of hadoop for storing historical data and retrieving it as per need historical data would be in form of archive containing many small files (millions) , so thats the reason we would like to reduce default block size in hadoop to 32MB ? I also understand that changing default size to 32MB may adversely affect if we plan to use that cluster for applications which , store files which are huge in size , so can anyone advise what to do in such situations

ripu · ‎01-22-2017

We have a situation where in we have lots of small xml files residing on Unix NAS and its associated metadata in Oracle DB. we want to combine this 3 month old XML and its associated metadata in 1 archive file (10GB) and want to store in hadoop . whats the best way to implement this in hadoop ? Note after creating 1 big archive , we will have many small files (each file size may be 1Mb or less) inside my archive so i would reduce block size to 32MB for example may be I read about hadoop archive .har files or storing data in hbase would like to know pros/cons from hadoop community experience whats the recommended practice for such situations can you please advise also reducing hdfs block size to 32 MB to cater to this requirement ? how does it look I want to read this data from hadoop whenever needed without affecting performance Thanks in advance

ripu · ‎01-21-2017

thanks for confirming , so what i wrote is correct that is changing dfs.blocksize . restart anyways will happen

ripu · ‎01-19-2017

can you please advise if we need to change any other parameter apart from changing dfs.blocksize in HDFS config any other suggestions are also welcome as long it helps in saving space in block and not wasting any block space also do we need any change on DataNodes side also ?

Online	Offline
Last Visited	‎09-27-2016 02:35 AM

Member Since	‎06-17-2015 08:02 AM
Last Visited	‎09-27-2016 02:35 AM
Posts	61
Kudos received	20

Cloudera Community

Re: want to decrease block size in HDP ambari , wh...

Re: can we customize HDP installation in other dir...

Re: issue error :cloudera unsupported major minor ...

Re: HDP 2.3.4 failed, parent directory /usr/hdp/cu...

Re: cases where changing hadoop block size is not ...

Re: cases where changing hadoop block size is not ...

Re: what is the best way to store small files in h...

Re: cases where changing hadoop block size is not ...

Re: cases where changing hadoop block size is not ...

Re: cases where changing hadoop block size is not ...

cases where changing hadoop block size is not reco...

what is the best way to store small files in hadoo...

Re: want to decrease block size in HDP ambari , wh...

want to decrease block size in HDP ambari , what a...