About ripu

ripu · ‎02-02-2017

yes similar to this https://community.hortonworks.com/questions/79103/what-is-the-best-way-to-store-small-files-in-hadoo.html#comment-80387

hduraiswamy · ‎02-01-2017

Hive is very similar to a database design - so as a first step you can create a hive table using syntax like (in its simplest form) create table table_name ( id int, date string, name string ) partitioned by (date string) There are many variants that you can add to this table creation such as where it is stored, how it is delimited, etc.. but in my opinion keep it simple first and then you can expand your mastery. This link (the one that I always refer to) will talk in detail on the syntax (for DDL operations), different options etc - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL Once you got this taken care of.. you can then start inserting data into Hive. Different options available for this is explained here at the DML documentation - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML So these 2 links will be good to start for getting closer to hive in general. Then sepecifically for your question on loading xml data - you can either load the whole xml file data as a single column and then read it using xpath udf at the read time, or break each xml tags as a seperate column at the write time. I will go through both of those options here in little details: Writing xml data as a single column: you can simply create a table like CREATE TABLE xmlfiles (id int, xmlfile string) and then put the entire xml data into the string column. Then at the time of reading, you can use the XPATH udf (user defined function that come along with Hive) to read the data. Details here - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+XPathUDF This approach is easy to write data, but may have some performance overhead at the time of reading data (as well as limitations on doing some aggregates on the result set) Writing xml data as a columnar value into Hive: This approach is little more drawn out at the time of writing data. but easier and more flexible for read operation. Here first you convert your xml data into either an Avro or Json and then using one of the serde (Serialize / deserialize) to write data to Hive. This will give you some context - https://community.hortonworks.com/repos/30883/hive-json-serde.html Hope this makes sense. If you find this answer helpful, please 'Accept' my initial answer above

ripu · ‎01-21-2017

thanks for confirming , so what i wrote is correct that is changing dfs.blocksize . restart anyways will happen

RobertOro · ‎10-17-2016

Hi My issue was solved by updating the SUSE 11SP4. Installed the updates as the os was in initial state.Erro rwas gone after that.

rbg412 · ‎08-23-2016

@Scott Shaw Thanks a lot

aervits · ‎08-20-2016

refer to manual installation doc for hdp-select to fix your symlink issues https://docs.hortonworks.com/HDPDocuments/Ambari-2.1.2.0/bk_upgrading_Ambari/content/_Run_HDP_Select_mamiu.html when you have a specific error open a question, generally you shouldn't get these errors.

ripu · ‎08-19-2016

@Ted Yu @emaxwell @Josh Elser thanks all for your confirmation , that's why i asked if rpm is relocatable 🙂 so the bottom line is Hortonworks installation directories cannot be changed , all binary and config files of HDP go in /usr and /etc .. since its hardcoded in RPM and RPM is not relocatable i will close this thread But I believe it should support relocatability from corporate IT policy POV , wherein we many times we have issue putting files in /usr and /etc also i suggest at the time of RPM creation hortonworks should make RPM to be relocatable in order to allow installing binary and config files in other directories instead of /usr and /etc . i understand there are other software's which HDP consists of, but ultimately Hortwonworks can customize this bundle to support user specific needs I should open this as an idea , WDYT ?

hirschs · ‎09-15-2016

@ripunjay godhani I want to be sure I understand your post. Are you saying that modifying a single Ambari property will relocate logs for all components on a restart? If so, can share what the name of that property is? The page you linked to does not have a single mention of log location. In a perfect world, I would have left plenty of room under /var for logging, but we have a heavily used cluster with a lot of data and constant crashes from full /var on many of the machines. I need to move everything to a new location.

ArpitAgarwal · ‎08-04-2016

Hi @ripunjay godhani, we no longer recommend setting up NameNode HA with NFS. Instead please use the Quorum Journal Manager setup. The Apache HA with QJM documentation is a good start: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html NameNode image files will be stored on two nodes (active and standby NN) in this setup. The latest edit logs will be on the active NameNode and at least two journal nodes (usually all three, unless one Journal Node has an extended downtime). The NameNodes can optionally be configured to write their edit logs to separate NFS shares if you really want but it is not necessary. You don't need RAID 10. HDFS HA with QJM provides good durability and availability with commodity hardware.

LesterMartin · ‎08-08-2016

I think there is an HCC article on this very topic, but https://martin.atlassian.net/wiki/x/EoC3Ag is a blog post I wrote back in mid-2015 on this subject as well in case it helps any. Good luck!

Online	Offline
Last Visited	‎09-27-2016 02:35 AM

Member Since	‎06-17-2015 08:02 AM
Last Visited	‎09-27-2016 02:35 AM
Posts	61
Kudos received	20

Cloudera Community

Re: want to decrease block size in HDP ambari , wh...

Re: can we customize HDP installation in other dir...

Re: issue error :cloudera unsupported major minor ...

Re: HDP 2.3.4 failed, parent directory /usr/hdp/cu...

Re: cases where changing hadoop block size is not ...

Re: what is the best way to store small files in h...

Re: want to decrease block size in HDP ambari , wh...

Re: Clouderaa Manager Install Error Cannot have em...

Re: When do we need to consider moving to hadoop ...

Re: HDP 2.3.4 and ambari-2.2.2.0 installation has...

Re: can we customize HDP installation in other dir...

Re: can i change log location in HDP installation

Re: Planning hardware for NameNode/Active/Secondar...

Re: can i use the same disk with diff mount points...