Member since
06-17-2015
61
Posts
20
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1953 | 01-21-2017 06:18 PM | |
2368 | 08-19-2016 06:24 AM | |
1712 | 06-09-2016 03:23 AM | |
2852 | 05-27-2016 08:27 AM |
02-02-2017
02:37 AM
yes similar to this https://community.hortonworks.com/questions/79103/what-is-the-best-way-to-store-small-files-in-hadoo.html#comment-80387
... View more
02-01-2017
04:12 PM
Hive is very similar to a database design - so as a first step you can create a hive table using syntax like (in its simplest form) create table table_name (
id int,
date string,
name string
)
partitioned by (date string)
There are many variants that you can add to this table creation such as where it is stored, how it is delimited, etc.. but in my opinion keep it simple first and then you can expand your mastery. This link (the one that I always refer to) will talk in detail on the syntax (for DDL operations), different options etc - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL Once you got this taken care of.. you can then start inserting data into Hive. Different options available for this is explained here at the DML documentation - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML So these 2 links will be good to start for getting closer to hive in general. Then sepecifically for your question on loading xml data - you can either load the whole xml file data as a single column and then read it using xpath udf at the read time, or break each xml tags as a seperate column at the write time. I will go through both of those options here in little details: Writing xml data as a single column: you can simply create a table like CREATE TABLE xmlfiles (id int, xmlfile string)
and then put the entire xml data into the string column. Then at the time of reading, you can use the XPATH udf (user defined function that come along with Hive) to read the data. Details here - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+XPathUDF This approach is easy to write data, but may have some performance overhead at the time of reading data (as well as limitations on doing some aggregates on the result set) Writing xml data as a columnar value into Hive: This approach is little more drawn out at the time of writing data. but easier and more flexible for read operation. Here first you convert your xml data into either an Avro or Json and then using one of the serde (Serialize / deserialize) to write data to Hive. This will give you some context - https://community.hortonworks.com/repos/30883/hive-json-serde.html Hope this makes sense. If you find this answer helpful, please 'Accept' my initial answer above
... View more
01-21-2017
06:18 PM
thanks for confirming , so what i wrote is correct that is changing dfs.blocksize . restart anyways will happen
... View more
10-17-2016
01:40 AM
Hi My issue was solved by updating the SUSE 11SP4. Installed the updates as the os was in initial state.Erro rwas gone after that.
... View more
08-23-2016
03:02 AM
@Scott Shaw Thanks a lot
... View more
08-20-2016
02:20 PM
refer to manual installation doc for hdp-select to fix your symlink issues https://docs.hortonworks.com/HDPDocuments/Ambari-2.1.2.0/bk_upgrading_Ambari/content/_Run_HDP_Select_mamiu.html when you have a specific error open a question, generally you shouldn't get these errors.
... View more
08-19-2016
06:24 AM
2 Kudos
@Ted Yu @emaxwell @Josh Elser thanks all for your confirmation , that's why i asked if rpm is relocatable 🙂 so the bottom line is Hortonworks installation directories cannot be changed , all binary and config files of HDP go in /usr and /etc .. since its hardcoded in RPM and RPM is not relocatable i will close this thread But I believe it should support relocatability from corporate IT policy POV , wherein we many times we have issue putting files in /usr and /etc also i suggest at the time of RPM creation hortonworks should make RPM to be relocatable in order to allow installing binary and config files in other directories instead of /usr and /etc . i understand there are other software's which HDP consists of, but ultimately Hortwonworks can customize this bundle to support user specific needs I should open this as an idea , WDYT ?
... View more
09-15-2016
06:36 PM
@ripunjay godhani I want to be sure I understand your post. Are you saying that modifying a single Ambari property will relocate logs for all components on a restart? If so, can share what the name of that property is? The page you linked to does not have a single mention of log location. In a perfect world, I would have left plenty of room under /var for logging, but we have a heavily used cluster with a lot of data and constant crashes from full /var on many of the machines. I need to move everything to a new location.
... View more
08-04-2016
09:47 PM
2 Kudos
Hi @ripunjay godhani, we no longer recommend setting up NameNode HA with NFS. Instead please use the Quorum Journal Manager setup. The Apache HA with QJM documentation is a good start: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html NameNode image files will be stored on two nodes (active and standby NN) in this setup. The latest edit logs will be on the active NameNode and at least two journal nodes (usually all three, unless one Journal Node has an extended downtime). The NameNodes can optionally be configured to write their edit logs to separate NFS shares if you really want but it is not necessary. You don't need RAID 10. HDFS HA with QJM provides good durability and availability with commodity hardware.
... View more
08-08-2016
02:07 AM
I think there is an HCC article on this very topic, but https://martin.atlassian.net/wiki/x/EoC3Ag is a blog post I wrote back in mid-2015 on this subject as well in case it helps any. Good luck!
... View more