Member since
06-29-2016
81
Posts
43
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
701 | 03-16-2016 08:26 PM |
12-21-2016
11:26 AM
2 Kudos
I see 3 different options to deploy HDP on Hadoop HDInsights (Built on top of HDP) HDP as a Service Deploying HDP on Azure's bare metal In my understanding 1 and 2 are managed services where the control is limited when it comes to the choice of OS etc. HDInsight has multiple cluster types (not sure whats the rationale behind this though) Questions: Whats the rationale behind having multiple cluster types for HDInsight? Why are two services (1 and 2 above) offered? When to use what? (apart from this) Are there any performance benchmarks done on HDInsight or HDP on Azure in a production situation? What are the different storage types possible on the above services? Atleast on HDInsight i see that Blob storage and Data Lake Store are options but both are external to the compute nodes. May hit performance, hence curious about question 3 apart from the fact that the cluster run on the virtual machines. What are the option to provision HDP on Azure bare metal nodes (Option 3)? Does CloudBreak help there?
... View more
Labels:
- Labels:
-
Hortonworks Data Platform (HDP)
08-25-2016
08:50 AM
@Tom McCuch Thanks a lot for the views and inputs. It definitely helps.
... View more
08-22-2016
06:14 AM
@Tom McCuch Thanks again. Do you recommend, data to be sorted for ORC optimization to work? Or it does not really matter? And any benchmark volume with performance testing done for adhoc queries with the optimization mentioned above?
... View more
08-18-2016
09:45 AM
@Tom McCuch Thanks for the detailed response. In terms of querying capabilities (from a BI tool or a CLI or Hue), to achieve faster query response as its required in the operational reporting, one way is to structure the data (by means of partition etc) for pre-defined queries but for adhoc operational reporting queries, whats your take on ODS in hadoop to achieve the desired performance? One way is restrict the volume of data (in addition to ORC format, Tez etc) in the ODS layer as its for operational needs anyways (so history may not be required). Please share your thoughts.
... View more
08-16-2016
09:48 AM
2 Kudos
I am asked to build an ODS (Operational data store) in hadoop for an insurance client. In this regard, few questions
First of all, is it recommended to build the ODS in hadoop? What are the pros and cons of buildingODS in hadoop? Any best practices around this topic? The ODS should facilitate the operational reporting needs that should support adhoc queries.
... View more
Labels:
- Labels:
-
Apache Hadoop
06-29-2016
01:00 PM
@Benjamin Leonhardi Thanks, makes sense
... View more
06-28-2016
10:13 PM
Data comes from multiple sources and these are exposed in the hive table for the users. A specific column is sensitive and needs to be given restricted access. If a user who wants to join 2 such tables on the column that he does not have access to, then whats the best approach to make it work? One option is to link the sensitive column with a generated key so that the user can join on the generated key. Is this a good idea or any better idea?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
05-13-2016
01:53 PM
What does it mean for a hive table with ORC or Avro format to have the Field delimiter specified? Does hive ignore even if its specified? For example, CREATE TABLE if not exists T (
C1 STRING ,
C2 STRING )
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
STORED AS ORC tblproperties ("orc.compress"="SNAPPY")
... View more
Labels:
- Labels:
-
Apache Hive
04-11-2016
02:50 PM
Can a non-numeric column be specified for a --split-by key parameter? What are the potential issues in doing so?
... View more
Labels:
- Labels:
-
Apache Sqoop
03-30-2016
01:24 PM
@Benjamin Leonhardi I think i got it. Its still the same number of files but with more reducers. In my mind, it was always just the buckets not the partitions. So i thought its 30 files (30 buckets and 40 partitions), but in fact its still 1200 files in both the case but in optimized its more number of reducers.
... View more