1973
Posts
1225
Kudos Received
124
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
827 | 04-03-2024 06:39 AM | |
1604 | 01-12-2024 08:19 AM | |
796 | 12-07-2023 01:49 PM | |
1367 | 08-02-2023 07:30 AM | |
1992 | 03-29-2023 01:22 PM |
06-09-2016
08:37 PM
Where
to store a 672 dimension, million record dataset for online applications? How would you store it, lay it out. Most queries are working with smaller subsets of the dimensions, say 20-30 at a time. HBase or HBase + Phoenix has been thought of? Or would Hive + Tez + ORC work well. Should it be cached like Apache Ignite? Apache Geode? Redis? Any suggestions? Looking for best practices for a greenfield application.
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Hive
-
Apache Phoenix
06-09-2016
03:26 PM
are there functions out there that utilize something like the accumulator interface in Pig where the data doesn't have to stay in memory?
... View more
Labels:
- Labels:
-
Apache Pig
-
Apache Spark
06-08-2016
09:15 PM
@Ancil McBarnett is there an upgrade? HDP 2.4 was just mentioned as supported by EMC.
... View more
06-07-2016
11:31 PM
Found it http://hortonworks.com/hadoop-tutorial/how-to-use-hcatalog-basic-pig-hive-commands/#explore-pig-latin-data-transformation 5.4 Save the script and execute it First you need to add the -useHCatalog (Case Sensitive) argument using the box box in the bottom right hand corner At the top of the screen, make sure the box “Execute on Tez” is checked. Then click Execute to run the script. This action creates one or more Tez jobs. @Revlin Abbi Make sure you add the useHCatalog argument
... View more
06-07-2016
11:29 PM
same here with the latest sandbox, won't work from Ambari
... View more
06-07-2016
06:34 PM
You need to engage Hortonworks Professional Services and your current Pivotal support team. If you are in the NorthEast drop me an email and I can help you engage those teams. You can also engage WANDisco who have some pretty cool migration software: http://hortonworks.com/blog/migration-to-hdp-as-easy-as-1-2-3-without-downtime-or-disruption/
... View more
06-07-2016
02:11 AM
any example on github?
... View more
06-06-2016
08:48 PM
1 Kudo
If I am only concerned with performance of HiveQL queries and not about storage space. Is it better to create my tables with no compression? http://stackoverflow.com/questions/32373460/parquet-vs-orc-vs-orc-with-snappy No if my datasize is in a few hundred gigs? terabytes? a petabyte? Same answer? Let's assume sane queries returning a few thousand rows, a dozen columns or so, a few items in a where clause and an order by. Interactive queries for Tableau.
... View more
Labels:
- Labels:
-
Apache Hive
06-06-2016
07:49 PM
I didn't want to use Hive Streaming at this point. I was really focusing on Spark and NiFi. Just curious about Pig, Sqoop and other tools in the HDP stack.
... View more
06-06-2016
06:46 PM
1 Kudo
I would have liked to use Apache NiFi, but that is not yet available in the current version (coming soon). I can do it from Sqoop, Pig, Spark, ... Any other options? For Relational Database, in bulk; Sqoop seems like a solid option. For real-time, Spark Streaming? For batch, Pig? I am looking for performance, but also ease of use and minimal amount of coding.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Pig
-
Apache Spark