About TimothySpann

TimothySpann · ‎06-09-2016

Where to store a 672 dimension, million record dataset for online applications? How would you store it, lay it out. Most queries are working with smaller subsets of the dimensions, say 20-30 at a time. HBase or HBase + Phoenix has been thought of? Or would Hive + Tez + ORC work well. Should it be cached like Apache Ignite? Apache Geode? Redis? Any suggestions? Looking for best practices for a greenfield application.

TimothySpann · ‎06-09-2016

are there functions out there that utilize something like the accumulator interface in Pig where the data doesn't have to stay in memory?

TimothySpann · ‎06-08-2016

@Ancil McBarnett is there an upgrade? HDP 2.4 was just mentioned as supported by EMC.

TimothySpann · ‎06-07-2016

Found it http://hortonworks.com/hadoop-tutorial/how-to-use-hcatalog-basic-pig-hive-commands/#explore-pig-latin-data-transformation 5.4 Save the script and execute it First you need to add the -useHCatalog (Case Sensitive) argument using the box box in the bottom right hand corner At the top of the screen, make sure the box “Execute on Tez” is checked. Then click Execute to run the script. This action creates one or more Tez jobs. @Revlin Abbi Make sure you add the useHCatalog argument

TimothySpann · ‎06-07-2016

same here with the latest sandbox, won't work from Ambari

TimothySpann · ‎06-07-2016

You need to engage Hortonworks Professional Services and your current Pivotal support team. If you are in the NorthEast drop me an email and I can help you engage those teams. You can also engage WANDisco who have some pretty cool migration software: http://hortonworks.com/blog/migration-to-hdp-as-easy-as-1-2-3-without-downtime-or-disruption/

TimothySpann · ‎06-07-2016

any example on github?

TimothySpann · ‎06-06-2016

If I am only concerned with performance of HiveQL queries and not about storage space. Is it better to create my tables with no compression? http://stackoverflow.com/questions/32373460/parquet-vs-orc-vs-orc-with-snappy No if my datasize is in a few hundred gigs? terabytes? a petabyte? Same answer? Let's assume sane queries returning a few thousand rows, a dozen columns or so, a few items in a where clause and an order by. Interactive queries for Tableau.

TimothySpann · ‎06-06-2016

I didn't want to use Hive Streaming at this point. I was really focusing on Spark and NiFi. Just curious about Pig, Sqoop and other tools in the HDP stack.

TimothySpann · ‎06-06-2016

I would have liked to use Apache NiFi, but that is not yet available in the current version (coming soon). I can do it from Sqoop, Pig, Spark, ... Any other options? For Relational Database, in bulk; Sqoop seems like a solid option. For real-time, Spark Streaming? For batch, Pig? I am looking for performance, but also ease of use and minimal amount of coding.

Online	Offline
Last Visited	‎05-20-2024 05:42 PM

Member Since	‎01-07-2019 11:58 AM
Last Visited	‎05-20-2024 05:42 PM
Posts	1,973
Kudos received	1122

Cloudera Community

Re: Has anyone tried NiFi consuming (JMSConsume) f...

Re: NiFi Crash after runing chain of lookups

Re: Recommend approach for listening to RSS Feed i...

Re: NiFi ListenFTP Processor Default Data Port

Re: Nifi: Kafka Producer with Avro format in both ...

Optimal Location to Store for Interactive Retrieva...

Pig Accumulator in Spark

Re: EMC Isilon HDP 2.3 and Ambari 2.1 Installation...

Re: Error with using 'Store' (Pig)

Re: How can I enable pig -useHCatalog via Ambari. ...

Re: Upgrading from Pivotal HD to Pivotal/Hortonwor...

Re: What is the Best Practice for Loading Files in...

ORC with Zlib vs ORC with No Compression

Re: What is the Best Practice for Loading Files in...

What is the Best Practice for Loading Files into O...