Member since
01-28-2016
2
Posts
0
Kudos Received
0
Solutions
10-19-2017
01:12 PM
The way I'm reducing the file (By Customer Id) will not be how it will be accessed in Hive. It will be by YearMon and CustomerId... So maybe if I use both of those as my key then it will work as I intend? Year/Month being the higher level and Customer Id being a subset. Example output: /user/awesome/productorders/yearmon=201510/customerid=123456/data... Then I'll set up an external Hive table to reference that HDFS file right? I wanted to use Spark actually but we are on 1.6.4 of Spark and I needed to finish before our scheduled upgrade. And as of this writing, I'm a much better MapReduce programmer than Spark (but looking to change that).
... View more
10-18-2017
07:04 PM
I have a MapReduce job that is calculating metrics around a customer through time (for each month that they are a customer with our company). Once completed the access pattern will likely be by year and month, so I was thinking that I should create a partition based on that. I'm new to the custom partitioning stuff and what I'm reading is the partition step comes before the reduce step (which I need the key to be customer id). I'm wondering if there is a way to create the partition after the reduce step without creating a new MapReduce job afterward. Thanks,
... View more
Labels:
- Labels:
-
Apache Hadoop