Member since
08-10-2016
170
Posts
14
Kudos Received
6
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
19977 | 01-31-2018 04:55 PM | |
4287 | 11-29-2017 03:28 PM | |
1902 | 09-27-2017 02:43 PM | |
2056 | 09-12-2016 06:36 PM | |
1982 | 09-02-2016 01:58 PM |
08-08-2017
03:57 PM
There are a couple of way that you could solve this problem. 1) Explode and duplicate the data. Hive is big data, it's NoSQL. You don't have to solve this problem in a SQL way. You could explode the data and see if you get a performance increase. (Don't forget to choose good partitions). This may feel wrong but really when your using big data the rules change, you don't have to solve this with SQL. 2) lateral view explode - This may give you some of the table structure you need. 3) Look at using JSON with Hive to help solve your problems. This would enable you to have the nested hierarchical data you are talking about. 4) Combination of some of the above. Hope this helps.
... View more
07-25-2017
02:21 PM
Did you end up resolving this?
... View more
07-24-2017
09:39 PM
1 Kudo
hmmm are you using bucketing by chance?
If you are could be similiar issue to this bug. If you aren't then we're barking up the wrong tree.
... View more
07-14-2017
02:18 PM
Your log4j settings are incorrect and that's what's throwing the error in your log. Check this property in your log4j settings specifically the one that looks like (below)... Sounds like it's trying to right to the container folder and this is causing an issue. (well atleast it's causing the error in your log) log4j.appender.[category].File Hope this helps!
... View more
06-24-2017
05:33 PM
When I reload this table by creating another table by doing select on target table, this newly created table does not contain too many small files. What could be the reason?
There are lots of factors that go into how many files are output. To use map-reduce terms how many reducers that are used. It's possible depending on a lot of factors, how many reducers were allowed for each type of workload. Even with this did you use the same engine to create both. (Did you use spark & hive, or just hive in both cases?) Did you run both job under the same user? with the same config?... so yeah lots of things could effect it.
... View more
06-24-2017
05:27 PM
My links aren't showing up but you can find discussion on this site about bucketing.
... View more
06-24-2017
05:17 PM
Maybe consider using clustering/bucketing? It will allows you to specify a fixed number of files per partion. Answer on: How many buckets does your table need.
... View more
12-09-2016
08:41 PM
Oozie is a little old school. have you thought about using HDF or apache Falcon? They both are a little more feature rich. What are you trying to do?
... View more
11-14-2016
07:38 PM
For deleting a column or performing transformations on Null, I would use the UpdateAttribute processor. If you want to delete a row I'd use the RouteOnAttribute. (I would then route this to HDFS to log that I 'deleted' it.) I hope this helps.
... View more
10-05-2016
07:38 PM
@Gobi Subramani I would suggest that you download and install HDP. It can handle creating the data flow for you. Here's an example of it collecting logs. Instead of writing to an Event bus you could use putHDFS connector and it would write it to hdfs for you. There isn't a lot of trickery to get the date/folder to work, you just need to ${now()} in place of the folder name to get the schema you are looking for. If you look around there are lots of walk throughs and templates. I have included a pic of a simple flow that would likely solve your issue.
... View more