About matt_andruff

matt_andruff · ‎08-08-2017

There are a couple of way that you could solve this problem. 1) Explode and duplicate the data. Hive is big data, it's NoSQL. You don't have to solve this problem in a SQL way. You could explode the data and see if you get a performance increase. (Don't forget to choose good partitions). This may feel wrong but really when your using big data the rules change, you don't have to solve this with SQL. 2) lateral view explode - This may give you some of the table structure you need. 3) Look at using JSON with Hive to help solve your problems. This would enable you to have the nested hierarchical data you are talking about. 4) Combination of some of the above. Hope this helps.

matt_andruff · ‎07-25-2017

Did you end up resolving this?

matt_andruff · ‎07-24-2017

hmmm are you using bucketing by chance? If you are could be similiar issue to this bug. If you aren't then we're barking up the wrong tree.

matt_andruff · ‎07-14-2017

Your log4j settings are incorrect and that's what's throwing the error in your log. Check this property in your log4j settings specifically the one that looks like (below)... Sounds like it's trying to right to the container folder and this is causing an issue. (well atleast it's causing the error in your log) log4j.appender.[category].File Hope this helps!

matt_andruff · ‎06-24-2017

When I reload this table by creating another table by doing select on target table, this newly created table does not contain too many small files. What could be the reason? There are lots of factors that go into how many files are output. To use map-reduce terms how many reducers that are used. It's possible depending on a lot of factors, how many reducers were allowed for each type of workload. Even with this did you use the same engine to create both. (Did you use spark & hive, or just hive in both cases?) Did you run both job under the same user? with the same config?... so yeah lots of things could effect it.

matt_andruff · ‎06-24-2017

My links aren't showing up but you can find discussion on this site about bucketing.

matt_andruff · ‎06-24-2017

Maybe consider using clustering/bucketing? It will allows you to specify a fixed number of files per partion. Answer on: How many buckets does your table need.

matt_andruff · ‎12-09-2016

Oozie is a little old school. have you thought about using HDF or apache Falcon? They both are a little more feature rich. What are you trying to do?

matt_andruff · ‎11-14-2016

For deleting a column or performing transformations on Null, I would use the UpdateAttribute processor. If you want to delete a row I'd use the RouteOnAttribute. (I would then route this to HDFS to log that I 'deleted' it.) I hope this helps.

matt_andruff · ‎10-05-2016

@Gobi Subramani I would suggest that you download and install HDP. It can handle creating the data flow for you. Here's an example of it collecting logs. Instead of writing to an Event bus you could use putHDFS connector and it would write it to hdfs for you. There isn't a lot of trickery to get the date/folder to work, you just need to ${now()} in place of the folder name to get the schema you are looking for. If you look around there are lots of walk throughs and templates. I have included a pic of a simple flow that would likely solve your issue.

Online	Offline
Last Visited	‎08-12-2019 05:02 PM

Member Since	‎08-10-2016 12:09 PM
Last Visited	‎08-12-2019 05:02 PM
Posts	170
Kudos received	14

Cloudera Community

Re: Kerberos: Failure to initialize security conte...

Re: Can I add a subcolumn to a hive struct column ...

Re: Cloudbreak verbose logging - RuntimeException:...

Re: Solr query question

Re: How to get EPOCH time in PIG?

Re: Recursive query or better way to build hierarc...

Re: MapReduce job: java.io.FileNotFoundException: ...

Re: Left outer join doesn't work in Hive

Re: MapReduce job: java.io.FileNotFoundException: ...

Re: Controlling Number of small files while insert...

Re: Controlling Number of small files while insert...

Re: Controlling Number of small files while insert...

Re: checklist to get started with Oozie on Hadoop?

Re: How to delete a row/drop a column?

Re: Need help on constructing oozie wf