Reply
Posts: 9
Topics: 1
Kudos: 20
Solutions: 0
Registered: ‎08-26-2016

Ask me anything!

I'm opening this topic some days early to let folks enter questions, and to let the community vote on them with kudos. I'll try to hit the most popular ones first, but will continue to take questions all through the AMA on Tuesday. As always, please pay attention to the Community Forum guidelines in your posts and responses.

New Contributor
Posts: 1
Registered: ‎08-26-2016

Re: Ask me anything!

Sir,

I have a doubt in Hadoop framework.
- What is the difference between FileStatus and Inode of a file?
- How can we get the Inode ID from a FileStatus object?
New Contributor
Posts: 6
Registered: ‎08-30-2016

Re: Ask me anything!

Hello Mike, founding Cloudera was a big leap forward that required vision and insight. When was that AHA MOMENT when you realized that Hadoop was going to be the future?

New Contributor
Posts: 4
Registered: ‎12-17-2015

Re: Ask me anything!

 - Will Cloudera continue to support Kitten (https://github.com/cloudera/kitten) now that its creator has left the organisation? 

 - When will the CSD format support custom statistics in Cloudera Manager?

New Contributor
Posts: 2
Registered: ‎05-01-2015

Cloudera & Apache Kafka

 

First, I would like to congratulate you with what you and the team build at Cloudera.

Even now that the company is becoming mature, there is a DNA of deep expertise & vision with regards to the products and technologies; and most importantly, you keep customers and their use cases in the center.

 

Cloudera quickly understood the importance of (interactive) SQL and invested heavily in the area. You were also early to understand and discuss the importance of Apache Spark and showed the ability to move beyond the core technologies. Finally, you show with Apache Kudu, that if needed you will invest in technology to address specific needs.

 

I am interested to understand what you think about the importance of Apache Kafka?

What do you think of the newer components like Connect and Streams?

What should Cloudera’s role be in this community? What does Cloudera want to achieve in it?

Are you happy with your current position and progress against your plans?

How will you benchmark yourself in this area (against established Apache Hadoop players or against the specialized start-ups)?

 

I might be biased here, because I truly believe in the power of Apache Kafka. I believe it can grow the big data market and enable many new use cases and at the same time drastically simplify some of the classic architectural patterns.

I am a bit concerned that Cloudera does not share (or act on) this view. Apache Kafka does not get that much ‘airtime’ in keynotes, blogs or roadmap sessions.

There is no such thing as a One Platform Initiative for Apache Kafka with clear ambitions and roadmap.

 

Looking forward to get your insights on this!

 

New Contributor
Posts: 1
Registered: ‎08-30-2016

Re: Ask me anything!

Mike,
Kudos for building an amazing company by recognizing an industry trend and its potential way before anybody else did!

My understanding is that Cloudera has built its market dominance primarily on greenfield opportunities, i.e., new applications and new domains. In order to tap into the $40+B monster of the data management market, however, greenfield won’t get you very far? (All database startups of the past 15 years are testament to the limits of greenfield – and ended up getting acquired)

What is Cloudera doing to migrate applications off of incumbent systems in order to take market share from, say, Teradata or Oracle? Are you developing tools like Amazon’s Database Migration Services or Datometry’s Hyper-Q to make re-platforming to Cloudera not only attractive but feasible for enterprises?

 

Thanks!

 

Posts: 9
Topics: 1
Kudos: 20
Solutions: 0
Registered: ‎08-26-2016

Re: Ask me anything!


Swatishsa wrote:
Sir,

I have a doubt in Hadoop framework.
- What is the difference between FileStatus and Inode of a file?
- How can we get the Inode ID from a FileStatus object?

Hey Swatishsa,

 

Thanks for being my first questioner!

 

Two answers:

 

First, an Inode is a server-side object, used to manage the block storage, allocation and recovery services in HDFS. A FileStatus is a client-side object, to work with contents of files, pathnames and so on. There's no public method for getting the associated server-side Inode from a FileStatus object.

 

Second, no one has paid me to write code since 1997. The answer above is from growing up UNIX, and searching the docs. It's a totally legitimate question, but one better posed to the developer community here, on one of the forum topics that has to do with storage.

 

Hope this helps!

Posts: 9
Topics: 1
Kudos: 20
Solutions: 0
Registered: ‎08-26-2016

Re: Ask me anything!

[ Edited ]

cragius wrote:

 - Will Cloudera continue to support Kitten (https://github.com/cloudera/kitten) now that its creator has left the organisation? 

 - When will the CSD format support custom statistics in Cloudera Manager?


We don't support Kitten, in the sense that you can call or email us with a problem and get us to open a support case. We never have. We have a collection of open source projects, generally available on a variety of git repositories, that we publish to simplify use of the platform, to give people ideas on how to use the software, and to encourage others to create and share code in the same way.

 

Kitten's a good example of that. There are lots of others that we publish by way of Cloudera Labs, http://www.cloudera.com/developers/cloudera-labs.html, but folks inside and outside the company often share code in their personal or professional git repositories as well.

 

We're not actively developing Kitten at this point, but that doesn't mean it's moribund. If you like it, we encourage you to use and extend it.

 

To your second question, on custom stats reporting: I can't share a committed release date for any particular CM feature because it screws up our revenue recognition if I tell you a date, someone in the world buys the product because of that promise and we miss it. Big-company problem, I know, but the finance team totally wigs out if I do that. I can tell you that installation of third-party server-side code via Cloudera Manager is a key differentiator for us in the market, and we recognize the value of reporting back statistics that are generated by, and specifically relevant to, that server-side code. I am a big fan of the idea.

 

It's one of many future enhancements we're tracking. If you're a Cloudera customer, you can get a detailed product roadmap briefing under non-disclosure from your sales team. They'll pull in the right technical field or product management folks to discuss futures.

Explorer
Posts: 13
Registered: ‎10-28-2013

Re: Ask me anything!

Hello Mike,

 

I would also like to congratulate you and the Cloudera team on some wonderful acheivements

 

I always look forward to seeing Amr Awadallah, and sometimes Sean Owen, at the Cloudera London Sessions.

 

I do have a simple technical question maybe you can get some of the cloudera engineers to look at:

 

My mahout arff.vector command produces NaN output for real and double values but works with integer values

ie, It works with input data like 2,1,3,1,15, ... But not input data like 2.5,1.6,5.00,2.8,1.11, ...Is there a simple solution ?

I am using Cloudera CDH5 Version 5.6.0-1.cdh5.6.0.p0.45 and Mahout Version 0.9+cdh5.6.0+26.

 

With regard to Cloudera and Mahout I am sad to see the Mahout mapreduce implimentations deprecated. But I am happy to hear you are moving on to new pastures with Apache Spark.

 

I am wondering how long will one be able to use the Mahout mapreduce algorithms with Cloudera's CDH distribution.

Posts: 9
Topics: 1
Kudos: 20
Solutions: 0
Registered: ‎08-26-2016

Re: Ask me anything!


xmorera wrote:

Hello Mike, founding Cloudera was a big leap forward that required vision and insight. When was that AHA MOMENT when you realized that Hadoop was going to be the future?


It was the week of June 9, 2008. Seriously.

 

But here's the more detailed backstory.

 

I have been doing database work since the middle 1980s, commercially and in academia. I worked on POSTGRES at Berkeley before it turned into PostgreSQL, and had jobs at a bunch of companies -- Britton Lee, Illustra, Informix, Sleepycat, Oracle -- over the course of twenty-five years or so.

 

Oracle acquired Sleepycat in 2006, and I worked at the big company for a couple of years before striking out to do something new. When I left Oracle, I wanted to try something different and looked for ideas that weren't right smack in the heart of the relational database market.

 

I had read the Google papers on the Google File System and MapReduce in the early 2000s, but like most of the industry, paid little attention at the time. We were doing enterprise-grade database work at Sleepycat and Oracle, and the consumer companies were doing something different. We didn't really recognize it as data management per se.

 

When I left Oracle, though, I started looking around for interesting stuff to work on. Some investors from Accel introduced me to Amr Awadallah at Yahoo! and Jeff Hammerbacher at Facebook. I knew Christophe Bisciglia at Google separately, from the conference circuit. I had a chance to spend time with Jeff in particular, learning about what FB was doing with big data. I want to tell you that the light came on in one searing incident of illumination, but it actually took me a week or so to get my head wrapped around it.

 

I hadn't thought of GFS+MapReduce as a database system, but that's how Jeff and Amr were using it. When I recognized the kinds of problems they were solving -- advanced analytics at massive scale -- the heavens parted and the chorus sang. I was instantly convinced this would be a big deal for traditional enterprises.

 

All three of my co-founders had independently reached the same conclusion. We were lucky to find one another and team up; we were the only team working on Apache Hadoop-based big data for enterprises for a year or two, and that solo position in the market was a big advantage to us.

Announcements
Threads Needing Assistance
No posts to display.