Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Unstructured Data processing

Highlighted

Unstructured Data processing

New Contributor

Hi,

Just wondering on the processing techniques available for typical unstructured data with the Hadoop ecosystem. For example, is there any processing framework which supports processing images, audio, video etc?

  • Is there something available in any of the existing engines?
  • If not something readily available, is there any third party commercial vendors who provides such capability?
  • Or should it be completely custom built according to the use case?

For example, if its just extracting the metadata, Tika / Lucene can be used. However, if I have to process the image file to look for some object / process CCTV footage to look for any suspicious entities, how to do with the data stored in HDFS?

Many Thanks

8 REPLIES 8

Re: Unstructured Data processing

Mentor

Re: Unstructured Data processing

New Contributor

@Artem Ervits Thanks for your response. I believe to make use of OpenCV, we need to use Hadoop Streaming API? Alternatively, JavaCV might be usable. Overall, I think image processing can be handled better than more complex types like audio and video.

Is there any similar capability for audio and video?

Thanks

Re: Unstructured Data processing

Mentor

search on HCC or Stack Overflow, that's what I would do :). There has to be as we have customers doing these types of use cases. @Greenhorn Techie

Re: Unstructured Data processing

Mentor

@Greenhorn Techie take a look at this also http://keystone-ml.org/ has option to convert speech to text

Re: Unstructured Data processing

Mentor

@Greenhorn Techie what did you end up with? We want to hear about your solution.

Re: Unstructured Data processing

Re: Unstructured Data processing

@Greenhorn Techie This is one of the good blogs on this use case ..Link

Re: Unstructured Data processing

Don't have an account?
Coming from Hortonworks? Activate your account here