Support Questions
Find answers, ask questions, and share your expertise

Unstructured Data processing

Unstructured Data processing

Hi,

Just wondering on the processing techniques available for typical unstructured data with the Hadoop ecosystem. For example, is there any processing framework which supports processing images, audio, video etc?

  • Is there something available in any of the existing engines?
  • If not something readily available, is there any third party commercial vendors who provides such capability?
  • Or should it be completely custom built according to the use case?

For example, if its just extracting the metadata, Tika / Lucene can be used. However, if I have to process the image file to look for some object / process CCTV footage to look for any suspicious entities, how to do with the data stored in HDFS?

Many Thanks

8 REPLIES 8
Highlighted

Re: Unstructured Data processing

Mentor
Highlighted

Re: Unstructured Data processing

@Artem Ervits Thanks for your response. I believe to make use of OpenCV, we need to use Hadoop Streaming API? Alternatively, JavaCV might be usable. Overall, I think image processing can be handled better than more complex types like audio and video.

Is there any similar capability for audio and video?

Thanks

Highlighted

Re: Unstructured Data processing

Mentor

search on HCC or Stack Overflow, that's what I would do :). There has to be as we have customers doing these types of use cases. @Greenhorn Techie

Highlighted

Re: Unstructured Data processing

Mentor

@Greenhorn Techie take a look at this also http://keystone-ml.org/ has option to convert speech to text

Highlighted

Re: Unstructured Data processing

Mentor

@Greenhorn Techie what did you end up with? We want to hear about your solution.

Highlighted

Re: Unstructured Data processing

Highlighted

Re: Unstructured Data processing

@Greenhorn Techie This is one of the good blogs on this use case ..Link

Highlighted

Re: Unstructured Data processing