Support Questions

atul_net · ‎05-22-2016

Hi,

I am new to Hadoop and Big Data concept. I understand the basics of this technology and idea behind this. I was trying to understand it better using PIG tool and looking for some practical real life examples as to understand that where we can use it? How can we use it with different file types like .csv, doc, audio, video files(stream data) and relational database? All I can find on web is the examples to process .csv files which don't give any idea of practical use of hadoop and how to process unstructured data?

If anybody can explain in better way or can point to some good resources?

ravi1 · ‎05-22-2016

Hadoop is distributed filesystem and distributed compute, so you can store and process any kind of data. I know that a lot of examples point of csv and DB imports since they are the most common use cases.

I will give a list of ways of how the data that you listed can be used and processed in hadoop. You can see some blogs and public repos for examples.

1. csv Like you said you will see a lot of examples including in our sandbox tutorials.

2. doc You can put raw 'doc' documents into hdfs and use tika or tesseract to do OCR from these documents.

3. audio and video. You can put raw data again in hdfs. Processing depends on what you want to do with this data. You can extract metadata out of this data using yarn.

4. relational DB. You can take a look at sqoop examples on how you can ingest relations DB into HDFS and use hive/hcatalog to access this data.

View solution in original post

nsabharwal · ‎05-22-2016

@atul kumar

You are looking for this http://hortonworks.com/solutions/

You will use Big Data tool sets to innovate and renovate.

Innovation is "make changes in something established"

Renovation is "process of improving an outdated structure."

ravi1 · ‎05-22-2016

Hadoop is distributed filesystem and distributed compute, so you can store and process any kind of data. I know that a lot of examples point of csv and DB imports since they are the most common use cases.

I will give a list of ways of how the data that you listed can be used and processed in hadoop. You can see some blogs and public repos for examples.

1. csv Like you said you will see a lot of examples including in our sandbox tutorials.

2. doc You can put raw 'doc' documents into hdfs and use tika or tesseract to do OCR from these documents.

3. audio and video. You can put raw data again in hdfs. Processing depends on what you want to do with this data. You can extract metadata out of this data using yarn.

4. relational DB. You can take a look at sqoop examples on how you can ingest relations DB into HDFS and use hive/hcatalog to access this data.

Cloudera Community

Support Questions

Hadoop in real life and Practical use and Processing different types of Data

Define and Process Data Pipelines in Hadoop With A...

Real-time Twitter Dashboard using Cloudera Data Pl...

Processing Real-Time Social Media (Twitter) with A...

Hadoop security best practices & recommendations f...

Real Time Image Classification on Twitter Data

Geospatial Data Analysis in Hadoop

Solr Best Practices

Zeppelin Best Practices

Disaster recovery and Backup best practices in a t...

Performance Monitoring In Big Data Hadoop