There are good discussion available on moving "Unstructured Data" to Hadoop. That gives a god references.
- Sqoop in Hadoop is mostly used to extract structured data from databases like Teradata, Oracle, etc., A good overview of this tool you can find in: https://hortonworks.com/apache/sqoop/
Apache Sqoop efficiently transfers bulk data between Apache Hadoop and structured datastores such as relational databases. Sqoop helps offload certain tasks (such as ETL processing) from the EDW to Hadoop for efficient execution at a much lower cost. Sqoop can also be used to extract data from Hadoop and export it into external structured datastores. Sqoop works with relational databases such as Teradata, Netezza, Oracle, MySQL, Postgres, and HSQLDB.
- Flume in Hadoop is used to sources data which is stored in various sources like and deals mostly with unstructured data.
A good overview of this tool you can find in: https://hortonworks.com/apache/flume/
Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of streaming data into the Hadoop Distributed File System (HDFS). It has a simple and flexible architecture based on streaming data flows; and is robust and fault tolerant with tunable reliability mechanisms for failover and recovery.
Its a vast question. Apart from Structured/ Un-structured you may have to look on other parameters like frequency of ingestion, size of file, Is it a event or batch load, from where you are picking the data.
Sqoop(Structured)--> Used to import the RDBMS data into HDFS/Hive
Flume/Kafka/NiFi(Unstructured) --> Can be used to capture unstructured data into HDFS.
Choosing the client depends on the many other parameters apart from what I have mentioned above. Each tool has its own pros & cons. You may have to dig deep if it is other than learning purpose. Hope it helps!!