Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

I have a zip file with 10k Mera files and 10k data files. What is the best way to ingest this in hive? Meta and data tables are separate

I have a zip file with 10k Mera files and 10k data files. What is the best way to ingest this in hive? Meta and data tables are separate

4 REPLIES 4

Super Guru

what is a mera file?

@sunile.manjee apologies for typo its meta files.

@Nilesh Shrimant

You should experiment with several methods of Data -> HDFS -> Hive.

In its simplest form, if your data is concise, you can always upload to HDFS and create external Hive table feeding your HIVE CREATE EXTERNAL TABLE Statement with the necessary configurations to understand your data.

If your data needs processing and preparation I recommend Nifi. I use NiFi to do this (more than 50 million records) in several different manners. You will need to inspect all of the NiFi Hive Processors and decide which one fits best for your Use Case.

Super Guru

Here are a few options

  1. Store data in its native format and create hive external tables on it.
    1. Not the best performance when it comes to queries the data but it may do the job or you
  2. Store data in ORC or Parquet format on HDFS, external Table
    1. Much better performance but built in optimizations for hive not there
  3. Store data in ORC or Parquet format and ingest into hive as a internal table
    1. Best performance

Tools you can for ingest

  1. NiFi
    1. Super easy
    2. you can convert data from one format to another in pipeline.
  2. Sqoop
    1. If source is in RDBMS
  3. Spark
    1. Super easy but no UI
    2. you can convert data from one format to another in pipeline
  4. Storm
    1. Super fast ingest, not as easy
    2. you can convert data from one format to another in pipeline
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.