Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

I have a zip file with 10k Mera files and 10k data files. What is the best way to ingest this in hive? Meta and data tables are separate

I have a zip file with 10k Mera files and 10k data files. What is the best way to ingest this in hive? Meta and data tables are separate

I have a zip file with 10k Mera files and 10k data files. What is the best way to ingest this in hive? Meta and data tables are separate

4 REPLIES 4
Highlighted

Re: I have a zip file with 10k Mera files and 10k data files. What is the best way to ingest this in hive? Meta and data tables are separate

Super Guru

what is a mera file?

Highlighted

Re: I have a zip file with 10k Mera files and 10k data files. What is the best way to ingest this in hive? Meta and data tables are separate

@sunile.manjee apologies for typo its meta files.

Highlighted

Re: I have a zip file with 10k Mera files and 10k data files. What is the best way to ingest this in hive? Meta and data tables are separate

Master Collaborator
@Nilesh Shrimant

You should experiment with several methods of Data -> HDFS -> Hive.

In its simplest form, if your data is concise, you can always upload to HDFS and create external Hive table feeding your HIVE CREATE EXTERNAL TABLE Statement with the necessary configurations to understand your data.

If your data needs processing and preparation I recommend Nifi. I use NiFi to do this (more than 50 million records) in several different manners. You will need to inspect all of the NiFi Hive Processors and decide which one fits best for your Use Case.

 


 


If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post.  


 


Thanks,



Steven @ DFHZ

Highlighted

Re: I have a zip file with 10k Mera files and 10k data files. What is the best way to ingest this in hive? Meta and data tables are separate

Super Guru

Here are a few options

  1. Store data in its native format and create hive external tables on it.
    1. Not the best performance when it comes to queries the data but it may do the job or you
  2. Store data in ORC or Parquet format on HDFS, external Table
    1. Much better performance but built in optimizations for hive not there
  3. Store data in ORC or Parquet format and ingest into hive as a internal table
    1. Best performance

Tools you can for ingest

  1. NiFi
    1. Super easy
    2. you can convert data from one format to another in pipeline.
  2. Sqoop
    1. If source is in RDBMS
  3. Spark
    1. Super easy but no UI
    2. you can convert data from one format to another in pipeline
  4. Storm
    1. Super fast ingest, not as easy
    2. you can convert data from one format to another in pipeline
Don't have an account?
Coming from Hortonworks? Activate your account here