Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Where to store a really wide table?

avatar
Master Guru

If you have 600+ columns and you need to access 20-30 columns at a time. What is the optimal type of storage:

  • Hive Table Stored as ORC with compression, vectorization and optimization; access with Tez and properly partition and bucket
  • HBase
  • HBase in Phoenix Table
  • Parquet File
  • AVRO File
  • Accumulo
1 ACCEPTED SOLUTION

avatar
Master Guru

600 columns of detailed information on customers with many kinds of attributes. The data needs to be access interactively in reports and through web applications. Access to a few hundred/thousand rows (plus summary information) from the dataset based on known 20-30 column chunks of related information bundles.

For interactive exploration of the data and extraction of this lists to use elsewhere.

View solution in original post

5 REPLIES 5

avatar
Master Collaborator

You can choose hbase as storage.

HBase can easily handle hundreds of columns. Consider grouping the columns normally accessed together in the same column family.

avatar
Super Guru

Accumulo would work for the same reasons that HBase does.

avatar
Master Collaborator

If you can share your use case more, we would be able to provide more advice.

avatar
Master Guru

600 columns of detailed information on customers with many kinds of attributes. The data needs to be access interactively in reports and through web applications. Access to a few hundred/thousand rows (plus summary information) from the dataset based on known 20-30 column chunks of related information bundles.

For interactive exploration of the data and extraction of this lists to use elsewhere.

avatar
Master Collaborator

HBase is a viable solution.

For query, consider Phoenix.