About gpolanch

gpolanch · ‎07-19-2016

Hello, In our current relational environment of Teradata, Oracle, SQL-Server, we often use the online SQL-based data dictionary facilities (DBC, ALL_TABLES, ALL_TAB_COLUMNS, INFORMATION_SCHEMA, etc) for use in the automation of operational tasks and to search columns for "data of interest" for study from other data marts. How can this be done in Hive? I'm aware of "show tables" and "describe", but need much more power. I see that the metastore, org.apache.hadoop.hive.metastore.api.Table has a rich set of functionality, but I have not found examples for what I want to do. I would like to write some Java that would walk all databases of interest, and generate an output of ALL tables in the databases and their columns, types, etc. Or does HCatalog or some other tool offer this type of functionality as an easier/quicker alternative? If someone could point me to some similar examples or intro material, etc, that would be appreciated very much. Thanks!

gpolanch · ‎07-11-2016

@slachterman Thank you very much ! That worked well ! -Greg

gpolanch · ‎07-10-2016

I successfully worked through Tutorial -400 (Using Hive with ORC from Apache Spark). But, what I would really like to do is to read established Hive ORC tables into Spark without having to know the HDFS path and filenames. I created an ORC table in Hive, then did the following commands from the tutorial in scala, but from the exception, it appears that the read/load is expecting the HDFS filename. How do I read directly from the Hive table, not HDFS? I searched, but could not find an existing answer. Thanks much! -Greg hive> create table test_enc_orc stored as ORC as select * from test_enc; hive> select count(*) from test_enc_orc; OK 10 spark-shell --master yarn-client --driver-memory 512m --executor-memory 512m import org.apache.spark.sql.hive.orc._ import org.apache.spark.sql._ val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) val test_enc_orc = hiveContext.read.format("orc").load("test_enc_orc") java.io.FileNotFoundException: File does not exist: hdfs://sandbox.hortonworks.com:8020/user/xxxx/test_enc_orc at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1319)

Online	Offline
Last Visited	‎07-19-2016 11:16 AM

Member Since	‎07-10-2016 01:34 PM
Last Visited	‎07-19-2016 11:16 AM
Posts	3
Kudos received	2

Cloudera Community

How to use HCatalog or Hive Metastore to list tabl...

Re: How to read table into Spark using the Hive ta...

How to read table into Spark using the Hive tablen...