Member since
07-10-2016
3
Posts
2
Kudos Received
0
Solutions
07-19-2016
11:16 AM
Hello, In our current relational environment of Teradata, Oracle, SQL-Server, we often use the online SQL-based data dictionary facilities (DBC, ALL_TABLES, ALL_TAB_COLUMNS, INFORMATION_SCHEMA, etc) for use in the automation of operational tasks and to search columns for "data of interest" for study from other data marts. How can this be done in Hive? I'm aware of "show tables" and "describe", but need much more power. I see that the metastore,
org.apache.hadoop.hive.metastore.api.Table has a rich set of functionality, but I have not found examples for what I want to do. I would like to write some Java that would walk all databases of interest, and generate an output of ALL tables in the databases and their columns, types, etc. Or does HCatalog or some other tool offer this type of functionality as an easier/quicker alternative? If someone could point me to some similar examples or intro material, etc, that would be appreciated very much. Thanks!
... View more
Labels:
- Labels:
-
Apache HCatalog
07-11-2016
03:03 AM
@slachterman Thank you very much ! That worked well ! -Greg
... View more
07-10-2016
01:59 PM
2 Kudos
I successfully worked through Tutorial -400 (Using Hive with ORC from Apache Spark). But, what I would really like to do is to read established Hive ORC tables into Spark without having to know the HDFS path and filenames. I created an ORC table in Hive, then did the following commands from the tutorial in scala, but from the exception, it appears that the read/load is expecting the HDFS filename. How do I read directly from the Hive table, not HDFS? I searched, but could not find an existing answer. Thanks much! -Greg hive> create table test_enc_orc stored as ORC as select * from test_enc;
hive> select count(*) from test_enc_orc;
OK
10
spark-shell --master yarn-client --driver-memory 512m --executor-memory 512m
import org.apache.spark.sql.hive.orc._
import org.apache.spark.sql._
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
val test_enc_orc = hiveContext.read.format("orc").load("test_enc_orc")
java.io.FileNotFoundException: File does not exist:
hdfs://sandbox.hortonworks.com:8020/user/xxxx/test_enc_orc
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1319)
... View more
Labels: