Support Questions

Find answers, ask questions, and share your expertise

Spark dataframe returning no data from hive

avatar
New Contributor

We have several tables ingested into hive via nifi. The tables show up and are query-able from hive directly. Though when attempting to query some of the tables in spark only the column names are returned.

Not Working bits:

CREATE TABLE IF NOT EXISTS ngmss.company(  OPERATING_COMPANY_NUMBER VARCHAR(4),  OPERATING_COMPANY_NAME VARCHAR(50),  LAST_MODIFIED_USERID VARCHAR(8),  LAST_MODIFIED_DATE TIMESTAMP,  TASK_FOR_ORD_COMP_PROC VARCHAR(8),  TBS_OWNER_IND CHAR(1),  PARTY_ID INT,  PARTY_ROLE_SEQ INT)CLUSTERED BY (OPERATING_COMPANY_NUMBER) INTO 2 BUCKETS STORED AS ORCTBLPROPERTIES ("transactional"="true");
spark.sql("select * from ngmss.company").show()
+------------------------+----------------------+--------------------+------------------+----------------------+-------------+--------+--------------+|operating_company_number|operating_company_name|last_modified_userid|last_modified_date|task_for_ord_comp_proc|tbs_owner_ind|party_id|party_role_seq|+------------------------+----------------------+--------------------+------------------+----------------------+-------------+--------+--------------++------------------------+----------------------+--------------------+------------------+----------------------+-------------+--------+--------------+

Working bits:

CREATE TABLE IF NOT EXISTS ngmss.circuit_position(  CIRCUIT_DESIGN_ID INT,  CIRCUIT_POSITION_NUMBER INT,  REMARKS VARCHAR(60),  CIRCUIT_NODE_STATUS CHAR(1),  LAST_MODIFIED_USERID VARCHAR(8),  LAST_MODIFIED_DATE TIMESTAMP,  RATE_CODE VARCHAR(10),  CIRCUIT_DESIGN_ID_3 INT,  DOCUMENT_NUMBER INT,  PENDING_DT TIMESTAMP,  CIRCUIT_DESIGN_ID_PREV INT,  STS_CHAN_NBR INT,  VTG_CHAN_NBR INT,  VT_CHAN_NBR INT,  EQUIV_CHAN INT,  ADDITIONAL_ASSIGNMENT_SEQ_NBR INT,  NS_COMP_ID INT,  NS_ID INT,  PROTECTED_PATH_TRI CHAR(1))CLUSTERED BY (CIRCUIT_DESIGN_ID) INTO 2 BUCKETS STORED AS ORCTBLPROPERTIES ("transactional"="true");
spark.sql("select * from ngmss.circuit_position").show()
Cant really share the return but it's there

1 ACCEPTED SOLUTION

avatar
@Royce Whetstine

Spark currently fully doesn't support Hive's transactional tables. Here are the reference jira's : SPARK-16996 & SPARK-15348

View solution in original post

2 REPLIES 2

avatar

May be silly, but do you have any data at all in ngmss.company? Are you able to see it via Hive?

avatar
@Royce Whetstine

Spark currently fully doesn't support Hive's transactional tables. Here are the reference jira's : SPARK-16996 & SPARK-15348