Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark dataframe returning no data from hive

Solved Go to solution

Spark dataframe returning no data from hive

New Contributor

We have several tables ingested into hive via nifi. The tables show up and are query-able from hive directly. Though when attempting to query some of the tables in spark only the column names are returned.

Not Working bits:

CREATE TABLE IF NOT EXISTS ngmss.company(  OPERATING_COMPANY_NUMBER VARCHAR(4),  OPERATING_COMPANY_NAME VARCHAR(50),  LAST_MODIFIED_USERID VARCHAR(8),  LAST_MODIFIED_DATE TIMESTAMP,  TASK_FOR_ORD_COMP_PROC VARCHAR(8),  TBS_OWNER_IND CHAR(1),  PARTY_ID INT,  PARTY_ROLE_SEQ INT)CLUSTERED BY (OPERATING_COMPANY_NUMBER) INTO 2 BUCKETS STORED AS ORCTBLPROPERTIES ("transactional"="true");
spark.sql("select * from ngmss.company").show()
+------------------------+----------------------+--------------------+------------------+----------------------+-------------+--------+--------------+|operating_company_number|operating_company_name|last_modified_userid|last_modified_date|task_for_ord_comp_proc|tbs_owner_ind|party_id|party_role_seq|+------------------------+----------------------+--------------------+------------------+----------------------+-------------+--------+--------------++------------------------+----------------------+--------------------+------------------+----------------------+-------------+--------+--------------+

Working bits:

CREATE TABLE IF NOT EXISTS ngmss.circuit_position(  CIRCUIT_DESIGN_ID INT,  CIRCUIT_POSITION_NUMBER INT,  REMARKS VARCHAR(60),  CIRCUIT_NODE_STATUS CHAR(1),  LAST_MODIFIED_USERID VARCHAR(8),  LAST_MODIFIED_DATE TIMESTAMP,  RATE_CODE VARCHAR(10),  CIRCUIT_DESIGN_ID_3 INT,  DOCUMENT_NUMBER INT,  PENDING_DT TIMESTAMP,  CIRCUIT_DESIGN_ID_PREV INT,  STS_CHAN_NBR INT,  VTG_CHAN_NBR INT,  VT_CHAN_NBR INT,  EQUIV_CHAN INT,  ADDITIONAL_ASSIGNMENT_SEQ_NBR INT,  NS_COMP_ID INT,  NS_ID INT,  PROTECTED_PATH_TRI CHAR(1))CLUSTERED BY (CIRCUIT_DESIGN_ID) INTO 2 BUCKETS STORED AS ORCTBLPROPERTIES ("transactional"="true");
spark.sql("select * from ngmss.circuit_position").show()
Cant really share the return but it's there

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Spark dataframe returning no data from hive

@Royce Whetstine

Spark currently fully doesn't support Hive's transactional tables. Here are the reference jira's : SPARK-16996 & SPARK-15348

2 REPLIES 2

Re: Spark dataframe returning no data from hive

May be silly, but do you have any data at all in ngmss.company? Are you able to see it via Hive?

Re: Spark dataframe returning no data from hive

@Royce Whetstine

Spark currently fully doesn't support Hive's transactional tables. Here are the reference jira's : SPARK-16996 & SPARK-15348

Don't have an account?
Coming from Hortonworks? Activate your account here