Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HdInsight Azure DW Polybase to Hive Table (ORC) with Date partition is failing

Highlighted

HdInsight Azure DW Polybase to Hive Table (ORC) with Date partition is failing

New Contributor

I am trying to create a polybase external table in Azure Data warehouse for the Hive table which is stored as ORC(HD insight Data lake ) and partitioned by date when i query the external table its failing in SQL server without any proper error message.

When i don't use partition i am able to access hive ORC table using external table , but with partition i am getting below error without any proper error message.

Error:

Msg 106000, Level 16, State 1, Line 33 Index: 23, Size: 23

Detail:

Hive Table:

CREATE EXTERNAL TABLE DL_ClickStream_fnl(

.

.) PARTITIONED BY(TIME_STAMP Date) STORED AS ORC;

Azure DW External Table:

CREATE EXTERNAL TABLE [stg].[dl_clickstream_procd] (

.

.

) WITH (DATA_SOURCE = [AzureDataLakeStore_clusters],LOCATION = N'clusters/BLMSpark/hive/warehouse/coremetrics.db/dl_clickstream_procd' ,FILE_FORMAT = [ORCFileFormat_DL],REJECT_TYPE = VALUE,REJECT_VALUE = 2)

File Format:

CREATE EXTERNAL FILE FORMAT [ORCFileFormat_DL] WITH (FORMAT_TYPE = ORC)

1 REPLY 1

Re: HdInsight Azure DW Polybase to Hive Table (ORC) with Date partition is failing

New Contributor

I came across this issue, somehow Polybase table didn't allow the partition by column as part of the definition. Removing the column from selection list resolved the issue.

In your case, your Polybase table structure might have 24 columns including partition by column.

Example:

polybase table:

CREATE EXTERNAL TABLE [dbo].[ProcessControl] (

[tablename] varchar(50) COLLATE Latin1_General_100_CI_AS_KS_WS NULL,

[lastupdatetimestamp] datetime2(3) NULL,

[etlcreateddatetime] datetime2(3) NULL )

-- etlcreatedate date NULL -- partition by column is not accepted in column list.

WITH (LOCATION='/data/target/process_control', . . )

Hive table:

CREATE EXTERNAL TABLE `process_control`( `tablename` string, `pdwlastupdatetimestamp` timestamp, `etlcreateddatetime` timestamp)

PARTITIONED BY ( 
  `etlcreatedate` string) 

ROW FORMAT DELIMITED ... STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'

Thanks,

Santosh

Don't have an account?
Coming from Hortonworks? Activate your account here