I have downloaded and configured HDP_2.6.5_virtualbox_180626, I am able to connect and upload files to HDFS and can access the files data using Hive.
To access the data from HDFS files using SQL Server (PolyBase), i have configured the External table and File Table
CREATE DATABASE SCOPED CREDENTIAL Hadoop_Cred WITH IDENTITY = 'raj_ops', Secret = 'raj_ops';
GO
CREATE EXTERNAL DATA SOURCE MyHadoopCluster WITH (
TYPE = HADOOP,
LOCATION ='hdfs://192.168.0.107:8020',
CREDENTIAL = Hadoop_Cred
);
DROP EXTERNAL FILE FORMAT Hadoop_file_format
CREATE EXTERNAL FILE FORMAT Hadoop_file_format WITH
(
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS
(
FIELD_TERMINATOR =',',
USE_TYPE_DEFAULT = TRUE
)
)
CREATE EXTERNAL TABLE [dbo].HDFS_File
(
street varchar(100),
city varchar(100),
zip varchar(100),
[state] varchar(100),
beds int,
baths int,
sq__ft varchar(100),
[type] varchar(100),
sale_date varchar(100),
price varchar(100),
latitude varchar(100),
longitude varchar(100)
)
WITH (
LOCATION = N'/user/raj_ops/TP/',
DATA_SOURCE =MyHadoopCluster,
FILE_FORMAT =Hadoop_file_format,
REJECT_TYPE = VALUE,
REJECT_VALUE = 0
);
On accessing the table , i am getting the following error
Msg 7320, Level 16, State 110, Line 33
Cannot execute the query "Remote Query" against OLE DB provider "SQLNCLI11" for linked server "(null)".
HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: BlockMissingException: Could not obtain block: BP-243674277-172.17.0.2-1529333510191:blk_1073743046_2226 file=/user/raj_ops/TP/Sacramentorealestatetransactions.csv
Created 01-20-2020 05:16 AM
any pointer to resolve this issue or help me look into the right direction to identify the issue.
Thanks
Created 01-26-2020 06:12 AM
Dear All,
I have spent too much time trying to find the issue with no success. I might be looking at the wrong place or have done something wrong.
Appreciate if i can get some help.
Regards
Sufian