Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Conversion of Microsoft SQL script to HIVE

avatar

Hi,

 

I am trying to convert Microsoft SQL Server Create Script into HIVE  and below is the script from MS SQL.


CREATE TABLE [dbo].[Data](
[NodeID] [int] IDENTITY(1,1) NOT NULL,
[ObjectSubType] [nvarchar](50) NULL,
[IP_Address] [nvarchar](50) NULL,
[IP_Address_Type] [nvarchar](10) NOT NULL,
[DynamicIP] [bit] NULL,
[VendorIcon] [char](20) NULL,
[IOSImage] [nvarchar](255) NULL,
[IOSVersion] [nvarchar](255) NULL,
[GroupStatus] [char](40) NULL,
[TotalMemory] [real] NULL,
[External] [bit] NULL,
[EntityType] [nvarchar](100) NULL,
[CMTS] [char](1) NULL,
[BlockUntil] [datetime] NOT NULL,
[IPAddressGUID] [uniqueidentifier] NULL,
[CustomStatus] [bit] NOT NULL,
CONSTRAINT [PK_Data] PRIMARY KEY CLUSTERED
(
[NodeID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

 

Could you please help me understand what data type and primary key should be modified in this query that suits to HIVE please:

 

which data type is relevant to this:

IDENTITY(1,1)

 

which data type is relevant to this in HIVE;

 

[IPAddressGUID] [uniqueidentifier]

 

How to define below PK constraint in HIVE:

 

CONSTRAINT [PK_Data] PRIMARY KEY CLUSTERED 
(
[NodeID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

 

 

Thanks

BDA.

2 REPLIES 2

avatar
Expert Contributor

Hi BDA, 

 

I understand that you would like to convert a MS SQL query into Hive. 

 

1. For IDENTITY(1,1) - Auto increment

Ans: There is no such built-in feature in Hive for auto-increment. However, this can be achieved by using UDF's. However, it might impact the performance since Hive runs as distributed queries. The below Apache documentation for UDF and Apache JIRA might help you achieve this.

 

UDF in Hive: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

Apache JIRA: https://issues.apache.org/jira/browse/HIVE-1304

 

2. Datatypes and Primary Key:

 

In default there is no primary key/unique identifier support in hive. If you require some of your column to have unique values you need to programatically handle it outside Hive before loading the data into the destination table. For example, If you are importing data from a RDBMS to Hive, if your PK is preserved in RDBMS, the same data comes to Hive. So no need to explicity handle the same, just a import should do. But if that is not the case you might need to import the RDBMS table to a temporary or staging table in Hive and from there based on the requirement, you can load the distinct values into destination table using some Hive QL.

 

Below are the datatypes in Hive,

 

Type

Size

Literal syntax examples

TINYINT

1 byte signed integer.

20

SMALLINT

2 byte signed integer.

20

INT

4 byte signed integer.

20

BIGINT

8 byte signed integer.

20

BOOLEAN

Boolean true or false.

TRUE

FLOAT

Single precision floating point.

3.14159

DOUBLE

Double precision floating point.

3.14159

STRING

Sequence of characters.

The character set can be specified.

Single or double quotes can be used.

Now is the time',"for all good men"

TIMESTAMP (v0.8.0+)

Integer, float, or string.

1327882394 (Unix epoch seconds),

1327882394.123456789 (Unix ep- och seconds

plus nanoseconds), and '2012-02-03

12:34:56.123456789' (JDBC- compliant

java.sql.Timestamp format)

BINARY (v0.8.0+)

Array of bytes

 

avatar

Thank you for the reply