Created on 01-04-2019 03:39 PM - edited 09-16-2022 07:01 AM
Hi,
I am trying to convert Microsoft SQL Server Create Script into HIVE and below is the script from MS SQL.
CREATE TABLE [dbo].[Data](
[NodeID] [int] IDENTITY(1,1) NOT NULL,
[ObjectSubType] [nvarchar](50) NULL,
[IP_Address] [nvarchar](50) NULL,
[IP_Address_Type] [nvarchar](10) NOT NULL,
[DynamicIP] [bit] NULL,
[VendorIcon] [char](20) NULL,
[IOSImage] [nvarchar](255) NULL,
[IOSVersion] [nvarchar](255) NULL,
[GroupStatus] [char](40) NULL,
[TotalMemory] [real] NULL,
[External] [bit] NULL,
[EntityType] [nvarchar](100) NULL,
[CMTS] [char](1) NULL,
[BlockUntil] [datetime] NOT NULL,
[IPAddressGUID] [uniqueidentifier] NULL,
[CustomStatus] [bit] NOT NULL,
CONSTRAINT [PK_Data] PRIMARY KEY CLUSTERED
(
[NodeID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
Could you please help me understand what data type and primary key should be modified in this query that suits to HIVE please:
which data type is relevant to this:
IDENTITY(1,1)
which data type is relevant to this in HIVE;
[IPAddressGUID] [uniqueidentifier]
How to define below PK constraint in HIVE:
CONSTRAINT [PK_Data] PRIMARY KEY CLUSTERED
(
[NodeID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
Thanks
BDA.
Created 01-07-2019 11:20 AM
Hi BDA,
I understand that you would like to convert a MS SQL query into Hive.
1. For IDENTITY(1,1) - Auto increment
Ans: There is no such built-in feature in Hive for auto-increment. However, this can be achieved by using UDF's. However, it might impact the performance since Hive runs as distributed queries. The below Apache documentation for UDF and Apache JIRA might help you achieve this.
UDF in Hive: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
Apache JIRA: https://issues.apache.org/jira/browse/HIVE-1304
2. Datatypes and Primary Key:
In default there is no primary key/unique identifier support in hive. If you require some of your column to have unique values you need to programatically handle it outside Hive before loading the data into the destination table. For example, If you are importing data from a RDBMS to Hive, if your PK is preserved in RDBMS, the same data comes to Hive. So no need to explicity handle the same, just a import should do. But if that is not the case you might need to import the RDBMS table to a temporary or staging table in Hive and from there based on the requirement, you can load the distinct values into destination table using some Hive QL.
Below are the datatypes in Hive,
Type | Size | Literal syntax examples |
TINYINT | 1 byte signed integer. | 20 |
SMALLINT | 2 byte signed integer. | 20 |
INT | 4 byte signed integer. | 20 |
BIGINT | 8 byte signed integer. | 20 |
BOOLEAN | Boolean true or false. | TRUE |
FLOAT | Single precision floating point. | 3.14159 |
DOUBLE | Double precision floating point. | 3.14159 |
STRING | Sequence of characters. The character set can be specified. Single or double quotes can be used. | Now is the time',"for all good men" |
TIMESTAMP (v0.8.0+) | Integer, float, or string. | 1327882394 (Unix epoch seconds), 1327882394.123456789 (Unix ep- och seconds plus nanoseconds), and '2012-02-03 12:34:56.123456789' (JDBC- compliant java.sql.Timestamp format) |
BINARY (v0.8.0+) | Array of bytes |
|
Created 01-31-2019 03:12 PM
Thank you for the reply