Support Questions

BDA_Analytics · ‎01-04-2019

Hi,

I am trying to convert Microsoft SQL Server Create Script into HIVE and below is the script from MS SQL.

CREATE TABLE [dbo].[Data](
[NodeID] [int] IDENTITY(1,1) NOT NULL,
[ObjectSubType] [nvarchar](50) NULL,
[IP_Address] [nvarchar](50) NULL,
[IP_Address_Type] [nvarchar](10) NOT NULL,
[DynamicIP] [bit] NULL,
[VendorIcon] [char](20) NULL,
[IOSImage] [nvarchar](255) NULL,
[IOSVersion] [nvarchar](255) NULL,
[GroupStatus] [char](40) NULL,
[TotalMemory] [real] NULL,
[External] [bit] NULL,
[EntityType] [nvarchar](100) NULL,
[CMTS] [char](1) NULL,
[BlockUntil] [datetime] NOT NULL,
[IPAddressGUID] [uniqueidentifier] NULL,
[CustomStatus] [bit] NOT NULL,
CONSTRAINT [PK_Data] PRIMARY KEY CLUSTERED
(
[NodeID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

Could you please help me understand what data type and primary key should be modified in this query that suits to HIVE please:

which data type is relevant to this:

IDENTITY(1,1)

which data type is relevant to this in HIVE;

[IPAddressGUID] [uniqueidentifier]

How to define below PK constraint in HIVE:

CONSTRAINT [PK_Data] PRIMARY KEY CLUSTERED
(
[NodeID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

Thanks

BDA.

Gomathinayagam · ‎01-07-2019

Hi BDA,

I understand that you would like to convert a MS SQL query into Hive.

1. For IDENTITY(1,1) - Auto increment

Ans: There is no such built-in feature in Hive for auto-increment. However, this can be achieved by using UDF's. However, it might impact the performance since Hive runs as distributed queries. The below Apache documentation for UDF and Apache JIRA might help you achieve this.

UDF in Hive: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

Apache JIRA: https://issues.apache.org/jira/browse/HIVE-1304

2. Datatypes and Primary Key:

In default there is no primary key/unique identifier support in hive. If you require some of your column to have unique values you need to programatically handle it outside Hive before loading the data into the destination table. For example, If you are importing data from a RDBMS to Hive, if your PK is preserved in RDBMS, the same data comes to Hive. So no need to explicity handle the same, just a import should do. But if that is not the case you might need to import the RDBMS table to a temporary or staging table in Hive and from there based on the requirement, you can load the distinct values into destination table using some Hive QL.

Below are the datatypes in Hive,

Type	Size	Literal syntax examples
TINYINT	1 byte signed integer.	20
SMALLINT	2 byte signed integer.	20
INT	4 byte signed integer.	20
BIGINT	8 byte signed integer.	20
BOOLEAN	Boolean true or false.	TRUE
FLOAT	Single precision floating point.	3.14159
DOUBLE	Double precision floating point.	3.14159
STRING	Sequence of characters. The character set can be specified. Single or double quotes can be used.	Now is the time',"for all good men"
TIMESTAMP (v0.8.0+)	Integer, float, or string.	1327882394 (Unix epoch seconds), 1327882394.123456789 (Unix ep- och seconds plus nanoseconds), and '2012-02-03 12:34:56.123456789' (JDBC- compliant java.sql.Timestamp format)
BINARY (v0.8.0+)	Array of bytes

BDA_Analytics · ‎01-31-2019

Thank you for the reply

Cloudera Community

Support Questions

Conversion of Microsoft SQL script to HIVE