Sunday, 8 September 2013

IDENTITY in SQL

                          IDENTITY in SQL

I have often came across many questions on IDENTITY like 'What is IDENTITY?', 'What is the use of IDENTITY?'  , 'How does IDENTITY help us identify duplicate rows in a table?', etc...

This post will cover the most basic functionality of IDENTITY

What is IDENTITY?
  • The IDENTITY clause specifies that the column is an IDENTITY column
  • If the column is set as IDENTITY column than it will be filled with an auto-incremented number and no other data can be inserted  into that column
  • IDENTITY is also known as 'Auto-Increment'
  • IDENTITY is used with CREATE and ALTER statements
  • It supports only int, bigint, smallint, tinyint, or decimal / numeric with a scale of 0
Syntax:
IDENTITY [ ( seed , increment ) ]
Arguments
seed
Is the value that is used for the very first row loaded into the table.
increment
Is the incremental value that is added to the identity value of the previous row that was loaded.
You must specify both the seed and increment or neither. If neither is specified, the default is (1,1).
Example:

CREATE TABLE new_employees( 
id_num int IDENTITY(1,1),
fname varchar (20),
minit char(1),
lname varchar(30))


Alter Table new_employees
add id_num bigint identity(1,1)




What is the use of IDENTITY?
                       
                  The most common use of IDENTITY is to set PRIMARY Key column or UNIQUE key column as IDENTITY column which lets you provide a simple unique row identifier without the need to maintain it yourself. SQL Server does the work for you.

                   
How does IDENTITY help us identify duplicate rows in a table?

                               We can use IDENTITY to identify duplicate rows in a table. Now we will go through an example to know how it works. Suppose we have a table named 'new_employees' and the data in the table is as in the below screen shot.


Now if you observe there is multiple entry made for 'XYZ'  named employee in the table. Now you come to know that there is duplicate entry in the table and here you can identify the id for this row and type a simple DELETE Command to delete row based on 'id_num'. But suppose there is duplicate entry for more than 1,000 employees, in this case it will not be possible to to identify 'id_num' manually.

In this case, we will simply write a sub query to get the desired results.

Sub query:

select * from new_employees where id_num in (select MAX(id_num) from new_employees group by fname)

The output of the query is as below



If you see the latest entry for the employee 'XYZ'  is selected and the old entry is not considered.

To convert above SELECT statement into DELETE command to delete the old entry we simply need to do the below highlighted changes.

delete from new_employees where id_num not in (select MAX(id_num) from new_employees group by fname)


No comments:

Post a Comment