Monday, 9 September 2013

Sub query/Nested Query/Inner Query in SQL

            Sub Query/Nested Query/Inner Query in SQL

Sub query or Inner query or Nested query is a query in a query. A sub query is usually added in the WHERE Clause of the SQL statement.

A Subquery is used to return data that will be used in the main query as a condition to further restrict the data to be retrieved.
Subqueries can be used with the SELECT, INSERT, UPDATE, and DELETE statements along with the operators like =, <, >, >=, <=, IN, BETWEEN etc.
There are a few rules that subqueries must follow:

  • Subqueries must be enclosed within parentheses.

  • A subquery can have only one column in the SELECT clause, unless multiple columns are in the main query for the subquery to compare its selected columns.

  • An ORDER BY cannot be used in a subquery, although the main query can use an ORDER BY. The GROUP BY can be used to perform the same function as the ORDER BY in a subquery.

  • Subqueries that return more than one row can only be used with multiple value operators, such as the IN operator.

  • The SELECT list cannot include any references to values that evaluate to a BLOB, ARRAY, CLOB, or NCLOB.

  • A subquery cannot be immediately enclosed in a set function.

  • The BETWEEN operator cannot be used with a subquery; however, the BETWEEN operator can be used within the subquery.


CTE in SQL

                    CTE(Common Table Expressions) in SQL


CTE( Common Table Expressions) was first introduced with SQL 2005. It is the best replacement for Views.

There are times when we might require to work on small data sets that is a part of large database. For example, we might have a huge sized table that contains data for all transactions that a company has made, but we require current month's data only and apply some grouping or filters on that particular data of current month. Typically views are used to do the task. But, views sometimes over occupy our resources as views are stored at system level. CTE is the best substitute for Views. It is also helpful when we don't have access to create object in a database.


Common Table Expressions or CTE are more readable form of derived table. CTE can be declared once and can be used in multiple queries. CTE can be recursive and non-recursive (we will cover this topic in next post)

Common Table Expression Syntax
A Common Table Expression contains three core parts:
  • The CTE name (this is what follows the WITH keyword)
  • The column list (optional)
  • The query (appears within parentheses after the AS keyword)

EXAMPLE:

WITH ProductAndCategoryNamesOverTenDollars (ProductName, CategoryName, UnitPrice) AS
(
   SELECT
      p.ProductName,
      c.CategoryName,
      p.UnitPrice
   FROM Products p
      INNER JOIN Categories c ON
         c.CategoryID = p.CategoryID
   WHERE p.UnitPrice > 10.0
)


SELECT *
FROM ProductAndCategoryNamesOverTenDollars
ORDER BY CategoryName ASC, UnitPrice ASC, ProductName ASC


                            The CTE is part of the subsequent statement only.The subsequent statement can be a single SELECT/INSERT/UPDATE/DELETE, or a compound (with UNION, INTERSECT etc). So if we modify our above example and include a query between CTE creation query(WITH) and the query that refer the CTE.(like below)



WITH ProductAndCategoryNamesOverTenDollars (ProductName, CategoryName, UnitPrice) AS
(
   SELECT
      p.ProductName,
      c.CategoryName,
      p.UnitPrice
   FROM Products p
      INNER JOIN Categories c ON
         c.CategoryID = p.CategoryID
   WHERE p.UnitPrice > 10.0
)

SELECT * FROM Products 

SELECT *
FROM ProductAndCategoryNamesOverTenDollars
ORDER BY CategoryName ASC, UnitPrice ASC, ProductName ASC


Here, we will see that when the query that refers to CTE(ProductAndCategoryNamesOverTenDollars) is executed we will get an error "Invalid object name 'ProductAndCategoryNamesOverTenDollars'" 

Sunday, 8 September 2013

VBA- Enable Developer Tab - EXCEL

                     Enable Developer Tab

Following are the steps to enable developer tab in Excel 2007


  • Click on 'Office Button' to top left of the workbook and then click on 'Excel Options'

  • Check the option 'Show developer tab in the ribbon' and click on OK

You might now be able to see developer tab in the workbook

  • To start the Visual Basic Editor click on 'Developer' Tab and click on the 'Visual Studio' option (very first option from left hand side)




You will now see visual basic editor to begin coding



SQL Tuning/Query Optimization/SQL Optimization

           SQL Tuning/Query Optimization/SQL Optimization


Query Optimization is often misunderstood as a technique or some setting that we need to change in SQL. In-real query optimization is something related to best practice that we have to follow in your day-to-day life.

Query optimization is all about how we use clause and operators in our query.

I have listed below some examples that we might be using daily and best practice that needs to follow to optimize for better performance.

  • SQL query becomes faster if you specify column names in query instead of '*'
                  EXAMPLE:
                             Normal way:
                                 select * from emp_details

                             Optimized way:
                                 select fname,lname from emp_details   

                  

  • Always use HAVING clause instead of WHERE clause whenever you are using GROUP BY in your query. Using HAVING clause allows SQL to first group the data and then filter the grouped data using HAVING clause. 

                  EXAMPLE:
                             Normal way:
                                 select fname.lname from emp_details
                                 where fname like 'XYZ'
                                 group by fname

                             Optimized way:
                                select fname,lname from emp_details
                                group by fname
                                having fname like 'XYZ'


  • Create Index on the column which we will be using to filter data.Following priority order should be followed when any index is created a) WHERE clause, b) JOIN clause, c) ORDER BY clause, d) SELECT clause
  • Sometimes you may have more than one subqueries in your main query. Try to minimize the number of subquery block in your query. 


                 EXAMPLE:
                             Normal way:
                                SELECT name FROM employee
                  WHERE salary = (SELECT MAX(salary) FROM employee_details) 
                  AND age = (SELECT MAX(age) FROM employee_details) 
                  AND emp_dept = 'Electronics';


                             Optimized way:
                                SELECT name FROM employee 
                  WHERE (salary, age ) = (SELECT MAX (salary), MAX (age) 
                  FROM employee_details) 
                  AND dept = 'Electronics'; 
  • Always use EXISTS instead of IN operator, Generally on large data IN operator works really slow compare to EXISTS. IN is efficient when most of the filter criteria is in the sub-query. EXISTS is efficient when most of the filter criteria is in the main query.
  • Always use UNION ALL instead of UNION if you does not expect unique result. UNION checks for duplicate rows and eliminate it . Hence it takes more time for execution compare to UNION ALL
  • If you want to re-use the query, create a STORE PROCEDURE for the same.
  • Check if there is at-least 30% HHD is empty – it improves the performance a bit
  • Remove any unnecessary joins from table

IDENTITY in SQL

                          IDENTITY in SQL

I have often came across many questions on IDENTITY like 'What is IDENTITY?', 'What is the use of IDENTITY?'  , 'How does IDENTITY help us identify duplicate rows in a table?', etc...

This post will cover the most basic functionality of IDENTITY

What is IDENTITY?
  • The IDENTITY clause specifies that the column is an IDENTITY column
  • If the column is set as IDENTITY column than it will be filled with an auto-incremented number and no other data can be inserted  into that column
  • IDENTITY is also known as 'Auto-Increment'
  • IDENTITY is used with CREATE and ALTER statements
  • It supports only int, bigint, smallint, tinyint, or decimal / numeric with a scale of 0
Syntax:
IDENTITY [ ( seed , increment ) ]
Arguments
seed
Is the value that is used for the very first row loaded into the table.
increment
Is the incremental value that is added to the identity value of the previous row that was loaded.
You must specify both the seed and increment or neither. If neither is specified, the default is (1,1).
Example:

CREATE TABLE new_employees( 
id_num int IDENTITY(1,1),
fname varchar (20),
minit char(1),
lname varchar(30))


Alter Table new_employees
add id_num bigint identity(1,1)




What is the use of IDENTITY?
                       
                  The most common use of IDENTITY is to set PRIMARY Key column or UNIQUE key column as IDENTITY column which lets you provide a simple unique row identifier without the need to maintain it yourself. SQL Server does the work for you.

                   
How does IDENTITY help us identify duplicate rows in a table?

                               We can use IDENTITY to identify duplicate rows in a table. Now we will go through an example to know how it works. Suppose we have a table named 'new_employees' and the data in the table is as in the below screen shot.


Now if you observe there is multiple entry made for 'XYZ'  named employee in the table. Now you come to know that there is duplicate entry in the table and here you can identify the id for this row and type a simple DELETE Command to delete row based on 'id_num'. But suppose there is duplicate entry for more than 1,000 employees, in this case it will not be possible to to identify 'id_num' manually.

In this case, we will simply write a sub query to get the desired results.

Sub query:

select * from new_employees where id_num in (select MAX(id_num) from new_employees group by fname)

The output of the query is as below



If you see the latest entry for the employee 'XYZ'  is selected and the old entry is not considered.

To convert above SELECT statement into DELETE command to delete the old entry we simply need to do the below highlighted changes.

delete from new_employees where id_num not in (select MAX(id_num) from new_employees group by fname)


Saturday, 7 September 2013

Difference between DELETE and TRUNCATE command in SQL

     Difference between DELETE and TRUNCATE command

The most important and the most favorite question of the interviewers, that i had came across, is 'what's the difference between delete and truncate command?'. It is very important to know the difference between the two when you are working on a database.

Few important difference are as follows:
  1. TRUNCATE:
    • TRUNCATE is faster and uses fewer system and transaction log resources than DELETE.
    • TRUNCATE removes the data by de-allocating the data pages used to store the table's data, and only the page de-allocations are recorded in the transaction log.
    • TRUNCATE removes all rows from a table, but the table structure, its columns, constraints, indexes and so on, remains. The counter used by an identity for new rows is reset to the seed for the column.
    • You cannot use TRUNCATE TABLE on a table referenced by a FOREIGN KEY constraint. Because TRUNCATE TABLE is not logged, it cannot activate a trigger.
    • TRUNCATE cannot be rolled back.
    • TRUNCATE is DDL Command.
    • TRUNCATE Resets identity of the table
    • Cannot use WHERE conditions
  2. DELETE:
    • DELETE removes rows one at a time and records an entry in the transaction log for each deleted row.
    • If you want to retain the identity counter, use DELETE instead. If you want to remove table definition and its data, use the DROP TABLE statement.
    • DELETE Can be used with or without a WHERE clause
    • DELETE Activates Triggers.
    • DELETE can be rolled back.
    • DELETE is DML Command.
    • DELETE does not reset identity of the table.
Recently, I came across a database. The size of the MDF file of that database was around 300 GB and when i took the backup of the database the backup file was of only 90 GB. I was confused and consult with the team that was working on that database to know there basic practice of working on DB. I found that they always use 'DELETE' command to remove data from a table. As mentioned above, DELETE command always stores an entry for each row in transaction log and hence,  this discrepancy was observed.

It is always recommended that you use TRUNCATE over DELETE if you want to delete all the records from the table.

One more impotant difference is: DELETE and TRUNCATE both can be rolled back when surrounded by TRANSACTION if the current session is not closed. If TRUNCATE is written in Query Editor surrounded by TRANSACTION and if session is closed, it can not be rolled back but DELETE can be rolled back.


Indexes in SQL

                                            Indexes in SQL

Index is a database object which help to fetch/retrieve data faster from SQL server. It is basically use to optimize query performance.

An index can be used to efficiently find all row matching some column in your query and then walk through only that subset of the table to find exact matches. If you don't have indexes on any column in the WHERE clause, the SQL server have to walk through the whole table and check every row to see if it matches, which may be a slow operation on big tables.

                                      Query to create index:
                                                       Create Index IndexName
                                                       On TableName (ColumnName1,ColumnName2,....)

                                      Query  to rename index:
                                                       sp_rename 'IndexName','New_IndexName'


 Two main types of indexes are Clustered and Non-Clustered index

Clustered Index : A clustered index determines the order in which the rows of a table are stored on disk. If a table has a clustered index, then the rows of that table will be stored on disk in the same exact order as the clustered index.There can be only one clustered index created on a table.

Non-Clustered Index: A table can have multiple non-clustered indexes because they don’t affect the order in which the rows are stored on disk like clustered indexes. A single table can have up-to 256 non-clustered index.

                                   


                                   Difference between Clustered and Non-Clustered Index
  • A clustered index determines the order in which the rows of the table will be stored on disk – and it actually stores row level data in the leaf nodes of the index itself. A non-clustered index has no effect on which the order of the rows will be stored.
  • A table can have multiple non-clustered indexes. But, a table can have only one clustered index.
  • Non clustered indexes store both a value and a pointer to the actual row that holds that value. Clustered indexes don’t need to store a pointer to the actual row because of the fact that the rows in the table are stored on disk in the same exact order as the clustered index – and the non-clustered index actually stores the row-level data in it’s leaf nodes.





Different types of statement in SQL

Different types of statement in SQL

SQL statements can be classified into four types as follows: 

DDL
Data Definition Language statements are used to define the database structure or schema. Examples:
  • CREATE - to create objects in the database
  • ALTER - alters the structure of the database
  • DROP - delete objects from the database
  • TRUNCATE - remove all records from a table, including all spaces allocated for the records are removed
  • COMMENT - add comments to the data dictionary
  • RENAME - rename an object

DML

Data Manipulation Language statements are used for managing data within schema objects. Examples:
  • SELECT - retrieve data from the a database
  • INSERT - insert data into a table
  • UPDATE - updates existing data within a table
  • DELETE - deletes all records from a table, the space for the records remain
  • CALL - call a PL/SQL or Java subprogram
  • EXPLAIN PLAN - explain access path to data
  • LOCK TABLE - control concurrency

DCL

Data Control Language statements are generally used to provide control/access/privileges to user over database. Examples:
  • GRANT - gives user's access privileges to database
  • REVOKE - withdraw access privileges given with the GRANT command

TCL

Transaction Control statements are used to manage the changes made by DML statements. It allows statements to be grouped together into logical transactions. Examples:
  • COMMIT - save work done
  • SAVEPOINT - identify a point in a transaction to which you can later roll back
  • ROLLBACK - restore database to original state
  • SET TRANSACTION - Change transaction options like isolation level and what rollback segment to use