Friday 22 November 2013

Difference between Char,Varchar and Nvarchar in SQL

                Difference between Char,Varchar and Nvarchar in SQL


Char DataType:

  • Char Datatype is used to store fixed length characters. 
  • If declared as Char(100) then it will allocate space of memory for 100 characters, no matter how much characters is actually stored in the column
Varchar DataType: 
  • Abbreviation for Varchar is Variable length characters
  • It is used to store Non-Unicode characters
  • Since the name contains 'Variable' it specifies that the memory allocation varies as per the characters stored in the column. For e.g 'Nadir' is stored in the column then the memory allocation would be 5. It allocates 1 byte per character
  • Maximum length for this data type is 8000
  • Varchar supports collation which requires 1 byte memory storage per characters
  • No other language except the once requires 1 byte memory storage per characters can be stored for example: English language characters can be stored in varchar
Nvarchar DataType:
  • Abbreviation for Nvarchar is Uni-code variable length characters
  • It is used to store Unicode characters
  • The memory allocation varies similar to varchar datatype except it allocates 2 byte per characters
  • Maximum length for this data type is 4000
  • Nvarchar supports all types of collation
  • Multiple languages can be stored in nvarchar datatype by using prefix 'N'. Below example explains this point.
                 e.g: Declare @str nvarchar(4000) = N'最低点'

                                                               Highlights:
  1. As both varchar and nvarchar datatypes are variable in nature it acquires space as per the characters stored irrespective of the max limit specified
  2. It is always recommendable to use varchar compared to nvarchar when the data is in no other language then English
  3. Query execution for varchar is generally faster compared to nvarchar

Sunday 17 November 2013

Data Flow Task in SSIS

                                    Data Flow Task in SSIS


We will begin with the tutorial for Data flow task in SSIS. SSIS is one my most favorite topic and simply like to work on this technology.

There is always a misconception between control flow and data flow. Hence we need to know the difference between both.

Control flow

                  A control flow consists of one or more tasks and containers that execute when the package runs. To control order or define the conditions for running the next task or container in the package control flow, we use precedence constraints to connect the tasks and containers in a package. A subset of tasks and containers can also be grouped and run repeatedly as a unit within the package control flow. SQL Server 2005 Integration Services (SSIS) provides three different types of control flow elements: Containers that provide structures in packages, Tasks that provide functionality, and Precedence Constraints that connect the executables, containers, and tasks into an ordered control flow.

Data flow
                A data flow consists of the sources and destinations that extract and load data, the transformations that modify and extend data, and the paths that link sources, transformations, and destinations The Data Flow task is the executable within the SSIS package that creates, orders, and runs the data flow. A separate instance of the data flow engine is opened for each Data Flow task in a package. Data Sources, Transformations, and Data Destinations are the three important categories in the Data Flow.


Data flow as name suggests is use to transfer data from source to destination. Data flow should always contain a source and a destination. If  business demands us to modify our data while transferring data from source to destination we might need to use transformations to achieve our task.


Following are some commonly used transformation that are available:

AGGREGATE  - It applies aggregate functions to Record Sets to produce new output records from aggregated values.
AUDIT  - Adds Package and Task level Metadata - such as Machine Name, Execution Instance, Package Name, Package ID, etc.. 
CHARACTER MAP - Performs SQL Server level makes string data changes such as changing data from lower case to upper case.
CONDITIONAL SPLIT – Separates available input into separate output pipelines based on Boolean Expressions configured for each output.
COPY COLUMN - Add a copy of column to the output we can later transform the copy keeping the original for auditing.
DATA CONVERSION - Converts columns data types from one to another type. It stands for Explicit Column Conversion.
DATA MINING QUERY – Used to perform data mining query against analysis services and manage Predictions Graphs and Controls.
DERIVED COLUMN - Create a new (computed) column from given expressions.
EXPORT COLUMN – Used to export a Image specific column from the database to a flat file.
FUZZY GROUPING – Used for data cleansing by finding rows that are likely duplicates.
FUZZY LOOKUP -  Used for Pattern Matching and Ranking based on fuzzy logic.
IMPORT COLUMN - Reads image specific column from database onto a flat file.
LOOKUP - Performs the lookup (searching) of a given reference object set against a data source. It is used for exact matches only.
MERGE - Merges two sorted data sets into a single data set into a single data flow.
MERGE JOIN - Merges two data sets into a single dataset using a join junction.
MULTI CAST - Sends a copy of supplied Data Source onto multiple Destinations.
ROW COUNT - Stores the resulting row count from the data flow / transformation into a variable.
ROW SAMPLING - Captures sample data by using a row count of the total rows in dataflow specified by rows or percentage.
UNION ALL - Merge multiple data sets into a single dataset.
PIVOT – Used for Normalization of data sources to reduce analomolies by converting rows into columns

UNPIVOT – Used for demoralizing the data structure by converts columns into rows incase of building Data Warehouses. 

Saturday 16 November 2013

Difference between UNION and UNION ALL

                                    Difference between UNION and UNION ALL


Both Union and Union all statements are use to merge data from multiple queries which has same number of columns. Also, the data types of the column should be same.

We will look into the difference between both operators using an example. We have the following two tables

Empdetails:











Custdetails:














Union: Union merge data from two or  more queries and returns a sorted unique result set. Since this operator provides distinct result set it takes more time than Union All.

Query:

               select * from EmpDetails union  select * from custdetails   
Output:















Union All: Union all merge data from two or more queries and returns non-unique result set. Since this operator does not perform distinct on the result set, it's performance is much faster than Union operator

Query:

            select * from EmpDetails union all select * from custdetails 

Output:




















Following are the constraints that are associated with UNION and UNION ALL operator:

  • The tables should contains the same number of columns
  • Columns in the tables must be of the same data type
  • Order by clause should be applied to the final result set
  • Column name for the result set would be of the first query mentioned



See Also:   Different Types of Joins in SQL   , Difference between CTE and View



Different Types of Joins in SQL

                          Different Types of Joins in SQL



Joins are classified into three types as below:

  1. Inner Join
  2. Outer Join
  3. Cross Join


We will look into this by considering an example. Suppose we have two table as follows:

Empdetails:













EmpSalary:













Inner and outer joins perform join on the basis of at least one field. in our example this common file is empid.
  • Inner Join: Inner join is the most commonly used join. This join returns rows when there is a match in both the tables. Now we will apply this join between the two sample tables
                     Query:
                                 select empdetails.empid,empdetails.empname,empsalary.empsalary
                                 from empdetails , empsalary where empdetails.empid = empsalary.empid

                 Now in this query we are trying to pull salary of employee from empsalary tables for the                                employees which are present in empdetails table The result for this query will be as follows:


  • Outer Join: Outer join is of three type: Left outer Join and Right Outer Join
Left outer Join: This join returns all the rows from the left table in conjunction with the matching rows from the right table. If there are no columns matching in the right table, it returns NULL values.

Example:

Query:

                     select empdetails.empid,empdetails.empname,empsalary.empsalary
                     from empdetails left outer join empsalary on empdetails.empid = empsalary.empid



Output:















Right outer Join:This join returns all the rows from the right table in conjunction with the matching rows from the left table. If there are no columns matching in the left table, it returns NULL values.

Example:

Query:


    select empdetails.empid,empdetails.empname,empsalary.empid,empsalary.empname,empsalary.empsalary

    from empdetails right outer join empsalary on empdetails.empid = empsalary.empid





Output:


























Full outer Join:This join combines left outer join and right outer join. It returns row from either table when the conditions are met and returns null value when there is no match.

Example:

Query:

select empdetails.empid,empdetails.empname,empsalary.empid,empsalary.empname,empsalary.empsalary

from empdetails full outer join empsalary on empdetails.empid = empsalary.empid





Output:


















  • Cross Join: It is the Cartesian product of the two tables involved. 

Example:

Query:


   select empdetails.empid,empdetails.empname,empsalary.empid,empsalary.empname,empsalary.empsalary
   from empdetails cross join empsalary 





Output:




























Join conditions can be specified in either the FROM or WHERE clauses; specifying them in the FROM clause is recommended. WHERE and HAVING clauses can also contain search conditions to further filter the rows selected by the join conditions.

Friday 15 November 2013

Difference between truncate and drop command in SQL

                 Difference between truncate and drop command in SQL


Both truncate and delete are DDL statements and changes done by both can't be reverted. Following are the main differences between both:

Truncate Command:
  • Truncate removes all the records from the table/view
  • Truncate command is used to remove data from the table
  • The schema for the table remains as it is
  • It does not effect relationship between tables
Drop Command:

  • Drop command is used to remove and object from the database
  • The object(table/view/indexes,etc) is completely removed from the database
  • The schema is also removed
  • All the relationship for that object is removed and if we need that object to use again then we will have to develop it from scratch

Saturday 9 November 2013

Difference between having and where clause in SQL

           Difference between having and where clause in SQL


It is very important to know the difference between Having and Where in order to get the desired results from the query

WHERE clause:
  • can be used with Select,Insert,Delete statements
  • can't use aggregate functions in where clause
  • Where clause filter results before the results are Grouped as Where clause can only be written before Group clause 
  • Where clause applies to rows
Example:

Select fname,country_name from person
where country_name in ('US','UK')

HAVING clause:
  • can be used with Select statement only
  • can use aggregate functions in Having clause
  • Having clause filters results after the results are grouped as having clause can be used after Group clause only
  • Having clause applies to groups
Example:

Select fname,max(sal) from person
group by fname having max(sal) > 10000


Similar Posts:  CTE vs VIEW, TEMP TABLE vs CTE

Synchronous and Asynchronous transformation in ssis

Synchronous and Asynchronous transformation in ssis


                     Synchronous transformation process the incoming rows and pass it to the output one at a time.Output is synchronized with the input. it does not create a separate buffer for output. same buffer is used both for input and output rows. All the non-blocking transformations are synchronous transformation. 

                   An Asynchronous transformation store the input data into the buffer and perform some operations before releasing it to the output buffer. There is two different buffer create for both input and output.Semi-Blocking and Full-Blocking transformations are asynchronous transformation.

Nonblocking, Partial Blocking and Blocking Transformations in SSIS

   Nonblocking, Partial Blocking and Blocking Transformations In SSIS



Transformation in SSIS lets you manipulate data during ETL(extract, transform and Load).  Operations such as aggregate, merge, split, modify,etc can be performed on data using transformations.

Transformations are generally divided into three categories as follows:

  1. Non-blocking transformation: The output of this transformation uses the same buffer as that of input. It works on row by row basis. The transformation works on a synchronous basis. This transformations are efficient and lightweight results in better and faster transformation compared to other types of transformation.
  2. Partial Blocking: This transformations works on a row-set basis and stores the data in buffer before releasing it to the output buffer.This transformation creates a new buffer for output rows. As this transformations generally stores some data in the buffer to perform  some operations this are generally heavy-weighted compare to Non-blocking transformations and hence performance is slower compared to Non-blocking transformation.
  3.  Blocking Transformations: This transformations work on the whole data set at a time and hence stores the entire data in buffer. This transformation works on asynchronous basis.This transformation creates new buffer for output rows. consider an aggregate transformation which calculate the sum of a field in the data set, for this the transformation will have to store the entire data into buffer and then perform sum on that data set and then the results will be sent to output buffer. Because transformation stores the entire data set into the buffer, this transformations are generally the most heavy-weighted compare to other transformations and hence results in low performance. 

Following are the list of Non blocking, Partial Blocking and Blocking Transformations in SSIS