Bollywood Updates: November 2013

Friday, 22 November 2013

Difference between Char,Varchar and Nvarchar in SQL

Char DataType:

Char Datatype is used to store fixed length characters.
If declared as Char(100) then it will allocate space of memory for 100 characters, no matter how much characters is actually stored in the column

Varchar DataType:

Abbreviation for Varchar is Variable length characters
It is used to store Non-Unicode characters
Since the name contains 'Variable' it specifies that the memory allocation varies as per the characters stored in the column. For e.g 'Nadir' is stored in the column then the memory allocation would be 5. It allocates 1 byte per character
Maximum length for this data type is 8000
Varchar supports collation which requires 1 byte memory storage per characters
No other language except the once requires 1 byte memory storage per characters can be stored for example: English language characters can be stored in varchar

Nvarchar DataType:

Abbreviation for Nvarchar is Uni-code variable length characters
It is used to store Unicode characters
The memory allocation varies similar to varchar datatype except it allocates 2 byte per characters
Maximum length for this data type is 4000
Nvarchar supports all types of collation
Multiple languages can be stored in nvarchar datatype by using prefix 'N'. Below example explains this point.

e.g: Declare @str nvarchar(4000) = N'最低点'

Highlights:

As both varchar and nvarchar datatypes are variable in nature it acquires space as per the characters stored irrespective of the max limit specified
It is always recommendable to use varchar compared to nvarchar when the data is in no other language then English
Query execution for varchar is generally faster compared to nvarchar

Sunday, 17 November 2013

Data Flow Task in SSIS

We will begin with the tutorial for Data flow task in SSIS. SSIS is one my most favorite topic and simply like to work on this technology.

There is always a misconception between control flow and data flow. Hence we need to know the difference between both.

Control flow

A control flow consists of one or more tasks and containers that execute when the package runs. To control order or define the conditions for running the next task or container in the package control flow, we use precedence constraints to connect the tasks and containers in a package. A subset of tasks and containers can also be grouped and run repeatedly as a unit within the package control flow. SQL Server 2005 Integration Services (SSIS) provides three different types of control flow elements: Containers that provide structures in packages, Tasks that provide functionality, and Precedence Constraints that connect the executables, containers, and tasks into an ordered control flow.

Data flow

A data flow consists of the sources and destinations that extract and load data, the transformations that modify and extend data, and the paths that link sources, transformations, and destinations The Data Flow task is the executable within the SSIS package that creates, orders, and runs the data flow. A separate instance of the data flow engine is opened for each Data Flow task in a package. Data Sources, Transformations, and Data Destinations are the three important categories in the Data Flow.

Data flow as name suggests is use to transfer data from source to destination. Data flow should always contain a source and a destination. If business demands us to modify our data while transferring data from source to destination we might need to use transformations to achieve our task.

Following are some commonly used transformation that are available:

AGGREGATE - It applies aggregate functions to Record Sets to produce new output records from aggregated values.
AUDIT - Adds Package and Task level Metadata - such as Machine Name, Execution Instance, Package Name, Package ID, etc..
CHARACTER MAP - Performs SQL Server level makes string data changes such as changing data from lower case to upper case.
CONDITIONAL SPLIT – Separates available input into separate output pipelines based on Boolean Expressions configured for each output.
COPY COLUMN - Add a copy of column to the output we can later transform the copy keeping the original for auditing.
DATA CONVERSION - Converts columns data types from one to another type. It stands for Explicit Column Conversion.
DATA MINING QUERY – Used to perform data mining query against analysis services and manage Predictions Graphs and Controls.
DERIVED COLUMN - Create a new (computed) column from given expressions.
EXPORT COLUMN – Used to export a Image specific column from the database to a flat file.
FUZZY GROUPING – Used for data cleansing by finding rows that are likely duplicates.
FUZZY LOOKUP - Used for Pattern Matching and Ranking based on fuzzy logic.
IMPORT COLUMN - Reads image specific column from database onto a flat file.
LOOKUP - Performs the lookup (searching) of a given reference object set against a data source. It is used for exact matches only.
MERGE - Merges two sorted data sets into a single data set into a single data flow.
MERGE JOIN - Merges two data sets into a single dataset using a join junction.
MULTI CAST - Sends a copy of supplied Data Source onto multiple Destinations.
ROW COUNT - Stores the resulting row count from the data flow / transformation into a variable.
ROW SAMPLING - Captures sample data by using a row count of the total rows in dataflow specified by rows or percentage.
UNION ALL - Merge multiple data sets into a single dataset.
PIVOT – Used for Normalization of data sources to reduce analomolies by converting rows into columns

UNPIVOT – Used for demoralizing the data structure by converts columns into rows incase of building Data Warehouses.

Saturday, 16 November 2013

Difference between UNION and UNION ALL

Both Union and Union all statements are use to merge data from multiple queries which has same number of columns. Also, the data types of the column should be same.

We will look into the difference between both operators using an example. We have the following two tables

Empdetails:

Custdetails:

Union: Union merge data from two or more queries and returns a sorted unique result set. Since this operator provides distinct result set it takes more time than Union All.

Query:

select * from EmpDetails union select * from custdetails
Output:

Union All: Union all merge data from two or more queries and returns non-unique result set. Since this operator does not perform distinct on the result set, it's performance is much faster than Union operator

Query:

select * from EmpDetails union all select * from custdetails

Output:

Following are the constraints that are associated with UNION and UNION ALL operator:

The tables should contains the same number of columns
Columns in the tables must be of the same data type
Order by clause should be applied to the final result set
Column name for the result set would be of the first query mentioned

See Also: Different Types of Joins in SQL , Difference between CTE and View

Different Types of Joins in SQL

Joins are classified into three types as below:

Inner Join
Outer Join
Cross Join

We will look into this by considering an example. Suppose we have two table as follows:

Empdetails:

EmpSalary:

Inner and outer joins perform join on the basis of at least one field. in our example this common file is empid.

Inner Join: Inner join is the most commonly used join. This join returns rows when there is a match in both the tables. Now we will apply this join between the two sample tables

Query:

select empdetails.empid,empdetails.empname,empsalary.empsalary

from empdetails , empsalary where empdetails.empid = empsalary.empid

Now in this query we are trying to pull salary of employee from empsalary tables for the employees which are present in empdetails table The result for this query will be as follows:

Outer Join: Outer join is of three type: Left outer Join and Right Outer Join

Left outer Join: This join returns all the rows from the left table in conjunction with the matching rows from the right table. If there are no columns matching in the right table, it returns NULL values.

Example:

Query:

select empdetails.empid,empdetails.empname,empsalary.empsalary
from empdetails left outer join empsalary on empdetails.empid = empsalary.empid

Output:

Right outer Join:This join returns all the rows from the right table in conjunction with the matching rows from the left table. If there are no columns matching in the left table, it returns NULL values.

Example:

Query:

select empdetails.empid,empdetails.empname,empsalary.empid,empsalary.empname,empsalary.empsalary

from empdetails right outer join empsalary on empdetails.empid = empsalary.empid

Output:

Full outer Join:This join combines left outer join and right outer join. It returns row from either table when the conditions are met and returns null value when there is no match.

Example:

Query:

select empdetails.empid,empdetails.empname,empsalary.empid,empsalary.empname,empsalary.empsalary

from empdetails full outer join empsalary on empdetails.empid = empsalary.empid

Output:

Cross Join: It is the Cartesian product of the two tables involved.

Example:

Query:

select empdetails.empid,empdetails.empname,empsalary.empid,empsalary.empname,empsalary.empsalary

from empdetails cross join empsalary

Output:

Join conditions can be specified in either the FROM or WHERE clauses; specifying them in the FROM clause is recommended. WHERE and HAVING clauses can also contain search conditions to further filter the rows selected by the join conditions.

Friday, 15 November 2013

Difference between truncate and drop command in SQL

Both truncate and delete are DDL statements and changes done by both can't be reverted. Following are the main differences between both:

Truncate Command:

Truncate removes all the records from the table/view
Truncate command is used to remove data from the table
The schema for the table remains as it is
It does not effect relationship between tables

Drop Command:

Drop command is used to remove and object from the database
The object(table/view/indexes,etc) is completely removed from the database
The schema is also removed
All the relationship for that object is removed and if we need that object to use again then we will have to develop it from scratch

Saturday, 9 November 2013

Difference between having and where clause in SQL

It is very important to know the difference between Having and Where in order to get the desired results from the query

WHERE clause:

can be used with Select,Insert,Delete statements
can't use aggregate functions in where clause
Where clause filter results before the results are Grouped as Where clause can only be written before Group clause
Where clause applies to rows

Example:

Select fname,country_name from person

where country_name in ('US','UK')

HAVING clause:

can be used with Select statement only
can use aggregate functions in Having clause
Having clause filters results after the results are grouped as having clause can be used after Group clause only
Having clause applies to groups

Example:

Select fname,max(sal) from person

group by fname having max(sal) > 10000

Similar Posts: CTE vs VIEW, TEMP TABLE vs CTE

Synchronous and Asynchronous transformation in ssis

Synchronous transformation process the incoming rows and pass it to the output one at a time.Output is synchronized with the input. it does not create a separate buffer for output. same buffer is used both for input and output rows. All the non-blocking transformations are synchronous transformation.

An Asynchronous transformation store the input data into the buffer and perform some operations before releasing it to the output buffer. There is two different buffer create for both input and output.Semi-Blocking and Full-Blocking transformations are asynchronous transformation.

Nonblocking, Partial Blocking and Blocking Transformations in SSIS

Nonblocking, Partial Blocking and Blocking Transformations In SSIS

Transformation in SSIS lets you manipulate data during ETL(extract, transform and Load). Operations such as aggregate, merge, split, modify,etc can be performed on data using transformations.

Transformations are generally divided into three categories as follows:

Non-blocking transformation: The output of this transformation uses the same buffer as that of input. It works on row by row basis. The transformation works on a synchronous basis. This transformations are efficient and lightweight results in better and faster transformation compared to other types of transformation.
Partial Blocking: This transformations works on a row-set basis and stores the data in buffer before releasing it to the output buffer.This transformation creates a new buffer for output rows. As this transformations generally stores some data in the buffer to perform some operations this are generally heavy-weighted compare to Non-blocking transformations and hence performance is slower compared to Non-blocking transformation.
Blocking Transformations: This transformations work on the whole data set at a time and hence stores the entire data in buffer. This transformation works on asynchronous basis.This transformation creates new buffer for output rows. consider an aggregate transformation which calculate the sum of a field in the data set, for this the transformation will have to store the entire data into buffer and then perform sum on that data set and then the results will be sent to output buffer. Because transformation stores the entire data set into the buffer, this transformations are generally the most heavy-weighted compare to other transformations and hence results in low performance.

Following are the list of Non blocking, Partial Blocking and Blocking Transformations in SSIS

Labels

Friday, 22 November 2013

Difference between Char,Varchar and Nvarchar in SQL

Sunday, 17 November 2013

Data Flow Task in SSIS

Saturday, 16 November 2013

Difference between UNION and UNION ALL

Different Types of Joins in SQL

Friday, 15 November 2013

Difference between truncate and drop command in SQL

Saturday, 9 November 2013

Difference between having and where clause in SQL

Synchronous and Asynchronous transformation in ssis

Nonblocking, Partial Blocking and Blocking Transformations In SSIS