Bollywood Updates: 2013

Saturday, 7 December 2013

Creating simple Bar chart in Tableau

Creating simple Bar chart in Tableau

This is my first post on Tableau. Tableau is a fantastic data visualization tool. It help us to create some fantastic dashboards using this tool.

In this post we will learn how to create a simple bar chart in the dashboard.

As shown in the above screen shot, my data contains list of countries the regions and sub-region in which they falls.

We will now create a bar chart which will display the no. of countries according to the regions.

Step1: Drag and drop Region field in the dimension section to columns and Number of countries field in the row section.

A graph will be plotted automatically as shown in the above figure.

Step2: To add label to the graph, drag and drop the 'number of countries' field in the label section.

Now the labels is visible on the graph

Step3: Now we will color the graph according to the 'Number of countries'. The darker the color the larger the number of countries available in the region.

Step4: Sort the graph according to the number of countries. Go to column field and click on the 'Region' field and select the option sort -> ascending-> sort by field-> click OK.

You can also sort the chart in descending order by selecting the descending option.

Friday, 22 November 2013

Difference between Char,Varchar and Nvarchar in SQL

Char DataType:

Char Datatype is used to store fixed length characters.
If declared as Char(100) then it will allocate space of memory for 100 characters, no matter how much characters is actually stored in the column

Varchar DataType:

Abbreviation for Varchar is Variable length characters
It is used to store Non-Unicode characters
Since the name contains 'Variable' it specifies that the memory allocation varies as per the characters stored in the column. For e.g 'Nadir' is stored in the column then the memory allocation would be 5. It allocates 1 byte per character
Maximum length for this data type is 8000
Varchar supports collation which requires 1 byte memory storage per characters
No other language except the once requires 1 byte memory storage per characters can be stored for example: English language characters can be stored in varchar

Nvarchar DataType:

Abbreviation for Nvarchar is Uni-code variable length characters
It is used to store Unicode characters
The memory allocation varies similar to varchar datatype except it allocates 2 byte per characters
Maximum length for this data type is 4000
Nvarchar supports all types of collation
Multiple languages can be stored in nvarchar datatype by using prefix 'N'. Below example explains this point.

e.g: Declare @str nvarchar(4000) = N'最低点'

Highlights:

As both varchar and nvarchar datatypes are variable in nature it acquires space as per the characters stored irrespective of the max limit specified
It is always recommendable to use varchar compared to nvarchar when the data is in no other language then English
Query execution for varchar is generally faster compared to nvarchar

Sunday, 17 November 2013

Data Flow Task in SSIS

We will begin with the tutorial for Data flow task in SSIS. SSIS is one my most favorite topic and simply like to work on this technology.

There is always a misconception between control flow and data flow. Hence we need to know the difference between both.

Control flow

A control flow consists of one or more tasks and containers that execute when the package runs. To control order or define the conditions for running the next task or container in the package control flow, we use precedence constraints to connect the tasks and containers in a package. A subset of tasks and containers can also be grouped and run repeatedly as a unit within the package control flow. SQL Server 2005 Integration Services (SSIS) provides three different types of control flow elements: Containers that provide structures in packages, Tasks that provide functionality, and Precedence Constraints that connect the executables, containers, and tasks into an ordered control flow.

Data flow

A data flow consists of the sources and destinations that extract and load data, the transformations that modify and extend data, and the paths that link sources, transformations, and destinations The Data Flow task is the executable within the SSIS package that creates, orders, and runs the data flow. A separate instance of the data flow engine is opened for each Data Flow task in a package. Data Sources, Transformations, and Data Destinations are the three important categories in the Data Flow.

Data flow as name suggests is use to transfer data from source to destination. Data flow should always contain a source and a destination. If business demands us to modify our data while transferring data from source to destination we might need to use transformations to achieve our task.

Following are some commonly used transformation that are available:

AGGREGATE - It applies aggregate functions to Record Sets to produce new output records from aggregated values.
AUDIT - Adds Package and Task level Metadata - such as Machine Name, Execution Instance, Package Name, Package ID, etc..
CHARACTER MAP - Performs SQL Server level makes string data changes such as changing data from lower case to upper case.
CONDITIONAL SPLIT – Separates available input into separate output pipelines based on Boolean Expressions configured for each output.
COPY COLUMN - Add a copy of column to the output we can later transform the copy keeping the original for auditing.
DATA CONVERSION - Converts columns data types from one to another type. It stands for Explicit Column Conversion.
DATA MINING QUERY – Used to perform data mining query against analysis services and manage Predictions Graphs and Controls.
DERIVED COLUMN - Create a new (computed) column from given expressions.
EXPORT COLUMN – Used to export a Image specific column from the database to a flat file.
FUZZY GROUPING – Used for data cleansing by finding rows that are likely duplicates.
FUZZY LOOKUP - Used for Pattern Matching and Ranking based on fuzzy logic.
IMPORT COLUMN - Reads image specific column from database onto a flat file.
LOOKUP - Performs the lookup (searching) of a given reference object set against a data source. It is used for exact matches only.
MERGE - Merges two sorted data sets into a single data set into a single data flow.
MERGE JOIN - Merges two data sets into a single dataset using a join junction.
MULTI CAST - Sends a copy of supplied Data Source onto multiple Destinations.
ROW COUNT - Stores the resulting row count from the data flow / transformation into a variable.
ROW SAMPLING - Captures sample data by using a row count of the total rows in dataflow specified by rows or percentage.
UNION ALL - Merge multiple data sets into a single dataset.
PIVOT – Used for Normalization of data sources to reduce analomolies by converting rows into columns

UNPIVOT – Used for demoralizing the data structure by converts columns into rows incase of building Data Warehouses.

Saturday, 16 November 2013

Difference between UNION and UNION ALL

Both Union and Union all statements are use to merge data from multiple queries which has same number of columns. Also, the data types of the column should be same.

We will look into the difference between both operators using an example. We have the following two tables

Empdetails:

Custdetails:

Union: Union merge data from two or more queries and returns a sorted unique result set. Since this operator provides distinct result set it takes more time than Union All.

Query:

select * from EmpDetails union select * from custdetails
Output:

Union All: Union all merge data from two or more queries and returns non-unique result set. Since this operator does not perform distinct on the result set, it's performance is much faster than Union operator

Query:

select * from EmpDetails union all select * from custdetails

Output:

Following are the constraints that are associated with UNION and UNION ALL operator:

The tables should contains the same number of columns
Columns in the tables must be of the same data type
Order by clause should be applied to the final result set
Column name for the result set would be of the first query mentioned

See Also: Different Types of Joins in SQL , Difference between CTE and View

Different Types of Joins in SQL

Joins are classified into three types as below:

Inner Join
Outer Join
Cross Join

We will look into this by considering an example. Suppose we have two table as follows:

Empdetails:

EmpSalary:

Inner and outer joins perform join on the basis of at least one field. in our example this common file is empid.

Inner Join: Inner join is the most commonly used join. This join returns rows when there is a match in both the tables. Now we will apply this join between the two sample tables

Query:

select empdetails.empid,empdetails.empname,empsalary.empsalary

from empdetails , empsalary where empdetails.empid = empsalary.empid

Now in this query we are trying to pull salary of employee from empsalary tables for the employees which are present in empdetails table The result for this query will be as follows:

Outer Join: Outer join is of three type: Left outer Join and Right Outer Join

Left outer Join: This join returns all the rows from the left table in conjunction with the matching rows from the right table. If there are no columns matching in the right table, it returns NULL values.

Example:

Query:

select empdetails.empid,empdetails.empname,empsalary.empsalary
from empdetails left outer join empsalary on empdetails.empid = empsalary.empid

Output:

Right outer Join:This join returns all the rows from the right table in conjunction with the matching rows from the left table. If there are no columns matching in the left table, it returns NULL values.

Example:

Query:

select empdetails.empid,empdetails.empname,empsalary.empid,empsalary.empname,empsalary.empsalary

from empdetails right outer join empsalary on empdetails.empid = empsalary.empid

Output:

Full outer Join:This join combines left outer join and right outer join. It returns row from either table when the conditions are met and returns null value when there is no match.

Example:

Query:

select empdetails.empid,empdetails.empname,empsalary.empid,empsalary.empname,empsalary.empsalary

from empdetails full outer join empsalary on empdetails.empid = empsalary.empid

Output:

Cross Join: It is the Cartesian product of the two tables involved.

Example:

Query:

select empdetails.empid,empdetails.empname,empsalary.empid,empsalary.empname,empsalary.empsalary

from empdetails cross join empsalary

Output:

Join conditions can be specified in either the FROM or WHERE clauses; specifying them in the FROM clause is recommended. WHERE and HAVING clauses can also contain search conditions to further filter the rows selected by the join conditions.

Friday, 15 November 2013

Difference between truncate and drop command in SQL

Both truncate and delete are DDL statements and changes done by both can't be reverted. Following are the main differences between both:

Truncate Command:

Truncate removes all the records from the table/view
Truncate command is used to remove data from the table
The schema for the table remains as it is
It does not effect relationship between tables

Drop Command:

Drop command is used to remove and object from the database
The object(table/view/indexes,etc) is completely removed from the database
The schema is also removed
All the relationship for that object is removed and if we need that object to use again then we will have to develop it from scratch

Saturday, 9 November 2013

Difference between having and where clause in SQL

It is very important to know the difference between Having and Where in order to get the desired results from the query

WHERE clause:

can be used with Select,Insert,Delete statements
can't use aggregate functions in where clause
Where clause filter results before the results are Grouped as Where clause can only be written before Group clause
Where clause applies to rows

Example:

Select fname,country_name from person

where country_name in ('US','UK')

HAVING clause:

can be used with Select statement only
can use aggregate functions in Having clause
Having clause filters results after the results are grouped as having clause can be used after Group clause only
Having clause applies to groups

Example:

Select fname,max(sal) from person

group by fname having max(sal) > 10000

Similar Posts: CTE vs VIEW, TEMP TABLE vs CTE

Synchronous and Asynchronous transformation in ssis

Synchronous transformation process the incoming rows and pass it to the output one at a time.Output is synchronized with the input. it does not create a separate buffer for output. same buffer is used both for input and output rows. All the non-blocking transformations are synchronous transformation.

An Asynchronous transformation store the input data into the buffer and perform some operations before releasing it to the output buffer. There is two different buffer create for both input and output.Semi-Blocking and Full-Blocking transformations are asynchronous transformation.

Nonblocking, Partial Blocking and Blocking Transformations in SSIS

Nonblocking, Partial Blocking and Blocking Transformations In SSIS

Transformation in SSIS lets you manipulate data during ETL(extract, transform and Load). Operations such as aggregate, merge, split, modify,etc can be performed on data using transformations.

Transformations are generally divided into three categories as follows:

Non-blocking transformation: The output of this transformation uses the same buffer as that of input. It works on row by row basis. The transformation works on a synchronous basis. This transformations are efficient and lightweight results in better and faster transformation compared to other types of transformation.
Partial Blocking: This transformations works on a row-set basis and stores the data in buffer before releasing it to the output buffer.This transformation creates a new buffer for output rows. As this transformations generally stores some data in the buffer to perform some operations this are generally heavy-weighted compare to Non-blocking transformations and hence performance is slower compared to Non-blocking transformation.
Blocking Transformations: This transformations work on the whole data set at a time and hence stores the entire data in buffer. This transformation works on asynchronous basis.This transformation creates new buffer for output rows. consider an aggregate transformation which calculate the sum of a field in the data set, for this the transformation will have to store the entire data into buffer and then perform sum on that data set and then the results will be sent to output buffer. Because transformation stores the entire data set into the buffer, this transformations are generally the most heavy-weighted compare to other transformations and hence results in low performance.

Following are the list of Non blocking, Partial Blocking and Blocking Transformations in SSIS

Saturday, 12 October 2013

Decision Making statements in Visual Basic

Decision making statements are often used to validate certain condition and based on that execute some piece of code. As the name suggest, this statements are used to check certain conditions. We can have multiple conditions in the statement and according to output, whether the check returns True or False, make some decision in your program.

There are two types of decision making statements in visual basic, as follows:

If Statements
Select Case Statements

If statements can further be classified into three types:

If - Then Statements: An If - Then statements has only one condition and if that condition gets satisfied that a particular piece of code gets executed

Example:

If name = "XYZ" Then

MsgBox "Welcome" & name

End if

If -then-else statements: This statements handles both the scenario, what should happen when a condition gets satisfied and what should happens if it doesn't.

Example:

If name = "XYZ" Then

MsgBox "Welcome" & name

Else

MsgBox "You are not an authorized person"

End if

Nested IF-Then-Else Statements: This statements has two or more statements based on which our program acts.

Example:

If name = "XYZ" Then

MsgBox "Welcome" & name

Elseif name = "ABC"

MsgBox "Welcome" & name

Else

MsgBox "You are not an authorized person"

End if

Select Case statements are generally used when there are more conditions to check , which gets quite complicated to handle

Example:

Select Case name

Case "XYZ"

MsgBox "Welcome" & name

Case "ABC"

MsgBox "Welcome" & name

Case "IJK"

MsgBox "Welcome" & name

Case "ELSE"

MsgBox "You are not an authorized person"

END SELECT

Difference between temp table and CTE

Difference between temp table and CTE

CTE:

CTE is un-materialized/ non-indexable (cannot create indexes on CTE)
CTE is logical/disposable View
CTE persists only till the very next query
CTE cannot have constraints
CTE is mostly used for recursion, as CTE can call itself
CTE resists in memory

Temp Table:

Temp table gets stored in temp table
Temp table persists till the current connection ends
Temp table can be referred in sub procedure
Temp table can have constraints,indexes and primary defined
Indexes can be implemented in Temp Table
Data can be updated in Temp Table
Temp Tables are stored in disk

Similar Posts: Difference between CTE and View , CTE(Common Table Expressions) in SQL , T- SQL Interview Questions and Answer

Difference between CTE and View - SQL

Difference between CTE and View

A CTE is a temporary/logical View, it is not store physically. It is a named query, the result for which is only available to the very next query after the CTE is defined. CTE is defined using WITH clause.

A View is a physical object that is present in the database. View is as good as a Table but it doesn't store data physically as compared to a table, only the data schema is stored in View. View,when referred, pulls data by executing the query that is associated with it.

The biggest difference between a CTE and View, is that, View or derived table cannot call itself, whereas CTE can call itself and hence support recursion.

In many databases, views have options, for instance to index them.

Also See: Having vs Where clause , Difference between temp table and CTE , CTE(Common Table Expressions) in SQL , Difference between DELETE and TRUNCATE command

Monday, 9 September 2013

Sub query/Nested Query/Inner Query in SQL

Sub Query/Nested Query/Inner Query in SQL

Sub query or Inner query or Nested query is a query in a query. A sub query is usually added in the WHERE Clause of the SQL statement.

A Subquery is used to return data that will be used in the main query as a condition to further restrict the data to be retrieved.

Subqueries can be used with the SELECT, INSERT, UPDATE, and DELETE statements along with the operators like =, <, >, >=, <=, IN, BETWEEN etc.

There are a few rules that subqueries must follow:

Subqueries must be enclosed within parentheses.

A subquery can have only one column in the SELECT clause, unless multiple columns are in the main query for the subquery to compare its selected columns.

An ORDER BY cannot be used in a subquery, although the main query can use an ORDER BY. The GROUP BY can be used to perform the same function as the ORDER BY in a subquery.

Subqueries that return more than one row can only be used with multiple value operators, such as the IN operator.

The SELECT list cannot include any references to values that evaluate to a BLOB, ARRAY, CLOB, or NCLOB.

A subquery cannot be immediately enclosed in a set function.

The BETWEEN operator cannot be used with a subquery; however, the BETWEEN operator can be used within the subquery.

CTE in SQL

CTE(Common Table Expressions) in SQL

CTE( Common Table Expressions) was first introduced with SQL 2005. It is the best replacement for Views.

There are times when we might require to work on small data sets that is a part of large database. For example, we might have a huge sized table that contains data for all transactions that a company has made, but we require current month's data only and apply some grouping or filters on that particular data of current month. Typically views are used to do the task. But, views sometimes over occupy our resources as views are stored at system level. CTE is the best substitute for Views. It is also helpful when we don't have access to create object in a database.

Common Table Expressions or CTE are more readable form of derived table. CTE can be declared once and can be used in multiple queries. CTE can be recursive and non-recursive (we will cover this topic in next post)

Common Table Expression Syntax
A Common Table Expression contains three core parts:

The CTE name (this is what follows the WITH keyword)
The column list (optional)
The query (appears within parentheses after the AS keyword)

EXAMPLE:

WITH ProductAndCategoryNamesOverTenDollars (ProductName, CategoryName, UnitPrice) AS
(
   SELECT
      p.ProductName,
      c.CategoryName,
      p.UnitPrice
   FROM Products p
      INNER JOIN Categories c ON
         c.CategoryID = p.CategoryID
   WHERE p.UnitPrice > 10.0
)

SELECT *
FROM ProductAndCategoryNamesOverTenDollars
ORDER BY CategoryName ASC, UnitPrice ASC, ProductName ASC

The CTE is part of the subsequent statement only.The subsequent statement can be a single SELECT/INSERT/UPDATE/DELETE, or a compound (with UNION, INTERSECT etc). So if we modify our above example and include a query between CTE creation query(WITH) and the query that refer the CTE.(like below)

SELECT * FROM Products

SELECT *
FROM ProductAndCategoryNamesOverTenDollars
ORDER BY CategoryName ASC, UnitPrice ASC, ProductName ASC

Here, we will see that when the query that refers to CTE(ProductAndCategoryNamesOverTenDollars) is executed we will get an error "Invalid object name 'ProductAndCategoryNamesOverTenDollars'"

Also See: Difference between CTE and View , Difference between temp table and CTE

Sunday, 8 September 2013

VBA- Enable Developer Tab - EXCEL

Enable Developer Tab

Following are the steps to enable developer tab in Excel 2007

Click on 'Office Button' to top left of the workbook and then click on 'Excel Options'

Check the option 'Show developer tab in the ribbon' and click on OK

You might now be able to see developer tab in the workbook

To start the Visual Basic Editor click on 'Developer' Tab and click on the 'Visual Studio' option (very first option from left hand side)

You will now see visual basic editor to begin coding

SQL Tuning/Query Optimization/SQL Optimization

Query Optimization is often misunderstood as a technique or some setting that we need to change in SQL. In-real query optimization is something related to best practice that we have to follow in your day-to-day life.

Query optimization is all about how we use clause and operators in our query.

I have listed below some examples that we might be using daily and best practice that needs to follow to optimize for better performance.

SQL query becomes faster if you specify column names in query instead of '*'

EXAMPLE:

Normal way:

select * from emp_details

Optimized way:

select fname,lname from emp_details

Always use HAVING clause instead of WHERE clause whenever you are using GROUP BY in your query. Using HAVING clause allows SQL to first group the data and then filter the grouped data using HAVING clause.

EXAMPLE:

Normal way:

select fname.lname from emp_details

where fname like 'XYZ'

group by fname

Optimized way:

select fname,lname from emp_details

group by fname

having fname like 'XYZ'

Create Index on the column which we will be using to filter data.Following priority order should be followed when any index is created a) WHERE clause, b) JOIN clause, c) ORDER BY clause, d) SELECT clause

Sometimes you may have more than one subqueries in your main query. Try to minimize the number of subquery block in your query.

EXAMPLE:

Normal way:

SELECT name FROM employee

WHERE salary = (SELECT MAX(salary) FROM employee_details)
AND age = (SELECT MAX(age) FROM employee_details)
AND emp_dept = 'Electronics';

Optimized way:

SELECT name FROM employee

WHERE (salary, age ) = (SELECT MAX (salary), MAX (age)
FROM employee_details)
AND dept = 'Electronics';

Always use EXISTS instead of IN operator, Generally on large data IN operator works really slow compare to EXISTS. IN is efficient when most of the filter criteria is in the sub-query. EXISTS is efficient when most of the filter criteria is in the main query.

Always use UNION ALL instead of UNION if you does not expect unique result. UNION checks for duplicate rows and eliminate it . Hence it takes more time for execution compare to UNION ALL

If you want to re-use the query, create a STORE PROCEDURE for the same.

Check if there is at-least 30% HHD is empty – it improves the performance a bit

Remove any unnecessary joins from table

IDENTITY in SQL

I have often came across many questions on IDENTITY like 'What is IDENTITY?', 'What is the use of IDENTITY?' , 'How does IDENTITY help us identify duplicate rows in a table?', etc...

This post will cover the most basic functionality of IDENTITY

What is IDENTITY?

The IDENTITY clause specifies that the column is an IDENTITY column
If the column is set as IDENTITY column than it will be filled with an auto-incremented number and no other data can be inserted into that column
IDENTITY is also known as 'Auto-Increment'
IDENTITY is used with CREATE and ALTER statements
It supports only int, bigint, smallint, tinyint, or decimal / numeric with a scale of 0

Syntax:

IDENTITY [ ( seed , increment ) ]

Arguments

seed

Is the value that is used for the very first row loaded into the table.

increment

Is the incremental value that is added to the identity value of the previous row that was loaded.

You must specify both the seed and increment or neither. If neither is specified, the default is (1,1).

Example:

CREATE TABLE new_employees(
id_num int IDENTITY(1,1),
fname varchar (20),
minit char(1),
lname varchar(30))

Alter Table new_employees

add id_num bigint identity(1,1)

What is the use of IDENTITY?

The most common use of IDENTITY is to set PRIMARY Key column or UNIQUE key column as IDENTITY column which lets you provide a simple unique row identifier without the need to maintain it yourself. SQL Server does the work for you.

How does IDENTITY help us identify duplicate rows in a table?

We can use IDENTITY to identify duplicate rows in a table. Now we will go through an example to know how it works. Suppose we have a table named 'new_employees' and the data in the table is as in the below screen shot.

Now if you observe there is multiple entry made for 'XYZ' named employee in the table. Now you come to know that there is duplicate entry in the table and here you can identify the id for this row and type a simple DELETE Command to delete row based on 'id_num'. But suppose there is duplicate entry for more than 1,000 employees, in this case it will not be possible to to identify 'id_num' manually.

In this case, we will simply write a sub query to get the desired results.

Sub query:

select * from new_employees where id_num in (select MAX(id_num) from new_employees group by fname)

The output of the query is as below

If you see the latest entry for the employee 'XYZ' is selected and the old entry is not considered.

To convert above SELECT statement into DELETE command to delete the old entry we simply need to do the below highlighted changes.

delete from new_employees where id_num not in (select MAX(id_num) from new_employees group by fname)

Labels

Saturday, 7 December 2013

Friday, 22 November 2013

Difference between Char,Varchar and Nvarchar in SQL

Sunday, 17 November 2013

Data Flow Task in SSIS

Saturday, 16 November 2013

Difference between UNION and UNION ALL

Different Types of Joins in SQL

Friday, 15 November 2013

Difference between truncate and drop command in SQL

Saturday, 9 November 2013

Difference between having and where clause in SQL

Synchronous and Asynchronous transformation in ssis

Nonblocking, Partial Blocking and Blocking Transformations In SSIS

Saturday, 12 October 2013

Decision Making statements in Visual Basic

Difference between CTE and View

Monday, 9 September 2013

Sub Query/Nested Query/Inner Query in SQL

See also: Difference between CTE and View - SQL, T- SQL, SSIS, SSRS Interview Questions and Answer

CTE(Common Table Expressions) in SQL

Also See: Difference between CTE and View , Difference between temp table and CTE

Sunday, 8 September 2013

Enable Developer Tab

SQL Tuning/Query Optimization/SQL Optimization

IDENTITY in SQL

Syntax:

Arguments