Wednesday, February 6, 2013

Half cooked concepts are dangerous

Well I am not sure, what it called in English or any other language but in Pashto we say, A half scholar is deadly danger for your knowledge, and a half doctor is deadly danger for your health.

The same principle apply to every field including database designing, I came to this conclusion when I encounter such practice I am currently analyzing for optimization.

Some people use a technique for publishing data, the data that is formatted for some particular purpose to be access with optimum performance, for this purpose the removed some checking constraints, and de-normalize it. 

De-normalize means you will have no need to put join in your query and all your desired data will be in a single table, and that table might be used for a single purpose. The constraints are removed for faster writing, because you might be updating it too quickly in a background/a-sync process, as well the query used for extracting information might be expensive.

But I have seen example of joining tables that were de-normalized and some constraints were removed, the performance of was too bad, there was a lot of I/O read, CPU utilization and more then 20K Operator  cast. 

I was technically not allowed to make some physical design changes in the current system, so I had to stay with the current design and optimize it to some acceptable level. 

I put some clustered and non clustered indices on main table and on associated tables. I get some 40% performance, and when I put some statistics on few columns I was amazed by the performance I got 85 to 90% performance gain.

The conclusion is never max premature concepts and avoid using publish table for different purposes, what I suggest always use a single publishing table for a single purpose.    


Sunday, February 3, 2013

Publishing Tables Issue

Today I encountered a very straight forward SQL Query  a simple Select with columns list, and order by an integer column. I wondered because there was no such big columns in the selection list, and even there were only a few thousands of records.

 When I analyzed the underlying table I cam to know that it was a publishing table (as it belonged to pub schema), which does not has any separate primary key nor it inherit it from some other tables, so in short there were no Clustered Index neither Non Clustered Index. Here is the execution plan details.





I experiment on that same table I created another table with an Auto ID columns to be used as primary key. So I just got approximately 4 to 7 percent performance as follow.  
So...I was not happy with this performance, I dig a bit on the other side, instead of creating another table, move data in it, was too much for that little performance boost. So I tried another thing why not I should create a clustered index on the table for the column it is sorting the data, and I did it and I got only 2 % percent performance gain.

And finally I got some thing that why I should not try Non clustered index on the same column, so I created a Non Clustered index as follow.

CREATE NONCLUSTERED INDEX [NCI_SequencNo_MyTable] ON [pub].[MyTable] 
(
[SequenceNo] ASC
)
INCLUDE ( [ID],
[Name],
[IsInActive],
[IsSuspended]) WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]

And I got a lot of performance....look at yourself...

I tried another thing, why not to remove the clustered index I created before which did give me only 2 % performance, and see if I can get more. Here is the final execution plan details I got.


And the most beutiful thing was in the execution plan I was not seeing the sorting details, seems it was removeing the Order By clause from the query, as it understands the result will be sorted..