Wednesday, May 6, 2009

Comparing Hardcoded functions, In-Line TVF's, and Scalar Functions

While perusing some of the threads on SQLServerCentral.com one of the questions that hit me was regarding the use of scalar functions in queries.  In this particular case, in the column list of a SELECT statement.  Did it make sense to encapsulate a routine to strip of the time portion of a datetime value or should you put the actual function call(s) in the SELECT list.

I decided to run some tests so I created to user-defined functions that would strip off the time portion of a datetime value, one was created as a scalar function and the second as an in-line table valued function.  In my SandBox database I have a table, dbo.LAPTest, that has 17,996,362 records.  One of the columns in this table is a datetime column with random datetime values.

Here is the code for the two functions I wrote for the test:

CREATE FUNCTION [dbo].[ufnDayOnly](@DateValue DateTime)
RETURNS table
AS
RETURN (select dateadd(dd, datediff(dd,0, @DateValue),0) as DateOnly );
GO


CREATE FUNCTION [dbo].[fn_DayOnly] (@DateValue DateTime)
RETURNS DateTime
AS
BEGIN
RETURN dateadd(dd, datediff(dd,0, @DateValue),0)
END
GO



I then wrote three separate T-SQL scripts, shown below, that will run 8 time each selecting 1 to 10,000,000 rows.  Each script strips the time portion of datetime values using three methods; hardcoded date functions, the table-valued function, and the scalar function.  Each time through the loop we will clear the caches.



Set NoCount On;
Declare @cpu_ int,
@lreads_ int,
@eMsec_ int;
declare @AcctID int,
@Amount money,
@TDate datetime,
@ADate datetime;
declare @LoopCnt int;
set @LoopCnt = 0;

while @LoopCnt < 8
begin
dbcc freeproccache with no_infomsgs;
dbcc freesystemcache('ALL') with no_infomsgs;

Select
@cpu_ = cpu_time
, @lreads_ = logical_reads
, @eMsec_ = total_elapsed_time
From
sys.dm_exec_requests
Where
session_id = @@spid;

select top (power(10, @LoopCnt))
@AcctID = AccountID,
@Amount = Amount,
@TDate = TDate,
@ADate = dateadd(dd,datediff(dd,0,TDate),0)
from
dbo.LAPTest;

Select
power(10, @LoopCnt) as RowsProcessed,
cpu_time-@cpu_ as CpuMs
, logical_reads- @lreads_ as LogRds
, total_elapsed_time - @eMsec_ as Elapsed
From
sys.dm_exec_requests
Where
session_id = @@spid;
set @LoopCnt = @LoopCnt + 1;
end


 



Set NoCount On;
Declare @cpu_ int,
@lreads_ int,
@eMsec_ int;
declare @AcctID int,
@Amount money,
@TDate datetime,
@ADate datetime;
declare @LoopCnt int;
set @LoopCnt = 0;

while @LoopCnt < 8
begin
dbcc freeproccache with no_infomsgs;
dbcc freesystemcache('ALL') with no_infomsgs;

Select
@cpu_ = cpu_time
, @lreads_ = logical_reads
, @eMsec_ = total_elapsed_time
From
sys.dm_exec_requests
Where
session_id = @@spid;

select top (power(10, @LoopCnt))
@AcctID = AccountID,
@Amount = Amount,
@TDate = TDate,
@ADate = DateOnly
from
dbo.LAPTest
cross apply dbo.ufnDayOnly(TDate);

Select
power(10, @LoopCnt) as RecordsProcessed,
cpu_time-@cpu_ as CpuMs
, logical_reads- @lreads_ as LogRds
, total_elapsed_time - @eMsec_ as Elapsed
From
sys.dm_exec_requests
Where
session_id = @@spid;
set @LoopCnt = @LoopCnt + 1;
end


 



Set NoCount On;
Declare @cpu_ int,
@lreads_ int,
@eMsec_ int;
declare @AcctID int,
@Amount money,
@TDate datetime,
@ADate datetime;
declare @LoopCnt int;
set @LoopCnt = 0;

while @LoopCnt < 8
begin
dbcc freeproccache with no_infomsgs;
dbcc freesystemcache('ALL') with no_infomsgs;

Select
@cpu_ = cpu_time
, @lreads_ = logical_reads
, @eMsec_ = total_elapsed_time
From
sys.dm_exec_requests
Where
session_id = @@spid;

select top (power(10,@LoopCnt))
@AcctID = AccountID,
@Amount = Amount,
@TDate = TDate,
@ADate = dbo.fn_DayOnly(TDate)
from
dbo.LAPTest;

Select
power(10,@LoopCnt) as RecordsProcessed,
cpu_time-@cpu_ as CpuMs
, logical_reads- @lreads_ as LogRds
, total_elapsed_time - @eMsec_ as Elapsed
From
sys.dm_exec_requests
Where
session_id = @@spid;
set @LoopCnt = @LoopCnt + 1;
end


The results of running these scripts are as follows:





















































































































































































































Hardcoded Functions In-Line TVF/Cross Apply Scalar Function

RowsProcessed



CpuMs



LogRds



Elapsed



CpuMs



LogRds



Elapsed



CpuMs



LogRds



Elapsed



1



0



26



1



0



52



1



0



40



2



10



0



4



0



0



4



1



0



18



1



100



0



4



1



0



4



0



0



18



2



1000



0



14



1



0



14



2



16



28



36



10000



16



58



12



16



58



13



94



72



102



100000



125



523



143



125



523



115



703



537



699



1000000



703



5130



730



703



5130



715



7687



5144



7958



10000000



6984



51171



7072



7063



51171



7157



67016



51185



67414




As you can see there isn't much difference between the hardcoded functions and the In-Line TVF/Cross Apply, however, as the number of rows increase there is a definite disadvantage to using the scalar function.



Does this mean you should convert all your scalar functions to In-Line TVF's?  Not necessarily, as you may need those functions else where, such as a WHERE clause where it may only be called once.



Does this mean you should always use the CROSS APPLY with In-Line TVF's? Again, not necessarily as it may not be performant either in all cases.



What can I suggest?  Test, test, and test again is the best advice.  The testing I ran above is limited and not necessarily what you may do on a normal, day to day basis.  But it does give you some food for thought.



 



See you on the forums.  I look forward to seeing what you all have to say.

Friday, May 1, 2009

Published, Again!

I have written two new articles and they are going to be published, not on SSC but on SSWUG.  I thought I'd help out Chris Shaw a bit and write a few articles for him as well.  Plus it broadens my horizons some as well.

The first article, based on numerous posts here on SSC is titled Deleting Numerous Rows from a Table, and will be published on May 25th.  It shows how to delete records in batches and control the size of the transaction log files, if the database is using the bulk logged or full recovery model.  This article originally started as my first blog entry, but when I finished it, it seemed more like an article.

The second article, to be published on June 1st, is titled The Dynamic Tally or Numbers Table.  I wrote this one after reading an article in SQL Server Magazine about creating a tally table.  I think you will find the methodology behind my dynamic tally table much more performant than the one in the SQL Server Magazine article.