While perusing some of the threads on SQLServerCentral.com one of the questions that hit me was regarding the use of scalar functions in queries. In this particular case, in the column list of a SELECT statement. Did it make sense to encapsulate a routine to strip of the time portion of a datetime value or should you put the actual function call(s) in the SELECT list.
I decided to run some tests so I created to user-defined functions that would strip off the time portion of a datetime value, one was created as a scalar function and the second as an in-line table valued function. In my SandBox database I have a table, dbo.LAPTest, that has 17,996,362 records. One of the columns in this table is a datetime column with random datetime values.
Here is the code for the two functions I wrote for the test:
CREATE FUNCTION [dbo].[ufnDayOnly](@DateValue DateTime)
RETURNS table
AS
RETURN (select dateadd(dd, datediff(dd,0, @DateValue),0) as DateOnly );
GO
CREATE FUNCTION [dbo].[fn_DayOnly] (@DateValue DateTime)
RETURNS DateTime
AS
BEGIN
RETURN dateadd(dd, datediff(dd,0, @DateValue),0)
END
GO
I then wrote three separate T-SQL scripts, shown below, that will run 8 time each selecting 1 to 10,000,000 rows. Each script strips the time portion of datetime values using three methods; hardcoded date functions, the table-valued function, and the scalar function. Each time through the loop we will clear the caches.
Set NoCount On;
Declare @cpu_ int,
@lreads_ int,
@eMsec_ int;
declare @AcctID int,
@Amount money,
@TDate datetime,
@ADate datetime;
declare @LoopCnt int;
set @LoopCnt = 0;
while @LoopCnt < 8
begin
dbcc freeproccache with no_infomsgs;
dbcc freesystemcache('ALL') with no_infomsgs;
Select
@cpu_ = cpu_time
, @lreads_ = logical_reads
, @eMsec_ = total_elapsed_time
From
sys.dm_exec_requests
Where
session_id = @@spid;
select top (power(10, @LoopCnt))
@AcctID = AccountID,
@Amount = Amount,
@TDate = TDate,
@ADate = dateadd(dd,datediff(dd,0,TDate),0)
from
dbo.LAPTest;
Select
power(10, @LoopCnt) as RowsProcessed,
cpu_time-@cpu_ as CpuMs
, logical_reads- @lreads_ as LogRds
, total_elapsed_time - @eMsec_ as Elapsed
From
sys.dm_exec_requests
Where
session_id = @@spid;
set @LoopCnt = @LoopCnt + 1;
end
Set NoCount On;
Declare @cpu_ int,
@lreads_ int,
@eMsec_ int;
declare @AcctID int,
@Amount money,
@TDate datetime,
@ADate datetime;
declare @LoopCnt int;
set @LoopCnt = 0;
while @LoopCnt < 8
begin
dbcc freeproccache with no_infomsgs;
dbcc freesystemcache('ALL') with no_infomsgs;
Select
@cpu_ = cpu_time
, @lreads_ = logical_reads
, @eMsec_ = total_elapsed_time
From
sys.dm_exec_requests
Where
session_id = @@spid;
select top (power(10, @LoopCnt))
@AcctID = AccountID,
@Amount = Amount,
@TDate = TDate,
@ADate = DateOnly
from
dbo.LAPTest
cross apply dbo.ufnDayOnly(TDate);
Select
power(10, @LoopCnt) as RecordsProcessed,
cpu_time-@cpu_ as CpuMs
, logical_reads- @lreads_ as LogRds
, total_elapsed_time - @eMsec_ as Elapsed
From
sys.dm_exec_requests
Where
session_id = @@spid;
set @LoopCnt = @LoopCnt + 1;
end
Set NoCount On;
Declare @cpu_ int,
@lreads_ int,
@eMsec_ int;
declare @AcctID int,
@Amount money,
@TDate datetime,
@ADate datetime;
declare @LoopCnt int;
set @LoopCnt = 0;
while @LoopCnt < 8
begin
dbcc freeproccache with no_infomsgs;
dbcc freesystemcache('ALL') with no_infomsgs;
Select
@cpu_ = cpu_time
, @lreads_ = logical_reads
, @eMsec_ = total_elapsed_time
From
sys.dm_exec_requests
Where
session_id = @@spid;
select top (power(10,@LoopCnt))
@AcctID = AccountID,
@Amount = Amount,
@TDate = TDate,
@ADate = dbo.fn_DayOnly(TDate)
from
dbo.LAPTest;
Select
power(10,@LoopCnt) as RecordsProcessed,
cpu_time-@cpu_ as CpuMs
, logical_reads- @lreads_ as LogRds
, total_elapsed_time - @eMsec_ as Elapsed
From
sys.dm_exec_requests
Where
session_id = @@spid;
set @LoopCnt = @LoopCnt + 1;
end
The results of running these scripts are as follows:
|
Hardcoded Functions |
In-Line TVF/Cross Apply |
Scalar Function |
|
RowsProcessed |
CpuMs |
LogRds |
Elapsed |
CpuMs |
LogRds |
Elapsed |
CpuMs |
LogRds |
Elapsed |
1 |
0 |
26 |
1 |
0 |
52 |
1 |
0 |
40 |
2 |
10 |
0 |
4 |
0 |
0 |
4 |
1 |
0 |
18 |
1 |
100 |
0 |
4 |
1 |
0 |
4 |
0 |
0 |
18 |
2 |
1000 |
0 |
14 |
1 |
0 |
14 |
2 |
16 |
28 |
36 |
10000 |
16 |
58 |
12 |
16 |
58 |
13 |
94 |
72 |
102 |
100000 |
125 |
523 |
143 |
125 |
523 |
115 |
703 |
537 |
699 |
1000000 |
703 |
5130 |
730 |
703 |
5130 |
715 |
7687 |
5144 |
7958 |
10000000 |
6984 |
51171 |
7072 |
7063 |
51171 |
7157 |
67016 |
51185 |
67414 |
As you can see there isn't much difference between the hardcoded functions and the In-Line TVF/Cross Apply, however, as the number of rows increase there is a definite disadvantage to using the scalar function.
Does this mean you should convert all your scalar functions to In-Line TVF's? Not necessarily, as you may need those functions else where, such as a WHERE clause where it may only be called once.
Does this mean you should always use the CROSS APPLY with In-Line TVF's? Again, not necessarily as it may not be performant either in all cases.
What can I suggest? Test, test, and test again is the best advice. The testing I ran above is limited and not necessarily what you may do on a normal, day to day basis. But it does give you some food for thought.
See you on the forums. I look forward to seeing what you all have to say.