March 19, 2009

Compare varchar and int - Implicit vs Expclicit conversions

Hi

While working on a query I stumbled across the following issue completely by mistake, but I believe it is worth mentioning.

I had a table with a varchar column that actually contained numbers, or so I believed. In my query as a result of lack of attention I wrote the following predicate
where col_name = 123456

Once I have executed the query, after a few seconds I got an error message:
Msg 245, Level 16, State 1, Line 1
Conversion failed when converting the varchar value 'asdf' to data type int.


I was a bit confused by this message since I had no place in my query where I performed such a conversion until I realized that a conversion is needed for my predicate.
I was pretty shocked that SQL Server chose to convert each varchar value to int and then perform the comparison, instead of just a one time conversion of the int literal into a varchar literal. You would expect the database optimizer to optimize the query... :-)

I fixed my query, but yet again, I performed another benchmark in order to check how much difference exists between the two possible behaviours - converting yourself, or let SQL Server convert for you. In my benchmark, I counted the records with a specific value, in a 1 million rows table. Here are the results:
No conversion - 186 milliseconds
Implicit Conversion - 330 milliseconds
Explicit Conversion - 186 milliseconds

That teaches us an important message - we must help the optimizer help us and we should always check what really happens when our queries are executing.

All of the above is true when there is no index on that varchar column. If there is an index on the varchar column, it's crucial to perform the explicit conversion, otherwise, SQL Server will not be able to perform an index seek and it will perform a table scan instead.

Here is the script:

create table varchar_table (var_col varchar(10))

set nocount on
declare @i int
set @i = 0
begin tran

while @i < class="blsp-spelling-error" id="SPELLING_ERROR_16">varchar_table values (cast(@i as varchar))

if ((@i % 10000)=0)
begin
commit
begin tran
end

set @i = @i+1
end
commit

declare @a datetime
declare @b datetime

set @a = getdate()
select count(*)
from varchar_table
where var_col = '50000'
set @b = getdate()
select 'No conversion', datediff(ms, @a, @b)

set @a = getdate()
select count(*)
from varchar_table
where var_col = 50000
set @b = getdate()
select 'Implicit conversion', datediff(ms, @a, @b)

declare @c int
set @c = 50000
set @a = getdate()
select count(*)
from varchar_table
where var_col = cast(@c as varchar)
set @b = getdate()
select 'Explicit conversion', datediff(ms, @a, @b)



Bye

Best way to truncate time part of datetime

Hi

I needed to truncate the time part from a datetime column in a query I perform, in order to group by the date part of my datetime column, a pretty standard requirement. I have found 3 methods to do so over the web, and I wondered which one performs best since I had to apply it to millions of records.

I have conducted a benchmark in which i tried to measure the time it takes SQL Server to perform the truncation. Since I wanted to isolate it as much as possible, I have performed a query that selects the maximum date from a table with 1 million rows. Here are the results:
No Casting - 373 milliseconds
Casting using convert to varchar - 1346 milliseconds
Casting using convert to float - 623 milliseconds
Casting using dateadd function - 490 milliseconds

It seems like we have a winner...

This is the script I used:

create table testdate (date_col datetime )

declare @i int
set nocount on
set @i = 0
begin tran
while (@i < 1000000)
begin
insert into testdate values (getdate() + @i)

if ((@i % 5000)=0)
begin
commit
begin tran
end
set @i = @i+1
end

commit


declare @a datetime
declare @b datetime

set @a = getdate();

with a as
(
select date_col
from testdate
)
select max(date_col)
from a

set @b = getdate()

select 'No cast',datediff(ms,@a,@b)


set @a = getdate();

with a as
(
select CONVERT(datetime, CONVERT(varchar, date_col, 101)) date_col
from testdate
)
select max(date_col)
from a

set @b = getdate()

select 'Cast Using convert',datediff(ms,@a,@b)

set @a = getdate();

with a as
(
select cast(floor(cast(date_col as float)) as datetime) date_col
from testdate
)
select max(date_col)
from a

set @b = getdate()

select 'Cast using float',datediff(ms,@a,@b)

set @a = getdate();

with a as
(
select DATEADD(dd, DATEDIFF(dd,0,date_col), 0) date_col
from testdate
)
select max(date_col)
from a

set @b = getdate()

select 'Cast using dateadd',datediff(ms,@a,@b)

BTW, if you try casting the datetime value to int, for example cast(cast(date_col as int) as datetime)
you will find out a weird behaviour.
If the time part of the datetime value is before noon (12 AM), the casting will return the date part of the datetime value. If the time part is after noon, the casting will return the date of the next day, for example (actually it not exactly noon, but a bit before that)

select cast(cast(cast('2009-03-19 11:59:59.994' as datetime) as int) as datetime) =>2009-03-19 00:00:00.000
select cast(cast(cast('2009-03-19 11:59:59.995' as datetime) as int) as datetime)=>2009-03-20 00:00:00.000

Personally, I consider this as a bug since casting any non-whole number to int just truncates the number, for example - select cast(5.9 as int) returns 5
So mind this issue when you are dealing with datetime values.

Till next time...

March 11, 2009

Automatic Decryption of Encrypted Data

Hi

Couple of days ago a colleague DBA presented me with a problem. He has a data warehouse with encrypted data, and he wants to run reports against it. The problem is, he has a reporting tool (Business Object) that issues SQL Statements against the database, and he has no way of adding the OPEN KEY statement to the statements that the reporting tool use.

I was able to solve his problem using two SQL Server features.
The first thing was to create views that perform the decrypt operation on all the encrypted columns, and have the reporting tool query the views instead of the base tables.
The second step was to implement a DDL trigger that will run when any user logs into SQL Server. That trigger will check if the login request is made by the login used by the reporting tool, and if so, it will call a stored procedure that will issue the OPEN KEY statement.

Here is the sample code:
Creating the key and test data

use db_test
go
create symmetric key key1 with ALGORITHM = DES encryption by password='P@ssword'
go
create table T (a int, b varchar(100), c varbinary(8000))
go

open symmetric key key1 decryption by password='P@ssword'
insert into T select 1,'paul',encryptbykey(key_guid('key1'),'Paul')
insert into T select 2,'john',encryptbykey(key_guid('key1'),'john')
close all symmetric keys
go


grant select on database::db_test to public
grant CONTROL on database::db_test to public


Create the login and the user used for testing
create login George with password='k', DEFAULT_DATABASE = db_test, check_policy=off
go

create user George for login George
go

Create the procedure that opens the key
create procedure p_OpenKey
as
begin
open symmetric key key1 decryption by password='P@ssword'
end

go

Create the DDL trigger
CREATE TRIGGER connection_limit_trigger
ON ALL SERVER
FOR LOGON
AS
BEGIN
begin try
IF ORIGINAL_LOGIN()= 'George'
begin
exec db_Test.dbo.p_OpenKey;
print 'Hello, George'
end
else
begin
print 'Hello,'+ORIGINAL_LOGIN()
end
end try
begin catch
print ERROR_Message()
rollback;
end catch
END;


There are couple of things to note in the code:
1) The p_OpenKey stored procedure must be created in the database that contains the tables and the key.
2) The reason the OPEN KEY statement is issued in the procedure and not in the trigger is because the trigger runs in the context of the master database, so the key does not exists there.
3) The rollback statement in the DDL trigger makes sure that if we were not able to open the key for some reason, the login request will be denied. This is not mandatory of course, even if the OPEN KEY fails, the user can still work, he just won't get the data decrypted.

The only thing left is to create views on the table and use the DecryptByKey function in order to decrypt the data.

Take Care

March 4, 2009

Using a stored procedure with SSIS OLE DB Source

Hi

I had a mission to write a SSIS package that transfers data from a source to destination (pretty ordinary... :-)), but the problem was I had to dynamically generate the sql statement that perform the select and I wanted to use the result set generated by the procedure as the output of the OLE DB Source. (The result set is constant, just the select statement that generates it is dynamic).

First, SSIS didn't work properly with the dynamic query since it could not know the structure of the result set. In order to get around this problem, I created a second stored procedure which inserts the result of the first stored procedure into a table variable, and then select from that variable. This way, SSIS could recognize the structure of my result set.

First procedure that perform the select - sp1
Second procedure :

create procedure sp2 @p1 varchar(100), @p2 numeric(19,0)
as

SET NOCOUNT ON
declare @tmp table (
col1 varchar(250),
col2 datetime)
insert into @tmp
exec sp1 @p1, @p2


select * from @tmp

SET NOCOUNT OFF
return


Notice the bold lines with the set nocount statement.
At first I didn't have these statements in my procedure, and I kept getting this error message:

"Error: A rowset based on the SQL command was not returned by the OLE DB provider."
error 0xc02092b4

But once I have added the set nocount statement at the beginning and ending of the procedure, everything was working.

Some Internet forums and blogs suggested using the SET FMTONLY statement, but I didn't need to use it.

Till next time
Bye