February 25, 2009

Generate Random Numbers

I needed to generate a massive amount of random numbers.
There are two ways to do it in SQL Server 2005 and above that I know of.
The first one is to use the rand function which returns a number between 0 and 1,
this is very convinient since you can limit your range of values to a specific range (though if you want negative numbers, it is a bit more tricky).
The problem with rand() is that it returns the same value for each time you invoke it in a SQL statement, for example if I run
select rand()*100
from my_table
I get back the same results for all the records. This means that if I want to generate different random values I would have to use a loop.
The second option is to use the newid() function which generates a uniqueidentifier. This has two great advantages over using rand():
1. it will generate a unique value for each record in the result set, so you can use a single statement without using loops.
2. it is easier to limit the range of generated values to a range of a specific data type, for example
select cast(cast(newid() as binary(4)) as int)
will return random integers.
If you want to limit the values to a specific range you can always use the % operator, for example
select cast(cast(newid() as binary(4)) as int)%7
will always return numbers between -6 and 6
That is it for now
Take Care

XML Shredding

Hi
Today I've started working on a new project, which has a requirement to store a list of 300k values in a table (having each value stored in a different record).
Facing this requirement I had two possible solutions in mind, either send an XML value or a CSV string, and since I'm not that familiar with SQL Server's XML abilities I decided to conduct a benchmark.
In order to simulate an input xml of a 300k list, I have created a table with random numbers and used a query in order to generate the XML variable.

create table random_numbers (random_number int)
declare @a xml
select @a = (select random_number as a from random_numbers for xml auto)
It took SQL Server about 2 seconds to generate the xml variable.
The next step was to shred it back into a table.
My first attempt was
select t.a.value('(/random_numbers[1]/@a)','int') id
from @a.nodes('/random_numbers') t(a)
but when I ran this query, all the rows had the value from the first row in the xml
so i modified it to this query
select t.a.query('.').value('(/random_numbers/@a)[1]','int') id
from @a.nodes('/random_numbers') t(a)
this gave me the required results, but it took forever
1000 records - 2 seconds
2000 records - 7 seconds
3000 records - 18 seconds
4000 records - 34 seconds
5000 records - 47 seconds
6000 records - 67 seconds
7000 records - 97 seconds
8000 records - 123 seconds
9000 records - 187 seconds
finally I changed the query to
select t.a.value('@a','int') id
from @a.nodes('/random_numbers') t(a)
This query worked amazingly well. It took SQL Server only 5 seconds to shred the 300k rows xml.
Hope this post helped you.

My New Blog

Hi
This is my new blog in which I will share my insights regarding SQL Server.
I'm a SQL Server DBA in one of the leading online gaming companies.
Hope you'll enjoy it