Tips For Dealing with Large Datasets in Sql Server

Are you dealing with large (100 million row +) datasets that live in a sql database? Have you found your old methods not to be satisfactory? Here are a couple of tips from my own experience.

  1. Do not use count(*) to figure out how big of a table you have.
  2.  Do not use the max function to figure out when the last record was inserted

1.

I found this out the hard way when I actually got an error when I tried to run
Select Count(*) From Table  and received an arithmetic overflow error. I was puzzled at first and after some  searching I found the problem. A 'count' in sql server returns the datatype int, which means it can only be 2^31-1 or 2,147,483,647. The table that I was working with had more than 2.1 billion rows, so that caused a problem. Now, you may be thinking that you could just use a Count_Big instead, but that is probably not the right answer. Try using sp_spaceused instead. If you are interested in turning this into a shortcut for SSMS, look here.

2.

Sometimes I need to figure out when the latest record was inserted. Instead of taking a max on a datetime field. I can often get to my answer by using information about an index. Hopefully your table has a unique auto incrementing primary key that can aid you in finding the last record inserted. Make sure you understand the process that builds or alters your table. It could be that the maximum value of your primary key is not related to the most recent records.

Select max(LocalTimeStamp) From Table ( Slow )

vs

Select LocalTimeStamp From Table Where PrimaryKey = ( Select Max(PrimaryKey) From Table ) ( Quick )

Have some useful tips to add? Please post them in the comments.

Leave a Comment

Filed under Sql Server

Leave a Reply