Database Fundamentals #8: All About Data Types

Home / Database Fundamentals / Database Fundamentals #8: All About Data Types

SQL Server provides all different kinds of data types in order to give you more flexibility and control for how you store your data. You define these data types as you define the columns in your tables. If you wanted to store information about the date and time that a purchase was made online you’re going to store the values in a column, or columns, that define dates and times in order to ensure accuracy in your data.

Choosing a Data Type

You could make the data type into one that stores just about anything you give it, such as one of the character types like char or varchar. Do this and you’re going to run into issues when you start running queries against that data. For example, your business may ask for the amount of time between purchases. Since you chose to store the date and time as character values, you won’t be able to use some of the functions provided by the date and time data types. Instead you’ll have to either convert your data into date and time data types or you’ll have to write your own functions. But, if you used the datetime data types you get all that functionality and more, such as date/time formatting for different countries and validation that the date and time you’re entering are valid dates and times as well as Universal Time Code (UTC) values and offsets. The same thing is true of all the various data types. You don’t have to use a specific data type, but you’re sacrificing functionality if you don’t try to specify a data type appropriate to the business need.

Each data type takes up a different amount of space on your system. The measure of space for these data types at the lowest level is measure in bytes. A byte basically defines the amount of storage needed to define a single character. A value like ‘HORSE’ would consist of 5 bytes. It can be much more complicated than this, but in general, this rule will hold up well. You want to try to store your information in the most efficient manner you can. This means, as much as possible, using the smallest possible data type.

Types of Data

Breaking down the data types into several different categories will make it easier to talk about them. You can think about these as some general types of data available to you in SQL Server:

Data Type Category Primary Use
Textual Data For storing words, numbers and symbols, both large and small
Numeric Data Stores numbers and only numbers can be integers, decimals or floating point values
Date and Time Use to keep date and time information so that special calculations can be performed on that data
Binary Data For storing computer information such as Microsoft Word documents or even programs
Other Data Types Speciality data types described separately

Each of these data types stores a different kind of information and there are different rules and functions associated with them. Describing everything about these types of data is outside the scope of this blog, but you need to have a good understanding of these types before you can use them appropriately within your own tables.

Textual Data

Textual data represents any kind of information that can be brought into your system and displayed in a human readable form. That includes information like the words in this sentence, numbers, dates, times, just about anything you can think of that is readable by people. But it’s important to note that this information is stored as textual data and not as any of those other things. This data can be broken down into two general types, fixed length and variable length data.

Fixed Length Textual Data

Fixed length textual data stores characters of a specified length. Regardless of the data you enter, the fixed length data will always include enough blank values to completely fill the space. For example, if you create a column that has a length of 10 and you put the value ‘DOG’ into that column, SQL Server will automatically fill in 7 blanks, which display as spaces, into the column. That’s because the length is fixed, so it must always be exactly what it says. This would be used for very stringent text requirements for values that must fill in a particular space. This is actually a somewhat more old-fashioned method for storing information so that you can make a series of columns of the data all the same length. Data types include:

CHAR
NCHAR

These data types must have a length specified. The difference between the two is that the NCHAR data type allows you to store Unicode information. This allows you to define the character and the character set when store it. By character set, we’re referring to foreign language characters such as Kanji. Fixed length data can only be a maximum of 8,000 bytes. This means for a CHAR field, you can have 8000 characters. For a nchar field, because you have to define the character set, you can only have 4000 characters.

Variable Length Textual Data

Variable length data is a much more flexible means of storing textual data. You can store up to 2GB of information for each column. That’s a lot of text. But, you’re not always going to want to store that much information. In fact, usually, you’re going to want to put limits on the size of information stored so that you have more control. Having a person’s last name be 2GB long doesn’t make any sense. So, you can put a limit on variable length textual data. This means the information stored will only be as long as what you define it, up to the limit defined. For example, if you define a variable length field of having a maximum of four values, when you put the value ‘DOG’ into it, you’re only ever storing 3 characters (or 6 for Unicode). Variable length textual data types include:

VARCHAR
NVARCHAR
TEXT
NTEXT

The last two data types, TEXT and NTEXT, are included for backwards compability. They should not be used in any new databases you create since they’re scheduled for deprecation in an upcoming version of SQL Server. The Unicode storage is much the same as it is in the fixed length textual data. You can define a limit on variable length textual data up to 8000, after that things are stored differently so you can’t put a limit on. For Unicode data this would be 4000. To store data past the 8000/4000 limit, use the key word MAX when defining the column and then you can store up to 2GB in that column.

VARCHAR and NVARCHAR are the most frequently used textual data types.

Numeric Data

While you can store numbers within textual data, you’ll get a lot more functionality by using one of the many numeric data types. Numeric data types offer much more efficient methods for storing information than simply storing numbers as text. You can store more numbers in a smaller space by using these data types. You also get all sorts of mathematical functionality with numbers that you just don’t get with textual data. Numeric data types can be generalized between the precise numbers and the approximate numbers.

Precise Numbers

These numbers are of fixed length and fixed decimal places. There is actually quite a bit of difference between the non-decimal numbers and the decimal numbers. The integer values will not allow for any type of decimal notation. This means you get numbers 0, 1,2,-1,-2, but you can’t store 1.1. The decimal numbers allow you to store both an integer and a decimal, but the integer is stored as an implied decimal, meaning 1.0 instead of 1. Basically, use the type of number that makes the most sense for the business need. For example counting the number of online purchases made would be a whole, round number and probably use one of the fixed length, integer, data types. But the purchase values themselves would probably be stored as a decimal value.

The integer data types are listed along with the allowable values within them:

Data Type Bytes Limit
BIT 1 byte 0, 1
TINYINT 1 byte 0 – 255
SMALLINT 2 bytes -32,768 – 32,768
INT 4 bytes -2,147,483,648 – 2,147,483,648
BIGINT 8 bytes -9,223,372,036,854,775,808 – 9,223,372,036,854,775,808

Each of these types stores more and more information, but you can see that they are storing the information much more efficiently than if you made the values text. For example, to type in a BIGINT data type into a VARCHAR you’d need 19 bytes instead of the 8 needed for the appropriate numeric data type.

The decimal data types are defined differently than the integer data types. You first define the precision. This represents the number of digits, in total, that will be stored. Depending on the precision, the number will take up more space. You then have to define the scale. The scale represents the number of digits to the right of the decimal point. This can be confusing. For example, if you want to define a number of like 312.213, you would give it a precision of 6, because it is six digits in total, and a scale of 3 because there are three digits to the right of the decimal point. You can store up to 38 digits, which means a maximum precision of 38. These are the decimal data types:

DECIMAL
NUMERIC

This is how setting different precision values affects storage size of the data types:

Precision Bytes
1-9 5
10-19 9
20-28 13
29-38 17

In addition to the general decimal types there are also two types that are dedicated to storing values associated with money. The unique functionality they offer is that they will display as the appropriate money system that you define for them. However they are much more limited in terms of size and flexibility so are not used that often.

SMALLMONEY
MONEY

The SMALLMONEY data type stores values between -214,748.3648 and 214,748.3647 and is 4 bytes in size. Money expands the values to -922,337,203,684,477.5808 to 922,337,203,685,477.5807 with a size of 8 bytes.

Approximate Numbers

These numbers use floating point storage and are therefore not considered to be completely accurate. In theory, as you modify the values, the floating point decimal could cause rounding errors. This is seldom evident, but might be for extremely high number values. The data types and there storage is defined as:

Approximate Number Range Value Precision Bytes
Real  3.40E + 38 to -1.18E – 38, 0 and 1.18E – 38 to 3.40E + 38 4
Float(n) – 1.79E+308 to -2.23E-308, 0 and 2.23E-308 to 1.79E+308 1-24 7 4
25-53 15 8

These data types are used primarily for scientific types of data storage and use of them to store monetary values are not encourage

Date And Time Data

Date and time data types can be some of the more complicated to use and understand. While everyone gets the idea of a date or the time of day, the fact that these data types store information in ways that seems completely unrelated to the date and time that people thought they were recording leads to confusion. This is primarily the result of forgetting that you can store a date, a time, or a date and time. When storing a date and time, just because you didn’t supply either the date or the time value doesn’t mean that one wasn’t supplied. Depending on how you define the information when you store it, you could end up with interesting results.

The key is to use the right data type for the job. Here are time and date data types, their sizes and uses:

Data Type Bytes Range Use
DATE 3 January 1, 1 CE – December 31, 9999 CE Stores date values within the range specified
DATETIME 8 January 1, 1753 – December 31, 9999 and times 00:00:00 – 23:59:59.997 Stores date and time values down to fractions of a second within the range specified
SMALLDATETIME 4 January 1, 1900 – June 6, 2079 and times 00:00:00 – 23:59:59 Stores date and time values down to the second within the range specified
DATETIME2(n) 6-8 January 1, 1 CE – DECEMBER 31, 9999 CE and times 00:00:00 –  23:59:59.9999999 A much wider range of time values for more accurate storage than that represented by the standard DATETIME data type. Scale can vary by changing (n)
TIME(n) 3-5 00:00:00 – 23:59:59.9999999 Stores time values down to the fraction of a second. Scale can vary by changing the value of (n)
DATETIMEOFFSET(n) 10 January 1, 1 CE – December 31, 9999 CE and times 00:00:00 – 23:59:59.9999999 Stores date and time values down to the fraction of a second as well as a time zone for that date based on an offset from GMT.

As you can see there are quite a large number of ways you can store date values. It’s always best to store the smallest data type for the type of values needed. Further, with dates and times, you need to be sure whether you need both date and time or simply need one or the other. With all that information you can make a more informed choice.

Binary Data

Binary is a special data type for storing formatted, machine readable binary information. You can store it as fixed length or variable length values, similar to the textual data type. This data is primarily used for special types of document management applications or storing encrypted data. You’ll seldom need to use this for most business applications. The data types are:

BINARY
VARBINARY

The size of the data depends on the size defined. Like textual data you can store up to 8000 bytes in the fixed data types and up to 2GB in the VARBINARY as long as you use the MAX keyword to define the data type.

Other Data Types

There are a number of special data types within SQL Server for dealing with very particular types of information. Detailing these will take too long, but you should be aware that some of these exist in order to take advantage of the special opportunities they offer. The special data types include:

XML

This data type is specially designed to hold well-formed XML information. This means you can define an XML fragment or an entire XML document and have the formatting enforced by the data type. This may be useful if you need to deal with XML information directly.

UNIQUEIDENTIFIER

This data type creates a special hexadecimal value based on information from your computer to make what should be a globally unique value that you can use to identify information. This is especially attractive when you would prefer to create unique values on the client side of an application rather than directly within the database. But, these are rather large values, coming in at 16 bytes, so you want to be careful about how and where you use them.

HIERARCHYID

The HIERARCHYID data type is used to create an identifier within a hierarchy such as a relationship tree used when setting up a business hierarchy or similar activity. The data type doesn’t create a tree, but it does store a location within a specified hiearchy

SPATIAL

This stores either geographic or geospatial information so that you can define shapes, points and lines as part of a drawing or on a map. This offers a whole range of functionality relating to tieing database information to locations. Many businesses find this type of data extremely useful.

Conclusion

That is an overview of the types of information that you can store. With that in hand, you can go on to create several tables in the database. Just remember that you really should pick the right data type for the right situation. Cramming the wrong data into the wrong data type just leads to problems and lots of rework.

4 Comments

  • Gary

    Terrific refresher, but I’ve a mental blank trying to unwrap the Value, Precision and Bytes columns for Real & Float(n). Is there a layout problem in there??

    • Actually it’s formatted correctly. There isn’t a changing size for Real, it’s 4 bytes. Where as the size of float depends upon the precision, 7 or 15 digits. Probably not the clearest way to lay it out.

  • jennifer

    when I try to execute my query, it comes back as: error: conversion failed when converting varchar value ‘yes’ to data type bit.
    how do I fix this errorschach in order to execute the query??

OK, fine, but what do you think?