Spatial Data Hurts My Brain

I’m still barely scratching the surface working with spatial data in SQL Server 2008. We’ve ported some of the data into a table where we built a geography spatial data column and we’re begginning to work with point data. The requirements from the developers are, so far, very simple. They’ll feed me a point and I find all the locations “close” to it. We had to go round & round on what defines “close” but finally settled on, I think, 15km.

The query to answer a question like this is ridiculously simple (a few object names have been changed):

SELECT ebe.[Location].STDistance(@Location) AS Distance,
ebe.[InterestId],
ebe.[Location].Lat AS Latitude,
ebe.[Location].Long AS Longitude,
ebe.[OrgId]
FROM dbo.[ebe] AS ebe
WHERE ebe[OrgId] = @OrgId
AND ebe.[Location].STDistance(@Location) < @CloseDistance

I’m not even hard-coding the “close” value so it can change when they change their minds. It retrieves exactly what’s needed. But… Well, look at the STATISTICS IO & TIME:

Table 'ebe'. Scan count 3, logical reads 40179
Table 'Worktable'. Scan count 0, logical reads 0
CPU time = 376 ms, elapsed time = 373 ms.

Not exactly snappy, is it? So, the obvious answer is to provide an index it can use so that it doesn’t have to scan the clustered index. Spatial indexes only support certain functions with the spatial column, STIntersects, STEquals, STDistance. Since I’m dealing with STDistance, I have a shot at this working. Here’s the script:

CREATE SPATIAL INDEX [ebe_spatial] ON [dbo].[ebe]
(
[Location]
)
USING GEOGRAPHY_GRID
WITH
( GRIDS =( LEVEL_1 = MEDIUM, LEVEL_2 = MEDIUM, LEVEL_3 = MEDIUM, LEVEL_4 = MEDIUM),
CELLS_PER_OBJECT= 128,
PAD_INDEX = OFF,
SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON)
ON [PRIMARY]

When I then run the query, well, here’s the STATISTICS IO & TIME again:

Table 'Worktable'. Scan count 0, logical reads 0
Table 'extended_index_469576711_384000'. Scan count 2076
Table 'ebe'. Scan count 0, logical reads 448064
Table 'Worktable'. Scan count 0, logical reads 0
CPU time = 1155 ms, elapsed time = 1393 ms.

You read that correctly, it used the spatial index which caused the performance to decrease. There are adjustments you can make to spatial indexes. You can change the number of cells per object or set the detail level on the grid, but it’s hard to understand what this does to the index.

After fighting with it for a bit, I sent a tweet out on twitter, just whining about the index causing a slow-down. Surprised as anything, I get a response from Paul Randall, uh, wow, suggesting an approach. This is responded to by Isaac Kunen, you know the Microsoft PM & spatial expert. You just have to love Twitter. I took the information they gave me and did a few searches on the web. This lead to Bob Beauchemin’s blog and my introduction to some new Dynamic Management Views, sp_help_spatial_geometry_index and sp_help_spatial_geography_index. Now we’re talking. A mechanism to identify how useful the spatial data index is, similar to looking at sys.dm_db_index_physical_stats to see information about a clustered index. This lead to the following query and output:

DECLARE  @LocationGEOGRAPHY 
SET @Location = GEOGRAPHY::STPointFromText('POINT (-86.674582 36.148843)',4326)
EXEC sp_help_spatial_geography_index @tabname = 'EWH_BIM_Extract2'
,@indexname='ebe_spatial'
,@verboseoutput = 1
,@query_sample = @Location

The question is, how efficient is the index? In my case, the Primary_Filter_Efficiency was listed as 3.33333. Yes, that’s 3% efficient. No wonder it killed the performance. What do I do to make the index more efficient? I haven’t a clue. That may be the next blog post. But at least now I know how to evaluate the usefulness of a spatial index.

13 thoughts on “Spatial Data Hurts My Brain

  • Great post. Spatial data is seriously challenging in my opinion. I love the title of the post. Reminds me of a Monthy pythos skit….brain hurts! I’m just getting into spatial data myself. I’ll keep post good articles on twitter on this if i find anything.
    Love the jumpstarttv videos

    ManuelRdgz

  • You know I’ve always wondered what SQL server was doing with the spatial indexes. Real 2d (let alone 3d!) indexing is a truly difficult thing, and not a solved problem AFAIK. Let us know if you find anything on the internals of spatial indexes, Grant.

  • scarydba

    I’ll put up some more information on another post, thanks Isaac. I’m still not convinced my query or my data are flawed. Partly because I didn’t import the data, so I’m not convinced everything is correct. Partly because I didn’t write the original query and I’ve already had to rewrite it. Let me get it a bit more settled and I’ll see if it changes the output. If it does, that’ll be interesting too.

  • If you’re working with point data, you want your spatial index grids to be all HIGH (you have them all MEDIUM now). You make the spatial index more effective by twiddling with the grids and cells-per-object.

    My spatial index diagnostic blog entries entitled “Filter” and “Filter Output” describe what the filter info means. You’re trying to get the data type to use the primary filter or internal filter (both implemented by the index) rather than do the long computation.

    If this is your real query on the blog, you want to look at this blog post: http://www.sqlskills.com/BLOGS/BOBB/post/How-to-ensure-your-spatial-index-is-being-used.aspx. You likely need to rewrite your query not to use variables, and make sure you install SP1. A spatial index is just like an ordinary index in that if you use variables the coster what know what the value is at plan creation time.

    HTH,
    Bob

  • scarydba

    Excellent. Thanks. Yeah, that’s not exactly the real query. I was planning on using parameters rather than hard-coding the “close” value just because the business can’t decide what “close” means. I’ll be doing a bunch more testing on it tomorrow.

    Thanks again.

    Grant

  • If you’re still on RTM, apply SP1. There were changes made around the costing for spatial queries and in SP1 the optimiser may actually select a spatial query without needing hints. It may still pick the wrong one (if you have multiple spatial indexes with different detail level), but it should select one.

    RTM the optimiser almost never selected a spatial index without a hint.

  • The structure of spatial indexes is nothing revolutionary Barry. Briefly (and as I understand it) the space it tessellated and contains/doesn’t contain info is stored in a regular b-tree.

  • scarydba

    Hey Gail,

    Thanks for the pointers. We are on SP1. The problem in this case wasn’t so much getting the index to work, it worked. It was getting it to work correctly. I’ll be clear to do some more experiments with it this morning. I’ll post more info as I find it.

  • Thanks Gail. Here I was hoping that someone had figured out how to effectively reverse quad-trees and use them as lookup indexes. Hmm, I guess tessellation is similar to that.

Please let me know what you think about this article or any questions:

This site uses Akismet to reduce spam. Learn how your comment data is processed.