The Age of Data and Software Development

I am so excited to be a data professional in the modern era. Yeah, 15-20 years ago, it was cool to be a DBA and a database developer. However, now, it’s amazing. Data drives, or should drive, all our decisions. Whether we’re deciding how high to set the cost threshold for parallelism, which query we want to tune, or even which product would serve us best, we should be making these decisions based on data. It’s not just about getting the average or the min & max, although, those are the start points. Now, you need to start to take into account standard deviation and you probably should learn how to run a regression analysis. All these tools will make you a better more valuable employee. It’s not any different for a business

Data Driven Software Development

Just as you should be making your decisions based on data, so should the business. Software development ain’t easy. Making the wrong choices on which processes or functions to tune, which new methods to build, or which old ones to toss could have gigantic negative repercussions. So, software companies are collecting information about how you use their software. It’s just common sense.

Now, there are lots of laws in place (and international treaties, and all sorts of fun stuff, but this is not a law blog) that prevent companies from capturing certain kinds of information. There are further laws in place that prevent them from sharing or selling certain kinds of information. Same things apply to the information captured within a business (and if you’re working in health care, you might want to look up the term ‘Mens Rea’ to be sure you’re not violating any of the information laws that you’re aware of).

The information gathered about how you’re using software helps the company build better software.

I work for Redgate Software. We have telemetry information in our products that lets us know how you’re using the software. It reports back to us. Don’t panic. We have a publicly posted privacy policy that we follow religiously. In fact, we’re legally obligated to both show you this information and follow it. Please don’t start uninstalling our software. You’ll note, we let you opt out (please don’t, more data is better data… well, as long as it’s clean & accurate and well distributed… OK, different discussion, sorry). You never have to share anything with us.

Same thing goes for Microsoft and SQL Server. They have a (detailed, woof) privacy policy, publicly posted and available. It shows what they collect and how they collect it. In fact, from my (non-lawyer, this is NOT legal advice) point of view, it goes above and beyond just the strict legal requirements.

Do me a favor, read those policies. Yes, they’re dull reading. However, that’s the information you need to understand so that you know how a company is dealing with telemetry. You need to look at information like this, published by the company, to know how and what they are doing (and are allowed to do) before you panic and start running around with your hair on fire.

If your SQL Server instances do not have access to the internet, Microsoft will never see your usage info. Personally, I think many, most, maybe even all your instances should be isolated from the internet (why invite attack). However, honest people can, and do, disagree on topics like this. Microsoft may never see your telemetry information if you don’t want them to. In fact, read the document, they tell you exactly how to control the telemetry (assuming you leave your instance connected to the internet). In short, you absolutely can opt out.

Conclusion

It is all about the data, and that is exciting. Let’s just be clear about the reality of things. When we hear that Company X is collecting data about you, yes, they are. We all are. It’s sort of our thing. That doesn’t make it evil unless they are violating the law (which you can check), or their own policies (which you can check), or are trying to hide it from you (which, again, you can check). So check first, then take action if needed.


PS: I don’t have a privacy policy. This is a privately owned, publicly posted blog. All my information (except my password) is exposed as will be anything you choose to share here. Just so we’re clear. After all, fair’s fair.

2 thoughts on “The Age of Data and Software Development

  • Grant, its an interesting topic. Privacy policies are a start, but they are often filled with loopholes (in part to avoid getting sued for accidentally doing something bad perhaps!). Take your own policy for example, it says “As a result, these releases may automatically collect additional data, provide fewer controls, and otherwise employ different privacy and security measures than those typically present in our products. If you participate in previews or pre-release programs, we may contact you about your feedback or your interest in continuing to use the product after general release”. That means IF I try something out in a Prod or near prod environment I may be leaking information. Don’t try in Prod? Sure.

    Beyond that, I worry that the policy or data gathered can change at any time and I’m unlikely to nice. Plus, its entirely easy for a company (or a zealous product team member) to elect to just gather some new stuff.

    I get the value of telemetry and there are times when I’ll cheerfully opt in. But I think it should default to opt in. Will you (and everyone else) get less? Sure. Offer an incentive. Discounts, upgrades, maybe even an annual report based on telemetry I sent plus the aggregate.

    Finally, I’m absolutely in favor of my db servers having zero internet access. Makes life harder, makes data safer.

  • Thanks for the feedback Andy. It is a tough call sometimes. I do think the most important thing for any vendor is to provide a way to opt out. While even this may be imperfect, it is the one important thing that has to happen.

Please let me know what you think about this article or any questions:

This site uses Akismet to reduce spam. Learn how your comment data is processed.