Distributing Jupyter Notebooks

If you’re working with the Microsoft Data Platform, you should be, at the least, exploring Azure Data Studio as a new tool in your toolbox. One of the big reasons for this is the inclusion of Jupyter Notebooks.

For those who don’t know, Jupyter Notebooks are an open source documentation tool that lets you combine text and pictures with live code. From this we can talk about runbooks that you can share with people, lessons in combination with videos, presentations, interactive software documentation and lots more.

I’m myopically focused at the moment on Azure Data Studio, but there are a lot of other places and ways to create or consume notebooks. However, I’m going to keep my focus.

The issue I’m running into, is distributing the notebooks.

Where to go to get a Jupyter Notebook?

Honestly, I don’t know. This is one of those times I’m using the blog as a way to learn while simultaneously sharing with you (and yeah, I know that some of you hate this approach of mine).

I’ve started with the idea of using a simple method of distribution. I’ve put my notebooks created so far (and there are exceedingly few at the moment) in a Github repository. Follow the link. You should be able to see my notebooks. They are sloppy and poor examples right now. As I say, I’m just figuring this stuff out.

However, I’m not convinced that this is the right way to distribute the notebooks.

What Do You Think?

If you don’t mind, I’d like to ask two questions of you.

First, are you exploring Azure Data Studio and Jupyter Notebooks?

Second, what do you think the best way to distribute notebooks could be?

Please answer in the comments. Also, share this blog post around. I’d really like to get some feedback on this one. I’m convinced of the utility of Jupyter Notebooks. I’m not convinced that enough of you are convinced, if that makes sense.

24 thoughts on “Distributing Jupyter Notebooks

  • Disclaimer:- I love Notebooks as you can see on my blog and my GitHub, the PowerShell module I have written and the fun that I had enabling the Tiger Teams BP Check to be in a notebook. With that in mind, here is my tuppence.

    The best way to distribute Notebooks that the community can use is I think GitHub, as it provides people with the capability to clone your repository and use it themselves. For an open source module like dbachecks I am currently creating some notebooks that will live in the repository and will help people to be able to explore the module. Equally for my rpesentations, I have kept my code and slides in my GitHub for a number of years and will add my Notebooks to them. If I want to see what Grant talked about at WhereEverConf then I can look in your GitHub and find the NoteBook.

    I could be persuaded (and would be happy to set one up in the sqlcollaborative GitHub) that a Community Notebooks repository would be useful for Notebooks relating to community tooling or common problem solving solutions.

    This is not the place for Notebooks for companies though. I have helped to set up Notebooks for the DBA Team at one of my clients that we store in an Azure DevOps repository. When someone wants to update the Notebook, they create a branch, update the Notebook and create a PR and when that is approved it is deployed to a folder on the DBA Management boxes, ready for the next time it is required. A similar process is used for Notebooks for 1st line support and could easily be used for field engineers or other teams.

  • Reitse

    I don’t like notebooks. I love them! So many cool things to with those. For distribution Github would be my preferred route. It’s easy to find and easy to use if you set it up correctly (stop looking at my Github!). But only for public access. For private stuff, think of other routes.
    And to be honest, much of my enthusiasm is fueled by the beardy fellow

  • Andrew Price

    I think Notebooks are a great way to save documentation with queries (and even save data if necessary). I am encouraging my team to learn them with the plan to use them for internal documentation and even to reply to internal users (like developers) self-documenting notebooks. I look forward to seeing more notebooks and learning how to use them more effectively. Thank you fro all you do!

  • Alex

    Improved tools for documentation and presenting are valuable, but I’m finding that many agile shops don’t think they have the time to invest in Azure Data Studio and Jupyter Notebooks yet. I hope I am wrong, but so far that’s what I’ve seen and I would expect more people to job on board the more SQL Saturday speakers demo it and the more “getting started” blog posts exists to help get people up to speed faster.

    Personally, I’m a big fan and my problem is that I’m still getting used to writing PowerShell in VS Code – I was so used to the ISE – and I need to learn the debugging features in VS Code before I can afford to really give this topic the attention it deserves.

    Thank you for the post, it’s a great reminder and I hope this topic gains more traction.

    • Thank you for the feedback. I think documentation and Agile sometimes do not get quite as cozy as they ought to, so your opinion here is probably pretty accurate.

      I’m in the same boat. Writing PowerShell in ADS is still very much a work in progress. I’m slowly, ever so slowly, figuring it out. I’ve been doing demos using it (and only occasionally mucking them up badly). It’s a good way to start to learn how it behaves.

  • dba100

    The notebook like something in the forum , text, query and picture, not much difference

    but uploading to github is a easy way to share.

  • Still exploring notebooks. I like the concept, but sometimes they feel a bit long to go through. However, I’ve liked the little I’ve seen w/ the dbatools module running the diag scripts into a notebook. There’s potential there, but I need a way to balance the documentation/ease there with presentability of the information. I often find it easier to read the output data in something like Excel or to visualize it with PowerBI.

    As for distribution – publicly, Github or similar makes the most sense right now. It’s easy to point to, download, etc. For private use, I’m not quite sure. A repo of some sort makes the most sense but I could see it being stored in Sharepoint or some similar manner. I could see that if a large number of useful notebooks start arising, a catalog of some sort would be helpful – similar to PSGallery. At that point, a curated/moderated set of notebooks with some basic overviews and/or screens might be really useful to find the sort of notebook you want to use for a starting point.

    Editing – still in ADS for me. My notebook usage is pretty much all around SQL Server so I haven’t really explored them as much with other platforms (except for a very brief trial with some Python code).

  • A couple options I haven’t see in the comments yet (at least until my post goes live!):

    Binder (https://mybinder.org) and Azure Notebooks (https://notebooks.azure.com) are services which let you host notebooks remotely. Binder reads from a GitHub repo and spins up a virtual environment for you. Azure Notebooks lets you run notebooks (including F# notebooks) against free VMs in Azure, or you can use your own VM for more power.

    Azure Notebooks let you fork projects pretty easily, so I’ve used that for pre-cons and other trainings, as well as some private usage on machines which didn’t have Jupyter installed.

Please let me know what you think about this article or any questions:

This site uses Akismet to reduce spam. Learn how your comment data is processed.