How to Download Sports Data

By: Bryan Povlinski

So you really want to dominate your fantasy league this year. You’ve been looking through some of the rankings provided by some of the major fantasy sports sites online, but they don’t seem to fit to your league’s settings. You’ve been brushing up on your Excel skills so if only you could get your hands on some data you could do your own analysis and rank players based on your own set of criteria. Or maybe you’re trying to decide who to select in your weekly pick ’em pool and you want to base your selection off some objective data. The Import Data function in Excel doesn’t work well with some of the sites you’ve been trying to pull data from so you’re stuck. Sound familiar? Fortunately, you’re in luck. There are a few different ways that you can easily get data out of any website and into your own Excel spreadsheet. Here is an overview of the methods I use starting with the most simple.

1. Microsoft Excel Import Data

The “Import Data from Web” functionality in Excel is probably the easiest way to get sports data into a spreadsheet.

excel_import_web_data

Simply click on the “Data” tab in Excel and then select “From Web” as your data connection (the screenshot is from Excel 2013, but the same functionality should be there going back to Excel 2007 on Windows. Excel for Mac does not seem to have direct connection to the web, but you could download an HTML file and then connect directly to that static file). Then you’ll input the URL where your data currently lives on the web and you should see a small yellow arrow next to the table on the webpage. Click that arrow and then “Import” and you should see all the sports data dumped into your spreadsheet. Once you have that connection established in your Excel sheet every time you come back you can choose to “Refresh All” so that any web data source that your worksheet connects to will be refreshed to the latest data.

Sometimes the way the web page is formatted prevents Excel from importing the data with this method. If the web page is using a special plug-in or tool to display their data (that’s not simply text in a table format on a page) then this method will not work. Also, if you the web page you’re pulling from is not always the same URL (due to query variables, or day-to-day or week-to-week differences) there’s not an easy way to dynamically change the URL. If this is a problem for the particular task you’re trying to accomplish take a look at the next method.

2. Google Docs – ImportHTML

From a pure functionality standpoint Excel is head and shoulders above Google Docs/Google Spreadsheets. There are many formatting advantages and a broader list of functions to choose from in Excel, but when it comes to interacting with data on the web – Google Spreadsheets blows Excel away in my opinion. The most important function for this purpose (that does not exist in Excel) is =importHTML. An example of an importHTML function would be:

=importhtml(“http://espn.go.com/nfl/statistics/player/_/stat/passing/sort/quarterbackRating/year/2012/seasontype/2″,”table”,0)

There are a few things to think about to make sure you get the syntax right:

The data returned from this function will automatically refresh every time you open the spreadsheet. This can be really useful when you have a web page that constantly updates stats or game results throughout the season and you want to build a model that is designed to make predictions continually.

This function can become extremely powerful when you want to access a data source from which the URL changes (either over time or depending on the variables you choose). You can build a function in Excel that uses =concatenate that breaks out variables in different cells and then dynamically builds the final URL based on whatever you put in the input cells. Now you can reference your concatenated cell that holds your final URL in the importHTML function. This is useful on sites like www.basketball-reference.com and any others on the Sports Reference network. It’s also valuable for fantasy football rankings pages that have something like “week1” in the URL that would change from week to week.

There are still some limitations to this function that prevent it from working with certain web pages. Sites that are password protected or secured in some way will not return any data. Also, sites that use an embedded tool or plug-in to display data (instead of displaying it as a table or list) will not work with this formula. Unfortunately, when you try to pull data from a page that has an issue it typically causes errors in the spreadsheet even after you delete the formula. If you have a spreadsheet that contains a lot of formulas or important data it’s probably best to test out your importHTML function on a separate blank sheet to make sure it works and doesn’t cause problems with your valuable sheet.

3. Chrome Scraper Extension + ImportXML

Finally, the last tool that might be the easiest to use is the Chrome Web Scraper. This is a plug-in that works with Google Chrome that allows you to point and click on any web page and export the data into Google Docs. You’ll need to install it as a Chrome extension and then it’s ready to use. To use the scraper, scroll across the first few rows of data from the web page you’re interested in and then right click and choose “Scrape Similar.” You should see a screen that looks like this:

excel_import_web_data

The example above is from the same ESPN page from the Excel example and as you see there’s a “Export to Google Docs” button in the bottom right. Simply click that button and you’ll have all the data that you see within the Scraper interface in your spreadsheet. If you don’t like how the results look try scrolling over and selecting the data in a different way (then right clicking and selecting “Scrape Similar”) to get it into the correct format.

This tool is used more useful for a one-time analysis than it is to continually pull data. However, the other nice thing that this tool does is give you the XPath reference for the data on the page. Xpath is a query language for XML data and it’s not something I’m terrible comfortable with. There are many resources out there that can offer a much better explanation of XPath than I can. For the purposes of pulling sports data into a spreadsheet it can be useful if none of the other options work with your page.

You can take the XPath reference that the Scraper shows you and plug that string of text into the =importXML function within Google Docs. There are 2 arguments for the importXML function. First is the URL (enclosed in quotes) and second is the XPath Reference that the Scraper identifies for you (also in quotes). Now you’ve programmatically pulled in your data and it will be refreshed each time you open up the spreadsheet.

If you have questions on any of these methods to pull sports data into a spreadsheet feel free to add them in the comments. Every situation is different and I know how frustrating it can be when you expect something to work and it doesn’t. Keep in mind though that certain web pages are simple impossible to pull from and you may have to resort to simple copy and pasting or attacking the problem from a more technical perspective.

  • John W

    When I enter the ImportHtml formula into Google Sheets I get a ‘Formula Parse Error’ even though the link itself is still valid on the ESPN sight. I tried changing ‘table’ to ‘list’ and also tried changing the number of the last argument but I still get the error. I’m not sure when you wrote this post but I see that you referred to ‘Google Docs’. Google spreadsheet is now called ‘Google Sheets’.) Maybe the exact syntax no longer valid?

  • Michael M

    I purchased the MLB DraftKings projection tool last season and am considering purchasing it again but am also trying to learn to do some projections on my own. So to my question: using the “importhtml” feature with Google Sheets, is there a way to specify some of the dynamic changes to a table even when the url doesn’t change? Specific example: NumberFire’s daily MLB projections defaults to FanDuel numbers and the url is static at https://www.numberfire.com/mlb/daily-fantasy/daily-baseball-projections/ . Within that webpage you can dynamically select different sites (DraftKings etc), game types, etc., but the url always remains the same. As a result, the “importhtml” feature always reverts back to the default table with the FanDuel columns. Another example is using the DraftKings Chrome extension for baseball-reference.com’s pages but not being able to access those columns using the “importhtml” feature. Any advice would be appreciated. Thanks!

    • spreadsheetsports

      The importhtml feature is going to get you what the web site has made available directly through the HTML in their page. On the numberFire page – the FanDuel numbers are available within the HTML, but to change the platform it looks like it has to run some kind of process (probably through Javascript) on the backend to display those numbers. The importhtml feature won’t have access to them because they’re not explicitly written within HTML on the page. That’s the same issue with the extension. That just applies a calculation to the existing numbers for display in your browser. The Google sheet can’t find it because it’s not there within the HTML. There are a ton of ways to display data on a page and if a site wants to restrict from common scraping tools they can pretty easily. You can almost always find a way to get around it, but it takes some pretty extensive coding and tools beyond Excel or Google Sheets.

      I know that’s not a very positive answer, but I’ve run into that many times as well. There may be other sites where you can get the numbers from that you’re looking for – you just have to find the right source. Hope you decide to give the Projection Tool a try again though.

      • Michael M

        Thanks for the quick reply. I was afraid some advanced coding was required but wanted to make sure I wasn’t missing something simple.

  • Goodzy

    Hi there.. nice article thanks for the info.. though I’m still struggling, hoping you could offer some advice or direction.

    What I am trying to achieve is to take data from Fangraphs.com and import it into my googlesheet. For example I would like the last 14 days wOBA for team hitting. Though done in a way so that I also have that day’s match ups with the wOBA stat falling into the right place cell. That way I can glance at the match up and see who is hitting better lately.

    For example:
    TEAM wOBA
    Chicago Cubs – 344
    Boston Red Sox – 312

    There are other stats I want to use as well, but the above is just a start.

    Trying to automate a ‘glancing’ spread sheet for the days games with some key stats. So that each day I open the spread sheet, the games update along with the stats I want.

    Is that possible ?

    Thanks