Google Data Analytics Certificate Capstone Project

Go with Blue by afagen is licensed under a Creative Commons CC BY-NC-SA 2.0 license

Introduction

This case study is my capstone project for the Google Data Analytics Certificate. It involves analysis of historical data for a fictional company, Cyclistic, a bike sharing company located in Chicago, to make recommendations for an upcoming marketing campaign. Although the company and scenario are fictitious, the data used for this project are real data collected between August 2020 – July 2021 from a bike share program in Chicago. In this project I am assuming the role of the junior analyst.

Scenario

Cyclistic is a fictional bike sharing company located in Chicago. It operates a fleet of more than 5,800 bicycles which can be accessed from over 600 docking stations across the city. Bikes can be borrowed from one docking station, ridden, then returned to any docking station in the system Over the years marketing campaigns have been broad and targeted a cross-section of potential users. Data analysis has shown that riders with an annual membership are more profitable than casual riders. The marketing team are interested in creating a campaign to encourage casual riders to convert to members.

The marketing analyst team would like to know how annual members and casual riders differ, why casual riders would buy a membership, and how Cyclistic can use digital media to influence casual riders to become members. The team is interested in analyzing the Cyclistic historical bike trip data to identify trends in the usage of bikes by casual and member riders.

I. ASK

Business Objective

To increase profitability by converting casual riders to annual members via a targeted marketing campaign.

Business Task for Junior Analyst

The junior analyst has been tasked with answering this question: How do annual members and casual riders use Cyclistic bikes differently?

Stakeholders

The stakeholders in this project include:

Lily Moreno, Director of Marketing at Cyclistic, who is responsible for the marketing campaigns at Cyclistic.

The Cyclistic marketing analytics team. This team is responsible for collecting, analyzing and reporting data to be used in marketing campaigns. I am the junior analyst on this team

The Cyclistic executive team. This team makes the final decision on the recommended marketing plan. They are notoriously detail-oriented.

II. PREPARE

Where is Data Located?

The data used for this analysis were obtained from the Motivate, a company employed by the City of Chicago to collect data on bike share usage.

How is the Data Organized?

The data is organized in monthly csv files. The most recent twelve months of data (August, 2020 – July 2021) were used for this project. The files consist of 13 columns containing information related to ride id, ridership type, ride time, start location and end location and geographic coordinates, etc.

Credibility of the Data

The data is collected directly by Motivate, Inc., the company that runs the Cyclistic Bike Share program for the City of Chicago. The data is comprehensive in that it consists of data for all the rides taken on the system and is not just a sample of the data. The data is current. It is released monthly and, as of August 2021, was current to July 2021. The City of Chicago makes the data available to the public.

Licensing, privacy, security, and accessibility

This data is anonymized as it has been stripped of all identifying information. This ensures privacy, but it limits the extent of the analysis possible. There is not enough data to determine if casual riders are repeat riders or if casual riders are residents of the Chicago area. The data is released under this license.

Ability of Data to be used to answer Business Question

One of the fields in the data records the type of rider; casual riders pay for individual or daily rides and member riders purchase annual subscription. This information is crucial to be able to determine differences between how the two groups use the bike share program.

Problems with the data

There are some problems with the data. Most of the the problems (duplicate records, missing fields, etc.) can be dealt with by data cleaning, but a couple of issues require further explanation.

Rideable-type Field

The rideable_type field contains one of three values – Electric bike, Classic bike or Docked bike. Electric and Classic bikes seem self-explanatory, but what exactly a Docked bike is is unclear. From a review of the data it seems that electric bikes were available to both types of users for the entire 12 month period; classic bikes were available to both groups of users but only from December 2, 2020 to July 31, 2021; and Docked Bikes were available to members from August 1, 2020 to January 13, 2021 and to casual users for the entire 12 months. For the purpose of this analysis these rideable types will not be used to segment the data or draw any conclusions about bike preferences as data collection for this variable is not consistent across the time period being analyzed.

Latitude and Longitude

There is also a challenge with the latitude and longitude coordinates for the stations. Each station is represented by a range of lat/long coordinates. The start/end latitude and longitude seem to indicate the location of a particular bike. Creating a list of popular stations is not difficult, but mapping the stations is more problematic. This was remedied by averaging the lat and long values for the stations before mapping. This resulted in the rides counts for a station matching the ride count for one set of lat/long coordinates.

III. PROCESS & CLEAN

What tools are you choosing and why?

For this project I choose to use RStudio Desktop to analyze and clean the data and Tableau to create the visualizations. The data set was too large to be processed in spreadsheets and RStudio Cloud.

Review of Data

Data was reviewed to get an overall understanding of content of fields, data formats, and to ensure its integrity. The review of the data involved, checking column names across the 12 original files and checking for missing values, trailing white spaces, duplicate records, and other data anomalies.

The review of the data revealed several problems:

  • Duplicate record ID numbers
  • Records with missing start or end stations
  • Records with very short or very long ride durations
  • Records for trips starting or ending at an administrative station (repair or testing station)

Once the initial review was completed, all twelve files were loaded into one data frame. The resulting amalgamated file consisted of 4.731,081 rows with 13 columns of character and numeric data. This matched the number of records in the twelve monthly files.

Extraction of Data from Existing Fields

To allow for more granular analysis of the data and more insights, several new columns were created and populated with data from the started_at column of date and time. These new columns were day, month, year, time and day of the week.

Another column was created to contain the trip duration (length of each trip). The data for this column was created by calculating the difference in time between the start and end time of the ride. Another version of this column was then created to contain the trip duration in minutes.

Data Cleaning

Duplicate records (based on the RIDE ID field) were removed. (209 records removed)

alltrips_v2 <- distinct(alltrips, ride_id, .keep_all=TRUE)

Records for trips less than 60 seconds (false starts) or longer than 24 hours were removed. Bikes out longer than 24 hours are considered stolen and the rider is charged for a replacement. (82,282 records removed)

alltrips_v2 <- alltrips_v2[!(alltrips_v2$ride_length<60 | alltrips_v2$ride_length>86400),]

Records with missing fields start_station, end_station, start/end lat/long fields were removed. (544,204 records removed)

alltrips_v3 <- alltrips_v2[!(is.na(alltrips_v2$start_station_id) | is.na(alltrips_v2$end_station_id) | is.na(alltrips_v2$ride_id) | is.na(alltrips_v2$rideable_type) | is.na(alltrips_v2$started_at) | is.na(alltrips_v2$ended_at) | is.na(alltrips_v2$end_lat) | is.na(alltrips_v2$end_lng)),]

Records for trips that started or ended at DIVVY CASSETTE REPAIR MOBILE STATION or HUBBARD ST BIKE CHECKING (LBS-WH-TEST) were removed as these are administrative stations. (143 records removed)

alltrips_v3<- alltrips_v3[!(alltrips_v3$start_station_name == "DIVVY CASSETTE REPAIR MOBILE STATION" | alltrips_v3$start_station_name == "HUBBARD ST BIKE CHECKING (LBS-WH-TEST)" | alltrips_v3$start_station_name == "WATSON TESTING DIVVY" | alltrips_v3$end_station_name == "DIVVY CASSETTE REPAIR MOBILE STATION" | alltrips_v3$end_station_name == "HUBBARD ST BIKE CHECKING (LBS-WH-TEST)" | alltrips_v3$end_station_name == "WATSON TESTING DIVVY"),]

Station names were checked for leading and trailing spaces during the analysis phase and there did not appear to be any.

Initially the data set contained 4,731,081 records. Once data was cleaned, 4.101,243 records remained. 13.3% of the records were removed.

IV. ANALYZE

Once the data was cleaned, analysis of the data was undertaken in RStudio to determine the following:

  • Mean, median, maximum and minimum ride duration (by rider type)
  • Average ride length by day and by rider type
  • Count of trips by rider type
  • Count of trips by bicycle type
  • Count of the number of start stations

The cleaned data set was used to create a csv file that was exported from RStudio and imported into Tableau for further analysis and creation of visualizations.

Tableau was used to further analyze the data and determine:

  • Ride duration
  • Times of Day for rides
  • Days of the week for rides
  • Months of the year of the rides
  • Top 20 start stations by user type
  • Top 20 end stations by user type

Summary of analysis

From the analysis we can see that there are several key differences between casual and member riders.

Number and Length of Rides

Member riders take more trips than casual riders but casual riders take longer rides than member riders (Figure 1). Casual riders average 31.85 minutes per ride as opposed to 14.43 minutes for member riders (Figure 2). The total amount of time ridden by casual riders is greater than by member riders (Figure 3).

Pie graph showing that member riders take about 56% of all rides.  Casual riders take 44%.
Figure 1.
Graph showing casual riders trips take an average of 131.85 minutes and member riders trips take an average of 14.43 minutes.
Figure 2.

Figure 3.

Trips by Time/Day/Month

Graph showing start hours for casual and member riders.  Member riders have three peak times - highest at 5 pm, another peak at lunch time and another peak at 8 am.  The number of casual rides increases during the day and peaks at 5 pm
Figure 4

The number of trips made by casual riders increases over the day, peaking at 5 pm. Member trips also peak at 5 pm but there are two smaller peaks earlier in the day at 8 am and lunch time, which corresponds with the work day.

Figure 5.

Figure 5 shows that weekend days are popular with casual riders whereas member rider trips are spread out more evenly throughout the week.

Figure 6.

The winter months (December, January and February) see very few rides. The summer months are popular with both types of riders. July is the busiest month for casual riders.

Station Usage

Member and casual riders also differ in the stations that are popular for starting and ending their rides.

Top starting and destination stations for casual riders cluster around tourist destinations within about 1 km of the lakefront from Lincoln Park in the north to the Field Museum in the south. Member riders top stations are more spread out and reflect office locations.

Figure 7
Figure 8

V. SHARE

Detailed documentation of R code is available on Kaggle and further, interactive visualizations are available on Tableau Public.

VI. ACT

Top Three Recommendations

Based on an analysis of the data, the following recommendations can be made to the Cyclistic stakeholders:

  1. The marketing campaign should be targeted at the popular start and end stations for casual riders.
  2. To reach the most riders, marketing should be targeted for the busiest casual rider days (Friday, Saturday and Sunday), busiest hours (afternoon) and the most popular months (June, July and August).
  3. Further data should be released or obtained to determine which casual riders are local to the Chicago area (as these riders are more likely to consider an annual membership than a tourist from out of the city) and to determine what changes might need to be made to the existing membership subscription model to make it more appealing to casual riders (casual riders have an average trip length of 32 minutes, longer on weekends, and the annual membership has ride lengths caps of 45 minutes).

Annotating Alice

This past Monday and Tuesday I attended eCampusOntario’s Technology  + Education Seminar + Showcase (TESS) atop the Globe and Mail Centre in Toronto.  As part of this conference I presented an Ignite Talk on a recent Open Pedagogy project I have been involved in:  Hypothesis and Alice B. Toklas:  Student Annotation of a Public Domain Work.

This presentation described an annotation assignment that was part of an Introduction to Non-fiction course that was taught at Ryerson University in the spring of 2019.   The text of The Autobiography of Alice B. Toklas by Gertrude Stein (public domain in Canada and Australia) was pulled into Ryerson Library’s Pressbooks authoring platform so that it was freely available to students on the web.  In the EDU version of Pressbooks, Hypothesis can be turned on within a book.  This makes it easy for students to create annotations as all they need is a Hypothesis account.

This text lends itself well to annotations that provide additional information about the contents of the text as many names, places and works of art are discussed.  Students paired up and chose a chapter to annotate.  They then presented their annotations to the others in the class.  Feedback from the students was positive; this was a new experience for them as they had not done this type of assignment before and they enjoyed the sleuthing involved.  The instructor found that using the annotation assignment was a good way of working through the text and felt confident that the students had actually read the text.  She also thought that it encouraged peer-to-peer learning and helped them uncover themes between the chapters.

Several lessons were learned as a result of this project.  A private Hypothesis group was set up for this class, but students ended up editing in the public layer.  In future this could be remedied by reviewing the first few annotations to ensure that they were being made in the correct group.  The text of the book that was pulled into Pressbooks contained a considerable number of typos – introduced during the OCR process.  These were corrected after the book had been “published” in Ryerson’s Pressbooks site by Library staff.  This could be built into the assignment for students by encouraging them to report errors (using tags in Hypothesis) as part of the assignment.  The last lesson learned had more to do with pedagogy. The instructor indicated that in future she would do a bit more scaffolding to model how and what to annotate.  Some annotations were deemed to be excessively long, and some not relevant to the content of the course.

The Open Syllabus Project shows that half of the top 20 assigned required texts are for public domain books.  Because these texts are no longer copyrighted, they are readily available to pull into Pressbooks and ideal candidates for assignments such as this and for other open pedagogy projects.

Anatomy of a Creative Commons License

Anatomy of a CC License

The assignment this week was to create an infographic, presentation, video, etc. to explain the anatomy of a Creative Commons license.  I decided to try using Canva for this assignment to produce a cross between a poster and an infographic.  The main challenge for this assignment was to succinctly summarize the information in such a way that it was comprehensive but not overwhelming.

Canva comes with various graphics that can be used in your creation.  You can also upload your own images.  For the cell image, I used an image from the Noun Project  with appropriate attribution.  FOr the structure of the poster, used one of the Canva infographic templates, but by the time I had made my edits, it looked very little like the original which was about bicycle safety.

The Landscape of Copyright Law

Landscape of Copyright Law by Sally Wilson licensed with a CC-BY 4.0 International License

This week’s assignment for the Creative Commons Certificate course was to create something to explain Copyright Law.  For the previous assignment I got bogged down by having to hack various technologies to get them to do what I wanted to do, so this week I took a more analog tack and used acrylic paints and deli paper (and also google docs, Word and Illustrator).  The idea was to explain copyright law using the landscape as a metaphor.  As copyright law varies considerably from country to country, this explanation is a necessarily somewhat abstract.   This is the result. (Click on the image and zoom in to read the text).

The Story of the Commons

This fall I am taking a Creative Commons Certificate Course for Librarians.  The first assignment involves telling the story of the Creative Commons using one of a selection of tools.

I wanted to create a visually appealing presentation using a tool like Haiku Deck.  I have used it in the past and like the layout templates and easy to use Creative Commons images.  I started creating my video using this tool, but soon discovered that an upgrade to the Pro version was necessary to embed audio into the presentation.

Plan B involved using Google slides for the presentation part and then embedding invisible videos to take care of the audio with the whole lot being exported to the web.  After upgrading Audacity, installing something called ffmeg and trying to create videos from audio files then getting a Google video playback error, Plan C was invoked.

Plan C – Install Screencastify, make short videos and embed them in Google Slides. This worked somewhat, but the resulting embedded slideshow below does not play the audio automatically. If you would like to view the slide show, please advance the slides manually and click on the play arrow to hear the audio.

1K Radius – Torontopia

Spacing Magazine has opened a pop-up shop at the northwest corner of Queen and Bay Streets in Toronto.  It is open from Wednesday to Saturday until the Labour Day weekend.  The store, which operates out of a shipping container, has a selection of Toronto-themed goods (a subset of what is sold at the Spacing store at 401 Richmond St.). The t-shirts, postcards, socks, buttons, bags and books will appeal to both tourists and Toronto residents alike.

This is the first in a planned series of brief blog posts about places of interest within a 1 km radius of Ryerson University that can easily be reached on a lunch time stroll.

Notes from the #ccsummit

This past weekend (April 28- April 30, 2017), Creative Commons held their global summit in Toronto.  There was a packed program of sessions, workshops, keynotes, etc. over the three days. Here are a few of my highlights from the Summit.

3D Surprise

The first day of the Summit got off to a spectacular start with Ryan Merkley’s (CEO of Creative Commons) announcement of the 3D printing of a column from the ancient city of Palmyra in Syria.  This tetrapylon is a copy of one that was destroyed in 2016.  More information about this project  can be found on the #NewPalmyra website.

This story was featured on the CBC National on Wednesday May 3 (at the 36:45 minute mark)

 

Can you Plant That Seed?

Tom Michaels (University of Minnesota, Dept of Horticultural Science) asked this question when talking about preparing a bright red pepper for eating.  When you have trimmed, cored and sliced the pepper, can you go ahead and plant the seed?  Fifty years ago the answer to this question would be “Yes”.  Today the answer is “Maybe” as many large plant breeders have placed limitations on what you can do with seeds.  Thirty years from now we do not want the answer to be “No”, so we need an alternative to patent-protected seeds sold by large agricultural companies.  Enter the Open Source Seed Initiative (OSSI) which aims to create open source varieties of crop seeds.

Since seeds do not lend themselves well to software licenses, other options were considered to ensure that at least some seeds are always available for sharing and are not locked by intellectual property rights.  The following simple pledge was developed to provide an alternative to the legal restrictions on many seeds:

“You have the freedom to use these OSSI-Pledged seeds in any way you choose. In return, you pledge not to restrict others’ use of these seeds or their derivatives by patents or other means, and to include this pledge with any transfer of these seeds or their derivatives.”

If you are looking for seeds that preserve the farmer’s and gardener’s rights to save, replant, share, breed, and sell seed, look for seeds marked with the logo on the left.

 

Open Syllabus Project

I caught the latter part of this presentation about the Open Syllabus Project software that mines the syllabi from millions of courses, primarily in the United States but also with representation from Canada, the UK, Australia and New Zealand, to surface the texts being taught in these courses.

Top ten titles in Canadian Syllabi (May 2017)

From the OER perspective, the data surfaced via the Open Syllabus Explorer, software that mines the data collected by the project, gives a clear picture of which public domain works are heavily used across the college and university curriculum and would be good candidates for including in open text projects.

The list to the left was generated by the Explorer and shows the top ten titles appearing in Canadian syllabi as of early May, 2017.

Future versions of the project will include a syllabi map, bar charts, ability to see changes over time, metadata improvements and more.

Creative Commons Certificates

“CC Certificates Logo” by Creative Commons licensed under CC-BY

This session introduced a new and exciting certification initiative from Creative Commons.  This is not going to be a read some text, complete some disposable assignments and answer a few questions course, but rather a big questions, applied practice, reflection, creating and sharing learning experience.  The bank of assignments being developed for the program borrows many positive characteristics from the popular DS106 (Digital Storytelling) course which challenges participants to take their learning to a higher level.

Currently there are four certificates planned:  a Core Certificate and three sector-based (Library, Education and Government) Certificates.

The Creative Commons Certificates aren’t quite ready for prime time yet, but you can check them out, start thinking about how you might complete the assignment, or provide feedback on the project.

Sharing Photos

CC0 licensed image by Tim Wright

CC0 licensed image by Scott Webb

Unsplash

The Capturing the Community of Photographers session highlighted two recent photo sharing projects.   Unsplash, based in Montreal, has created a collection of high-resolution public domain photographs that you can use as you wish.  All photos published on Unsplash are licensed under Creative Commons Zero which means you can copy, modify, distribute and use the photos for free, including commercial purposes, without asking permission from or providing attribution to the photographer or Unsplash.  Attribution would be a great way of thanking the photographer and Unsplash for making the images freely available.

Capture.Canada

This photography app is Government of Canada project which received funding as part of the Canada 150 celebrations. The app is designed to let Canadians take pictures of the country and upload them to a living archive.  Coming soon to a phone near you.

 

 

 

Some Notes from OpenCon 2016 Toronto

This past Saturday a group of enthusiastic Open Access advocates met to attend OpenCon 2016 Toronto held at TWG (The Working Group) on Adelaide St. in Toronto.  Lorraine Chuen, Co-Founder of OOO Canada Research Network hosted the event.

Open Access and Social Justice: Aligning Open Access with the Mission of the Public University

The first speaker of the day was Leslie Chan, Senior Lecturer in the Department of Arts, Culture and Media and the Centre for Critical Development Studies at the University of Toronto, Scarborough.  His talk discussed shifting the narrative of Open Access from showcasing an institution’s research output to using it for public good.  Chan pointed out that changing this narrative is difficult in an era of metrics, marketing and recruitment as there is a disconnect between the stated mission of public good and the tendency to favour rankings as a yardstick.

The second part of his talk focused on the tyranny of journal articles, their format – a relatively recent development perpetuated by the major journal publishers to facilitate sales, the insidious nature of impact factors, and the focus on the discovery aspect of research.

Chan also discussed new models of access that can help address the inequity not just of access but also of knowledge dissemination. This interesting map used in the presentation graphically illustrates the imbalance in contributions to scientific research:

Contributions to Science Research

Map that shows size of countries based on the amount of science research

Map created and copyrighted by SASI Group (University of Sheffield) and Mark Newman (University of Michigan) and licensed under a Creative Commons BY-NC-ND license and is available at: http://www.worldmapper.org/display.php?selected=205. Based on data from the World Bank’s 2005 World Development Indicators.

Beyond Free: Harnessing the Resonant Value in Open and Collaborative Practices for Public Good

David Porter, the new CEO of eCampus Ontario, gave a talk that highlighted many existing Open Educational Resource (OER) initiatives and outlined plans for developing an OER textbook  platform in Ontario (by reviewing and importing the BCcampus Open Textbooks) and engaging faculty here.

Beyond the economic benefits to students of free textbooks, Porter outlined five other benefits:

  • Teachers have full legal control to customize and contextualize learning resources for their students
  • Access to customized resources improves learning (by providing choices for students)
  • Opportunities for authentic learning activities (student contributions to learning)
  • Collegial collaboration
  • Demonstration of the service mission of the institution

For those interested in attending more Open events, Porter mentioned the following:

Porter’s slides and notes are available on SlideShare:

Lightning Sessions

After lunch there were several lightning sessions highlighting various open initiatives:

Dr. Rachel Harding talked about making her research into Huntington’s disease open by blogging about her findings on Lab Scribbles where she reports on her research in real time and by posting and licensing her data under Creative Commons on zenodo. She has found that this practice has led to other researchers sharing their work with her reducing the amount of time required for her research.

Wes Kerfoot, a recent philosophy graduate from McMaster, talked about and demoed his software, Textbook Commons,  that he built to help students find public domain copies of course readings and texts.

Karen Young, recently graduated from the University of Toronto, spoke about Student Participation in the Open Access Movement and it Applications for Mental Health and how in this sphere Open Access not only benefits the researcher but also those facing mental health challenges while pursuing higher education.

University of Toronto Culture and New Media Professor, Alessandro Delfanti talked about academic social media sites, Academia.edu and ResearchGate, that faculty members use to disseminate their research.  He mentioned both the positive and negative aspects of these sites.  While using these sites to post research does make it more accessible there are issues such as the opaque algorithms that generate scores, the proprietary nature of the software, and the danger of publishers purchasing the sites and their content.

Open Access in the Creative Disciplines

The final formal conference session was given by Chris Landry, Scholarly Communications Librarian at OCADU. His talk, Open Access in the Creative Disciplines, focused on some of the challenges of Open Access in the arts.  He also talked about the difficulties of making content available through institutional repositories as most institutions only encourage contribution to these repositories and it is a rare few such as Harvard that require faculty members to opt out of participation.

The day wrapped up with a one-hour unconference focusing on the following topics:

Building a Open Education Movement for Student Leaders in Ontario
Facilitator: Chris Fernlund (eCampus Ontario)

Open Access and Social Justice
Facilitator:  Dr. Leslie Chan
Open Access and Social Justice Session Notes

Reproducible Research & Open Science Tools
Facilitators:  Mike Galang & Rachel Harding

History and New Media Adventure

RULA_mapsThis semester several of us in the Ryerson Library (librarians, developers, co-ordinator of our Digital Media Experience Lab) are contributing to a course, HIS500 being taught to upper-year students at Ryerson by Art Blake.  Many of these students are history majors, but there are also students who come from other disciplines across campus.  Library staff will be assisting with this course by drawing on our previous experiences with various digital media tools and, in particular, with a tool called RULA Maps. RULA Maps  was developed by the Library in conjunction with the Ryerson Department of Architectural Science for use one of its courses.   The plan is to build upon the RULA Maps app for a class project tentatively called Feeling Ryerson, Feeling Toronto.  The app was initially designed with buildings as the focal points, but with this project we hope to incorporate emotions and narratives into the app.

SlackAs part of this class we are also exploring using Slack for communication.  Slack is a team collaboration tool that allows for communication between individuals and groups and is available for use on desktop, laptop and via downloadable apps for iOs and Android.  Use of this app for communication was slow to start, but now that we have started working on the class project, Slack is getting much more use.  Several groups have been set up in Slack to facilitate communication within and between the various teams.

Also of interest in this course is the use of an Open Access textbook, Writing History in the Digital Age that is distributed under a distributed under a Creative Commons Attribution-NonCommercial 3.0 United States license.  Students are commenting on the readings from this text in the blogs that they have set up for this class.  These blogs are aggregated on the History and New Media Fall 2015 blog.

 

Ontario Snowy Owl Sightings – Winter 2014/2015

Snowy Owl, Toronto, December 24, 2014

Snowy Owl, Toronto, December 24, 2014

ebird, a project of the Cornell Lab of Ornithology and the National Audubon Society, collects data about bird sightings from observers around the world.  In addition to providing many ways of viewing this data online, it also allows downloading of data for use in non-commercial projects.  To access the data you will need to set up a free account (or use  your Project FeederWatch or Great Backyard Bird Count account) and request access to the data.

I requested data about bird sightings in Ontario and a smaller file of sightings of Snowy Owls in Ontario .  The data comes with terms of use, a recommended citation format and metadata information.

 

This map was created using the ebird data and CartoDB.  I uploaded the dataset, created a query to extract a subset of the data relating to the winter of 2014/2015 and created a torque map using the CartoDB map wizard.  The map shows a time-lapse of reported sightings of Snowy Owls in Ontario from October 1, 2014 – April 29, 2015.

Data retrieved from:  eBird Basic Dataset. Version: EBD_relMay-2015. Cornell Lab of Ornithology, Ithaca, New York. May 2015.