Course information

Date and time
Tuesdays and Thursdays, 11:30am-12:45pm
Location
Zoom (Online, synchronous)

Zoom meeting information will be shared directly with students through Sakai.

Course staff information

Instructor
Sayamindu Dasgupta
Teaching assistant
Malvika Pillai
Office hours
By appointment

Overview

In a world that is increasingly driven by software and data, developing fluency with the basics of programming and data analysis is a crucial skill. This course will introduce basic programming and data science tools to give students the skills to use data to answer questions about local and online communities.

In particular, the class will cover the basics of the Python programming language, an introduction to web APIs including APIs from Wikipedia and Twitter, and will teach basic tools and techniques for data analysis and visualization. As part of the class, participants will learn to write software in Python to collect data from public datasets and web APIs and process that data to produce numbers, tables, and graphical visualizations that answer their questions.

The class will be built around student-designed independent projects. Every student will pick a question or issue they are interested in pursuing and will work with the instructor to build from that question toward an analysis of data that the student has collected using software they have written.

Please note that this course is designed for students with little or no prior programming experience. If you already consider yourself to be knowledgeable about programming, this is probably not the course for you. Furthermore, this introduction to programming is intentionally quick and dirty, and is focused on what you need to get things done. If you want to become a professional programmer, this is also probably not the right class. If you want to learn about programming so that you can more effectively answer questions with data by writing your own software and by managing and communicating more effectively with programmers, you are in the right place.

Objectives

At the end of the course, you will be able to:

Grading

You will be graded based on the following elements:

There is a total of 100 points.

Final grades will be assigned according to the following schedule for undergraduate students:

Grade Points
A 95 to 100
A- 90 to 94
B+ 87 to 89
B 84 to 86
B- 80 to 83
C+ 77 to 79
C 74 to 76
C- 70 to 73
D+ 67 to 69
D 60 to 66
F <60

Final grades will be assigned according to the following schedule for graduate students:

Grade Points
H 95 to 100
P 80 to 94
L 60 to 79
F <60

Assessment

Project idea

Maximum length: 500 words
Due date: October 6, 11:00 AM
Submission: Sakai

In this assignment, you should concisely identify an community or context that you are interested in a source of data and/or and a list of at least 3-4 questions you might be interested in answering in the context of your final project. I am hoping that each of you will pick an area or domain that you are intellectually committed to and invested in (e.g., your town, or an online community that you participate in). You will be successful if you describe the scope of the problem and describe why you are interested in using the techniques you are learning in this class to tackle this problem.

I will give you feedback on these write-ups and will let you each know if I think you have identified a questions that might be too ambitious, too trivial, too broad, too narrow, etc.

Project proposal

Maximum length: 1000 words
Due date: October 22, 11:00 AM
Submission: Sakai

Building on your project idea assignment, you should describe the specific types of data you will collect, the steps you will take to collect the dataset, the limits and strength of these data for answering the question you have selected, and a description of the kinds of report and visualization you will make. An important step here is going to be framing your analysis. Why is this is an important question? Why do you care? What do we need to know (e.g., about the question, about underlying theories, about your business, about the topic, about the community) to understand this analysis? This will all need to be part of your final project.

I will give you feedback on these proposals and suggest changes or modifications that are more likely to make them successful or compelling and to work with you to make sure that you have the resources and support necessary to carry out your project successfully.

Final project

Presentation date: Week of November 10
Paper due date: TBD, 12:00 PM

For your final project, I expect you to build on the first two assignments to describe what you have done and what you have found. I’ll expect every student to give both:

I expect that your reports will include text from the first two assignments and reflect comprehensive documentation of your project. Each project should include: (a) the description of the question and community you have identified and information necessary to frame your question, (b) a description of the how you collected your data, (c) the results.

A successful project will tell a compelling story and will engage with, and improve upon, the course material to teach an audience that includes me, your classmates, how to take advantage of programming with data more effectively. The very best papers will give us all a new understanding of some aspect of course material and change the way I teach some portion of this course in the future.

Paper and Code

Your final project should include detailed information on:

If you want inspiration for how people use data science to communicate this kinds of findings broadly and effectively, take a look at great sources of data journalism including Five Thirty Eight or The Upshot at the New York Times. Both of these publish an large amount of excellent examples of data analysis aimed at broader non-technical audiences like the ones you’ll be communicating with and quite a bit of their work is actually done using Python. A simple Five Thirty Eight story will include a clear question, a brief overview of the data sources and method, a figure or two plus several paragraphs walking through the results, followed by a nice conclusion. I’m asking you to try to produce something roughly like this.

Keep in mind that most stories on Five Thirty Eight are under 1000 words and I’m giving up to 2000 words to show me what you’ve learned. As a result, you should do more than FiveThirtyEight does in a single story. You can ask and answer more questions, you can provide more background, context, and justification, you can provide more details on your methods and data sources, you can show us more graphs, you can discuss the implications of your findings more. You to use the space I’ve given you to show off what you’ve done and what you’ve learned!

As you will submit a Jupyter notebook as your final paper, I will automatically get to see your code. Make sure that you also submit your data (if you use a copy) with your submission. However, I will not be emphasizing the quality or quantity of your code but rather the degree to which you have been successful at answering the substantive questions you have identified.

Presentation

Your presentation should do everything that your paper does and should provide me with a very clear idea of what to expect in your final paper. I’m going to give you all at least a paragraph of feedback after your talk. This will be an opportunity for me to see a preview of your paper and give you a sense for what I think you can improve. It’s too your advantage to both give a compelling talk and to give me a sense for your project.

Weekly coding activities

Every Thursday from August 20th onward will be dedicated to a set of coding activities that will involve changing or adding to code related to the topic of the week. These coding activities will not be turned in and will not be graded.

In many cases, you will find yourself continuing to work beyond the class on these activities. I will share my solutions answers to each of the coding activities by the subsequent Monday in a Sakai forum. As you will see over the course of the semester, there are many possible solutions to many programming problems and my own approaches will often be different than yours. That’s completely fine! Coding is a creative act!

Please do not share answers to activities before midnight on Sunday so that everybody has a chance to work through answers on their own. After midnight on Sunday, you are all welcome to share your solutions and/or to discuss different approaches. We will discuss the coding activities for a short period of time at the beginning of each class.

Participation

The course relies heavily on participation. The material we’re going to be covering is difficult and we’re going to be covering it quickly. It is going to be extremely difficult to make up any missed classes. Attendance will be the most important part of participation and missing more than 1 session is going to make it extremely difficult to excel in the class. Participation will be graded according to these criteria:

Attendance
It is important for you to attend class. Please be seated and ready when class begins. If personal difficulties (serious illness, etc.) make attendance problematic, please consult with me so that we can make an appropriate plan.
Deportment
You should be attentive in class and respectful of your classmates and the instructor. Turn off cell phones and other devices that might disrupt class. Use laptops and other devices to support current course activities only.
Engagement
Engagement includes: participating in class activities; responding to discussion questions or other questions that I might ask during a lecture; actively listening and taking notes. I value all informed opinions and encourage you to share them.

Engagement will be weighted more heavily than attendance and deportment.

Resources & technology

Text book

We will follow the “Python for Everybody” text book for this course. A copy of the book will be made available in the Resources section in Sakai. You can also buy printed copies (if you prefer printed books) or a version for your e-book reader by following the links from the book’s website.

Zoom

We will be using Zoom to run the class. Each Zoom session will be recorded and uploaded to Sakai. All UNC students are eligible for a Zoom account through UNC—to install and sign up for Zoom, go to https://software.sites.unc.edu/zoom/.

The Zoom meeting URL and meeting ID is shared in Sakai in the resources section in a PDF file named zoom.pdf.

Sakai

Sakai will be used for assignments, forum discussions, and resources. The textbook for this course will be made available in the resources section of Sakai.

Jupyter notebooks

Although we will be using Python, you will not need to download and install Python on your own laptops. We will be using Jupyter notebooks to write programs. In order to use Jupyter notebooks, you will have to use a web-browser such as Mozilla Firefox or Google Chrome. I will share the link where you can sign in during class.

Calendar

Note: This is a tentative schedule and is subject to change. Any changes will be announced in class and by email.

Apart from weekly coding activities, there will be readings for some of the days. I will announce those in advance (at least a week before) and share the material through Sakai’s resource section.

Starting the week of August 18th, every Thursday will be dedicated to working hands-on on a set of coding activities that will involve changing or adding to code that is related to the topic of the week. These coding activities will not be turned in and will not be graded. In many cases, you will have to continue to work on these activities beyond class. I will share my solutions answers to each of the coding activities by the subsequent Monday morning in a Sakai forum. Please do not share answers to activities before midnight on Sunday so that everybody has a chance to work through answers on their own. After midnight on Sunday, you are all welcome to share your solutions and/or to discuss different approaches. We will discuss the coding activities briefly at the beginning of each Tuesday’s class.

Date

Topic

Tuesday, August 11

Introduction and logistics

Thursday, August 13

Introduction to programming

Tuesday, August 18

Introduction to data analysis and Jupyter

Thursday, August 20

Getting started with Python and Jupyter (part 1)

Tuesday, August 25

Class cancelled

Thursday, August 27

Tuesday, September 01

Getting started with Python and Jupyter (part 2)

Thursday, September 03

Tuesday, September 08

First data set - baby names

Thursday, September 10

Tuesday, September 15

Data from Chapel Hill (part 1)

Thursday, September 17

Tuesday, September 22

Data from Chapel Hill (part 2)

Thursday, September 24

Tuesday, September 29

Visualizing data

Thursday, October 01

Tuesday, October 06

Data from the web: Wikipedia (part 1)

Thursday, October 08

Tuesday, October 13

Data from the web: Wikipedia (part 2)

Thursday, October 15

Tuesday, October 20

Data from the web: Twitter (part 1)

Thursday, October 22

Tuesday, October 27

Data from the web: Twitter (part 2)

Thursday, October 29

Tuesday, November 03

Review and final project prep

Thursday, November 05

Tuesday, November 10

Final presentations part 1

Thursday, November 12

Final presentations part 2

Tuesday, November 17

Final presentations part 3

Policies

Instructor communication

For specific, concrete questions, e-mail is the most reliable means of contact for us. You should receive a response within a day or so, but sometimes it may take 2-3 days. If you do not receive a response after a few days, please follow up. Please keep this in mind when you are scheduling your own activities, especially those related to activities with due dates. If you wait until the day before a due date to ask me a clarification question, there is a good chance that you will not receive a response in time.

It is always helpful if your e-mail includes a targeted subject line that begins with “INLS 490.” Please use complete sentences and professional language in your e-mail.

For more complicated questions or help, make an appointment to talk with us by sending us an email.

You are welcome to call me (Sayamindu) by my first name (“Sayamindu” – pronounced “Shayomindoo”). However, you may also use “Dr. Dasgupta” or “Professor Dasgupta” if that is more comfortable for you. Any one of those is fine.

Academic integrity

The UNC Honor Code states that:

It shall be the responsibility of every student enrolled at the University of North Carolina to support the principles of academic integrity and to refrain from all forms of academic dishonesty…

This includes prohibitions against the following:

All scholarship builds on previous work, and all scholarship is a form of collaboration, even when working independently. Incorporating the work of others, and collaborating with colleagues, is welcomed in academic work. However, the honor code clarifies that you must always acknowledge when you make use of the ideas, words, or assistance of others in your work. This is typically accomplished through practices of reference, quotation, and citation.

If you are not certain what constitutes proper procedures for acknowledging the work of others, please ask the course staff for assistance. It is your responsibility to ensure that the honor code is appropriately followed. The UNC Office of Student Conduct provides a variety of honor code resources.

The UNC Libraries has online tutorials on citation practices and plagiarism that you might find helpful.

Students with disabilities

Students with disabilities should request accommodations from the UNC office of Accessibility Resources and Service.

Use of Amazon Web Services (AWS) for course technology

This course uses Amazon Web Services (AWS) for some of its underlying technology.

The specific server used in this course operates in a UNC-managed AWS virtual private cloud. While the course server is not physically located on campus, it uses a private IP address that is not accessible through the public internet. Furthermore, connections to the course server are restricted to campus and UNC VPN, and login access is only available to students, the course staff, and UNC information technology support staff.

Students enrolled in this course must acknowledge and consent to the following:

  1. Students must use this AWS environment to complete required course assignments.
  2. Students must agree not to upload or publish any sensitive data in this specific AWS environment.

Acknowledgement

This syllabus builds on the Community Data Science Course taught by Benjamin Mako Hill and TommyGuy at the University of Washington. You can find their courses and material at https://wiki.communitydata.science/Workshops_and_Classes


  1. Python code and/or data does not count toward the word limit.↩︎