Course information

Date and time
Tuesdays and Thursdays, 12:30PM-1:45PM
Location
Manning 208

Course staff information

Instructor
Sayamindu Dasgupta
Office hours
Set up an online appointment. Note that you will have to be signed in through your UNC account to set up the appointment.

Overview

In a world that is increasingly driven by software and data, developing fluency with the basics of programming and data analysis is a crucial skill. This course will introduce basic programming and data science tools to give students the skills to use data to answer questions about local and online communities.

In particular, the class will cover the basics of the Python programming language, an introduction to web APIs including APIs from Wikipedia and Twitter, and will teach basic tools and techniques for data analysis and visualization. As part of the class, participants will learn to write software in Python to collect data from public datasets and web APIs and process that data to produce numbers, tables, and graphical visualizations that answer their questions.

The class will be built around student-designed independent projects. Every student will pick a question or issue they are interested in pursuing and will work with the instructor to build from that question toward an analysis of data that the student has collected using software they have written.

Please note that this course is designed for students with little or no prior programming experience. If you already consider yourself to be knowledgeable about programming, this is probably not the course for you. Furthermore, this introduction to programming is intentionally quick and dirty, and is focused on what you need to get things done. If you want to become a professional programmer, this is also probably not the right class. If you want to learn about programming so that you can more effectively answer questions with data by writing your own software and by managing and communicating more effectively with programmers, you are in the right place.

Objectives

At the end of the course, you will be able to:

Grading

You will be graded based on the following elements:

There is a total of 100 points.

Final grades will be assigned according to the following schedule for undergraduate students:

Grade Points
A 95 to 100
A- 90 to 94
B+ 87 to 89
B 84 to 86
B- 80 to 83
C+ 77 to 79
C 74 to 76
C- 70 to 73
D+ 67 to 69
D 60 to 66
F <60

Final grades will be assigned according to the following schedule for graduate students:

Grade Points
H 95 to 100
P 80 to 94
L 60 to 79
F <60

Assessment

Midterm

Question distributed: September 30
Due date: October 19
Submission: Sakai

For your midterm, you will have to write Python code to answer a few questions that will be set by me. This exercise will help us both understand where you are in terms of being able to write Python code that analyses data. The exercise will be based on examples that we have already covered in class. You will submit the exercise as a Jupyter notebook.

Project idea

Maximum length: 500 words
Due date: October 26
Submission: Sakai

In this assignment, you should identify communities or contexts that you are interested in as sources of data, along with a list of at least 3-4 questions you might be interested in answering for your final project. I am hoping that each of you will pick contexts that you are intellectually committed to and invested in (e.g., your town, or an online community that you participate in). You will be successful if you describe the scope of the problem and describe why you are interested in using the techniques you are learning in this class to tackle this problem.

I will give you feedback on these write-ups and will let you each know if I think you have identified a questions that might be too ambitious, too trivial, too broad, too narrow, etc.

Project proposal

Maximum length: 1000 words
Due date: November 10
Submission: Sakai

Building on your project idea assignment, you should describe the specific types of data you will collect, the steps you will take to collect the dataset, the limits and strength of these data for answering the question you have selected, and a description of the kinds of report and visualization you will make. An important step here is going to be framing your analysis. Why is this is an important question? Why do you care? What do we need to know (e.g., about the question, about underlying theories, about your business, about the topic, about the community) to understand this analysis? This will all need to be part of your final project.

I will give you feedback on these proposals and suggest changes or modifications that are more likely to make them successful or compelling and to work with you to make sure that you have the resources and support necessary to carry out your project successfully.

Final project

Presentation dates: See calendar below.
Paper due date: Friday December 3, 12:00 PM

For your final project, I expect you to build on the first two assignments to describe what you have done and what you have found. I’ll expect every student to give both:

I expect that your reports will include text from the first two assignments and reflect comprehensive documentation of your project. Each project should include: (a) the description of the question and community you have identified and information necessary to frame your question, (b) a description of the how you collected your data, (c) the results.

A successful project will tell a compelling story and will engage with, and improve upon, the course material to teach an audience that includes me, your classmates, how to take advantage of programming with data more effectively. The very best papers will give us all a new understanding of some aspect of course material and change the way I teach some portion of this course in the future.

Paper and Code

Your final project should include detailed information on:

If you want inspiration for how people use data science to communicate this kinds of findings broadly and effectively, take a look at great sources of data journalism including Five Thirty Eight or The Upshot at the New York Times. Both of these publish an large amount of excellent examples of data analysis aimed at broader non-technical audiences like the ones you’ll be communicating with and quite a bit of their work is actually done using Python. A simple Five Thirty Eight story will include a clear question, a brief overview of the data sources and method, a figure or two plus several paragraphs walking through the results, followed by a nice conclusion. I’m asking you to try to produce something roughly like this.

Keep in mind that most stories on Five Thirty Eight are under 1000 words and I’m giving up to 2000 words to show me what you’ve learned. As a result, you should do more than FiveThirtyEight does in a single story. You can ask and answer more questions, you can provide more background, context, and justification, you can provide more details on your methods and data sources, you can show us more graphs, you can discuss the implications of your findings more. You to use the space I’ve given you to show off what you’ve done and what you’ve learned!

As you will submit a Jupyter notebook as your final paper, I will automatically get to see your code. Make sure that you also submit your data (if you use a copy) with your submission. However, I will not be emphasizing the quality or quantity of your code but rather the degree to which you have been successful at answering the substantive questions you have identified.

Presentation

Your presentation should do everything that your paper does and should provide me with a very clear idea of what to expect in your final paper. I’m going to you all feedback after your talk. This will be an opportunity for me to see a preview of your paper and give you a sense for what I think you can improve. It’s to your advantage to both give a compelling talk and to give me a sense for your project.

Weekly coding activities

Every other day from August 26th onward will be dedicated to a set of coding activities that will involve changing or adding to code related to the topic of the week. These coding activities will not be turned in and will not be graded.

In many cases, you will find yourself continuing to work beyond the class on these activities. Though these activities are not graded, if you do not complete these activities, you will face difficulties in class going forward. If you feel that you are having a tough time in completing these activities, I encourage you to set up a time with me so that I can help you complete them.

At the beginning of class on the subsequent day for each coding activity, I will go over the ways in which the activities can be completed correctly. I will also share the completed activities on Sakai.

Participation

The course relies heavily on participation. The material we’re going to be covering is significant in terms of volume and we’re going to be covering it quickly. It will be extremely difficult to make up any missed classes. Attendance will be the most important part of participation and missing class sessions will make it hard to be successful in the class. Participation will be graded according to these criteria:

Attendance
It is important for you to attend class. Please be seated and ready when class begins. If personal difficulties (serious illness, etc.) make attendance problematic, please consult with me so that we can make an appropriate plan.
Deportment
You should be attentive in class and respectful of your classmates and the instructor. Turn off cell phones and other devices that might disrupt class. Use laptops and other devices to support current course activities only.
Engagement
Engagement includes: participating in class activities; responding to discussion questions or other questions that I might ask during a lecture; actively listening and taking notes. I value all informed opinions and encourage you to share them.

Engagement will be weighted more heavily than attendance and deportment.

Resources & technology

Text book

We will follow the “Python for Everybody” text book for this course. A copy of the book will be made available in the Resources section in Sakai. You can also buy printed copies (if you prefer printed books) or a version for your e-book reader by following the links from the book’s website.

Sakai

Sakai will be used for assignments, forum discussions, and resources. The textbook for this course will be made available in the resources section of Sakai.

Jupyter notebooks

Although we will be using Python, you will not need to download and install Python on your own laptops. We will be using Jupyter notebooks to write programs. In order to use Jupyter notebooks, you will have to use a web-browser such as Mozilla Firefox or Google Chrome. I will share the link where you can sign in during class.

Calendar

Apart from weekly coding activities, there will be readings for some of the days. I will announce those in advance and share the material through Sakai’s resource section.

Every other day from August 26th onward will be dedicated to a set of coding activities that will involve changing or adding to code related to the topic of the week. These coding activities will not be turned in and will not be graded. At the beginning of class on the subsequent day for each coding activity, I will go over the ways in which the activities can be completed correctly. I will also share the completed activities on Sakai.

Date Topic
Thursday, August 19 Introduction and logistics
Tuesday, August 24 Introduction to programming
Thursday, August 26 Introduction to data analysis and Jupyter
Tuesday, August 31
Thursday, September 02 Getting started with Python and Jupyter (part 1)
Tuesday, September 07
Thursday, September 09 Getting started with Python and Jupyter (part 2)
Tuesday, September 14
Thursday, September 16 First data set—baby names
Tuesday, September 21
Thursday, September 23 Data from Chapel Hill (part 1)
Tuesday, September 28
Thursday, September 30 Data from Chapel Hill (part 2)
Tuesday, October 05
Thursday, October 07 Visualizing data
Tuesday, October 12
Thursday, October 14 Data from the web: Wikipedia (part 1)
Tuesday, October 19
Thursday, October 21 No class: Fall break
Tuesday, October 26 Data from the web: Wikipedia (part 2)
Thursday, October 28
Tuesday, November 02 Data from the web: Twitter (part 1)
Thursday, November 04
Tuesday, November 09 Data from the web: Twitter (part 2)
Thursday, November 11
Tuesday, November 16 Review and final project prep
Thursday, November 18
Tuesday, November 23 Final presentations part 1
Thursday, November 25 No class: Thanksgiving
Tuesday, November 30 Final presentations part 2

Policies

Community standards in our course and mask use.

This fall semester, while we are in the midst of a global pandemic, all enrolled students are required to wear a mask covering your mouth and nose at all times in our classroom. This requirement is to protect our educational community–your classmates and me–as we learn together. If you choose not to wear a mask, or wear it improperly, I will ask you to leave immediately, and I will submit a report to the Office of Student Conduct. At that point you will be disenrolled from this course for the protection of our educational community. Students who have an authorized accommodation from Accessibility Resources and Service have an exception. For additional information, see https://carolinatogether.unc.edu/community-standards/#chapter-2.

Syllabus changes

I may make changes to this syllabus, including due dates and topic covered. These changes will be announced as early as possible.

Instructor communication

For specific, concrete questions, e-mail is the most reliable means of contact for us. You should receive a response within a day or so, but sometimes it may take 2-3 days. If you do not receive a response after a few days, please follow up. Please keep this in mind when you are scheduling your own activities, especially those related to activities with due dates. If you wait until the day before a due date to ask me a clarification question, there is a good chance that you will not receive a response in time.

It is always helpful if your e-mail includes a targeted subject line that begins with “INLS 490.” Please use complete sentences and professional language in your e-mail.

For more complicated questions or help, make an appointment to talk with me.

You are welcome to call me (Sayamindu) by my first name (“Sayamindu” – pronounced “Shayomindoo”). However, you may also use “Dr. Dasgupta” or “Professor Dasgupta” if that is more comfortable for you. Any one of those is fine.

Late work

Please avoid late submissions, i.e., submissions made after a deadline. Depending on the circumstances, late submissions will be penalized by points being deducted. If you feel that you will be unable to meet a deadline, contact me prior to the deadline.

Academic integrity

The UNC Honor Code states that:

It shall be the responsibility of every student enrolled at the University of North Carolina to support the principles of academic integrity and to refrain from all forms of academic dishonesty…

This includes prohibitions against the following:

All scholarship builds on previous work, and all scholarship is a form of collaboration, even when working independently. Incorporating the work of others, and collaborating with colleagues, is welcomed in academic work. However, the honor code clarifies that you must always acknowledge when you make use of the ideas, words, or assistance of others in your work. This is typically accomplished through practices of reference, quotation, and citation.

If you are not certain what constitutes proper procedures for acknowledging the work of others, please ask the course staff for assistance. It is your responsibility to ensure that the honor code is appropriately followed. The UNC Office of Student Conduct provides a variety of honor code resources.

The UNC Libraries has online tutorials on citation practices and plagiarism that you might find helpful.

Use of Amazon Web Services (AWS) for course technology

This course uses Amazon Web Services (AWS) for some of its underlying technology.

The specific server used in this course operates in a UNC-managed AWS virtual private cloud. While the course server is not physically located on campus, it uses a private IP address that is not accessible through the public internet. Furthermore, connections to the course server are restricted to campus and UNC VPN, and login access is only available to students, the course staff, and UNC information technology support staff.

Students enrolled in this course must acknowledge and consent to the following:

  1. Students must use this AWS environment to complete required course assignments.
  2. Students must agree not to upload or publish any sensitive data in this specific AWS environment.

University resources and services

Accessibility resources

The University of North Carolina at Chapel Hill facilitates the implementation of reasonable accommodations, including resources and services, for students with disabilities, chronic medical conditions, a temporary disability or pregnancy complications resulting in barriers to fully accessing University courses, programs and activities.

Accommodations are determined through the Office of Accessibility Resources and Service (ARS) for individuals with documented qualifying disabilities in accordance with applicable state and federal laws. See the ARS Website for contact information: https://ars.unc.edu or email .

Counseling and psychological services

Counseling and Psychological Services (CAPS) is strongly committed to addressing the mental health needs of a diverse student body through timely access to consultation and connection to clinically appropriate services, whether for short or long-term needs. Go to their website: https://caps.unc.edu/ or visit their facilities on the third floor of the Campus Health Services building for a walk-in evaluation to learn more.

Dealing with discrimination, harassment, violence or exploitation

Any student who is impacted by discrimination, harassment, interpersonal (relationship) violence, sexual violence, sexual exploitation, or stalking is encouraged to seek resources on campus or in the community. Please contact the Director of Title IX Compliance (Adrienne Allison – ), Report and Response Coordinators in the Equal Opportunity and Compliance Office (), Counseling and Psychological Services (confidential), or the Gender Violence Services Coordinators (; confidential) to discuss your specific needs. Additional resources are available at https://safe.unc.edu.

Acknowledgement

This syllabus builds on the Community Data Science Course taught by Benjamin Mako Hill and Tommy Guy at the University of Washington. You can find their courses and material at https://wiki.communitydata.science/Workshops_and_Classes

Parts of this syllabus also draw from material developed for INLS 201 (taught by Prof. Melanie Feinberg) and INLS 560 (taught by Prof. David Gotz).


  1. Python code and/or data does not count toward the word limit.↩︎