Course information

Date and time
Tuesdays and Thursdays, 11:00AM-12:15PM
Location
Zoom (Online, synchronous)

Zoom meeting information will be shared directly with students through Sakai.

Course staff information

Instructor
Sayamindu Dasgupta
Office hours
Set up an online appointment. Note that you will have to be signed in through your UNC account to set up the appointment.

Overview

In a world that is increasingly driven by software and data, developing fluency with the basics of programming and data analysis is a crucial skill. This course will introduce basic programming and data science tools to give students the skills to use data to answer questions about local and online communities.

In particular, the class will cover the basics of the Python programming language, an introduction to web APIs including APIs from Wikipedia and Twitter, and will teach basic tools and techniques for data analysis and visualization. As part of the class, participants will learn to write software in Python to collect data from public datasets and web APIs and process that data to produce numbers, tables, and graphical visualizations that answer their questions.

The class will be built around student-designed independent projects. Every student will pick a question or issue they are interested in pursuing and will work with the instructor to build from that question toward an analysis of data that the student has collected using software they have written.

Please note that this course is designed for students with little or no prior programming experience. If you already consider yourself to be knowledgeable about programming, this is probably not the course for you. Furthermore, this introduction to programming is intentionally quick and dirty, and is focused on what you need to get things done. If you want to become a professional programmer, this is also probably not the right class. If you want to learn about programming so that you can more effectively answer questions with data by writing your own software and by managing and communicating more effectively with programmers, you are in the right place.

Objectives

At the end of the course, you will be able to:

Grading

You will be graded based on the following elements:

There is a total of 100 points.

Final grades will be assigned according to the following schedule for undergraduate students:

Grade Points
A 95 to 100
A- 90 to 94
B+ 87 to 89
B 84 to 86
B- 80 to 83
C+ 77 to 79
C 74 to 76
C- 70 to 73
D+ 67 to 69
D 60 to 66
F <60

Final grades will be assigned according to the following schedule for graduate students:

Grade Points
H 95 to 100
P 80 to 94
L 60 to 79
F <60

Assessment

Midterm

Question distributed: March 9
Due date: March 23
Submission: Sakai

For your midterm, you will have to write Python code to answer a few questions that will be set by me. This exercise will help us both understand where you are in terms of being able to write Python code that analyses data. The exercise will be based on examples that we have already covered in class. You will submit the exercise as a Jupyter notebook.

Project idea

Maximum length: 500 words
Due date: April 2
Submission: Sakai

In this assignment, you should identify communities or contexts that you are interested in as sources of data, along with a list of at least 3-4 questions you might be interested in answering for your final project. I am hoping that each of you will pick contexts that you are intellectually committed to and invested in (e.g., your town, or an online community that you participate in). You will be successful if you describe the scope of the problem and describe why you are interested in using the techniques you are learning in this class to tackle this problem.

I will give you feedback on these write-ups and will let you each know if I think you have identified a questions that might be too ambitious, too trivial, too broad, too narrow, etc.

Project proposal

Maximum length: 1000 words
Due date: April 16
Submission: Sakai

Building on your project idea assignment, you should describe the specific types of data you will collect, the steps you will take to collect the dataset, the limits and strength of these data for answering the question you have selected, and a description of the kinds of report and visualization you will make. An important step here is going to be framing your analysis. Why is this is an important question? Why do you care? What do we need to know (e.g., about the question, about underlying theories, about your business, about the topic, about the community) to understand this analysis? This will all need to be part of your final project.

I will give you feedback on these proposals and suggest changes or modifications that are more likely to make them successful or compelling and to work with you to make sure that you have the resources and support necessary to carry out your project successfully.

Final project

Presentation dates: See calendar below.
Paper due date: May 7, 12:00 PM

For your final project, I expect you to build on the first two assignments to describe what you have done and what you have found. I’ll expect every student to give both:

I expect that your reports will include text from the first two assignments and reflect comprehensive documentation of your project. Each project should include: (a) the description of the question and community you have identified and information necessary to frame your question, (b) a description of the how you collected your data, (c) the results.

A successful project will tell a compelling story and will engage with, and improve upon, the course material to teach an audience that includes me, your classmates, how to take advantage of programming with data more effectively. The very best papers will give us all a new understanding of some aspect of course material and change the way I teach some portion of this course in the future.

Paper and Code

Your final project should include detailed information on:

If you want inspiration for how people use data science to communicate this kinds of findings broadly and effectively, take a look at great sources of data journalism including Five Thirty Eight or The Upshot at the New York Times. Both of these publish an large amount of excellent examples of data analysis aimed at broader non-technical audiences like the ones you’ll be communicating with and quite a bit of their work is actually done using Python. A simple Five Thirty Eight story will include a clear question, a brief overview of the data sources and method, a figure or two plus several paragraphs walking through the results, followed by a nice conclusion. I’m asking you to try to produce something roughly like this.

Keep in mind that most stories on Five Thirty Eight are under 1000 words and I’m giving up to 2000 words to show me what you’ve learned. As a result, you should do more than FiveThirtyEight does in a single story. You can ask and answer more questions, you can provide more background, context, and justification, you can provide more details on your methods and data sources, you can show us more graphs, you can discuss the implications of your findings more. You to use the space I’ve given you to show off what you’ve done and what you’ve learned!

As you will submit a Jupyter notebook as your final paper, I will automatically get to see your code. Make sure that you also submit your data (if you use a copy) with your submission. However, I will not be emphasizing the quality or quantity of your code but rather the degree to which you have been successful at answering the substantive questions you have identified.

Presentation

Your presentation should do everything that your paper does and should provide me with a very clear idea of what to expect in your final paper. I’m going to you all feedback after your talk. This will be an opportunity for me to see a preview of your paper and give you a sense for what I think you can improve. It’s to your advantage to both give a compelling talk and to give me a sense for your project.

Weekly coding activities

Every other day from January 28th onward will be dedicated to a set of coding activities that will involve changing or adding to code related to the topic of the week. These coding activities will not be turned in and will not be graded.

In many cases, you will find yourself continuing to work beyond the class on these activities. Though these activities are not graded, if you do not complete these activities, you will face difficulties in class going forward. If you feel that you are having a tough time in completing these activities, I encourage you to set up a time with me so that I can help you complete them.

At the beginning of class on the subsequent day for each coding activity, I will go over the ways in which the activities can be completed correctly. I will also share the completed activities on Sakai.

Participation

The course relies heavily on participation. The material we’re going to be covering is significant in terms of volume and we’re going to be covering it quickly. It will be extremely difficult to make up any missed classes. Attendance will be the most important part of participation and missing class sessions will make it hard to be successful in the class. Participation will be graded according to these criteria:

Attendance
It is important for you to attend class. Please be seated and ready when class begins. If personal difficulties (serious illness, etc.) make attendance problematic, please consult with me so that we can make an appropriate plan.
Deportment
You should be attentive in class and respectful of your classmates and the instructor. Turn off cell phones and other devices that might disrupt class. Use laptops and other devices to support current course activities only.
Engagement
Engagement includes: participating in class activities; responding to discussion questions or other questions that I might ask during a lecture; actively listening and taking notes. I value all informed opinions and encourage you to share them.

Engagement will be weighted more heavily than attendance and deportment.

Resources & technology

Text book

We will follow the “Python for Everybody” text book for this course. A copy of the book will be made available in the Resources section in Sakai. You can also buy printed copies (if you prefer printed books) or a version for your e-book reader by following the links from the book’s website.

Zoom

We will be using Zoom to run the class. Each Zoom session will be recorded and uploaded to Sakai. All UNC students are eligible for a Zoom account through UNC—to install and sign up for Zoom, go to https://software.sites.unc.edu/zoom/.

The Zoom meeting URL and meeting ID is shared in Sakai in the resources section in a PDF file named zoom.pdf.

Sakai

Sakai will be used for assignments, forum discussions, and resources. The textbook for this course will be made available in the resources section of Sakai.

Jupyter notebooks

Although we will be using Python, you will not need to download and install Python on your own laptops. We will be using Jupyter notebooks to write programs. In order to use Jupyter notebooks, you will have to use a web-browser such as Mozilla Firefox or Google Chrome. I will share the link where you can sign in during class.

Calendar

Apart from weekly coding activities, there will be readings for some of the days. I will announce those in advance and share the material through Sakai’s resource section.

Every other day from January 28th onward will be dedicated to a set of coding activities that will involve changing or adding to code related to the topic of the week. These coding activities will not be turned in and will not be graded. At the beginning of class on the subsequent day for each coding activity, I will go over the ways in which the activities can be completed correctly. I will also share the completed activities on Sakai.

Date Topic
Tuesday, January 19 Introduction and logistics
Thursday, January 21 Introduction to programming
Tuesday, January 26 Introduction to data analysis and Jupyter
Thursday, January 28
Tuesday, February 02 Getting started with Python and Jupyter (part 1)
Thursday, February 04
Tuesday, February 09 Getting started with Python and Jupyter (part 2)
Thursday, February 11
Tuesday, February 16 Wellness day
Thursday, February 18 Class cancelled
Tuesday, February 23 First data set—baby names
Thursday, February 25
Tuesday, March 02 Data from Chapel Hill (part 1)
Thursday, March 04
Tuesday, March 09 Data from Chapel Hill (part 2)
Thursday, March 11
Tuesday, March 16 Visualizing data
Thursday, March 18 Wellness day
Tuesday, March 23
Thursday, March 25 Data from the web: Wikipedia (part 1)
Tuesday, March 30
Thursday, April 01 Data from the web: Wikipedia (part 2)
Tuesday, April 06
Thursday, April 08 Data from the web: Twitter (part 1)
Tuesday, April 13
Thursday, April 15 Data from the web: Twitter (part 2)
Tuesday, April 20
Thursday, April 22 Review and final project prep
Tuesday, April 27
Thursday, April 29 Final presentations part 1
Tuesday, May 04 Final presentations part 2

Policies

Syllabus changes

I may make changes to this syllabus, including due dates and topic covered. These changes will be announced as early as possible.

Instructor communication

For specific, concrete questions, e-mail is the most reliable means of contact for us. You should receive a response within a day or so, but sometimes it may take 2-3 days. If you do not receive a response after a few days, please follow up. Please keep this in mind when you are scheduling your own activities, especially those related to activities with due dates. If you wait until the day before a due date to ask me a clarification question, there is a good chance that you will not receive a response in time.

It is always helpful if your e-mail includes a targeted subject line that begins with “INLS 490.” Please use complete sentences and professional language in your e-mail.

For more complicated questions or help, make an appointment to talk with me.

You are welcome to call me (Sayamindu) by my first name (“Sayamindu” – pronounced “Shayomindoo”). However, you may also use “Dr. Dasgupta” or “Professor Dasgupta” if that is more comfortable for you. Any one of those is fine.

Late work

Please avoid late submissions, i.e., submissions made after a deadline. Depending on the circumstances, late submissions will be penalized by points being deducted. If you feel that you will be unable to meet a deadline, contact me prior to the deadline.

Academic integrity

The UNC Honor Code states that:

It shall be the responsibility of every student enrolled at the University of North Carolina to support the principles of academic integrity and to refrain from all forms of academic dishonesty…

This includes prohibitions against the following:

All scholarship builds on previous work, and all scholarship is a form of collaboration, even when working independently. Incorporating the work of others, and collaborating with colleagues, is welcomed in academic work. However, the honor code clarifies that you must always acknowledge when you make use of the ideas, words, or assistance of others in your work. This is typically accomplished through practices of reference, quotation, and citation.

If you are not certain what constitutes proper procedures for acknowledging the work of others, please ask the course staff for assistance. It is your responsibility to ensure that the honor code is appropriately followed. The UNC Office of Student Conduct provides a variety of honor code resources.

The UNC Libraries has online tutorials on citation practices and plagiarism that you might find helpful.

Use of Amazon Web Services (AWS) for course technology

This course uses Amazon Web Services (AWS) for some of its underlying technology.

The specific server used in this course operates in a UNC-managed AWS virtual private cloud. While the course server is not physically located on campus, it uses a private IP address that is not accessible through the public internet. Furthermore, connections to the course server are restricted to campus and UNC VPN, and login access is only available to students, the course staff, and UNC information technology support staff.

Students enrolled in this course must acknowledge and consent to the following:

  1. Students must use this AWS environment to complete required course assignments.
  2. Students must agree not to upload or publish any sensitive data in this specific AWS environment.

University resources and services

Accessibility resources

The University of North Carolina at Chapel Hill facilitates the implementation of reasonable accommodations, including resources and services, for students with disabilities, chronic medical conditions, a temporary disability or pregnancy complications resulting in barriers to fully accessing University courses, programs and activities.

Accommodations are determined through the Office of Accessibility Resources and Service (ARS) for individuals with documented qualifying disabilities in accordance with applicable state and federal laws. See the ARS Website for contact information: https://ars.unc.edu or email .

Counseling and psychological services

Counseling and Psychological Services (CAPS) is strongly committed to addressing the mental health needs of a diverse student body through timely access to consultation and connection to clinically appropriate services, whether for short or long-term needs. Go to their website: https://caps.unc.edu/ or visit their facilities on the third floor of the Campus Health Services building for a walk-in evaluation to learn more.

Dealing with discrimination, harassment, violence or exploitation

Any student who is impacted by discrimination, harassment, interpersonal (relationship) violence, sexual violence, sexual exploitation, or stalking is encouraged to seek resources on campus or in the community. Please contact the Director of Title IX Compliance (Adrienne Allison – ), Report and Response Coordinators in the Equal Opportunity and Compliance Office (), Counseling and Psychological Services (confidential), or the Gender Violence Services Coordinators (; confidential) to discuss your specific needs. Additional resources are available at https://safe.unc.edu.

Acknowledgement

This syllabus builds on the Community Data Science Course taught by Benjamin Mako Hill and Tommy Guy at the University of Washington. You can find their courses and material at https://wiki.communitydata.science/Workshops_and_Classes

Parts of this syllabus also draw from material developed for INLS 201 (taught by Prof. Melanie Feinberg) and INLS 560 (taught by Prof. David Gotz).


  1. Python code and/or data does not count toward the word limit.↩︎