Jump to content

Draft:Data Science Discovery

From Wikipedia, the free encyclopedia


Data Science Discovery is an open-access educational platform and course developed by the University of Illinois at Urbana-Champaign (UIUC). It serves as both a free online resource for data science education and the foundation for the university's STAT/CS/IS 107: Data Science Discovery course.[1]

Overview

[edit]

The platform was created by UIUC faculty members Wade Fagen-Ulmschneider (Computer Science) and Karle Flanagan (Statistics) to address the lack of high-quality, accessible online data science content.[2]

The platform consists of four main components:

  • Data Science Lessons
  • Data Science Guides
  • Data Science Datasets
  • Data Science MicroProjects

Background

[edit]

The Data Science Discovery platform and course trace their origins to early efforts by UIUC faculty to create accessible and innovative data science education. In 2016-2017, Wade Fagen-Ulmschneider piloted a small data visualization course called CS 205, which laid the groundwork for what would eventually become STAT/CS/IS 107: Data Science Discovery. Despite its initial success, CS 205 was paused due to the need for faculty to focus on core courses like CS 225 during a period of significant growth in computer science enrollment at UIUC.[3]

In 2019, as part of a broader campus initiative to expand data science offerings, the foundation for STAT/CS/IS 107 was established. This course was designed to make data science education widely accessible, with no prerequisites, and it quickly became a cornerstone of the university’s efforts to foster interdisciplinary engagement in data science. The course was also integrated into the university’s General Education program, ensuring it could reach students from a wide range of academic backgrounds.[1]

Academic Course

[edit]

As STAT/CS/IS 107, Data Science Discovery is offered as a 4-credit hour course at UIUC. The course satisfies the university's General Education Criteria for Quantitative Reasoning I and is cross-listed between the Statistics, Computer Science, and Information Sciences departments. The course emphasizes project-driven learning, with students performing hands-on analysis of real-world datasets while considering social issues such as privacy and design.[4]

Faculties

[edit]

The course is co-developed by Wade Fagen-Ulmschneider from the Department of Computer Science and Karle Flanagan from the Department of Statistics.

Prof. Fagen-Ulmschneider’s contributions draw from his expertise in systems, data visualization, and open-access education. His DISCOVERY platform and interactive visualizations, such as the 91-DIVOC COVID-19 project and UIUC GPA visualizations, emphasize engaging with data to inspire students[5].

Prof. Flanagan brings her focus on applied statistics and innovative teaching to the course, aiming to make data science accessible to a broad audience. Her experience as a teaching professor complements the course's interdisciplinary approach [6].

Course Composition

[edit]

The Data Science Discovery course is organized into six modules, each designed to provide foundational and advanced knowledge in data science concepts and practices. The curriculum emphasizes a hands-on approach using Python for data analysis, visualization, and predictive modeling.[7]

Module 1: Basics of Data Science with Python

[edit]

This module introduces the fundamentals of data science and Python programming. Topics include experimental design, data manipulation using DataFrames, and software version control with Git. Students learn to design experiments, explore subsets of data, and handle confounding variables.

Module 2: Exploratory Data Analysis

[edit]

In this module, students learn tools and techniques for analyzing and visualizing real-world datasets. The focus is on descriptive statistics, grouping data, and creating visual representations such as histograms and box plots.

Module 3: Simulation and Distributions

[edit]

This module delves into computational simulations, introducing students to common distributions such as the Normal Distribution. Concepts like the Law of Large Numbers are explored through practical Python programming exercises.

Module 4: Prediction and Probability

[edit]

Students examine probabilistic models to predict outcomes under uncertainty. The module covers foundational probability concepts, including Bayes' Theorem and multi-event probability rules, supported by Python implementations.

Module 5: Polling, Confidence Intervals, and the Normal Distribution

[edit]

This module explores sampling techniques, polling, and the statistical principles underlying hypothesis testing and confidence intervals. Students analyze how sampling bias and variability affect results and apply the Central Limit Theorem in Python.

Module 6: Towards Machine Learning

[edit]

The final module builds on prior knowledge to introduce machine learning techniques. Topics include correlation analysis, linear regression, and clustering, with a focus on using scikit-learn for practical implementations.

The course equips learners with a comprehensive skill set for advancing into specialized machine learning studies or professional roles in data science.

Educational Philosophy

[edit]

The platform maintains a commitment to being free and accessible to all learners, aligning with UIUC's land-grant mission. It emphasizes clear, understandable instruction through:[2]

  • Conversation-style "office hour" lectures
  • Written explanations
  • Example worksheets
  • Practice questions
  • Real-world applications across disciplines

Influence on Data Science Education

[edit]

The Data Science Discovery platform has significantly influenced data science education at UIUC and beyond. The U.S. Bureau of Labor Statistics projects a 36% growth in data science occupations from 2021 to 2031, highlighting the rising demand for professionals adept in handling and interpreting large datasets.[8]

At UIUC, the course serves as the foundation for the "X + Data Science" family of undergraduate degrees, which combine data science with other disciplines such as Astronomy, Accountancy, and Business. These interdisciplinary programs aim to prepare students to lead in the rapidly evolving digital landscape by blending domain-specific knowledge with core data science competencies.[9]

The platform's emphasis on inclusivity has also set a benchmark for other institutions. By reducing technical prerequisites and offering open-access materials, Data Science Discovery ensures that students from diverse academic backgrounds can gain essential data science skills. The project-driven nature of the course promotes hands-on analysis of real-world datasets while encouraging students to reflect on ethical issues such as privacy and algorithmic bias.[7][10]

Platform Development

[edit]

The initiative received support from:

  • Discovery Partners Institute (DPI)
  • College of Liberal Arts and Sciences
  • The Grainger College of Engineering

As of 2024, the platform receives thousands of daily page views and appears in over 10,000 Google search results daily.[9]

Future Goals

[edit]

Under the FY24 Investment for Growth Program, Data Science Discovery aims to:[9]

  • Generate 1,000,000 monthly page views
  • Create over 3,000 educational resources
  • Establish micro-credential opportunities for non-technical majors
  • Develop a campuswide program with the Center for Innovation in Teaching and Learning (CITL)
  • Support faculty in teaching courses with visible social impact
  • Expand accessibility and maximize student enrollment across undergraduate, graduate, and online programs


Responses, Review and Feedbacks

[edit]

The course attracts students from various academic backgrounds, contributing to an average GPA of 3.56, [11]

Instructors Wade Fagen-Ulmschneider and Karle Flanagan consistently receive high ratings on teaching review platforms, noted for their approachable teaching style and dedication to student success.[12]

References

[edit]
  1. ^ a b "Data Science Discovery". University of Illinois. Retrieved 2024-11-11.
  2. ^ a b "About Data Science Discovery". University of Illinois. Retrieved 2024-11-11.
  3. ^ "Data Science Discovery: A new course four-years in the making". r/UIUC. Retrieved 2024-12-01.
  4. ^ "STAT 107 Data Science Discovery". University of Illinois Course Catalog. Retrieved 2024-11-11.
  5. ^ "Fagen-Ulmschneider webpage". Retrieved 2024-11-11.
  6. ^ "Flanagan webpage". Retrieved 2024-11-11.
  7. ^ a b "Data Science Discovery". University of Illinois. Retrieved 2024-11-11.
  8. ^ "Interdisciplinary Education in Data Science". University of Illinois. Retrieved 2024-12-01.
  9. ^ a b c "FY24 Funded Investment for Growth Programs". University of Illinois Office of the Provost. Retrieved 2024-11-11.
  10. ^ "Data Science Discovery for a Broad Audience". University of Illinois. Retrieved 2024-12-01.
  11. ^ "UIUC GPA Distribution". UIUC GPA. Retrieved 2024-11-11.
  12. ^ "Professor Ratings". Rate My Professors. Retrieved 2024-11-11.