DSβ4XX: Harnessing Language Models for Data Science
 π   β’ Boston University 
 Course Overview
An advanced seminar exploring the use of large language models (LLMs) in data science workflows. Topics include prompt engineering, retrieval-augmented generation (RAG), automation of analysis and reporting, and the ethical implications of AI-assisted data science.
Proposed Course Design
Notes: Proposed and designed by me. Intended as an advanced elective for upper-level undergraduates and masterβs students interested in the intersection of GenAI and data science. Not yet offered.
Learning Objectives
Students will develop expertise in:
- Prompt Engineering: Designing effective prompts for data science tasks
 - RAG Systems: Building retrieval-augmented generation for domain-specific analysis
 - Automated Analysis: Using LLMs to streamline data exploration and reporting
 - Code Generation: Leveraging AI for data science programming assistance
 - Quality Assurance: Validating and verifying AI-generated insights
 - Ethical AI Use: Responsible practices for AI-assisted data science
 
Proposed Course Structure
- Hands-on Workshops: Practical experience with state-of-the-art LLMs
 - Project-based Learning: Students build AI-assisted data science workflows
 - Ethics Integration: Ongoing discussion of responsible AI practices
 - Industry Case Studies: Real-world examples of LLM use in data science
 - Technical Deep Dives: Understanding LLM capabilities and limitations
 
Key Technical Skills
- Prompt Design: Crafting effective prompts for different data science tasks
 - API Integration: Working with OpenAI, Anthropic, and other LLM providers
 - RAG Implementation: Building systems that combine LLMs with external knowledge
 - Workflow Automation: Using LLMs to streamline repetitive analysis tasks
 - Model Evaluation: Assessing LLM performance on data science problems
 - Fine-tuning: Adapting models for specific data science domains
 
Application Areas
- Exploratory Data Analysis: AI-assisted data exploration and hypothesis generation
 - Report Generation: Automated creation of data science reports and summaries
 - Code Documentation: Using LLMs to improve code readability and documentation
 - Data Cleaning: AI-assisted identification and correction of data quality issues
 - Insight Discovery: Leveraging LLMs to identify patterns and generate hypotheses
 
Ethical Considerations
- Bias and Fairness: Understanding how LLM biases affect data analysis
 - Transparency: Maintaining interpretability in AI-assisted workflows
 - Validation: Ensuring human oversight of AI-generated insights
 - Attribution: Properly crediting AI assistance in data science work
 - Privacy: Protecting sensitive data when using external LLM services
 
Course Innovation
First-of-its-kind course specifically focused on the practical integration of large language models into data science practice, addressing both technical implementation and ethical considerations.