Data Science (DATA)
DATA 50000 - Mathematics for Data Scientists (3)
Differentiation and integration of functions; basic matrix operations; linearization; linear and nonlinear optimization techniques; clustering and similarity measures, introduction to probability and statistics, basic computational algorithms. Includes frequent illustration of concepts using mathematical computation tools.
DATA 50100 - Probability and Statistics for Data Scientists (3)
This course covers aspects of probability theory and statistical analysis used in data science. Students will study elementary probability theory, basic combinatorics, conditional probability and independence, Bayes’ rule, random variables, mathematical expectation, discrete and continuous distributions, estimation theory, and tests of hypotheses. This course requires the use of statistical computing with the R programming language for solving sample problems.
Prerequisite: DATA 50000 (may be taken concurrently) or MATH 51000 (may be taken concurrently)
DATA 51000 - Data Mining and Analytics (3)
This course covers techniques for knowledge extraction in very large-scale data. Students will learn how to analyze real-world datasets using different data mining techniques like document similarity detection, association rule mining, clustering, link analysis, and predictive modeling. Topics also include applications for e-advertising and recommendation systems.
Prerequisite: (CPSC 50200 (may be taken concurrently) or DATA 50000 (may be taken concurrently) or MATH 51000 (may be taken concurrently) ) and (CPSC 50100 (may be taken concurrently) or DATA 51100 (may be taken concurrently) or CPSC 51100 (may be taken concurrently) )
DATA 51100 - Statistical Programming (3)
Programming structures and algorithms for large-scale statistical data processing and visualization. Students will use commonly available data analysis software packages to apply concepts and skills to large data sets and will also develop their own code using an objectoriented programming language.
Prerequisite: CPSC 50100 (may be taken concurrently) or DATA 50100 (may be taken concurrently)
DATA 51200 - Multivariate Data Analysis (3)
This course explores statistical techniques for analysis of multivariate data. It covers exploratory factor analysis, multiple regression analysis, multiple discriminant analysis, logistic regression, multivariate analysis of variance and covariance, general linear models, and cluster analysis. Extensive use of statistical software is required.
Prerequisite: DATA 50100 (may be taken concurrently)
DATA 53000 - Data Visualization (3)
The theory and practice of visualizing large, complicated data sets to clarify areas of emphasis. Human factors best practices will be presented. Programming with advanced visualization frameworks and practices will be demonstrated and used in group programming projects.
Prerequisite: CPSC 50100 (may be taken concurrently) or CPSC 51100 (may be taken concurrently) or DATA 51100 (may be taken concurrently)
DATA 54000 - Large-Scale Data Storage Systems (3)
The design and operation of large-scale, cloud-based systems for storing data. Topics include operating system virtualization, distributed network storage; distributed computing, cloud models (IAAS, PAAS, and SAAS), and techniques for securing cloud and virtual systems.
Prerequisite: CPSC 50100 (may be taken concurrently) or CPSC 51100 (may be taken concurrently) or DATA 51100 (may be taken concurrently)
DATA 55000 - Supervised Machine Learning (3)
This course covers methods and theory related to generating predictive models from labeled datasets. Students will get introduced to computational learning theory, study algorithms for generating predictive models, perform feature selection and hyperparameter tuning, and learn how to evaluate model performance. Examples of supervised machine learning techniques covered in the course include naïve Bayes learning, logistic regression, decision tree induction, support vector machines, and deep neural networks. Other, recent developments and state-of-the art methods related to supervised learning may also be covered. Students will be required to write programs that demonstrate machine learning techniques on real-world datasets.
Prerequisite: (CPSC 50200 (may be taken concurrently) or DATA 50000 (may be taken concurrently) or MATH 51000 (may be taken concurrently) ) and (CPSC 50100 (may be taken concurrently) or CPSC 51100 (may be taken concurrently) or DATA 51100 (may be taken concurrently))
DATA 55100 - Unsupervised Machine Learning (3)
This course will survey leading algorithms for unsupervised learning and high dimensional data analysis. The first part of the course will cover clustering algorithms and generative models of high dimensional data, such as distance/similarity measures, k-means clustering, hierarchical clustering, Fuzzy C-Means (FCM), Possibilistic C-Means (PCM), Principal Components Analysis (PCA), and Linear Discriminant Analysis (LDA). The second part of the course will cover spectral methods for dimensionality reduction, including multidimensional scaling, spectral clustering, and manifold learning. The third part of the course will cover self-organizing maps (SOMs) as well as an introduction to semi-supervised learning. Other, recent developments and state-of-the art methods related to unsupervised learning may also be covered.
Prerequisite: or CPSC 50200 (may be taken concurrently) or DATA 50000 (may be taken concurrently)) or CPSC 50100 (may be taken concurrently) or DATA 51100 (may be taken concurrently))
DATA 55200 - Semantic Web (3)
Expressing relationships among items in a way that enables automated, distributed analysis in an application-independent way; text mining to derive meaning from semantic networks; algorithms for processing semantic networks; developing a web of things.
Prerequisite: (CPSC 50100 (may be taken concurrently)) and DATA 51100 (may be taken concurrently)
DATA 56000 - Neural Networks and Deep Learning (3)
This course will survey leading algorithms and methods for neural networks and deep learning. The first part of the course will cover the basics of neural networks through the backpropagation algorithm. Then, the fundamentals of neural networks will be discussed through radial-basis function (RBF) networks and restricted Boltzmann machines. The second part of the course will cover deep learning methods through recurrent neural networks, convolutional neural networks (CNNs), generative adversarial networks (GANs), and transformer networks. Other recent developments and state-of-the art methods related to deep learning such as attention mechanisms and pre-trained language models may also be covered.
Prerequisite: DATA 50000 and DATA 51100 or CPSC 50100
DATA 56600 - Digital Image Processing (3)
This course provides an introduction to basic concepts, methodologies, and algorithms of digital image processing focusing on the following two major problems concerned with digital images: image enhancement and restoration for easier interpretation of images, and image analysis and object recognition. Some advanced image processing and computer vision techniques (e.g., object detection and tracking or camera models and stereo vision) might also be studied in this course. The primary goal of this course is to lay a solid foundation for students to study advanced image analysis topics such as computer vision systems, biomedical image analysis, and multimedia processing and retrieval.
Prerequisite: (DATA 50000 (may be taken concurrently)) and (CPSC 50100 )
DATA 59000 - Data Science Project for Computer Scientists (3)
The capstone experience for students pursuing the Computer Science concentration in Data Science. Students will develop a solution for a real-world problem in data mining and analytics, document their work in a scholarly report, and present their methodology and results to faculty and peers.
DATA 59500 - Data Science Thesis Research (3)
In this course, students will work with a faculty advisor on research in the field of Data Science or its applications. The student will research open problems in data science, select a topic for their thesis, and implement novel solutions, which will be documented in a formal thesis. The course will require students to form a thesis committee and defend their thesis before graduating from the program. This course is meant to be repeated three times to fulfill the concentration requirements.
DATA 61000 - Advanced Data Mining and Prescriptive Analytics (3)
In this course, students will learn how to utilize advanced data mining techniques for use in improving decision making. The topics covered include generation of predictive models, optimal decision making, computational simulation systems, expert and recommendation systems.
Prerequisite: DATA 51000 (may be taken concurrently) and DATA 51100 (may be taken concurrently)
DATA 62500 - Data Mining for Cyber Security (3)
The application of Data Science techniques is of increasing importance in computer security. Data mining and machine learning algorithms are now extensively employed in detecting cyber-attacks, developing authentication methods that distinguish legitimate from illegitimate users, and testing the strength of existing security technologies. In this course, students will learn how to use data mining techniques to solve real-world security problems, processing datasets, training models, and deploying solutions to strengthen a system’s defenses.
Prerequisite: DATA 55000 (may be taken concurrently)
DATA 64000 - Data Engineering (3)
This course covers the theory and practice of Data engineering. Students will learn about the data engineer role, data lineage, privacy and governance. Students will learn how to stitch together various technologies to securely serve the needs of downstream data consumers such as analysts, data scientists, and machine learning engineers. Students will be introduced to popular tools and techniques, such as Nifi, Spark, GraphQL, ORC/Parquet, that are used by data engineers in the industry.
Prerequisite: DATA 54000 (may be taken concurrently) or CPSC 50900 (may be taken concurrently)
DATA 75000 - Network Data Analysis (3)
this course, students will learn how various types of network data can be represented and analyzed using link analysis, node centrality measures, community detection algorithms, and other techniques. Much of the course will be spent on applications of the techniques to specific types of networks, such as biological networks, transportation networks, and social networks. Network data represents the relationships between various entities (nodes) and their connections. These could be neurons in the brain (connectomes), airports connected by airline routes (transportation networks), or people connected by their relationships (social networks).By analyzing network data, we can determine important nodes and connections, discover community structures, and make predictions.
Prerequisite: DATA 51000 and DATA 51100
DATA 76000 - Spatial and Temporal Data Analysis (3)
This course introduces students to the methods and techniques used to analyze spatial and temporal data such as GPS traces, traffic conditions, stock prices, geographic information, and brain connectome. The course emphasizes uncertainty management, machine learning approaches, visualization techniques, and applied case studies. The course covers the unique challenges that arise when data is influenced by both spatial (positional and structural) and temporal (time-based) factors, and teaches students how to model, interpret, and visualize such data. Topics include spatial indexing, spatial clustering, spatial autocorrelation, time-series forecasting, trajectory computing, geospatial visualization, machine learning for spatial and temporal data, and applications in fields such as transportation, urban planning, epidemiology, and the understanding of human brain.
Prerequisite: DATA 51100
DATA 77000 - Recommender Systems (3)
This course teaches the underlying techniques for building such systems and their implementation. Topics include content-based recommendations, collaborative filtering, knowledge-based systems and hybrid systems, evaluation issues, implementation design, trustworthiness and fairness. Making recommendations is important in many current industries. Recommendation systems are built to provide optimal, personalized recommendations using large datasets about user preferences and item features.
Prerequisite: DATA 51000 and DATA 51100
DATA 78000 - Data Protection and Ethics (3)
This course provides an in-depth exploration of the technical and ethical aspects of data protection and privacy, emphasizing privacy-preserving algorithms and methods used in data science. Students will learn how to develop, apply, and evaluate techniques for protecting data privacy and security in machine learning, data analysis, and data sharing, while addressing ethical considerations. Key topics include data anonymity, data validity, differential privacy, homomorphic encryption, federated learning, algorithmic fairness, the Fair Information Principles (FIPs), and data privacy regulations. The course also covers practical approaches to mitigating privacy risks and ethical concerns in real-world data science applications.
Prerequisite: DATA 55000
DATA 79001 - Doctoral Project Proposal 1 (3)
In this course, doctoral students in data science will work with the instructor to choose an appropriate project and identify members that can serve on the project advisory committee. Students will work on developing a formal proposal and perform preliminary work on the project.
DATA 79002 - Doctoral Project Proposal 2 (3)
In this course, students will continue working on their proposed project and defend their proposal in front of their advisory committee.
Prerequisite: DATA 79001
DATA 79501 - Doctoral Project Implementation 1 (3)
the implementation of their project, in coordination with their advisory committee members and any external collaborators.
Prerequisite: DATA 79002
DATA 79502 - Doctoral Project Implementation 2 (3)
In this course, students will continue the implementation of their project, in coordination with their advisory committee members and any external collaborators.
Prerequisite: DATA 79501
DATA 79503 - Doctoral Project Implementation 3 (3)
In this course, students will complete the implementation of their project, in coordination with their advisory committee members and any external collaborators.
Prerequisite: DATA 79502
DATA 79900 - Doctoral Project Presentation (3)
In this course, students will work on documenting their work throughout the project and preparing a formal report. They will also defend their project with an oral presentation in front of the advisory committee.
Prerequisite: DATA 79503