Student performance

A data set of the attributes of 382 students in secondary education collected from two schools. The goal is to predict the grade in math and Portugese at the end of the third period. See the cited sources for additional information.

student

Format

382 observations from 13 variables represented as a list consisting of a binary factor response matrix y with two responses: portugese and math for the final scores in period three for the respective subjects. The list also contains x: a sparse feature matrix of class 'dgCMatrix' with the following variables:

school_ms: student's primary school, 1 for Mousinho da Silveira and 0 for Gabriel Pereira
sex: sex of student, 1 for male
age: age of student
urban: urban (1) or rural (0) home address
large_family: whether the family size is larger than 3
cohabitation: whether parents live together
Medu: mother's level of education (ordered)
Fedu: fathers's level of education (ordered)
Mjob_health: whether the mother was employed in health care
Mjob_other: whether the mother was employed as something other than the specified job roles
Mjob_services: whether the mother was employed in the service sector
Mjob_teacher: whether the mother was employed as a teacher
Fjob_health: whether the father was employed in health care
Fjob_other: whether the father was employed as something other than the specified job roles
Fjob_services: whether the father was employed in the service sector
Fjob_teacher: whether the father was employed as a teacher
reason_home: school chosen for being close to home
reason_other: school chosen for another reason
reason_rep: school chosen for its reputation
nursery: whether the student attended nursery school
internet: Pwhether the student has internet access at home

Source

P. Cortez and A. Silva. Using Data Mining to Predict Secondary School Student Performance. In A. Brito and J. Teixeira Eds., Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008) pp. 5-12, Porto, Portugal, April, 2008, EUROSIS, ISBN 978-9077381-39-7. http://www3.dsi.uminho.pt/pcortez/student.pdf

Dua, D. and Karra Taniskidou, E. (2017). UCI Machine Learning Repository http://archive.ics.uci.edu/ml. Irvine, CA: University of California, School of Information and Computer Science.

Preprocessing

All of the grade-specific predictors were dropped from the data set. (Note that it is not clear from the source why some of these predictors are specific to each grade, such as which parent is the student's guardian.) The categorical variables were dummy-coded. Only the final grades (G3) were kept as dependent variables, whilst the first and second period grades were dropped.

Format

Source

Preprocessing

Contents