admin-tools for Processing Student Enrolment Data

Introduction

The school gets a nightly dump of enrolment information about all students currently associated with our school or msor (either enrolled in one of our courses or declared as one of our majors). This is processed into a tab separated file that enables easy unix-based analysis of the data. Each file contains the full enrolment history of each student, and the school keeps a database of past files (each day for the past week, last friday of the month for every month).

Each line of the file contains the details of the enrolment of a student in a course, along with additional details about the student (eg, degrees, majors, gender, international/domestic). The first field of the line is the student ID, the fields are tab separated, and the file is sorted by student ID.

Tools

There are some standard tools for querying this data. There are some additional tools that pondy has created that do some additional useful queries. To be able to use these tools:

  • type need admin-tools at your shell to get access to the standard tools

  • add ~pondy/bin/admin-tools to your path get pondy's extensions.

The basic query tool is students

  • students
    will list all the fields of the file, along with their index (needed for some queries)
    (note, this is pondy's version of the standard enrolments command. It is roughly the same as the enrolments -p, but does some additional cleaning up, (for example, adding a comma to A, B, and C, in the grades field so that grades sort properly with unix sort).

  • students query-terms
    • lists all the entries in data file that match the query terms.
    • no arguments will list the fields in the data file
    • a query term can either be a pattern to match in the whole line, or it can be preceded by a field number. eg
    • students comp102 2:2014 3:1 "\sbehons\s"
      will list all the course enrolment lines for students taking comp102 in 2014 in trimester 1 and who are enrolled for a BE.

  • students-single query-terms
    lists all the entries for each student that has any course enrolment matching all the query terms. (ie, the complete history for each such student)

  • students-multi query-terms
    Lists the complete history for each student who has course enrolments matching each of the query terms.

All three queries take flag arguments to specify a past year (-y YYYY) or a specific date (-d YYYY-MM-DD)

You can pipe the results of these query (or anything else) through the following:

  • project - field-numbers
    selects just the specified fields in the order specified

  • select - query-terms
    performs the same query as the student command above, but on its input rather than the original data file
    . (students uses select). [use selectdb if you are on a bash shell]

  • joinup -
    given several lines with the same value in their first field, puts all the remaining fields of all the lines onto a single line: eg
           a    1
           a    2
           b    3
           b    5
           b    6
    
    turns into
           a    1       2
           b    3       5       6
    

  • unjoinup -
    does the reverse of joinup

  • students-tt query-terms
    lists the courses for the current year of students matching the query terms, and then lists the timetable of all the courses

  • students-count query-terms
    enrolment summary of courses matching the query terms

  • paf query-terms
    gives a summary of the points and courses for the enrolments matching the query terms. Mosty useful only when the query terms are selecting all the courses of a single student (eg their ID number)

Two other useful databases:

  • rooms query-terms
    list the entries in the rooms file that match the terms

  • tt query-terms
    lists all the entries in the current year's timetable that match the terms