This subject covers the theory and practice of modern statistical learning, regression and classification modelling. Techniques covered range from traditional model selection and generalised linear model structures to modern, computer-intensive methods including generalised additive models, splines and tree methods. Methods to handle continuous, ordinal and nominal response variables and assessment of fit via cross-validation and residual diagnostics are also considered. All techniques will be investigated via practical application on real data using the statistical software package R.
|Academic unit:||Bond Business School|
|Subject title:||Statistical Learning and Regression Models|
Delivery & attendance
|Attendance and learning activities:||Attendance at all class sessions is expected. Students are expected to notify the instructor of any absences with as much advance notice as possible.|
|Prescribed resources:|| |
|[email protected] & Email:||[email protected] is the online learning environment at Bond University and is used to provide access to subject materials, lecture recordings and detailed subject information regarding the subject curriculum, assessment and timing. Both iLearn and the Student Email facility are used to provide important subject notifications. Additionally, official correspondence from the University will be forwarded to students’ Bond email account and must be monitored by the student.|
To access these services, log on to the Student Portal from the Bond University website as www.bond.edu.au
Assumed knowledge is the minimum level of knowledge of a subject area that students are assumed to have acquired through previous study. It is the responsibility of students to ensure they meet the assumed knowledge expectations of the subject. Students who do not possess this prior knowledge are strongly recommended against enrolling and do so at their own risk. No concessions will be made for students’ lack of prior knowledge.
Assumed Prior Learning (or equivalent):
Possess demonstrable knowledge in the theory and application of simple and multiple linear regression models to the level of a unit such as ECON71-200 Linear Models and Applied Econometrics as well as basic data science concepts and techniques to the level of a unit such as DTSC71-200 Data Science
Assurance of learning
Assurance of Learning means that universities take responsibility for creating, monitoring and updating curriculum, teaching and assessment so that students graduate with the knowledge, skills and attributes they need for employability and/or further study.
At Bond University, we carefully develop subject and program outcomes to ensure that student learning in each subject contributes to the whole student experience. Students are encouraged to carefully read and consider subject and program outcomes as combined elements.
Program Learning Outcomes (PLOs)
Program Learning Outcomes provide a broad and measurable set of standards that incorporate a range of knowledge and skills that will be achieved on completion of the program. If you are undertaking this subject as part of a degree program, you should refer to the relevant degree program outcomes and graduate attributes as they relate to this subject.
Subject Learning Outcomes (SLOs)
On successful completion of this subject the learner will be able to:
- Demonstrate advanced knowledge of the limitations of linear regression models and the ability to develop an appropriate regression model.
- Evaluate and choose between a variety of regression models.
- Demonstrate advanced knowledge of the role of regularisation and the ability to use the concept to develop a variety of regression models.
- Apply regression models for limited dependent variables (i.e., binomial, ordered and count data).
- Apply generalised linear models, including proper use and assessment of model diagnostic techniques.
- Develop regression models utilising splines, additive models and tree-based methods.
- Correctly and concisely communicate the results and implications of a regression analysis in a professional written report.
|Capstone Project §||Group Project: Part 1 – Proposal for data-based analysis to be performed, as well as discussion of data collection issues and data summarisation.||10%||Week 5||1.|
|Capstone Project §||Group Project: Part 2 – Final report on proposed data-based analysis, including methodology discussion and conclusions regarding findings.||20%||Week 13||2, 3, 4, 5, 6, 7.|
|Computer-Aided Examination (Open)||Comprehensive Final Examination – Practical, computer-based analysis of provided datasets covering techniques presented to date (emphasis on techniques presented after Mid-semester)||45%||Final Examination Period||1, 2, 3, 4, 5, 6.|
|Computer-Aided Examination (Open)||Mid-semester Examination – Practical, computer-based analysis of provided datasets covering techniques presented to date.||25%||Week 7 (Mid-Semester Examination Period)||1, 2, 3.|
- § Indicates group/teamwork-based assessment
- * Assessment timing is indicative of the week that the assessment is due or begins (where conducted over multiple weeks), and is based on the standard University academic calendar
- C = Students must reach a level of competency to successfully complete this assessment.
|High Distinction||85-100||Outstanding or exemplary performance in the following areas: interpretative ability; intellectual initiative in response to questions; mastery of the skills required by the subject, general levels of knowledge and analytic ability or clear thinking.|
|Distinction||75-84||Usually awarded to students whose performance goes well beyond the minimum requirements set for tasks required in assessment, and who perform well in most of the above areas.|
|Credit||65-74||Usually awarded to students whose performance is considered to go beyond the minimum requirements for work set for assessment. Assessable work is typically characterised by a strong performance in some of the capacities listed above.|
|Pass||50-64||Usually awarded to students whose performance meets the requirements set for work provided for assessment.|
|Fail||0-49||Usually awarded to students whose performance is not considered to meet the minimum requirements set for particular tasks. The fail grade may be a result of insufficient preparation, of inattention to assignment guidelines or lack of academic ability. A frequent cause of failure is lack of attention to subject or assignment guidelines.|
For the purposes of quality assurance, Bond University conducts an evaluation process to measure and document student assessment as evidence of the extent to which program and subject learning outcomes are achieved. Some examples of student work will be retained for potential research and quality auditing purposes only. Any student work used will be treated confidentially and no student grades will be affected.
Students must check the [email protected] subject site for detailed assessment information and submission procedures.
Policy on late submission and extensions
A late penalty will be applied to all overdue assessment tasks unless an extension is granted by the subject coordinator. The standard penalty will be 10% of marks awarded to that assessment per day late with no assessment to be accepted seven days after the due date. Where a student is granted an extension, the penalty of 10% per day late starts from the new due date.
Policy on plagiarism
University’s Academic Integrity Policy defines plagiarism as the act of misrepresenting as one’s own original work: another’s ideas, interpretations, words, or creative works; and/or one’s own previous ideas, interpretations, words, or creative work without acknowledging that it was used previously (i.e., self-plagiarism). The University considers the act of plagiarising to be a breach of the Student Conduct Code and, therefore, subject to the Discipline Regulations which provide for a range of penalties including the reduction of marks or grades, fines and suspension from the University.
Feedback on assessment
Feedback on assessment will be provided to students within two weeks of the assessment submission due date, as per the Assessment Policy.
Accessibility and Inclusion Support
If you have a disability, illness, injury or health condition that impacts your capacity to complete studies, exams or assessment tasks, it is important you let us know your special requirements, early in the semester. Students will need to make an application for support and submit it with recent, comprehensive documentation at an appointment with a Disability Officer. Students with a disability are encouraged to contact the Disability Office at the earliest possible time, to meet staff and learn about the services available to meet your specific needs. Please note that late notification or failure to disclose your disability can be to your disadvantage as the University cannot guarantee support under such circumstances.
Additional subject information
The delivery of this subject will include the use of the R programming language, which is fully open-source. RStudio is the recommended front-end and is also freely available. A peer-evaluation system will be used in this subject to help determine the individual marks for all group assessments. As part of the requirements for Business School quality accreditation, the Bond Business School employs an evaluation process to measure and document student assessment as evidence of the extent to which program and subject learning outcomes are achieved. Some examples of student work will be retained for potential research and quality auditing purposes only. Any student work used will be treated confidentially and no student grades will be affected.
Introduction and revision of statistical modelling principles, including the importance of the bias-variance trade-off. Review of theory and application of least-squares multiple linear regression, including residual diagnostics, and re-acquaintance with the statistical software package R.
Introduces a number of variable selection methods for use in choosing an optimal subset of available independent variables. Methods are evaluated using goodness-of-fit ideas as well as cross-validated prediction errors. Both sequential and automated approaches will be covered.
Introduces model regularisation using ridge and LASSO regression. In addition, dimension reduction methods for the predictor space are considered, including principal components and partial least squares.
Introduces ordinary and weighted least-squares methods for dichotomous outcomes as well as likelihood based probit and logistic regressions. In addition, linear, quadratic and logistic discrimination are introduced along with the concepts of sensitivity, specificity, ROC curves and confusion matrices.
Examines traditional models for dealing with count responses, including Poisson and negative binomial regression models. In addition, zero-inflated and zero-truncated models are considered and applied.
Introduces models for contingency table data, including multinomial models and ordered-response models such as proportional odds and proportional hazards regressions. Model structures for testing independence are discussed, and various possible likelihood structures are introduced and explained.
Explores the traditional exponential family-based generalised linear model (GLM) structure. Applications are investigated and residual diagnostics including assessment of over-dispersion are discussed. In addition, quasi-likelihood methods for handling over-dispersion are introduced. Further, influence diagnostics are introduced as is graphical investigation of model fit and structure.
Introduces the concept of splines and smoothing models. Both B-splines and P-splines are surveyed as are local regression techniques. In addition, these spline methods are extended to introduce generalised additive model (GAM) techniques as well as multivariate adaptive regression splines (MARS).
Methods of binary search and recursive partitioning tree models are explored. In addition, modern computer-intensive statistical learning techniques which derive from trees, such as random forests are introduced and these ideas extended to the more general concepts of boosting and bootstrap aggregation (BAGGing).