Final Project Part 1Research Proposal and Data Introduction
Due: October 20, 2022 by 11:59PM ET
Goal of the Assessment:
The purpose of this assessment is two parts. First, to give you a head start with your final
project by finding an area of interest to study and real-world data to work with. Next, to
research a little into your area of interest to see what has been accomplished surrounding
your question and to highlight the importance of your proposal.
The steps involved in completing this assignment encompass the general process of proposing
a research question and will form the basis for a solid introduction section for your final
project report (Part 3). Completing this assignment will also give you the chance to think about
the appropriateness of linear regression as a tool for answering your proposed research
question using your chosen data. Lastly, this assignment provides an opportunity to get some
feedback on your writing and research question that can be used to improve your final report.
Instructions:
1. Decide on one (or a few possible) areas of interest that you may want to explore.
These areas of interest can be anything that matters or is of interest to you. Some
examples could be (but are certainly not limited to) sports, medicine, public health,
economics, video games, literature, etc. Pick something that you really care about.
2. Next, think about possible research questions you may want to study in these areas.
What do you want to know about this area? You want to make sure that your question
can be answered/studied using linear regression models. So, you’ll want to frame your
question to be something related to modelling a relationship or predicting a value
based on this relationship. You’ll also want to consider whether the variable of interest
would allow the assumptions of linear regression to hold (see Module 3 content). See
the workshop slides from September 23 for advice on framing your research question
effectively.
3. After producing a research question, you will need to find some open-source data that
you may use in your data analysis. You want to make sure that the data you find has
both: 1) your response variable of interest (or has variables that could be used to
create that variable), and 2) any other variable you may want to use as predictors. By
looking for data online, you may realize you need to modify your research question
slightly or pick another one if you can’t quite find the data you’re looking for.
Alternatively, you can stick with your research question, but be sure to mention that
you expect there to be many limitations to the dataset because it doesn’t quite meet
your needs. Step 4 can also help you decide what predictors might be needed for you
to answer your question.
1
Examples of open data sources:
o https://open.toronto.ca for open-data from Toronto
o https://data.ontario.ca for open-data from Ontario
o https://www150.statcan.gc.ca/n1/en/type/data?MM=1 for data collected by
Statistics Canada
o https://sports-statistics.com/sports-data/ for various sports-related datasets
o https://data.oecd.org for data on various country-level variables
o https://mdl.library.utoronto.ca for links to many other data portals through
the University of Toronto library
4. Once you’ve found your dataset and have decided on your research question (or you
can work on steps 2-4 simultaneously and use what you find in all of them to finalize
your research question), you need to look at what others have studied in relation to
your research question. Do a quick search on the University of Toronto library website
or other databases that feature scholarly articles (see workshop slides from Sept. 23)
to learn about anything related to your area of interest and research question. Look
for academic papers (i.e., peer-reviewed work that has been published in reputable
scholarly journals, not websites, blogs, or news articles, etc.) that studied the same
research question or something related, that tells you more about what you may need
to consider in your analysis. Also use the academic papers to justify why your research
question is important.
o Focus on giving your reader a rough idea of how many academic papers have
studied your research topic (or closely related concepts to your topic). This
process of looking at the number of academic papers which describe a specific
topic tells your audience how popular the area of research is and how much
research has been done.
o Give examples from a few important papers about what was found or
discovered to be important in relation to your question. This can be important
variables, important results, surprising results, etc. The process of identifying
and describing important papers tells your audience that you are aware of prior
results and that you will be using these to plan your analysis.
o Think about how your research question fits into the general area of research
about your topic. Is your research question different to research questions in
other studies? If so, how? A novel research question consists of something
that nobody has studied before, or studied in the way you are looking at, or in
the population you hope to examine. The process of examining if your research
question is novel tells your audience that you see the importance of what you
are researching and can frame it against what has already been done.
Library resources:
o https://guides.library.utoronto.ca/librarysearchtips/gettingstarted for more
details about searching for articles related to your question
o https://guides.library.utoronto.ca/citing for details about why and how to
cite your references
o https://guides.library.utoronto.ca/c.php?g=251103&p=1673071 for help
getting the correct citation format
2
5. Lastly, perform a short exploratory data analysis of your chosen dataset. You will want
to focus on identifying anything that you may need to consider moving forward. This
includes identifying:
a. skews,
b. statistical outliers,
c. variables with high spread or observations that don’t make sense,
d. missing data
For section 5, you want to make sure you specifically mention the presence of any of
the characteristics in 5a-d (or lack thereof) and what this means for the analysis you
will eventually perform. For example, this may include describing how any of the
characteristics in 5a-d might cause problems (or not) with the results of linear
regression or generalizability. You will need to present numerical and/or graphical
summaries describing the variables. Choose the options that highlight the features of
the data that you want to point out but will also let your reader clearly understand
the data that you will be working with.
Guidelines for Picking a Dataset
o Government data portals often contain many datasets about diverse topics – if one
dataset doesn’t have all the variables you might want to consider, feel free to
combine different datasets together
o Just make sure that each unit being measured is the same in both datasets (i.e.
it’s reasonable that both measurements are on the same unit)
o There are many data repositories online – if you find a dataset there that is of interest
to you, you MUST ensure that your question is different than what the dataset was
originally used for.
o YOU MAY NOT use any dataset that is part of any R package or library, or that is
contained in a textbook. If you’re not sure, please ask the instructor or one of the
TAs.
o You will need to make sure you have enough variables to be able to showcase the
statistical methods that you will learn later in the course. This is because your final
project replaces a final exam and so the teaching team needs to assess your
knowledge on all topics covered in the course. Some topics the teaching team will
require include model validation and model refinement so please ensure your
dataset has at least 5 predictor variables.
o You will also need to make sure you have enough observations to be able to validate
your model, which will involve splitting your dataset into two roughly equal parts.
o A good rule of thumb for a minimum number is to have about 10
observations per variable in each half of your dataset (e.g. 6 predictors x 10
observations/variable x 2 halves of the dataset = 120 observations in total)
3
Proposal Content Requirements:
Your proposal should be created to satisfy the following requirements:
o The proposal should be organized clearly (consider using headings or sections) and
include the following information:
a. Your research question, why you chose it (i.e., why it’s of interest to you), and
why it may be of interest to others.
b. Summaries of academic papers related to your question or topic, highlighting
similarities/differences to what you propose, and how you will incorporate this
knowledge into your model/project.
c. Details and summaries on your chosen dataset including the variables
collected, the number of observations and anything that stands out in the data
that would need to be addressed/investigated further in your analysis.
d. A discussion about how and why a linear model fits your chosen data. This will
allow you to answer your proposed research question, as well as whether you
anticipate any problems that may arise in your analysis from EDA.
e. References for where you located the data, and your background research on
your topic
o The proposal should be written/presented for an audience that has some statistics
background but is not necessarily familiar with the area of your research question or
linear regression models,
o The proposal should contain figures and/or tables with proper labels/titles as
appropriate in your Data Description – Exploratory Data Analysis section,
o The proposal should have references listed in proper APA format, and
o The proposal itself should not contain R codes
Technical Requirements:
Your submission to Quercus should include the following:
1. A video that presents your proposed research area and question, the dataset you have
chosen, and the exploration of your dataset.
o The video should be no more than 5 minutes in length
o You must display your U of T Student ID card (or other valid government-issued
photo ID) at the beginning of your video The presenter’s face must be visible
throughout the video
o The presentation should include an appropriate visual medium (e.g., slides) to
display important information in an easily readable way.
o The video should be hosted on a video-sharing service (e.g., MS Streams,
MyMedia are supported by the university)
2. The proposed dataset you will use in your Final Project, as a csv or xlsx file, or if too
large, as a link to cloud storage where the dataset is saved in csv or xlsx.
3. A copy of the slides/visual aids used in your presentation saved as a PDF document.
4. The R Markdown file containing the code used to produce your exploratory data
analysis and tables/figures.
4
How to Upload:
o Link to Video Presentation – add as a comment to your submission
o Instructions for uploading to MS Stream: https://learn.microsoft.com/enus/stream/portal-upload-video
o Instructions for uploading to MyMedia: https://itoengineering.screenstepslive.com/s/ito_fase/a/1291600-how-do-i-upload-avideo-or-audio-file-to-mymedia
o Both require you to log in with your UofT credentials.
o R Markdown File – as a file upload on Quercus
o Slides used in Presentation – as a file upload on Quercus
o Chosen Dataset – either as a file upload OR as a comment to your submission (best
option if the file is large)
5
Characteristics
Introduction –
Clarity of Research
Question
(i.e., what are you
studying)
Introduction –
Relevance or
Importance of
Research Question
(i.e., why does it
matter to you
personally and to
others)
Background and
Literature –
Keyword Search
(i.e., how did you
search the
literature)
Background and
Literature –
Summary of
Relevant
Literature
Excellent
(3 points)
Good Effort
(2 points)
Needs Improvement
(1 point)
The proposed research
question is explicitly stated
with no ambiguity as to the
variables being considered
The proposed research question
is stated explicitly but it is
somewhat ambiguous which
variables are being considered
The proposed research
question is stated but is very
confusing, and it is highly
ambiguous which variables are
being considered
No explicit research
question provided
The question/topic is
original, and the student
has conveyed why the
question is important to
both themselves personally
and to a wider audience.
The question/topic is reasonably
original and/or the student has
not explicitly conveyed why the
question is important to both
themselves personally and to a
wider audience
The question/topic lacks
originality (e.g., a
common/well-known
dataset/question) and/or the
student has not explicitly
conveyed why the question is
important to both themselves
personally and to a wider
audience
No explanation why the
question is important
to both themselves
personally and to a
wider audience
The student has used
keywords appropriate to
the scope of their work to
search for peer-reviewed
publications in their chosen
area and has provided
information on the number
of search hits and the
databases searched.
The student has searched for
peer-reviewed publications in
their chosen area, however,
either the keywords used were
not entirely appropriate for the
scope of their work or the
number of search hits or
databases searched were not
reported.
The student has searched for
peer-reviewed publications in
their chosen area, however,
keywords used were either not
reported or not appropriate
for the scope of their work and
the number of search hits or
databases searched were not
reported.
The student has not
searched for peerreviewed publications,
has not reported
keywords used to
search literature, and
has not reported the
number of search hits
nor the databases
searched.
The student provided a summary
of only 2 peer-reviewed papers
relevant to their chosen area. The
summary of each, brief and in
their own words, discussed the
research question, the main
results/conclusion, and how the
study arrived at this conclusion.
The student provided a
summary of only 1 peerreviewed paper relevant to
their chosen area. The
summary of this paper, brief
and in their own words,
discussed the research
question, the main
The student provided a
summary of 3 peerreviewed papers relevant to
their chosen area. The
summary of each, brief and
in their own words,
(i.e., what has been discussed the research
done before)
question, the main
Missing or Incomplete
(0 points)
The student has not
provided a summary of
any peer-reviewed
paper relevant to their
chosen area.
results/conclusion, and how Or, if 3 peer-reviewed papers
the study arrived at this
were discussed, at most one
conclusion.
summary lacked at least one of
the above characteristics.
Background and
Literature –
Relevance to
Current Work
(i.e., how will you
use existing
knowledge in your
project)
Data Description –
Variables Chosen
(i.e., what does
your dataset
contain)
Data Description Exploratory Data
Analysis (EDA)
(i.e., what does
your data look like
and are there any
problems)
results/conclusion, and how
the study arrived at this
conclusion. Or, if 2+ peerreviewed papers were
discussed, at most two
summaries lacked at least one
of the above characteristics.
For each paper
summarized, the student
explicitly states how the
paper informed 1) their
research question, and 2)
their choice of dataset
and/or the variables to be
considered, while
highlighting where their
project will be similar
and/or different to what
was done before.
For each paper summarized, the
student either explicitly states
how the paper informed 1) their
research question, or 2) their
choice of dataset and/or the
variables to be considered (but
not both 1 and 2), while explicitly
highlighting where their project
will be similar and/or different to
what was done before.
On some papers summarized,
the student either explicitly
states how the paper informed
1) their research question,
and/or 2) their choice of
dataset and/or the variables
considered, or explicitly
highlights where their project
will be similar and/or different
to what was done before, but
not all papers are discussed.
There is no explicit
commentary on how
existing work informed
current practice.
The student explicitly states
where the dataset was
taken from online,
describes what each
variable represents and, if
not all variables will be
used, justifies each.
The student lacks only one of the
following: explicitly states where
the dataset was taken from
online, describes what each
variable represents, and, if not all
variables will be used, justifies
each.
The student lacks two of the
following: explicitly stating
from where the dataset was
taken online, describing each
variable present, and, if not all
variables will be used,
justifying each.
The student lacks all
the following: explicitly
stating from where the
dataset was taken
online, describing each
variable present, and, if
not all variables will be
used, justifying each.
1) Univariate
numerical/visual summaries
are provided for all the
chosen variables, and 2)
The student correctly
highlights important data
characteristics of note in
the data and speculates on
1) Univariate numerical/visual
summaries are provided for most
chosen variables and/or 2) The
student correctly highlights the
most important data
characteristics of note in the data
and speculates on impact to
generalizability/ability to make
1) Univariate numerical/visual
summaries are provided for
few chosen variables and/or 2)
The student does not correctly
highlight important
characteristics of note in the
data and speculates on impact
to generalizability/ability to
No univariate
numerical/visual
summaries are
provided for the
chosen variables and
the student does not
provide commentary
Discussion – Plan
for Answering
Research Question
(i.e., how will a
regression model
answer your
question)
Discussion Assumptions of
Linear Models
(i.e., why would a
linear model make
sense (statistically)
to answer your
question)
impact to
generalizability/ability to
make reliable
inference/properties of
estimators.
The student has explicitly
stated both the
role/function each variable
chosen from the dataset
will play in the model and
how a linear regression
model is an appropriate
tool to answer the research
question.
The student has correctly
discussed whether the EDA
informally supports the
assumptions of linear
regression, with each
assumption being
addressed individually.
All the below requirements
were met:
Presentation
Quality – Visual
Aids
(i.e., was the
information
presented well
visually)
1. Slides/visual aids were
neither overly cluttered nor
overly sparse.
2. Tables and figures were
used effectively to display
information.
3. Tables and figures were
appropriately labelled.
4. No R code or R output
was included.
reliable inference/properties of
estimators.
make reliable
inference/properties of
estimators.
on characteristics of
note in the data.
Either the student has explicitly
stated the role/function each
variable chosen from the dataset
will play in the model but how a
linear regression model is an
appropriate tool to answer the
research question is not made
explicit, or vice versa.
The student has both stated
the role/function each variable
chosen from the dataset will
play in the model and how a
linear regression model is an
appropriate tool to answer the
research question, but either
both are not explicit and clear,
or are incorrect.
The student has not
provided any
information about the
roles of variables or the
appropriateness of
linear regression.
The student has either not
addressed each assumption
individually or has incorrectly
determined that the EDA
informally supports the
assumptions of linear regression.
The student has both not
addressed each assumption
individually and has incorrectly
determined that the EDA
informally supports the
assumptions of linear
regression.
The student has not
included any discussion
at all regarding the
assumptions of linear
regression or has not
used the EDA to
informally assess the
assumptions.
At most two of the below
requirements were not met:
1. Slides/visual aids were neither
overly cluttered nor overly
sparse.
2. Tables and figures were used
effectively to display information.
3. Tables and figures were
appropriately labelled.
4. No R code or R output was
included.
At most four of the below
requirements were not met:
1. Slides/visual aids were
neither overly cluttered nor
overly sparse.
2. Tables and figures were
used effectively to display
information.
3. Tables and figures were
appropriately labelled.
4. No R code or R output was
included.
Either no visuals were
used, or more than four
of the below
requirements were not
met:
1. Slides/visual aids
were neither overly
cluttered nor overly
sparse.
2. Tables and figures
were used effectively
to display information.
Presentation
Quality –
Video/Submission
Requirements
(i.e., was the
information
provided
appropriately)
5. Grammar and spelling
errors were minimized.
6. References were
provided at the end and
were cited appropriately
during the presentation.
7. Information on the slides
was easily digestible during
the time on screen.
5. Grammar and spelling errors
were minimized.
6. References were provided at
the end and were cited
appropriately during the
presentation.
7. Information on the slides was
easily digestible during the time
on screen.
All the below requirements
were met:
At most two of the below
requirements were not met:
1. Video length was no
more than 5 minutes.
2. Student displayed their
student ID card (or other
valid ID).
3. Student was visible
throughout the
presentation.
4. Student tried to speak to
the correct audience.
5. Student spoke at a
reasonable pace and was
easy to understand.
6. Student did not simply
read their slides/visual aids.
1. Video length was no more than
5 minutes.
2. Student displayed their student
ID card (or other valid ID).
3. Student was visible throughout
the presentation.
4. Student tried to speak to the
correct audience.
5. Student spoke at a reasonable
pace and was easy to understand.
6. Student did not simply read
their slides/visual aids.
7. All components as required in
instructions were submitted to
Quercus in the correct format.
5. Grammar and spelling errors
were minimized.
6. References were provided
at the end and were cited
appropriately during the
presentation.
7. Information on the slides
was easily digestible during the
time on screen.
At most four of the below
requirements were not met:
1. Video length was no more
than 5 minutes.
2. Student displayed their
student ID card (or other valid
ID).
3. Student was visible
throughout the presentation.
4. Student tried to speak to
the correct audience.
5. Student spoke at a
reasonable pace and was easy
to understand.
6. Student did not simply read
their slides/visual aids.
7. All components as required
in instructions were submitted
3. Tables and figures
were appropriately
labelled.
4. No R code or R
output was included.
5. Grammar and
spelling errors were
minimized.
6. References were
provided at the end
and were cited
appropriately during
the presentation.
7. Information on the
slides was easily
digestible during the
time on screen.
More than four of the
below requirements
were not met:
1. Video length was no
more than 5 minutes.
2. Student displayed
their student ID card
(or other valid ID).
3. Student was visible
throughout the
presentation.
4. Student tried to
speak to the correct
audience.
5. Student spoke at a
reasonable pace and
was easy to
understand.
7. All components as
required in instructions
were submitted to Quercus
in the correct format.
to Quercus in the correct
format.
6. Student did not
simply read their
slides/visual aids.
7. All components as
required in instructions
were submitted to
Quercus in the correct
format.
Essay Writing Service Features
Our Experience
No matter how complex your assignment is, we can find the right professional for your specific task. Achiever Papers is an essay writing company that hires only the smartest minds to help you with your projects. Our expertise allows us to provide students with high-quality academic writing, editing & proofreading services.Free Features
Free revision policy
$10Free bibliography & reference
$8Free title page
$8Free formatting
$8How Our Dissertation Writing Service Works
First, you will need to complete an order form. It's not difficult but, if anything is unclear, you may always chat with us so that we can guide you through it. On the order form, you will need to include some basic information concerning your order: subject, topic, number of pages, etc. We also encourage our clients to upload any relevant information or sources that will help.
Complete the order form
Once we have all the information and instructions that we need, we select the most suitable writer for your assignment. While everything seems to be clear, the writer, who has complete knowledge of the subject, may need clarification from you. It is at that point that you would receive a call or email from us.
Writer’s assignment
As soon as the writer has finished, it will be delivered both to the website and to your email address so that you will not miss it. If your deadline is close at hand, we will place a call to you to make sure that you receive the paper on time.
Completing the order and download