At a glance... | Syllabus | Models | Code | Lecturer

Read123456789: reading research papers

(For a video lecture on this page, see here.)

As technologies change, technologists need to continually update their technical knowledge. The problem with that is that reading all the latest research is very hard. Working through complex technical papers is a complex and technical task. For example, if you ask new graduate students to read ten papers in a particular sub-field:

It can take a full day to read the first paper.
But after reading ten papers, they can do it much faster.

Since reading is so important, the rest of this page offers:

Notes on [how to read papers, faster](#how-to-read-papers-faster).

A [reading assignment](#exercises-in-reading-faster) that lets students practice their skills.

The assignment can be used two ways:

For a fast-path assignment for newbies, lecturers could assign
- Part one
- Part two of
- Over a 2-3 week period.
For a more advanced and longer assignment, lecturers could assign parts
- Part one;
- Then part three (Note, not part two, yet);
- Followed by part two;
- Then part four;
- Spread over a one semester (one paper per week, then a month of open time till the end-of-semester essay in part four is due).

How to Read Papers, Faster

There are four keys to reading papers faster:

Rhetorical Strategies : Understanding the rhetorical strategies taken by the authors.
Terminology: Having a working background knowledge of the half-dozen key terms in a paper.
Context: Experts can read papers faster when they know of other work in the field and can place this new paper into the context of other work.
Feature extraction: Experts are experts at anything since they know what to look for, and what can be skipped over. This is true for many tasks, including reading:
- Experts do not read entire papers, word for word.
- Rather, they hunt and peek looking for certain key features (which we number below as 1 to 19).

Feature extraction, details

To put that last point another way, papers are not read for _repeatability_ (of the whole paper) but for _reusability_ of their parts. Technical papers are really a presentation of many connected technical concepts, some of which the reader will extract and apply to their own work. So we should not read papers so we can paint them again as beautiful complete works of art. Rather, we should treat them as a design of some complex product...

Which can be exploded into various parts...
... any of which might be repurposed in other areas.

To put that another way, we should not read papers but we should survey them, to

Map out their internal structure
To find and extract whatever parts might be useful to use.

Of course, once we find the (little) bits that we really want to use, then we might spend hours/days struggling to understand those (small) parts. But otherwise, we need to read over papers, not through them.

Here is a list of what we might find within a paper:

1.Motivational statements	reports or challenge statements or lists of open issues that prompt an analysis;
2.Hypotheses	Expected effects in some area;.
3.Checklists	Used to design the analysis (see also, the Checklist Manifesto ;.
4.Related Work	Comprehensive, annotated, and insightful (e.g. showing the development or open areas in a field);.
5.Study instruments	e.g. surveys interview scripts, etc;.
6.Statistical tests	Mathematical tools to analyze results (along with some notes explaining why or when this test is necessary);.
7.Commentary	About the scripts used in the analysis;.
8.Informative visualizations	e.g. Sparklines http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msgid=0001OR .
9.Baseline results	Results against which new work can be compared;.
10.Sampling procedures	e.g. ``how did you choose the projects you studied?'';.
11.Patterns	describing best practices for performing this kind of analysis; .
Anti-patterns	describing cautionary tales of ``gotchas'' to avoid when doing this kind of work;
12.Negative results	Anti-patterns, backed up by empirical results;.
13.Tutorial materials	Guides to help newcomers become proficient in the area. Some of these tutorial materials may be generated by the researcher and others may be collected from other sources..
14.New results	Guidance on how to best handle future problems..
15.Future work:	Based on the results, speculations about open issues of future issues that might become the motivation for the next round of research.

Here of items that are usually too large to add to a paper, but which a paper might list as an external resource:

16.Data	Used in an analysis; either raw from a project; or some derived product.
17.Scripts	used to perform the analysis (the main analysis or the subsequent statistical tests or visualizations; e.g. the Python Sparklines generator or code for a fast a12 test. Scripts can also implement some of the patterns identified by the paper.
18.Sample models	Can generate exemplar data; or which offer an executable form of current hypotheses. Or, these models could be a set of standard problems everyone shares (e.g.the verification comminity and optimization community have libraries of standard models (or models ported from commercial apps) that they all use to baseline their tools)
19.Delivery tools	Things that let other people automatically rerun the analysis; e.g. + Config management files that can + build the system/ paper from raw material and/or + update the relevant files using some package manager + Virtual machines containing all the above scripts, data, etc, pre-configured such that a newcomer can automatically run the old analysis.

Rhetorical Strategies: details

Parts of a Paper

The following notes on "parts of a paper" are taken from the excellent notes by Tim Sheard and Todd Leen.

When reading a paper, take care to note:

Items 1 to 19, listed above.
Comments on:
- The thesis being investigated
- The contribution
- The method of investigation
- The “power” of the results
- The applicability of the results
- Summary of the technical development
- Details of any examples

So a first pass of a paper, skim over to find

The abstract, (to determine relevance to determine kind of paper);
Pictures tables, graphs, and diagrams concepts (just to get the big picture);
Any of the items 1 to 19 listed above;
References (do you recognize them?)

Swales' Three-Move Model

The following notes on "Swales' Three-Move Model" are taken from the excellent notes by James Luberda.

The following is based upon an empirically-derived model of how “real-world” research article introductions commonly proceed:

Note that it is not a set of rules, but rather something of a guide as to what readers of research articles and academic essays are likely to expect (and find), a set of patterns in introductions that facilitate their reading and comprehension.
You might think of each “move” below as a kind of verbal action—a “move” a writer will make to have a particular effect on the reader.

Move 1 Establishing a territory

In this opening move, the writer may do one or more of the following to broadly sketch out where the subject of his/her essay falls—the “big picture”
- Point out the importance of the general subject
- Make generalizations about the subject
- Review items of previous research

Move 2 Establishing a niche

In this move, the writer then indicates to the reader the particular area of the broader subject that the essay will deal with. This can be done using one or more of the following:
- Make a counter-claim, i.e. assert something contrary to expectations
- Indicate a gap in the existing research/thinking
- Raise a question about existing research/thinking
- Suggest the essay is continuing a tradition, i.e. it is following in the footsteps of previous research/thinking

Move 3 Occupying the niche

In this move, the writer then sketches out exactly what this particular essay will accomplish in relation to move2, and gives the reader a sense of how the essay will proceed. In general, each of the steps below will appear in this move, in order:
- Step 1: Outline the purpose of the essay, or state the research that was pursued
- Step 2: State the principal findings of the essay—what the reader can expect the essay/research will have accomplished for them by the time they get to the end
- Step 3: Indicate, roughly, the structure of the essay—what will appear in it and in what order

Exercises In Reading Faster

Note that, at first, it will take hours to read one paper. However, after a couple your reading will speed up dramatically. So do not be discouraged if, at first, this is ridiculously slow.

Part1: Learn Historical Context

In the following, anything shown in italics is explained below.

One: Find a highly cited paper from the automated software engineering literature
- Find some source of highly cited papers
  - Do not review any paper from your own institution (so, fear not, you don't have to review the lecturer's paper)
  - For students of general software engineering, start with the International Conference on Software Engineering
  - For students of automated software engineering, start with the International Conference on Automated SE
- Pick any 2011 paper and summarize some of its parts.
Two,Three,Four,Five: Explore context, backwards
- Find four papers in the One's reference list
  - That date 2008 to 2010
  - That are highest cited (Note that recent papers have less cites than older papers). + Walk them backwards in time, summarizing some of their parts

Notes:

By summarize parts we mean write 500 to 1000 words on text:
1. Starting with a clear reference to the paper. + e.g. Tim Menzies, Burak Turhan, Ayse Bener, Gregory Gay, Bojan Cukic, and Yue Jiang. 2008. Implications of ceiling effects in defect predictors. In Proceedings of the 4th international workshop on Predictor models in software engineering (PROMISE '08).
2. Write down the four most important keywords in the paper, plus a two line definition of each.
  - Label them ii1, ii2, ii3, ii4
3. Offer very brief notes on any four of the items listed as 1 to 19 (above).
  - Label them iii1, iii2, iii3, iii4
4. Write down three ways the paper could be improved.
  - Label them iv1, iv2, iv3
5. For Two,Three,Four, etc also comment on the connection to the other papers.
- Do you know how long 1000 words is? About as long as this page. So you want to write something half this size.
- You goal is being able to generate such a summary in thirty minutes:
  - It is unlikely you will reach this goal until after you have read numerous papers.
To find highest cited papers, look up items from the reference list in the week1 paper paper in scholar.google.com (or dl.acm.org/ or ieeexplore.ieee.org) and count their citations. For example, looking up "Mining metrics to predict component failures" in scholar.google.com produces:

Looking bottom, you can see Cited by 527. If you click there, you find many others published since the first paper:

Google scholar sorts these top-to-bottom most-to-least cited (so the most cited papers are shown at top). So,
To find the highest cited papers that cite the week1 paper, look up your week1 paper in scholar.google.com (or dl.acm.org/ or ieeexplore.ieee.org) and count their citations.

Part2: Identify reusable data

Six: For any paper in the above sequence, report any reusable data.

Notes:

To report any reusable data, try to fill in the form here. Hand in either: - A page shown what you entered from those fields - Or an explanation why your kind of papers do not generated data of the kind that can be entered here.

Part3: Track advances.

Seven,Eight,Nine: Explore context, forwards + Find three papers that cite the One paper + That date 2012 to 2015 + That are highly cited (Note that recent papers have less cites than older papers).. + Walk them forwards in time, summarizing some of their parts

Part4 (one big essay)

Take all the above and summarize the procession of research 2008 to 2015 of some automated software engineering issue.

10 pages, 2 columns, using the Word or Latex formats shown in this page.
Include at least 20 references, eight of which you studied above while the others are related work (or, indeed, far flung work that you think should be connected to your eight but , so far, no one has done so).
Mention as many as possible of items listed 1 to 19, above.

Note that, for this essay, the keyword definitions you generated above will become the core of your related work section.

For full marks:

Through out your text, comment on how eight of these nine papers improved (failed to improve, ignored, extended, refined) the issues mentioned in an early paper.
End with your own recommendations of the path from here. Mention the issues that are now retired, that no one has retired, that someone should retire, or that no one should even try to retire.

Note: if the papers you studied above proved to be dull, fell free to start again with some other 2011 paper from here. Note that, by the time you get to Part4, it will take you less than a day to work through eight papers (it may even just take you one afternoon).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

READING.md

READING.md

Read123456789: reading research papers

How to Read Papers, Faster

Feature extraction, details

Rhetorical Strategies: details

Parts of a Paper

Swales' Three-Move Model

Exercises In Reading Faster

Part1: Learn Historical Context

Part2: Identify reusable data

Part3: Track advances.

Part4 (one big essay)

Files

READING.md

Latest commit

History

READING.md

File metadata and controls

Read123456789: reading research papers

How to Read Papers, Faster

Feature extraction, details

Rhetorical Strategies: details

Parts of a Paper

Swales' Three-Move Model

Exercises In Reading Faster

Part1: Learn Historical Context

Part2: Identify reusable data

Part3: Track advances.

Part4 (one big essay)