Origins of Assessment – Part 3: The Myth of the Metals

The Context of Schooling: 1905-1950

As compulsory school attendance was more rigorously enforced towards the end of the nineteenth century, this led to a much more socially and culturally diverse student population in schools. Educators and school boards continued to use written tests to evaluate pupils and schools, but while testing was seen as useful for identifying which students were succeeding or failing, many acknowledged its shortcomings in explaining why different students performed well or poorly. Educators and policy-makers acknowledged that characteristics such as intelligence, ambition, socio-economic status, race, and ethnicity influenced students’ academic performance, however they lacked the statistical knowledge to weigh the relative impacts of each of these variables.

Pursuing Ryerson’s vision for common curricular standards while serving such a diverse student population would prove to be challenging. Whereas teachers previously turned a blind eye when ‘difficult’ or ‘unruly’ schoolchildren skipped class, new measures to enforce attendance, such as the hiring of truancy officers, meant that students would be required to remain in the school building for the duration of the school day. Consequently, students who “failed to keep step with [their] fellows, or who, because of physical or moral defect seriously interfered with the regular work of a class” were segregated into special classrooms (Tropea, 1987, p. 31). The segregation of “difficult” students was seen to increase bureaucratic efficiency because these students were not included in the calculation of a school’s overall examination or promotion scores (Tropea, 1987). So began the practice of streaming.

At the same time, the turn of the century marked the beginning of progressive educational reform. Progressive reformers advocated for individualized, child-centered instruction that would promote the “blossoming” of each student, rather than simply filling empty heads with prescribed knowledge. Proponents of progressive reform believed that “teachers must think of the school more definitely as a place for instructing pupils and improving their abilities and to consider the school no longer as a place for holding pupils until their abilities reach certain arbitrary standards” (Tropea, 1987, p. 42). In other words, in order to preserve efficiency within the schools, teachers would have to lower their standards for these ‘special populations’ and think of other interests besides academics as being just as “noble” and “worthy” as the academic life (Tropea, 1987). Teachers soon became well versed in the practice of “passing on” difficult or low-achieving pupils to avoid having to deal with them again the following year.

Central to the idea of universal, free, and compulsory schooling was the notion that educational opportunities (and, in turn, positions of power) should no longer be restricted to wealthy élites. Universal common schooling represented a shift from an aristrocratic society (ruled by the wealthy) to a meritocratic society (ruled by the intelligent). In identifying intelligent individuals, the dominant perception at the time was that intelligence was unified, heritable, and fixed (Gould, 1996). Early attempts to “scientifically” measure intelligence gave rise to a number of pseudo-scientific disciplines, such as phrenology, craniometry, and eugenics. These fields sought to compare (and select for) the intelligence of individuals and races based on the size and shape of the brain or skull. However, at the turn of the century, several lines of research from across Europe and the United States converged to form a new discipline – psychology – and the mental testing movement soon followed. As Fass (1980) states, “it was only as the French concern with personality and abnormality and the English preoccupation with individual and group differences, as measured in aggregates and norms, were superimposed on the older German emphasis on laboratory testing of specific functions that mental testing as an American science was born” (p. 433).

Intelligence Testing

In 1904 the minister of public education commissioned Alfred Binet to devise a technique for identifying low-achieving students who would benefit from special education (Gould, 1996). In response, Binet designed a test consisting of a “hodgepodge” of short, observable tasks and activities with varying degrees of difficulty (Gould, 1996). For Binet, “it mattered very little what the tests were so long as they were numerous” (Binet, 1911, in Gould, 1996, p. 179). Additionally, Binet sought to separate “natural intelligence” from “the trammels of the school,” therefore tasks involving reading, writing, or other “learned skills” were excluded (Gould, 1996). The output of this test was a single reported number that fell along a linear scale, which was used as a proxy for a child’s “general potential.” This number, later termed “IQ,” was only intended to be a rough empirical guide designed for the specific practical purpose of identifying schoolchildren who needed help. Binet made it very clear that his test “does not permit the measure of intelligence because intellectual qualities are not superposable and therefore cannot be measured as linear surfaces are measured” (Binet, 1905, in Gould, 1996, p. 181). After publishing his scale in 1905, Binet was concerned that over-zealous teachers would abuse his test for the purposes of classifying, labeling, and ranking students, seeing it as “an excellent opportunity for getting rid of all the children who trouble us…who are unruly or disinterested in the school” (Binet, 1905, in Gould, 1996, p. 181).

The United States army testing of World War I represents the first time in history that intelligence tests were administered on a massive scale, with soldiers being assigned to different jobs based on their IQ scores. A byproduct of the army-testing program was the generation of large data sets on human intelligence, which could be mined by intelligence theorists and used to establish norms (US Congress, 1992). Following World War I, schools were quick to adopt the practice of mass IQ testing “to classify children according to their innate abilities, and in so doing, protect the slow witted from the embarrassments of failure while allowing the gifted to rise to their rightful levels of achievement” (US Congress, 1992, p. 121). The establishment of educational norms transformed education into a remedial activity – i.e. the main purpose of education was to make ‘deficient’ individuals as normal as possible (also known as the “deficit model” of learning) (Church, 1971).

Standard achievement tests, used for the purposes of selecting individuals based on intelligence, were also on the rise, and by 1932 it was estimated that there were 1,300 such tests on the market (US Congress, 1992). Of special note was the Stanford Achievement Test (SAT), developed in 1927 by Lewis Terman as a tool for skimming the ‘crème de la crème’ off a distribution of applicants to be accepted to college (Lemann, 2000). Nicholas Lemann (2000) likened this practice to the fable of Cinderella, where “the educational system would fit glass slippers on the feet of a lucky few who would be whisked away to college and trained to lead American society” (Lemann, 2000, p. 25). However, in a project called the “Eight Year Study,” conducted by Ralph Tyler from 1932-1940, SAT scores were shown to have very little predictive power in determining how well students actually performed in college (US Congress, 1992).

Many of the weaknesses of intelligence testing and aptitude testing stemmed from the fallacy that intelligence was a biologically determined, fixed trait. The implications of the fixed-ability mindset included phenomena such as fear of failure and learned helplessness in students. After all, as Matthew Syed (2010) states, “Why spend time and energy seeking to improve if success is available only to people with the right genes?” (p. 123).

In contemporary discussions surrounding assessment and evaluation, the corrosiveness of the fixed-ability mindset remains a prevailing theme. For example, giving praise to students based on their intelligence (e.g. “You’re so smart!”) rather than their efforts (e.g. “You must have worked really hard!”) has been shown to be detrimental to students, harming their motivation and teaching them to pursue easy challenges at the expense of valuable learning opportunities (i.e. ‘the least amount of work with the least amount of effort’) (Dweck, 1998, in Syed, 2010). For example, the self-esteem movement of the 1970s combined excessive praise of student intelligence with a lowering of academic standards (Syed, 2010). However, these efforts to make students feel good about their ‘innate’ abilities backfired when it served to create “poorly educated students who felt entitled to easy work and lavish praise” (Dweck, 2007, in Syed, 2010).

Continue to Part 4 →


Leave a Reply