In the past dozen years, little progress has been made toward judging the success of new technologies. Why is this so? How do we explain the enormous commitment of dollars to technologies with so little evidence available as to determine the benefits which might emerge from the investment? How can we make wise technology choices when there is so little data available upon which to build insight?
Test out your own district's commitment to careful planning and program evaluation by completing the District Technology Self-Assessment Form on pages -. Ask colleagues to share their perceptions. Which issues need the most attention?
I. The Importance of Program Evaluation
We need to know more about the effects of new technologies upon the learning of students in order to support the thoughtful selection of technologies, the design and implementation of new programs, and the marketing of new technologies to various school constituencies.
A. Selection of Technologies
Given a limited hardware budget, what kind of investment will pay the greatest learning dividends? In order to answer that question, the district must first clarify its own objectives. What are the desired student outcomes? Is the district looking to equip students with job-related technology skills such as database research, writing, graphics and group-problem-solving? Or is the improvement of reading scores a higher priority? Once the district determines its priority, it should be able to turn to research studies which show the relative merits of various platforms and systems in comparison with other educational delivery systems which are not technology-based. Unfortunately, because there are few studies which provide such comparative data, technology shopping leaves many districts vulnerable to vendor data.
B. Program Design and Implementation
What you don't know may well hurt you. Informed experimentation requires the frequent and skillful collection of data in order to modify and adapt the newly implemented program. It is the responsibility of district technology planning committees to identify research questions worth asking, commission an evaluation design and explore the significance of findings on a frequent basis, suggesting program changes as data warrants them.
It makes great sense to learn from the experience and mistakes of others instead of re-inventing the wheel, yet the literature on technology programs is often "testimonial" in character, meaning that a district pioneer is writing an article describing the benefits of a particular innovation. These articles tend to minimize the difficulties and exaggerate the benefits of the innovation. They also rarely include reliable data which might provide program designers in other districts clues regarding which elements to adopt and which ones to avoid.
Formative evaluation - the collection of data as a program proceeds in order to learn enough to guide adaptation of the program - is of prime importance. The implementation team keeps asking important questions and collecting relevant data, some of which will be quantitative (numerical) and some of which will be qualitative (descriptive). The goal is reflective practice, continually asking "What's happening? How might we change what we are doing to improve results?" Programs must be implemented with an experimental spirit and style, assuming that the assumptions and ideas which guided the original design will deserve reconsideration as implementation proceeds.
Collection of data is often viewed with suspicion by staff members who have concerns that the data might be used in an evaluative manner to assess their own performance. The mere hint of accountability raises eyebrows and defensiveness in many districts. In order to avoid such a reaction, staff members must have a strong voice in the design of the study and the collection of the data, after receiving appropriate training in formative evaluation so that they can see the benefits of data collection for program adjustment and development. Teachers, if they are to act as technology pioneers, must become researchers, applying new technologies to learning challenges with an eye toward testing hypotheses and developing successful strategies.
Even good programs, like bushes and trees, can benefit from careful and timely pruning. In conjunction with research data, the program team keeps asking what changes need to be made in the original plan. What elements should be eliminated, cut back and modified?
Summative evaluation - the collection of data to judge the overall success of a program - is also important to help district decision-makers determine the return on investment. When all is said and done, how much bang did the district buy with its buck? When boards of education pay for elaborate systems and nobody can tell them the "return on investment" three years later, they often become suspicious, doubtful and uncooperative about new ventures.
C. Marketing New Technologies
Because many school district constituencies such as senior citizens and parents did not use these new technologies during their own schooling, the support for such programs is often quite soft, especially in recessionary times when budgets are tightened. Because anything viewed as innovative is vulnerable to the budget ax, school districts would be wise to involve all constituencies in seeing and experiencing the benefits of these new technologies. As outlined in the December, 1991 issue of From Now On, marketing involves getting to know the needs and interests of all the groups and then opening school technology programs to community participation in order to engender feelings of ownership and support.
Evaluation reports can be an important element in these marketing campaigns. How has student writing changed because of the writing labs? What proof can you provide that student performance has changed dramatically for the good? What does the new technology offer which the old technologies could not match?
Because few school districts invest in such data collection with regard to new technologies and the programs associated with them, they rob themselves of several important opportunities. As mentioned, the data plays an important role in building a successful program, but they can also be helpful in winning community support, gaining grant support and maintaining credibility. Without data to show student outcomes, new technologies can too easily be characterized as frills.
Unfortunately, in too many cases, the hardware and the technology is seen as the program itself. First we buy the equipment and then we ask how we might use it. Evaluation is not even an after-thought.
II. The Sad State of Research on Technology Programs
As the chart and the table reproduced below each demonstrate, the number of articles reported each year in the ERIC collection which treat the issue of evaluation of K-12 educational technology has never pushed through the glass ceiling of 40. Many of those do not report actual findings but explore the issue of evaluation.
Number of Evaluation of K-12 Educational Technology Articles
Reported by ERIC Each Year 1980-92
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992
8 ---- 16 -- 10 -- 29 -- 25 -- 22 - 32 -- 27 -- 27 -- 27 -- 29 -- 36 -- 7
Upon close examination, many of the published program evaluations are seriously flawed, according to a study by Becker. Becker's review of the research is worth quoting at some length here:
In order to prepare a best-evidence synthesis on the effects of computer-based instructional programs on children's learning, we began a search for empirical research about the effects of computer-based approaches on basic curricular categories of learning (math, language arts, writing, science, etc.) in grades 1 through 12. We limited the search to reports produced since 1984 and including achievement measures as outcomes.
The 51 reports that were obtained included 11 dissertations, 13 reports of school district evaluations, 15 published articles, and 12 unpublished papers.
Of the 51 studies, 11 were eliminated because they had no comparison group, and measures of effects were limited to gains on standardized achievement tests over varying periods of time.
Of the remaining 40 studies, eight were excluded from consideration because they did not employ pre-test controls and neither classes nor students were randomly assigned. Lacking both pre-tests and random assignment, it was impossible to equate the computer-using groups and the traditional instruction groups.
Seven more studies were removed because the treatment period was shorter than eight weeks and, finally, we chose not to consider eight studies involving fewer than 40 children, or where each treatment involved only a single class of students and the experimental and control classes were taught by different teachers.
What about the major element of experimental design -- random assignment? Of the remaining studies, only one randomly assigned pupils to classes, which were in turn randomly assigned to computer-assisted or traditional treatments.1
Curious as to whether or not Becker had updated this 1988 study, I conducted an ERIC search for his more recent articles and found four articles with a focus on patterns of utilization rather than evaluation.2 One of the most important questions posed by Becker and others for further research back in 1988 was the relative effectiveness of CAI vs. other expenditures such as tutoring or staff development to improve teachers' reading or math instruction. This question remains poorly addressed:
In a 1991 update of their research, Kulik and Kulik acknowledged this hole in the research:
Finally, this meta-analysis produced no evidence on what is certainly one of the most important questions of all about CBI: Is it cost effective? An early analysis by Levin, Destner, & Meister (1986) had suggested that the costs of CBI were too great given its record of effectiveness. Levin et. al. suggested that nontechnological innovations, such as tutoring, produced results that were just as good at a lower cost. Later reanalyses, such as those by Blackwell, Niemiec and Walberg (1986), have suggested that computer-based instruction is not only a cost effective alternative to traditional instruction but that it is far more cost effective than such non-technological innovations as tutoring.3
Fortunately, there are several large county systems, cities and think tanks such as Research for Better Schools which have invested in meaningful program evaluations with considerable reliability and validity.
III. Examples of Effective Research
A. Writing to Read
One example of effective evaluation is the series of reports available from the New York City Public Schools evaluating the IBM Writing to Read program over several years. These particular studies establish control groups, honor sound design standards and report findings in a comprehensive fashion. A district considering purchase and implementation of this program could learn a great deal about what kinds of learner outcomes one could expect from the investment in contrast with more traditional strategies, but one must be careful to read beyond the abstract of such studies because the summaries often mask or fail to report important findings.
The abstract below fails to mention, for example, that the impact of WTR on writing performance compared with non WTR students was strongest at grade one but virtually disappeared at grade three, the hypothesis being that the regular grade one programs traditionally paid less attention to writing, while writing instruction was important in traditional grade three programs. This third grade finding suggests that the achievement gains at grade one associated with WTR may have been replicated by teaching writing more frequently in the traditional first grade classes. 4
Writing To Read 1988-89. Evaluation Section Report.
New York City Board of Education, Brooklyn, NY. Office of Research,
Evaluation, and Assessment. Apr 1990 67p.
The Writing to Read Program (WTR) objectives for 1988-89 included: to extend and support the implementation of the WTR program in New York City elementary schools; to promote the reading and writing achievement of kindergarten, first-, and second-grade students; and to introduce students in early childhood to computer technology. In 1988-89 the program served 87 schools in 22 community school districts. The methods used to evaluate the program included on-site interviews, lab and classroom observations, questionnaires distributed to all program participants and a selected group of parents, pre- and post-program writing samples, and reading achievement scores for the Metropolitan Achievement Test for both selected program participants and matching control groups. Overall reaction to the program was positive. Most participants believed that the program provided a good foundation in basic skills, helped to develop confident and mature writers, and that the computers and center setting were significant motivational devices. Some of the additional major findings included: (1) WTR program has little immediate impact, and no long-term impact on improving reading performance of participating students when compared with other reading programs; (2) students in the program made significant progress in their writing; (3) WTR students improved their writing skills to a greater degree than did similar students who did not participate in the program; and (4) monolingual students at the kindergarten level showed a statistically significant improvement in writing over bilingual kindergartners. (Eleven tables of data are included. One appendix includes the Writing Sample Scoring Scale.)
B. Research for Better Schools Writing Evaluation - Delaware
Collaborating with the Delaware Department of Instruction, RBS (Research for Better Schools) researcher Francine Beyer tested the following three hypotheses regarding writing instruction combined with word processing at the middle grade level:
student writing skills would be improved through the use of computer-assisted instruction for the teaching of the writing process student enjoyment of writing would be improved through the use of computer-assisted instruction for the teaching of the writing process student enjoyment of computer-assisted instruction for writing would be improved through the use of computer-assisted instruction for the teaching of the writing process6
This study was conducted in 16 schools, thirteen of which provided reasonably comparable control groups. The researcher identified both process and outcome objectives of the project, noting that process objectives (how the program was delivered) must be met for findings regarding outcomes to be significant. This process data would also be used to adjust program implementation.
Districts wishing to construct local evaluations for writing programs may want to model their design after this excellent study because it effectively combines formative with summative evaluation. Copies are available from Francine Beyer at RBS at 444 North Third Street, Philadelphia, PA 19123.
The study found that all three study hypotheses were supported by the evaluation data, as participants in experimental groups generally achieved greater gains in skill and attitude than members of control groups.
Even such well designed evaluations and programs sometimes encounter some interesting design issues. Subtle though they may be, these issues deserve careful attention by those creating evaluations.
In the RBS study, for example, growth in student writing skill was measured using writing samples that relied upon paper and pencil rather than the new technology. How might the results have differed if experimental groups had been permitted to compose their samples on the word processor? If one believes that the word processor empowers a different kind of idea processing because of its low risk, flexible editing and re-writing, how well will that growth show up if the student must revert to the old (paper and pencil) technology? It is a bit like testing the effect of new composite tennis rackets on a player's service by doing pre and post tests using wooden rackets.
Reading the study carefully, a second subtle but important issue emerges. During the year of this program, many of the students evidently composed their first drafts on paper and then transferred them to the computer. For those who prize the playfulness of composing on the computer - the adult work place use - this behavior signals what may have been a fairly serious problem. What percentage of the students were able to actually perform pre-writing exercises and first drafts on the computers? Did access and time issues block many students from this experience? The study does not tell us, but the answers to the questions could prove instructive, as some would hypothesize that students would perform even more strongly if given such opportunities.
A third issue worth considering is the appropriateness of utilizing a timed writing sample as a measure of student writing proficiency after spending a year on writing as a process since this approach to writing emphasizes the importance of stretching writing out over time in order to allow for ideas to percolate. The instrument may not fit the program and may minimize differences between control groups and experimental groups, since writing samples play to the non-process, time-pressured approach to writing once typical of most school writing programs.
I offer these comments as a cautionary note to illustrate the peculiar challenge of designing valid evaluation studies for new technologies. Here we have an excellent study which should serve as a model, and yet the instrumentation available to us (such as pencil and paper, timed writing samples) may belong to an old paradigm unsuited to measuring the effects of various new technology programs. If we are not careful, these design issues may depress differences between experimental and control groups, eventually undermining support for the new programs.
IV. Why So Little Technology Evaluation?
In this section we will explore a number of hypotheses which might serve to explain the "glass ceiling" keeping annual ERIC evaluation reports under 40 since 1980. Following each hypothesis will appear a rationale, most of which will be conjecture.
Hypothesis #1: Most school districts do not have the expertise or the resources to conduct solid evaluation studies. Most of the existing studies have been completed by large districts, vendors or universities. Few districts have personnel with formal evaluation skills or the specific assignment to conduct such evaluations. Research is rarely conducted as part of the decision-making process. The collection and analysis of data, a cornerstone in the Total Quality movement, is rare in many school districts. In times of scarce resources, these are the kinds of budgets and projects first cut.
Hypothesis #2: Program proponents have a vested interest in protecting new programs from scrutiny. Those who push new frontiers and encourage large expenditures are always taking a considerable risk, especially when there is little reliable data available to predict success in advance. Careful program evaluation puts the innovation under a magnifying glass and increases the risk to the pioneers.
Hypothesis #3: Accountability is sometimes counter-culture. Many school districts have been careful to avoid data collection which might be used to judge performance.
Hypothesis #4: There is little understanding of formative evaluation as program steering. Since most program evaluation in the past has been summative (Does it work?), few school leaders have much experience with using data formatively to steer programs and modify them. While this kind of data analysis would seem to be more useful, more helpful and less threatening than summative evaluation, lack of familiarity may breed suspicion.
Hypothesis #5: Vendors have much to lose and little to gain from following valid research design standards. Districts are unlikely to pour hundreds of thousands of dollars into computers and software which will produce no significant gains. Careful research design tends to depress some of the bold results associated with gadgetry and the Hawthorne effect. Amazing first year gains, for example, often decline as programs enter their third year. In some cases, vendors report only the districts or schools with the best results and remain silent about those which are disappointing.
Hypothesis #6: School leaders have little respect for educational research. Many school leaders joke that you can find an educational study to prove or disprove the efficacy of just about any educational strategy. Studies have shown that such leaders typically consult little research as they plan educational programs.
Hypothesis #7: Technology is often seen as capital rather than program. Some school leaders do not associate technology with program. They view technology as equipment not requiring program evaluation. Equipment may be evaluated for speed, efficiency and cost but not learning power.
Hypothesis #8: Evaluation requires clarity regarding program goals. Unless the district is clear about its learning objectives in terms which are observable and measurable, as was done by the RBS study, it will be difficult to design a meaningful evaluation study. In some districts, the technology is selected before a determination is made regarding its uses.
Hypothesis #9: Adherence to evaluation design standards may create political problems. In addition to increasing risk by spotlighting a program, evaluation can also anger parents as some students are involved in experimental groups and others may have to put up with the traditional approach. Random selection can anger people on either side of the innovation, participating teachers, included. Voluntary participation, on the other hand, immediately distorts the findings.
Hypothesis #10: Innovative programs are so demanding that launching an evaluation at the same time may overload the system. Many schools are perennially stable and conservative organizations with a preference for first order change (tinkering) rather than second order change (fundamental change). Stability needs conflict with innovation, as change is seen as threatening and pain producing. Because the potential for resistance runs high in such organizations, many leaders may trade off evaluation just to win buy-in for a change.
We have devoted nearly a dozen years to exciting new technologies without penetrating most of the regular classroom programs or achieving the kind of program integration that makes sense. Much technology use is tangential or token. Before we can expect to see a greater migration into the regular program areas, we must gather more convincing evidence regarding the learning effects of these technologies, and yet it seems unrealistic to expect that either vendors or school districts can be expected to fill this void during the next few years.
Perhaps it is time our federal research dollars tackle this challenge in a comprehensive, highly organized fashion. Meanwhile, we must all muddle through, the blind leading the blind, trusting to our instincts and our intuition rather than solid data. Wise districts will keep an eye out for studies like the RBS/Delaware writing project reported in this article, and they will form local versions to test their computerized writing programs, borrowing research design and instruments from professionals in the field.
1. Becker, Henry J. "The Impact of Computers on Children's Learning." Principal, November, 1988.
2. Becker, Henry Jay. "How Computers Are Used in United States Schools: Basic Data from the 1989 I.E.A. Computers in Education Survey." Journal of Educational Computing Research; v7 n4 p385-406 1991.
Becker, Henry Jay. "Mathematics and Science Uses of Computers in American Schools, 1989." Journal of Computers in Mathematics and Science Teaching; v10 n4 p19-25 Sum 1991.
Becker, Henry Jay. "When Powerful Tools Meet Conventional Beliefs and Institutional Restraints." Computing Teacher; v18 n8 p6-9 May 1991.
Becker, Henry Jay. "Encyclopedias on CD-ROM: Two Orders of Magnitude More than Any Other Educational Software Has Ever Delivered Before." Educational Technology; v31 n2 p7-20 Feb 1991.
3.Kulik, James A. and Chen-Lin Kulik. "Effectiveness of Computer-based Instruction: an Updated Analysis." Computers in Human Behavior, Vol. 7, p. 91.
4. Writing To Read 1988-89. Evaluation Section Report. New York City Board of Education, Brooklyn, NY. Office of Research, Evaluation, and Assessment. Apr 1990 67p.
5. Beyer, Francine S. "Impact of Computers on Middle-Level Student Writing Skills." Research for Better Schools, Philadelphia, PA, March, 1992.
Credits: The background is from Jay Boersma.
Other drawings and graphics are by Jamie McKenzie.
Copyright Policy: Materials published in From Now On may be duplicated in hard copy format if unchanged in format and content for educational, non-profit school district use only. All other uses, transmissions and duplications are prohibited unless permission is granted expressly. Showing these pages remotely through frames is not permitted.
FNO is applying for formal copyright registration for articles.
From Now On Index Page