Q U A N T I T A T I V E
Thank you to my fellow PhD student, Rostati Rostati, for her critical eye, discussions and support!
Successfully passed exam without revision
Wirkala, C. & Kuhn D. (2011). Problem-based learning in K-12 education: Is it effective and how does it achieve its effect? American Educational Research Journal 48(5): 1157-1186

INQUIRY TOOL EXAM: Click here to download
OVERVIEW/NOTES
.
Rationale/Theoretical Framework:  The study was guided by the overarching research question, "do the benefits of PBL justify its demands?," which was answered by breaking the multifaceted PBL down into its constituent elements and evaluating the respective effectiveness of each using a theoretical framework based on earlier work by Capon and Kuhn (2004) and Pease and Kuhn (2011).These studies were also used to evaluate the effectiveness of PBL in natural instructional settings under tight experimental control. In contrast to the other two studies, though, Wirkala and Kuhn state that, "Specifically, we compare not only PBL and lecture/discussion instructional conditions but also two forms of PBL instruction-team and individual-in order to examine whether the effectiveness of PBL is reduced when its social component is subtracted, and hence whether social collaboration is an essential component of the PBL method" (p. 1159).

Although Wirkala and Kuhn did not elucidate a research question or questions per se, they did explain the overall purpose and general question that guided their research early on in their content. For instance, although they state that, "In order to address the research questions of specific interest in this study, analyses for most performance variables consisted of two planned comparisons-one between the LD group and the PBL groups and one between the two PBL groups" (p. 1172), again, they do not ever articulate the specific research questions that guided their study. These researchers, though, do report that, "Given its growing use, and the potential for much more widespread use, in K-12 education, the question of whether its benefits justify its demands is thus one of great practical significance" (p. 1159). In support of their choice of research design, population, setting, and data analysis techniques, Wirkala and Kuhn add that, "We see the present study as contributing to an essential research base necessary to answer this question" (p. 1159).

Literature Review: Contents of Literature Review Presented by Wirkala and Kuhn (2011)
Table 1
Source
No.
Comments
Peer-reviewed journal articles
26
Representative journals included:  Journal of Educational Psychology, Educational Psychologist, Cognition and Instruction, Journal of Experimental Child Psychology, and Academic Medicine.
Scholarly texts
7
Titles were as follows:
1. Bransford, J. D., Brown, A. L., & Cocking, R. R. (Eds.). (2000). How people learn: Brain, mind, experience, and school. Washington, DC: National Academy Press.
2. Higbee, K. L. (2001). Your memory: How it works and how to improve it (2nd ed.). New York, NY: Marlowe & Company.
3. Kuhn, D. (2005). Education for thinking. Cambridge MA: Harvard University Press.
4. Kuhn, D., & Pease, M. (2009). The dual components of developing strategy use: Production and inhibition. In H. S. Waters & W. Schneider (Eds.), Metacognition, strategy use, and instruction (pp. 135-159). New York, NY: Guilford.
5. Lahey, B. B. (2008). Psychology: An introduction (10th ed.). New York, NY: McGraw-Hill.
6. Lorayne, H., & Lucas, J. (1974). The memory book: The classic guide to improving your memory at work, at school, and at play. New York, NY: Ballantine Books.
7. Mitchell, K., Shkolnik, J., Song, M., Uekawa, K., Murphy, R., Garet, M., & Means, B. (2005). Rigor, relevance, and results: The quality of teacher assignments and student work in new and conventional high schools. Washington, DC: American
Institutes for Research and SRI International.

Research Design - Post-Test Only Experimental Design: The research design and analysis used to achieve this comparison was a crossed within-subjects design that manipulated two independent variables:  (a) instructional format (PBL vs. LD); and, (b) grouping condition (PBL-team vs. PBL-individual). Based on the study's findings, Wirkala and Kuhn conclude that compared to the traditional lecture method, PBL was in fact superior in helping students achieve mastery, but that the social component does not contribute to its overall effectiveness.

Design Encompassed Two Basic Topics
1.     Topic 1 was groupthink, the faulty decision making that can occur in groups with low cognitive diversity and other characteristics.
2.     Topic 2 was learning and memory, particularly how certain study factors affect memory for learned material (p. 1160).

Positivist philosophical stance - research is free of the values, passions, politics and ideology of the researcher. Does not take into account learning about how people live, how they view the world, how they cope with it, how they change it ...(i.e. for example research does not take into account the onset of puberty and the physciological differences between males/females -particularly in group/team settings).Post-positivists conduct research among other people, learning with them vs. positivists conducting research on them.

Method:

Participants: Sixth-grade students at an alternative urban public middle school -  Ns of 30, 29, and 31 (Wirkala & Kuhn, 2011, p. 1159). School administration assigned incoming students to three equivalent classes based on gender, ethnicity, standardized test scores, essay responses on the schoolís admission exam, and previous academicrecord (random selection.. i.e sample of subjects representative of population)  Students in the PBL-team condition were randomly assigned to a team (assigned AFTER all subjects were selected vs. random selection, which involves probability sampling technigues to extrapolate to generalizability) but am unclear if participants were randomly assigned to LD and PBL individual. Random assignment is critical to internal validity.

Curricular content /age and developmentally appropriate
Wirkala and Kuhn established a baseline in an effort to understand the targeted concepts and the equivalency of the two topics (groupthink and learning/memory) by utilizing an additional group of 94 students from one grade below at the same school. the 94 students were administered the cued comprehension assessments, however, they did not participate in study (2011, p 1160). Wirkala and Kuhn  also stated, but did not elaborate further, " that the content was developmentally and age appropriate for the population. However, it was entirely new to these students and different from the typical content in the course in which the intervention took place (social studies), thus minimizing previous knowledge as a variable factor across participants" (2011, p. 1160). It should be noted that "no special education students attend the school" (p. 1160). "Initial comprehension prior to instruction was not assessed for the main experimental groups, so as not to prime their use of the concepts or otherwise interfere with the effects of instruction" (Wirkala, and Kuhn, 2011, p. 1173).
 .

Length of instruction. The instruction for each topic took place during three 40-minute class sessions, over the course of 1½ weeks (two hour total). Entire intervention, including both topics, consisted of 6 class sessions and a total of 4 hours. Groupthink occurred at the end of the sixth-grade year; instruction on the second (memory) at the beginning of the seventh-grade year (separated by summer break).

Assessment measures. Long-term learning was assessed 9 weeks after instruction ended -  via cued assessment of comprehension and an uncued assessment of application (to a new context).  Not interested in short-term gains but endured learning.

Conditions. A crossed within-subjects design , manipulating two independent variables: instructional format (PBL vs. LD) and grouping
condition (PBL-team vs. PBL-individual). Assignment of classes to conditions is shown in Table 2 which allowed researchers to examine .

Table 2*
Topic 1
Topic 2
Class 1
PBL -individual
PBL - team
Class 2
PBL - team
LD
Class 3
LD
PBL -individual
Note. PBL = problem-based learning; LD = lecture/discussion.
*Wirkala, and Kuhn, 2011, p.1161

Examined the effect of instructional condition for each topic on both comprehension and application measures - of both between-subject and within-subject comparisons.  Asked..."(a) Does PBL produce superior results to LD? (b) Does PBL-team produce superior results to PBL-individual (i.e., is social context an essential component of the PBL method?" (Wirkala, and Kuhn, 2011, p. 1161).

Data Collection (Wirkala, and Kuhn, 2011, p.1169):

1) Application assessment (groupthink and memory).
"The application assessment measured studentsíintegration and application of the concepts to a new context. The main topic
and concepts were not referred to in the essay question, and students could potentially respond without mentioning the concept. Therefore, to recognizetheir applicability in the new context, students needed to have understood, retained, and integrated these concepts into their long-term
knowledge structures. Thus, this assessment tested deep understanding by examining whether students spontaneously applied the concepts learned
to a novel situation, without being explicitly prompted to do so".

A two-tiered coding system was employed
1st tier addressed whether a concept was defined at all in their essay (per topic- 1 point per correct concepts out of possible 7).
2nd tier of coding, for each concept a student defined, a score was assigned for the level of explanation achieved using ordinal scale of 1-4 (fourth being highest level)

2) Comprehension assessment.
Define and fully explain the following groupthink concepts
Define and fully explain the following memory concepts

A two-tiered coding system was employed
1st tier addressed whether a concept was defined at all in cued comprehension assessment (per topic- 1 point per correct concepts out of possible 7).
2nd tier of coding, for each concept a student defined, a score was assigned for the level of explanation achieved using ordinal scale of 1-4 (fourt being hhighest level)


Reliability (confidence of measuring tool) & Validity (trustworthiness of what is being measured):

Coding Reliability asserted by Wirkala, and Kuhn, (2011, p.1172):
"A primary coder (the first author) coded all responses, and a proportion (20%) of responses across topics and conditions was coded by a second coder who was a trained doctoral student but not otherwise involved in the study. Both coders were blind to the studentsí identity and condition, as well as the other coderís scores. Percentage agreement between the two coders was as follows: groupthink comprehension, 86% (Cohenís
kappa = .82); groupthink application, 86.8% (Cohenís kappa = .79); memory comprehension, 91% (Cohenís kappa = .87); memory application, 89%
(Cohenís kappa = .70)."

 According to Wirkala and Kuhn, the statistical analyses and the rationale in support of their use in this study are set forth in Table 3 below:

Table 3
Type       
 Description*  
Authors' Rationale**
t test (inferential)
The t test is a parametric test (mean that is calculated from data and describes a population) and is used to determine if two means are significantly different at a selected probability level. Statistical significance (p<.05; probability less than .05 =95 chances in 100 that findings would be the same if different population was used). Typically used in basic two-group design.      
This test was used to evaluate the difference in mean number of concepts defined/applied.
chi-square
This tests the hypothesis that variables are independent, without indicating their strength or direction of the relationship. Chi-square is a non-parametric test (no dependency on any parameters) often used when data are in the form of frequency counts, percentages...that can be converted to actual numbers.
Chi square=
O = observed # of cases in category  
       (actual)
E= expected # of cases in category
 = sum
Determine significance; refer to chi square table; determine statistical significance (p<.05)
The chi-square statistic was used where ordinal scales were involved to assess depth of explanation achieved.
Wilcoxon signed-rank test (sample size of at least 20)
A nonparametric procedure used with two related variables to test the hypothesis that the two variables have the same distribution. It makes no assumptions about the shapes of the distributions of the two variables. This test takes into account information about the magnitude of differences within pairs and gives more weight to pairs that show large differences than to pairs that show small differences. The test statistic is based on the ranks of the absolute values of the differences between the two variables.Compares two sets of scores that come from the same participants.
The Wilcoxon signed-rank test was used to assess individual patterns over the two topics (groupthink/ learning & memory). The Wilcoxon uses the Z statistic and evaluates differences between repeated scores based on the magnitude of the difference between pairs of observations. Statistical significance was set at the .05 alpha level.
Sources:
* Kaufhold, John A. (20077). Basic statistics for educational research. Lincoln: iUniverse ; ** Wirkala, C. & Kuhn, D. (2011)
While noting that times have changed and methodologies and controls are tighter, it is certainly valid that the results of the Wirkala study should be laid side by side and compared with those that came before.  This would have then quantitatively back up the claims of Wirkala and Kuhn to superiority in terms of rigor of control and accuracy.

Findings: The main findings were that both versions of PBL were far superior in long-term learning lecture/discussion sessions and that there was no difference between PBL-individual and PBL-team situations.  This indicates that social interactions are not what makes PBL so effective (Wirkala, and Kuhn, 2011, 1157-1158).
Table 4
Statisical significance set at .05 (Wirkala, and Kuhn, 2011, 1173)
No-Instruction Baseline
"Results of comprehension assessments for the two topics among the noinstruction group indicated that students had negligible prior knowledge
of the concepts. These results also established that the two topics were equivalent in difficulty" (Wirkala, and Kuhn, 2011, 1173) .
Group think Topic
Memory Topic
Difference Across Topics
Difference in Explanation Levels
94 students-did not participate in instruction
mean # of concepts .56
mean # of concepts .52
Insignificant p=.732
Modal explanation level was 0
Highest Level of explanation was 0

"The few students who were able to define any one concept provided only a very basic definition, with no student reaching the explanation level. A comparison of the highest level of explanation achieved for any concept across topics further supported the equivalency of the two topics, p = .346" (Wirklala and Kuhn, 2011, p. 1173).

Groupthink Topic
Following Instruction  
Comprehension Assessment
Difference in Defining Concepts
Difference in Explanation Levels
PBL Team
mean # of concepts defined 5.19
Insignificant p=.329
Modal explanation level- marginal signficance p=.050
Highest Level of explanation is insignificant p= .331
PBL Individual
mean # of concepts defined 4.84
Combined PBL Groups
mean # of concepts defined 5.02
Combined PBL showed higher levels than LD
Significant difference p=.001
PBL group also showed higher modal explanation levels p=.006
Combined PBL groups showed higher levels of explanation than LD significant difference p=.002
LD
mean # of concepts defined 3.33
Application Assessment
Difference in Application of Concepts
Difference in Explanation Levels
PBL Team Group
mean # of concepts applied 2.59
Insignificant p=.577
Highest Level of explanation is insignificant p= .568
PBL Individual Group
mean # of concepts applied 2.88
Combined PBL Groups
mean # of concepts applied 2.74
Combined PBL showed higher levels than LD
Significant difference p=.001
Combined PBL showed higher levels than LD
Significant difference p=.002
LD
mean # of concepts applied 1.17
As in the comprehension assessment, PBL students in the application assessment were overrepresented at the highest levels and underrepresented at the
lower levels; the majority of students in the LD group performed at the no reference, mention, and definition levels, while the majority of PBL students
reached the explanation levels (Table 5) (Wirklala and Kuhn, 2011, p. 1175)
Memory Topic
Following Instruction  
Comprehension Assessment
Difference in Defining Concepts
Difference in Explanation Levels
PBL Team Group
mean # of concepts defined 4.7
Insignificant p=.105
Modal explanation level- marginal signficance p=.765
Highest Level of explanation is insignificant p= .744
PBL Individual Group
mean # of concepts defined 4.00
Combined PBL Group
mean # of concepts defined 4.34
Combined PBL showed higher levels than LD
Significant difference p=.001
Modal explanation level- marginal signficance p=.001
Combined PBL showed higher levels of explanation than LD -significant difference p=.007
LD
mean # of concepts defined 2.75
Application Assessment
Difference in Application of Concepts
PBL Team Group
mean # of concepts applied 2.10
Insignificant p=.478
Highest Level of explanation is insignificant p= .331
PBL Individual Group
mean # of concepts applied 2.42
Combined PBL Group
mean # of concepts applied 2.26
Combined PBL showed higher levels than LD
Significant difference p=.001
Combined PBL showed higher levels than LD
Significant difference p=.023
LD
mean # of concepts applied 1.24
Nearly half of PBL students reached the explanation levels, while the majority of LD students reached only definitional levels (Wirklala and Kuhn, 2011, p. 1177)
*Comparison Across Topics
Analyze 2 topics separately to establish results not specific to one topic
Comprehension Assessment
Difference in Defining Concepts
Difference in Explanation Levels
Groupthink
mean # of concepts defined 4.49
Significant difference p=.03
Modal explanation level- insignficant p=.41
Highest Level of explanation is insignificant p= .135
Memory
mean # of concepts defined 3.84
Application Assessment
Difference in Application of Concepts
Difference in Explanation Levels
Groupthink
mean # of concepts defined 28.08%**
Insignificant p=.899
Highest Level of explanation is insignificant p= ..277
Memory
mean # of concepts defined 27.62
*"Especially because participants encountered the two topics at two distinct times separated by several months, in the analyses presented thus far we elected to analyze the two topics separately, in effect treating one as a replication of the other to establish that results were not specific to one topic. However, it was also of interest in a secondary set of analyses to examine each groupís performance across topics for the two conditions they encountered. To do so, it is essential to establish that the topics are of equivalent difficulty. As reported earlier, we did this for an independent sample. However, it is also desirable to do so for the main sample in the assessments that followed instruction. Comparisons across the two topics for all performance assessments were consistent with the results reported for the noninstruction group: The two topics were of equivalent difficulty. All but one comparison were nonsignificant at the .05 level. We present those results
here". (Wirklala and Kuhn, 2011, p. 1177)
**"Percentages are used for this comparison because, cognitive diversity was split into two concepts, making 8 vs. 7.1" (Wirklala and Kuhn, 2011, p. 1178)
Within-Group Analyses of Individual Patterns
"The overall equivalence of difficulty level across topics permitted an additional set of analyses to be conducted within groups across the two
instructional methods that the group experienced 2 "(Wirklala and Kuhn, 2011, p. 1178)
Class 1:  
Comprehension Assessment
Difference in Defining Concepts
Difference in Explanation Levels
PBL Individual -Groupthink
mean # of concepts defined 4.9
Insignificant p=.339
Nine students defined more concepts when learning through PBL-individual, 7 defined more concepts when
learning through PBL-team, and 5 students defined the same number of concepts
in both.
Modal explanation level- insignficant p=.689
Highest Level of explanation is insignificant p= .130
Seven students achieved a higher level of explanation when learning via PBL-individual, 3 students achieved a higher level of explanation when learning via PBL-team, and 11 students achieved the same level of explanation in both.
PBL Team-Memory
mean # of concepts defined 4.62
Application Assessment
Difference in Application of Concepts
Difference in Explanation Levels
PBL Individual -Groupthink
mean # of concepts defined 39.77%
Insignificant p=.123
Fifteen students applied more concepts when learning via PBL-individual, and 7 students applied more concepts when learning via PBL-team.
Highest Level of explanation is insignificant p= .075
Eight students achieved a higher level of explanation when
learning via PBL-individual, 5 students achieved a higher level of explanation when learning via PBL-team, and 9 students achieved the same level of explanation in both.
PBL Team-Memory
mean # of concepts applied 30.52%
"Analyses of individual patterns thus confirm the results of the between subjectscomparisons: Neither comprehension nor application differs significantly
across PBL-individual and PBL-team instructional conditions" (Wirklala and Kuhn, 2011, p. 1179).
Class 2:
Comprehension Assessment
Difference in Defining Concepts
Difference in Explanation Levels
PBL Team-Groupthink
mean # of concepts applied 4.95
PBL showed higher levels than LD
Significant difference p=.001
Eighteen students defined more concepts when learning via PBL-team, only 1 student defined
more concepts when learning via LD, and 2 students defined the same number of concepts in each.
Modal explanation level PBL showed higher levels than LD: Significant difference p=.001
Highest Level of explanation is significant p= .01
The majority of students had modal levels of 2 or
above when learning via PBL and 0 when learning via LD. Most students also reached higher levels of explanation when learning via PBL . 12 students achieved a higher level of explanation when learning via PBL-team, 3 students achieved a higher level of explanation when learning through LD, and 6 students achieved the same level of explanation in each.
LD -Memory
mean # of concepts applied 2.67
Application Assessment
Difference in Application of Concepts
Difference in Explanation Levels
PBL Team-Groupthink
mean # of concepts defined 35.80%
PBL showed higher levels than LD
Significant difference p=.032
Sixteen students applied more concepts
when learning via PBL-team, 5 students applied more concepts when learning via LD, and one student applied the same number of concepts in
each.
PBL showed higher levels than LD
Significant difference p=.003
Fourteen students achieved a higher level ofexplanation when learning via PBL-team, 3 students achieved a higher level of explanation when learning through LD, and 5 students achieved the same level of explanation in each.
LD-Memory
mean # of concepts defined 20.78%
"Analyses of individual patterns in Class 2 thus also confirm the results of the between-subjects comparisons. The majority of students recalled, comprehended,
and applied concepts better when learning took place via PBL compared to LD (Wirklala and Kuhn, 2011, p. 1179).
Class 3:
Comprehension Assessment
Difference in Defining Concepts
Difference in Explanation Levels
PBL Indiv. - Memory
mean # of concepts applied 4.32
PBL showed higher levels than LD
Significant difference p=.035
Ten students defined more concepts when learning via PBL-individual, only 2 students defined more concepts when learning through LD, and 7 students defined the
same number of concepts in each.
Modal explanation level PBL showed higher levels than LD: Significant difference p=.024
Highest Level of explanation is insignificant p= .267
Nine students achieved a higher level of explanation when learning via PBL-individual, 4 students achieved a higher level of explanation when learning via LD, and 6 students achieved the same level of explanation in each.
LD- Groupthink
mean # of concepts applied 3.42
Application Assessment
Difference in Application of Concepts
Difference in Explanation Levels
PBL Indiv. - Memory
mean # of concepts defined 34.59%
PBL showed higher levels than LD
Significant difference p=.001
Fifteen students applied more concepts when learning via PBL-individual, only 1 student applied more
concepts when learning via LD, and 3 students applied the same number of concepts in both topics.
Highest Level of explanation is significant p= .026
Eleven students achieved a higher level of explanation when learning via PBL-individual, only 2 students achieved a higher level of explanation when learning via LD, and 6
students achieved the same level of explanation in each
LD- Groupthink
mean # of concepts defined 12.5%
"Analyses of individual patterns in Class 3 thus also support betweensubject analyses in indicating that the majority of students recalled, comprehended,
and applied concepts better when learning took place via PBL compared " to LD (Wirklala and Kuhn, 2011, p. 1180).
2 "The choice of two topics unrelated to one another minimizes the possibility of order effects. Comparison of Class 2 (problem-based learning [PBL], then lecture/discussion [LD]) and Class 3 (LD, then PBL) addresses the possibility of an order effect between PBL and LD. The order of the two PBL conditions was not varied, but performance did not differ across conditions. The possibility remains that a reverse order (team first) could have produced different results. However, it is the order we used (individual, then team) that theory would predict most likely to manifest a PBL-team superiority, and this did not appear" (Wirklala and Kuhn, 2011, p. 1185).

Potential Bias: In an effort to overcome the weaknesses that are inherent in the previous studies concerning the effectiveness of PBL, including (a) nonrandom assignment of students to PBL and traditional instruction, (b) variations in time and exposure to treatment across frequently lengthy interventions, and (c) varying instructors and conditions, undertaking to study the effectiveness of PBL in a natural instructional but highly controlled experimental educational setting. To this end, the authors emphasize that, "We also follow their approach in acknowledging the varying practices that have been characterized as falling under the heading and undertaking to instantiate PBL in its 'best practice' form, namely, as its advocates claim it to be most effective" (p. 1159).

What are the limitations and strengths of the design?
Limitations
Researchers claim to not have the information about the effectiveness of PBL versus traditional LD delivery modalities but have innate bias against LD and the lack of an outside control group. Authors emphasize that, "We also follow their [prior researchers] approach in acknowledging the varying practices that have been characterized as falling under the heading and undertaking to instantiate PBL in its 'best practice' form, namely, as its advocates claim it to be most effective" (p. 1159).

"Groupthink" might actually constitute consensus building - competitive types of learning can actually hinder girls learning and development.
Other interpretations of the results are possible, as the authors themselves admit when they point out:
      "Still, we cannot rule out the possibility that the instructor consciously or unconsciously delivered a superior product in one case due to subtle    
     differences that were extraneous to the definitions of each practice. In the case of PBL practice, this potential influence is diluted by the presence of
      multiple coaches and the indirect role they play in instruction (ip.1181).
The instructor who delivers a superior instruction due to extraneous "subtle differences" may have delivered an instructionally better product due to the fact they that they were simply better prepared or more knowledgeable.  The multiple coaches in PBL will then dilute this advantage.  The unmentioned result may be that the LD product in such a case is better by default (?)

Another critical issue that Wirkala and Kuhn do identify as an area in their study that needs more research are the number of coaches used in PBL. This may be a critical area of change that needs to be modified in order to make the modality work more effectively by itself or in coordination with LD. The study's PBL groups had two extra coaches on staff who supplemented the primary teacher (unfortunately, employing extra coaches is not always possible in a school setting...budget constraints).
Male/female physiological differences not considered (gender equality) nor were individual learning styles.

Strengths
Types of students selected appear to be diverse and validÖparticular classes of people have not been excluded from the report.
In an effort to overcome the weaknesses that are inherent in the previous studies concerning the effectiveness of PBL, including (a) nonrandom assignment of students to PBL and traditional instruction, (b) variations in time and exposure to treatment across frequently lengthy interventions, and (c) varying instructors and conditions, undertaking to study the effectiveness of PBL in a natural instructional but highly controlled experimental educational setting.

Individual learning styles
Individual learning styles were not taken into consideration at all and could be considered a weakness - although Wirkala & Kuhn acknowledge that the "present work focuses on outcomes rather than process" (2011, p 1184) . With regard to note taking, it was prohibited in all groups. The rationale provided was: "Students of this age are not effective note takers. Moreover, doing so could have hurt LD students by taking their attention away from the teacher and the class discussion" (p. 1168).

Were the site and participants selection justified?
Yes, as they followed the studies of Capon and Kuhn (2004) and Pease and Kuhn (2011) in undertaking to study the effectiveness of PBL in a natural instructional setting (in this case middle school) yet under tight experimental control. Prior studies included adults and omitted K-12 education, the researchers investigated the question of whether PBL benefits justify its demands -thus setting was middle school in this particular study.

Ethics: The fundamental ethical issue raised by this study concerned the author's assumption that problem-based learning represents an across-the-board, end-all "best practices" pedagogical solution rather than a valuable alternative that must be used in those situations where it works best.

Until now, the majority of study participants have been adults in university settings.  There are almost always ethical issues raised when using human subjects, especially children, such as the middle school participants in this study.  While this complicates the ethical climate, it does not in and of itself impact upon the integrity of the report, however, Wirkala and Kuhn do not divulge the IRB rigor inherent in research involving human subjects.  

Future Research/Concerns: Across the country, educators are increasingly incorporating computer-based applications in the classroom, as well as various one-to-one laptop initiatives, that are changing the way educational services are delivered. Problem-based learning can certainly be applied to these new learning resources (i.e. simulated virtual environments), but because every classroom is unique, a better use of scarce resources might be to investigate how these current trends will affect PBL rather than continuing to study what parts of the process work best.

Considering the small scale of the study (small number of groups involved), is it justified to state that the social component is not significant?
Researchers stated their conclusion was general Ö collaborative educational methods yield no benefit and "social interaction by itself is not a ''magic bullet'' that benefits students" (p. 1183).  They countered it by stating that until "microgenetic observations of collaborative learning are carried out, we are limited in what we can conclude about its nature."  From the limited information measured on post-tests, the social component does not appear to be analyzed or observed in depth.

The authors addressed the strengths of the short period, small scale of the study. What about the limitations?
The design of the present study could not readily be replicated with instruction of semester-long duration
without introducing other variables. Wirkala and Kuhn acknowledge the best strategy would be to "very gradually increase length of instruction (as well as to vary other factors such as subject matter and grade level) to ascertain how broadly the present findings extend" (2011, p 1181-1182).

"A further factor critical to internal validity is the fact that this intervention was shorter than those typical in the PBL literature, many of which span a full year, and was targeted to very specific and precisely defined learning objectives. With longer interventions and broader, more extensive learning objectives, it is much more difficult to measure outcomes precisely, as well as to thoroughly control for extraneous variables confounding treatment, such as variations in time and exposure to treatment, differing instructors with teaching styles, and varying student participation" (Wirkala and Kuhn, 2011,p. 1181).

Terms:

Problem-based learning (PBL) "is a teaching and learning method in which students engage a problem without preparatory study and with knowledge
insufficient to solve the problem, requiring that they extend existing knowledge and understanding and apply this enhanced understanding to generating a solution" (Wirkala, and Kuhn, 2011, p.1157).

Random Selection: Random selection is a probability-sampling technique (composed of simple random sampling, stratified random sampling, and
cluster sampling). It is designed to select a sample of subjects from a population in such a way that the data from the sample can be extrapolated to the general population (Kaufhold, 2007).

Random Assignment: Random assignment is used to assign subjects to different groups after all the subjects have been selected. Every subject has an equal chance of being placed in any of the treatment conditions (Kaufhold, 2007).

Effect Validity: Nonrandom selection affects the external validity of an experiment -- that is, how well the findings can be applied to the population and other situations. Random assignment, however, is critical to internal validity. If subjects are not assigned at random, confounding may occur (Kaufhold, 2007)..

Nonparametric tests: are often used in place of their parametric counterparts when certain assumptions about the underlying population are questionable. For example, when comparing two independent samples, the Wilcoxon Mann-Whitney test does not assume that the difference between the samples is normally distributed whereas its parametric counterpart, the two sample t-test does. Nonparametric tests may be, and often are, more powerful in detecting population differences when certain assumptions are not satisfied (Kaufhold, 2007).

Parametric test: A parameter is a value, usually unknown (and which therefore has to be estimated), used to represent a certain population characteristic. For example, the population mean is a parameter that is often used to indicate the average value of a quantity. Within a population, a parameter is a fixed value which does not vary. Each sample drawn from the population has its own value of any statistic that is used to estimate this parameter. For example, the mean of the data in a sample is used to give information about the overall mean in the population from which that sample was drawn (Kaufhold, 2007).

Links found within this site may open a new browser window and take you outside Incubator Island to another website, the contents of which are maintained by third parties over whom Incubator Island  has no control. We provide links to these external sites for your convenience and awareness. We accept no responsibility for the content of linked sites. Upon request of the content source, we will remove links.

quanHome