End-of-course sophomore interpreting tests in huflit: Reliability and washback effect

pdf 7 trang Gia Huy 19/05/2022 1520
Bạn đang xem tài liệu "End-of-course sophomore interpreting tests in huflit: Reliability and washback effect", để tải tài liệu gốc về máy bạn click vào nút DOWNLOAD ở trên

Tài liệu đính kèm:

  • pdfend_of_course_sophomore_interpreting_tests_in_huflit_reliabi.pdf

Nội dung text: End-of-course sophomore interpreting tests in huflit: Reliability and washback effect

  1. HUFLIT International Conference On Ensuring A High-Quality Human Resource In The Modern Age - Oct 16, 2020 doi: 10.15625/vap.2020.00104 END-OF-COURSE SOPHOMORE INTERPRETING TESTS IN HUFLIT: RELIABILITY AND WASHBACK EFFECT Nguyen Duc Chau Ho Chi Minh City University of Foreign Languages - Information Technology chauducnguyen06@yahoo.com ABSTRACT: Liaison Interpretation (Interpreting) and Consecutive Interpretation have been included in HUFLIT’s curriculum, especially for sophomores, since the early days; however, with their distinctive features, they require some challenges such as experienced lecturers, proper teaching methodology, reliable and valid materials, effective facilities for simulation , in which, selecting proper type of interpreting tests for the level seems the most problematic. Liaison Interpretation (Interpreting) focuses on training learners of the first level to deal with daily contacts or dialogues and Consecutive Interpretation centers on teaching learners of the second level to work out with more complicated mode. Most of the lecturers in HUFLIT (Ho Chi Minh City University of Foreign Languages and Information Technology) have almost got a consensus on the application of Consecutive Interpretation for the testing mode with varied excuses. The paper aims to present a viewpoint on an experimental test of Consecutive Interpretation in the “Two-scorer OPI” form that can be practically applied in HUFLIT in particular, and in other Vietnam’s institutions in general, targeting at more efficiency, better reliability, validity, and practicality, which have been seen a crucial point for a positive and motivated didactics, and a specific teaching effectiveness in the 4.0 technological age. Keywords: 4.0 technology; Liaison Interpreting; Consecutive Interpreting; reliability, validity, practicality. I. OVERVIEW A. Background to the study Training students in the job market demand, a requirement from Vietnam’s Ministry of Education and Training, has long been a hard-to-hit challenge to all departments of the institution with an aim to improve its curriculum in the world integration context. One of the HUFLIT’s missions is to produce graduates, well equipped with not only good ability in their expertise but also real competence in computer and foreign language(s), to help them gain a compatible job in the competitive job market. The renovation in all aspects has been widely worked out in which improving interpreting tests is considered a top priority (Nguyen, 2018). B. Statement of the problem Different types of interpreting tests have been applied all colleges and English centers in VN in general, and in the Department of Foreign Languages in Chi Minh University of Foreign Languages and Information Technology (HUFLIT), in particular, for selecting the best. According to the curriculum, all students of English, who account for over 90 % of the total, take part in the most challenging test, which has the crucial washback effect on teaching and learning. Description Interpretation requires students to render messages (codes) from L1 to L2 and oppositely in the oral form. Students are expected to memorize the original message after it is uttered by the speaker and then simultaneously transfer it to the target (or the source) language with understandable pronunciation, intonation, acceptable fluency, simultaneity, proper use of structures and vocabulary, confidence, memory, and flexibility. It is quite different from the subject of written translation in which students’ writing skills are required in transferring messages from L1 to L2 and oppositely (Nguyen, 2018). The type discussed in the chapter is the two-scorer OPI test that experimentally replaces the traditional repetition test. This kind of test was first used for important speaking tests and then, when applied for interpretation, The second part of the test is Vietnamese-English interpretation in which students are required to listen to the Vietnamese teacher’s text and then try to produce an English equivalence. Students’ products are assessed in terms of intonation, pronunciation, fluency, simultaneity, use of vocabulary and structures. Test takers are individually invited to get into the testing room, confronting 2 examiners, exactly like the face-to-face OPI test. After picking up his/her topic number at random, the testee gets ready to listen to 10 messages, 5 in English and another 5 in Vietnamese; then, tries to interpret into the other language. The score given by the 2 examiners is basically based on pronunciation, intonation, fluency, simultaneity, and accuracy (proper use of vocabulary and structures).
  2. Nguyen Duc Chau 339 Table 1. Rubric (Source: HUFLIT) BM01.QT02/ĐNT-ĐT HO CHI MINH CITY UNIVERSITY OF FOREIGN LANGUAGES - INFORMATION TECHNOLOGY DEPARTMENT OF FOREIGN LANGUAGES FINAL EXAMINATION Subject: INTERPRETATION 1 Semester: I Academic Year: 2019-2020 1. SCORING CRITERIA 1. Pronunciation & intonation 20 % 2. Fluency 10 % 3. Accuracy (proper use of VOC & structures) 70 % TOTAL 100 % The test is designed for sophomores, who for the first time encounter this test form, and the length of each message is approximately 15-20 words. The number of words has been long experimented to identify the teaching methodology, the test takers’ memory, and competence that it can be the foundation for the test design. All messages are recorded with appropriate pauses after each (exactly 18 seconds), which has also been tested for years, to provide equality and reliability to all test takers. C. Hypothesis Test takers’ real competence can be more accurately assessed and improved if the end-of-course Interpretation 1 test tasks in HUFLIT are upgraded to produce the washback effect on teaching and learning (Rivers, 1987). D. Research questions The research question in this minor study circles around the view of Interpretation lecturers on the Interpretation 1 test that all of them experienced: -Is the Interpretation 1 SOPI test in HUFLIT practicable and reliable? E. Significance of the study The drawbacks of the traditional end-of-course Interpretation 1 test and the currently applied Interpretation 1 SOPI test in HUFLIT after modifications can help upgrade the testing system, meet the proposed learning outcomes in improving students’ motivation and real competence in English, a requirement in the job market, and meet the innovation demand from the institution. F. The scope and delimitation of the study The study can be popularly employed in VN’s other institutions including those of still apply various forms of the interpretation (interpreting) test. However, an apparent setback of my proposal is that not all universities are eager for the change that may be costly, time and effort consuming in the status of lacking adequate professionals. This pilot study with a small number of participants is just worked out in HUFLIT; therefore, its conclusions maybe not true when it is repeated on a larger scale or in other institutions. II. LITERATURE REVIEW Saad et al. (1999) produce a so-called standard set of testing procedures, in which the vital testing concepts were clarified. “Inter-rater reliability indicates how consistent test scores are likely to be if the test is scored by two or more raters. On some tests, raters evaluate responses to questions and determine the score. Differences in judgments among raters are likely to produce variations in test scores. A high inter-rater reliability coefficient indicates that the judgment process is stable and the resulting scores are reliable. Inter-rater reliability coefficients are typically lower than other types of reliability estimates. However, it is possible to obtain higher levels of inter-rater reliabilities if raters are appropriately trained” (Saad et al., 1999: 24). Their opinions apparently contribute a necessary modification to HUFLIT’s SOPI interpreting tests Another point in their concepts is Internal consistency reliability, which “ indicates the extent to which items on a test measure the same thing. A high internal consistency reliability coefficient for a test indicates that the items on the test are very similar to each other in content (homogeneous). It is important to note that the length of a test can affect
  3. 340 END-OF-COURSE SOPHOMORE INTERPRETING TESTS IN HUFLIT: RELIABILITY AND WASHBACK EFFECT internal consistency reliability . Tests that measure multiple characteristics are usually divided into distinct components. Manuals for such tests typically report a separate internal consistency reliability coefficient for each component in addition to one for the whole test. Test manuals and reviews report several kinds of internal consistency reliability estimates. Each type of estimate is appropriate under certain circumstances. The test manual should explain why a particular estimate is reported” (Saad et al., 1999: 24). Saad et al. (1999) also explain Test-retest reliability, which “ indicates the repeatability of test scores with the passage of time. This estimate also reflects the stability of the characteristic or construct being measured by the test. Some constructs are more stable than others. For example, an individual's reading ability is more stable over a particular period of time than that individual's anxiety level. Therefore, you would expect a higher test-retest reliability coefficient on a reading test than you would on a test that measures anxiety “(Saad et al., 1999: 24). They define Alternate or parallel form reliability as “ how consistent test scores are likely to be if a person takes two or more forms of a test. A high parallel form reliability coefficient indicates that the different forms of the test are very similar which means that it makes virtually no difference which version of the test a person takes. On the other hand, a low parallel form reliability coefficient suggests that the different forms are probably not comparable; they may be measuring different things and therefore cannot be used interchangeably” (Saad et al., 1999: 24). Oral tests, especially interpretation tests, obviously underlying specific problems relative to its traits, require some diligent readiness before undertaking that international test researchers and developers referred to as objectivity, authenticity, validity, reliability, and practicality or feasibility (Nguyen, 2013). Bendazzoli, C. & Sandrelli, A. (2011) stated that: “There is still a considerable gap between CTS (corpus-based studies on written translation) has been more advanced than the development of CIS (corpus-based interpreting studies), both in terms of corpus size and availability and in terms of number of studies and pedagogical applications due to the greater challenges and obstacles involved in setting up interpreting corpora, i.e. electronic corpora of transcribed speech events, which include an original (source language, hereafter SL) speech and its parallel (target language, hereafter TL) version into one or more foreign languages (p. 1) (quoted from Nguyen, 2019). Liskin-Gasparro (1987) said about the differences between SOPI and OPI The simulated oral proficiency interview (SOPI) is a type of semi-direct speaking test that models, as closely as is practical, the format of the oral proficiency interview (OPI). The OPI is used by government agencies belonging to the Interagency Language Roundtable (ILR) and by the American Council on the Teaching of Foreign Languages (ACTFL) to assess general speaking proficiency in a second language. ERIC Custom Transformations Team (1989) states that “ the SOPI correlates so highly with the OPI that it seems safe to say that the tests measure the same abilities. Also, a comparison of the advantages of each suggests that the SOPI offers certain practical and psychometric advantages over the OPI. Thus, it may be useful to consider the circumstances that should motivate the selection of one format or the other”. ERIC Custom Transformations Team compares the OPI versus the SOPI and concludes that the latter would seem to have certain advantages. “The OPI must be administered by a trained interviewer, whereas any teacher, aide, or language lab technician can administer the SOPI. This may be especially useful in locations where a trained interviewer is not available. The SOPI can be simultaneously administered to a group of examinees by a single administrator, whereas the OPI must be individually administered. Thus, the SOPI may be preferable when many examinees need to be tested within a short span of time” (ERIC Custom Transformations Team, 1989). III. METHODS OF STUDY AND SOURCES OF DATA A. Research Purpose The end-of-course interpreting 1 test has been controversial recently for some conflicts of opinions on the testing mode. The backwash effect (Rivers, 1987) has been viewed as gigantic problems for both test designers and lecturers. The proper choice of testing form decides if the learning motivation in the challenging subject can be maintained. The writer conducts the study with the aims to sort out relevant problems of the Interpretation 1 SOPI test for an appropriate adjustment to enhance students’ learning motivation, to help them gain and upgrade their real competence for fierce competitions in the job market. B. Population of the Study The instructors in the Faculty of Methodology and Translation were invited in the survey and grouped on their working experience and gender. The majority of the respondents fall into the experienced instructors, who have been dealing with interpreting tests of all kinds and their contributions in various aspects in teaching and testing have been recognized. The number of respondents is 14, academically not satisfactory for reliable research but it represents most of the faculty members (22 in all). Some newcomers were not invited for their limited knowledge of the field. This can be seen a pilot study in HUFLIT for further researches.
  4. Nguyen Duc Chau 341 C. Instrument Used The humble scope of the article, the only tool in the study is data collection from attitude questionnaires distributed January 8 and collected January 15. The questionnaires were sent to participants via email to assure the prompt reception and 10 were returned before the deadline for the convenience of data collection. D. Statistical Treatment Statistical data is treated with SPSS to show the percentage of agreement/disagreement and the individual solutions/proposals/personal views from the participants. IV. PRESENTATION, DATA ANALYSIS, AND DISCUSSION A. Presentation and Data analysis The survey circles around reliability, validity, and practicality of the Interpretation 1 SOPI test for English-majored sophomores in the Department of Foreign Languages, HUFLIT The test is designed with 10 components: 5 English messages and 5 Vietnamese messages. Its difficulty level is set up by the number of words (15-18 for each component) and the 18-second pause between the components. The pause is for the test takers’ response. Examinees are supposed to transfer them to the other language with such criteria as pronunciation, intonation, fluency, and accuracy. Question 3. The first two questions focus on gender and Table 1. experience. The respondents were divided into 2 groups: 6 females and 8 males. Valid Cumulative Frequency Percent Percent Percent Just one of them was seen as a junior lecturer with his Valid C 2 14.3 14.3 14.3 teaching experience falling into under 10 years. 13 D 10 71.4 71.4 85.7 others were very experienced with their professional E 2 14.3 14.3 100.0 duration over 10 years (some got over 30 years). They Total 14 100.0 100.0 all have been working with interpreting tests for a long time, enough to be aware of the advantages and Question 4 disadvantages of each one. Table 2. The third question is if the “Two-Scorer SOPI test” is Valid Cumulative the right choice for HUFLIT at the moment. 1 Frequency Percent Percent Percent respondent disagreed (7.1 %). 10 affirmed it (71.4 %) Valid B 1 7.1 7.1 7.1 just 3 said no opinion (Table 1). C 3 21.4 21.4 28.6 Question 4 can be seen as very important in the D 10 71.4 71.4 100.0 survey, relating to Internal consistency reliability, Total 14 100.0 100.0 asking if the test is fair to different groups of students. One of the participants said no; 3 had no choices, and Question 5 Table 3. 10 agreed. No consensus for the problem; however, Valid Cumulative the pro outweighed the con (Table 2). Frequency Percent Percent Percent Question 5 is a technically detailed question about the Valid A 1 7.1 7.1 7.1 construct validity of the test. The number of words in a B 1 7.1 7.1 14.3 test component represents the level difficulty, which is C 1 7.1 7.1 21.4 proportional to the students’ level. As the test D 11 78.6 78.6 100.0 regulated, 15-18 words for a component is valid Total 14 100.0 100.0 enough to test a sophomore. The result shows 14.2 % of the population disagreed while 78.6 % agreed. Only Question 6 one showed no idea (Table 3). Table 4. Question 6 is also about test validity. One participant Valid Cumulative disagreed, requesting fewer components; 2 had no Frequency Percent Percent Percent idea; 10 (71.4 %) accepted the current setting (Table 4). Valid B 1 7.1 7.1 7.1 Question 7 is about the Inter-rater reliability of the C 3 21.4 21.4 28.6 test, asking whether the “Two-scorer SOPI test in D 10 71.4 71.4 100.0 HUFLIT is more objective than the traditional test. Total 14 100.0 100.0 100 % showed agreement. (Table 5).
  5. 342 END-OF-COURSE SOPHOMORE INTERPRETING TESTS IN HUFLIT: RELIABILITY AND WASHBACK EFFECT Similar to Question 6, Question 8 is about validity. As Question 7 described, there is an 18-second pause after each Table 5. component for the test taker’s response. It is seen valid Valid Cumulative for the level in terms of difficulty and simultaneity. Frequency Percent Percent Percent 100 % agreed (Table 6). Valid D 9 64.3 64.3 64.3 E 5 35.7 35.7 100.0 Question 9 is that the current number of test-takers, < 20, is appropriate that the raters feel balanced and Total 14 100.0 100.0 concentrated in scoring. 100 % said yes, in which 14.3 % entirely agreed (Table 7). Question 8 Table 6. Question 10 focuses on the relevance of testing and Valid Cumulative teaching, surveying if teaching topics are tally with Frequency Percent Percent Percent those of testing. 100 % agreed (Table 8). Valid D 14 100.0 100.0 100.0 Question 9 Table 7. Similar to Question 10, Question 11 centers on whether the vocabulary in the test is equivalent to that Valid Cumulative in the teaching materials (new words are always given) Frequency Percent Percent Percent 100 % agreed (Table 9). Valid D 12 85.7 85.7 85.7 E 2 14.3 14.3 100.0 Total 14 100.0 100.0 Question 10 Question 12 relates to modified opinions comprising: Table 8. longer duration for Vietnamese-English components, more structural exercises in teaching textbooks, better Valid Cumulative standardization in the tests. Frequency Percent Percent Percent Valid D 10 71.4 71.4 71.4 E 4 28.6 28.6 100.0 Total 14 100.0 100.0 Question 11 Table 9. Valid Cumulative Frequency Percent Percent Percent Valid B 1 7.1 7.1 7.1 C 4 28.6 28.6 35.7 D 6 42.9 42.9 78.6 E 3 21.4 21.4 100.0 Total 14 100.0 100.0 B. Discussion The following discussion is to elicit the test reliability, validity, practicality and, backwash (also known as washback) effect that its shortcomings can be identified as posed by the research question. 1. Test Description a) Procedures -Test takers are invited individually into the testing room according to the ready-made list issued by the management. - Due to HUFLIT’s humble facilities, the recording test in the lab proposed by Nguyen (2013) cannot be applied and the two-scorer SOPI form is employed instead.
  6. Nguyen Duc Chau 343 b) The test - As discussed in 4.1, test takers have to produce their oral translation in the pause duration of 18 seconds after listening to the message. - After each message, their performance is independently rated by two examiners, based on their pronunciation, intonation, fluency, and accuracy. Each message has the highest score of 1; the total of 10 messages (5 in English, 5 in Vietnamese) is 10. The rationale for 10 messages in the interpretation test 1 is from its practicality: the time mount allotted for a 20-student testing room. - Notes (vocabulary) are provided in case candidates get problems with new terms. - Examinees’ performance used to be recorded in case of complaints, but it is no more applied from no complaints and its archival complicacy. 2. Test participants All test participants are viewed as qualified candidates when they finish a designed number of credits and come from various majors: Office Management, Business English, Teaching Methodology, and Interpretation-Translation. c) Knowledge As mentioned in 4.2.2, to assure the validity and reliability of the test, all relevant topics are of general knowledge such as culture, society, arts, environment, education, medical science, technology, business . Through informal interviews with randomly chosen test takers after the test, the topics are not the problem at all, although vocabulary is to a few. d) Skill According to our hands-on observation, most of the examinees perform well in the knowledge of vocabulary, not many show negative expression in interpretive skills, ie. simultaneity, intonation, pronunciation, and accuracy thanks to proper teaching. Informal interviews with students also express their apparent learning motivation to the subject and it can be viewed a promising signal for the test designers, lecturers, and the management. V. CONCLUSION AND RECOMMENDATIONS A. Summary of findings 10 participants (71.4 %) affirmed that the “Two-Scorer SOPI test” is the right choice for HUFLIT at the moment. Relating to Internal consistency reliability, 10 agreed that the test is fair to different groups of students. About the construct validity of the test, 78.6 % agreed that 15-18 words for a component is valid enough to test a sophomore. Also regarding the construct validity, 10 (71.4 %) accepted the current setting of 10 components in a test. As the Inter-rater reliability goes, 100 % showed agreement that the “Two-scorer SOPI test in HUFLIT is more objective than the traditional test. Also regarding the construct validity, all accepted that an 18-second pause after each component for the test taker’s response is seen valid for the level in terms of difficulty and simultaneity. 100 % said that the current number of test-takers, <20, is appropriate for a testing room. All agreed the teaching topics are tally with those of testing. 100 % agreed the vocabulary in the test is equivalent to that in the teaching materials. Some modified opinions comprising: longer duration for Vietnamese-English components, more structural exercises in teaching textbooks, better standardization in the tests. B. Conclusion Interpretation 1 SOPI tests in HUFLIT though have been studied and scientifically adjusted for over 5 years still need some minor modifications depending on the practical situations such as the input levels are not the same year on year. Owing to conflict problems between textbooks, teaching materials, and testing could be well solved; the reliability of the test is technically upgraded, and the construct validity of the test has been updated, the positive backwash in teaching and learning (Rivers, 1987) has also been gained. Students learning motivation has been mounted through informal interviews as well as the increasingly higher number of classes.
  7. 344 END-OF-COURSE SOPHOMORE INTERPRETING TESTS IN HUFLIT: RELIABILITY AND WASHBACK EFFECT C. Recommendations 1. For test designers The proposal is that the test designers should further study the test takers’ competence through the requisite tests for the reliable data before working out with new interpreting SOPI tests, especially the Internal consistency reliability, which has been considered a big problem for all stake holders. 2. For students Students should do their best to practice not only in the classrooms, but also outside whenever and wherever they have a chance. They should spend more time on Free English speaking clubs and group work to motivate both learning and entertainment. Language proficiency improvement, in general, and interpreting skill, in particular, is a certain advantage for new job seekers. 3. For researchers This is just a pilot study with a humble number of respondents, unreliable for a real research. There is a need for further research on whether or not the new test mode of “Recording test” can be employed to replace the current one and on how to get closer to the international test criteria. 4. For instructors Teaching interpreting 1seems easier in comparison to that in higher courses. However, it requires lecturers to pay more attention to help their learners to get the learning outcomes set up by the course outline. Various and attractive teaching techniques should be studied and applied to gain better motivation. D. Limitations Besides the above-mentioned limitations of a small sample, the study may be seen rather distinctive case study in HUFLIT, not popularly to be applied in other English classrooms. It is also difficult to persuade busy test designers to make an update if they are unwilling to have a change, especially in testing, a controversial section in Vietnam’s modern universities. The findings in the pilot study may not gain the same results when being applied to a bigger population, with different teachers, or in different institutions. VI. REFERENCES [1] Bendazzoli, C. & Sandrelli, A. (2011) “Corpus-based Interpreting Studies: Early Work and Future Prospects”, Revista Tradumatica –Traduccio i Technologies de la Informacio i la Communicacio 07: l’aplicació del Corpus Linguistics a la Traduccio: ISSN: 1578-7559. [2] ERIC Custom Transformations Team (1989) Simulated Oral Proficiency Interviews, p 2-6, ERIC Digest. [3] Liskin-Gasparro, J. (1987) "Testing and teaching for oral proficiency”. Boston, MA: Heinle and Heinle Publishers. [4] Nguyen Duc Chau (2013) Interpretation Test in English in the Selected Universities in Ho Chi Minh City, Dissertation, Tarlac State University, The Philippines. [5] Nguyen Duc Chau (2016), Teaching Interpretation at universities in VN: Important Steps (presentation), Proceedings of Teaching Interpreting-Translation, HCMC National University publisher, 2016. [6] Rivers, W. (1987) Interactive Language Teaching, Cambridge: Cambridge University Press. [7] Saad, Syed et al. (1999) Testing and Assessment: An Employer's Guide to Good Practices, Educational Resources Information Center (ERIC).