Walking the Rake: 10 Critical Test Design Mistakes to Test Knowledge
Before enrolling in a new Machine Learning Advanced course, we test prospective students to determine their level of readiness and understand what exactly they need to offer to prepare for the course. But a dilemma arises: on the one hand, we have to test our knowledge of Data Science, on the other hand, we cannot arrange a full 4-hour exam.
To solve this problem, we deployed a TestDev headquarters right in the Data Science course development team (and it looks like this is just the beginning). We present you with a list of 10 “rakes” that are stepped on when developing tests to assess knowledge. Hopefully, the online learning world will be a little better after that.
Rake 1: Not clearly define the purpose of testing
In order to correctly define goals and draw up a test that will take them into account, at the planning stage we must answer ourselves a few questions:
- What do we actually check?
- What environment will the testing take place in and what mechanics are used? What are the limitations in this environment. The same paragraph will allow you to understand the technical requirements for the device on which the testing will be carried out, and also for the content (if the test is passed from phones, pictures should be readable even on a small screen, it should be possible to enlarge them, etc.).
- How long will the testing take? You need to think about the conditions under which the user will take the test. Could a situation arise that he will need to interrupt the testing process and then continue again?
- Will there be any feedback? How do we form and deliver it? What do you need to get? Is there a time lag between test execution and feedback?
In our case, having answered these questions, we have defined the following list of goals for the test:
- The test should show whether future students are ready to take the course, whether they have enough knowledge and skills.
- The test should give us material for feedback, indicate the topic in which students made a mistake, so that they can improve their knowledge. How to compose it – we will tell further.
Rake 2: Do not draw up technical assignment for an expert – test writer
For the preparation of test items, it is very important to involve an expert in the field in which the knowledge is being tested. And for an expert, in turn, you need a competent technical specification (description), which includes test topics, verifiable knowledge / skills and their level.
An expert will not do such a TK for himself, because his job is to come up with tasks, and not a test structure. Moreover, so far few people develop tests professionally, even in the teaching process. This is taught in a separate specialty – psychometrics.
If you want to quickly get acquainted with psychometrics, then in Russia there is summer school for everyone interested. For a more in-depth study at the Institute of Education there is magistracy and graduate school.
When preparing the technical assignment, we collect a detailed description of the test for the expert (or better, together with him): the topics of the tasks, the type of tasks, their number.
How to choose the type of tasks: having decided on the topics, we decide which tasks can be tested best of all? Classic options: a task with an open answer, a task with multiple or single choice, matches, etc. (don’t forget about the technical limitations of the testing environment!). After defining and prescribing the type of tasks, we have a ready-made TOR for an expert. You can call it a test specification.
Rake 3: Not Involving an Expert in Test Development
When immersing an expert in the development of a test, it is very important not only to indicate to him the “scope of work”, but to involve him in the development procedure itself.
How to make working with an expert as efficient as possible:
- Set it up in advance and spend some time talking about the science of test development, psychometrics.
- Focus the expert’s attention on creating a valid and reliable assessment tool, not a list of questions.
- Explain that a preparatory stage is included in his work, not only the development of the tasks themselves.
Some experts (by their nature) may perceive this as a test of their own work, and we explain to them that even when creating great assignments, they simply may not be suitable for specific testing goals.
To make the process go quickly, we prepare with an expert a table of topics coverage (knowledge and skills), which is part of the test specification. It is this table that allows us to accurately work out the questions, to determine what we will measure. In each case, it can be composed a little differently. Our task: to check how well a person is oriented in the knowledge and skills of the previous, basic courses, in order to understand how ready he is for learning on the new course.
Rake 4: Thinking The Expert “Knows Best”
Knows the subject better. But it does not always explain clearly. It is very important to check the wording of the assignments. Write clear instructions, such as “Pick 1 Correct Option”. In 90%, experts prepare questions in a way that they themselves understand. And that’s okay. But before giving the test to those who will take it, you need to check everything and comb your hair so that the people who are taking the test understand exactly what is required of them and do not make mistakes just because they could misinterpret the text of the assignment.
To avoid ambiguous assignments, we run cognitive labs. We ask people from Central Asia to take the test, saying out loud what they think and recording it in detail. In “cognitive laboratories” you can “catch” incomprehensible questions, poor wording, get the first feedback on the test.
Rake 5: Ignore Test Run Time
sarcasm mode: on
Of course, our test is the best, everyone dreams of passing it! Yes, all 4 hours.
sarcasm mode: off
When you have a list of everything that you can check, the main thing is not to do it (at first glance it sounds strange, doesn’t it?). It is necessary to cut mercilessly, highlighting key knowledge and skills with an expert (yes, a number of skills can also be tested in the test). We look at the type of tasks and estimate the target execution time: if it is still more than reasonable limits, we cut it!
To reduce the volume, you can also try (carefully) test two skills in one task. In this case, it is difficult to understand why the person was wrong, but if done correctly, both skills can be taken into account. It is important to make sure these 2 skills fit into the same area of expertise.
Rake 6: Don’t think over a scoring system
Often, when compiling assessment tests, they use the classic grading system in points, for example, 1 point for easy tasks and 2 points for difficult tasks. But it is not universal. It’s just that the sum of points based on the results of testing will tell us little: we do not know for which tasks these points were received and can only determine the number of correct tasks. We need an accurate understanding of what skills test takers are demonstrating. In addition, we want to give them feedback on what topics need to be finalized.
After all, we are doing a test that will divide people into ready and not ready for the program, we will advise some to prepare for the course on free training. It is important for us that this group includes only those who really need it and who are ready for it.
What we are doing in our situation: we determine within the working group of test developers which groups of people need to be selected (for example, ready for training, partially ready) and we form a table of characteristics of such groups, indicating what skills and knowledge will be relevant for the group ready for training. This way you can form the “difficulty” of tasks for such tests.
Rake 7: Only evaluate results automatically
Of course, the assessment should be as objective as possible, so some of the students’ materials are assessed automatically, “by keys” – comparing with the correct answers. Even if there is no dedicated testing system, there are tons of free solutions. And if you have an understanding of the principles of scripting, then you can do whatever you want with Google forms and results in tables. If some of the tasks are checked by experts, then we need to think over the delivery of answers to experts, without information about those who hand over them. And think about how to integrate the results of the expert review into the final assessment.
Initially, we wanted to make several open-ended tasks with a code, when experts evaluate solutions according to pre-formed criteria, and even prepared a system that exports individual answers of test participants to a special table for experts, and then imports the results into a table with the estimation calculation. But after discussion with representatives of the target audience, a product manager and a pedagogical designer, we felt that conducting a technical interview with an instant feedback from an expert and discussing the code, as well as individual questions, would be much more effective and useful for the participants themselves.
Now the expert verifies the passing of the test, clarifying some questions. For this we have prepared a guide to questions, assessment criteria for technical interviews. Before the technical interview, the examiner receives a test taker’s answer sheet to select the questions to ask.
Rake 8: Don’t Explain Test Results
Providing feedback to participants is a separate issue. We need to not only inform about the test score, but also give an understanding of the test results.
It can be:
- Tasks in which the participant made a mistake, and which he completed correctly.
- Topics in which the participant made mistakes.
- His rating among those taking the exam.
- Description of the level of the participant, in accordance with, for example, the description of the level of specialists (based on the description of vacancies).
During the pilot launch of our test, for those who wanted to enter the program, along with the results, we showed a list of topics that need to be improved. But this is certainly not ideal, we will improve and make the feedback better.
Rake 9: Don’t discuss the test with the developers
Perhaps the sharpest rake, which is especially unpleasant to step on, is to send the developers a test, a description and a counting scale in the “as is” state.
What exactly needs to be discussed:
- The appearance of the questions, the structure, the position of the graphics, what the choice of the correct answer looks like.
- How the score is calculated (if needed), are there any additional conditions.
- How the feedback is formed, where to get the texts, are there any additionally generated automatically blocks.
- What additional information you need to collect and at what point (the same contacts).
To avoid misunderstandings, we ask our developers to code 2 or 3 different questions so that you can see how they look before programming the test itself.
Rake 10: Without testing, pour straight into production
3 times, guys, different people should check the test 3 times, and it is better – 3 times each. This truth is obtained by blood, sweat and pixels by lines of code.
Our test checks the following trio:
- Product – checks the performance test, appearance, mechanics.
- Test developer – checks the text of the tasks, their order, the form of working with the test, the types of tasks, correct answers, readability and normal viewing of graphics.
- Task author (expert) – checks the fidelity test from an expert position.
An example from practice: only on the third time of the run, the author of the tasks saw that 1 task remained in the old version of the wording. All the previous ones also actively ruled. But when the test was coded, it looked different than originally imagined. Most likely, something will have to be edited. This must be taken into account.
Carefully bypassing all these “rakes”, we created a special bot in Telegram, to test the knowledge of applicants. Anyone can test it while we are preparing the next material, in which we will tell what happened inside the bot, and what it all transformed into later.
You can get a demanded profession from scratch or Level Up in skills and salary by taking online SkillFactory courses:
- Machine Learning Course (12 weeks)
- Advanced Course “Machine Learning Pro + Deep Learning” (20 weeks)
- Course “Mathematics and Machine Learning for Data Science” (20 weeks)
- Teaching the Data Science profession from scratch (12 months)More courses
- Teaching the Data Science profession from scratch (12 months)
- Online bootcamp for Data Science (14 weeks)
- Data Analytics Online Bootcamp (5 weeks)
- Analytics profession with any starting level (18 months)
- Data Analytics Course (6 months)
- DevOps course (12 months)
- Profession Web developer (8 months)
- Python for Web Development Course (9 months)
- The profession of iOS developer from scratch (12 months)
- The profession of Android developer from scratch (18 months)
- The profession of Java developer from scratch (18 months)
- UX designer profession from scratch (9 months)
- Web designer profession (7 months)