Cambridge Assessment started testing Adobe PDF Extract API on the most standardised type of exam question: multiple choice. Each multiple-choice question starts with the context of the question, which may include several statements, figures or tables. This is followed by the question prompt and four possible responses.
Adobe PDF Extract API pulls all PDF information into JSON format while also providing associated png and csv data for images and table respectively. Cambridge Assessment created a post-processing pipeline that applies a set of logic rules to the JSON output to separate each question into the context, prompt and responses.
Once delivered into QTI XML format, a subject expert checks the question for errors and adds metadata to help categorise each question according to what it’s testing, difficulty level and other such information. Finally, the questions stored in a content bank for use by customer-facing products and services.
Before working with Adobe PDF Extract API, entering questions into the content bank was a tedious and manual process. Cambridge Assessment hired temporary workers to either retype questions or copy and paste content from PDF files into QTI XML format files. This manual process was prone to errors and slow.
“By using Adobe PDF Extract API to automate how we pull questions into the content bank, we will save over 2,000 days of labour for every 50,000 questions harvested and eliminate the costs of hiring temporary workers for data entry,” says Child. “With the time saved, we will be able to harvest more questions and build a much richer content bank.”