Scripts for data acquisition with paper based surveys
SDAPS is an open source (GPLv3) program to create surveys that can be printed out, and then batch scanned and analysed.
With SDAPS the questionaire is designed using OpenOffice.org. From the OpenOffice Document (ODT) the program will create a PDF that can be printed and handed out to people. After the sheets are filled out, you just need to scan them again, and the program will create a report.
If you are interested in using SDAPS, please contact benjamin@sipsolutions.net for more information and help.
Also have a look at the ToDo List.
Features
- Fully automatic recognition of scanned data
OpenDocument text (ODT) for creating questionnaires
LaTeX for creating questionnaires
- Supports any paper size
- Multipage questionnaires (1, 2, 4 or 6 pages)
- Different kinds of questions:
- A mark type question (value from 1 to 5)
- A choice of many, that may also include freeform fields
- Freeform fields
- Creation of PDF reports for printout
- Also supports creating reports of only partial result sets with arbitrary filters
- Export of data to CSV files for further analysis (excluding image data)
Import of additional results from other sources.
With this it is for example possible to merge data aquired via a webpage at a later point.- A GUI application to check the recognition and correct errors
Getting SDAPS
SDAPS is currently only available via git. You can browse the repository or check it out using using the following command:
git clone http://git.sipsolutions.net/sdaps.git
Process
This shows the process of taking out a survey, using an example. The example questionaire is currently just in german, but the process is the same for any language.
Creating the Questionaire
The questionaire is created using OpenOffice.org and a special set of styles. Have a look at this document. Special marks to enable the automatic processing of the scanned data will be added by SDAPS, so only print this document for testing purposes.
You will notice that there are a number of special styles. These styles are later used to extract the needed information from the document. So "QObject-Choice" for example is used for multpile choice questions, while "QObject-Mark" is a numerical range (eg. 1-5).
Initialising the Project
This is the first step that actually requires SDAPS. First of all, export your questionaire from OpenOffice into a PDF document. After that run:
$ sdaps project_path setup questionaire.odt questionaire.pdf
You will be presented with the detected headings, questions and answers. It is important that you verify the information that is printed out. It will look something like the following:
Fachschaft Elektro- und Informationstechnik
AG Lernverhalten
Datum: 28.07.2008
Umfrage: Prüfung ES Sommersemester 2008
Questionnaire
1. (Head) Allgemeines
1.1 (Choice) In welchem Studiengang bist Du immatrikuliert? {1}
0 (Checkbox) 23.0 63.6 3.5 3.5 ETIT
1 (Checkbox) 52.0 63.6 3.5 3.5 Anderer
1.2 (Choice) In welchem Fachsemester bist Du? {1}
0 (Checkbox) 23.0 76.4 3.5 3.5 1 – 2
1 (Checkbox) 52.0 76.4 3.5 3.5 3 – 4
2 (Checkbox) 81.0 76.4 3.5 3.5 5 – 6
3 (Checkbox) 110.0 76.4 3.5 3.5 7 – 8
4 (Checkbox) 139.0 76.4 3.5 3.5 9 – 10
5 (Checkbox) 168.0 76.4 3.5 3.5 11 und mehr
1.3 (Choice) War für Dich diese Prüfung der 1. Versuch? {1}
0 (Checkbox) 23.0 89.3 3.5 3.5 Ja
1 (Checkbox) 52.0 89.3 3.5 3.5 Nein
1.4 (Choice) Ist Deutsch Deine Muttersprache? {1}
0 (Checkbox) 23.0 102.1 3.5 3.5 Ja
1 (Checkbox) 52.0 102.1 3.5 3.5 Nein
[[...]]
4. (Head) Sonstiges
4.1 (Text) Kommentare (Was kann die Universität verbessern? Was kann die Fachschaft verbessern?) {2}
0 (Textbox ) 23.0 196.7 174.0 72.0
5. (Additional_Head) Resultat
5.1 (Additional_Mark) Welche Note hast Du bekommen? {0}
1 - 5If something is wrong, double check that the styles are correct, and then recreated the project (just remove the old version, it is a directory).
Printing it
To make everything machine readable corner marks and other information need to be added. In our case we also needed to be able to uniquely identify each questionaire, so that people could anonymously add more data via the internet at a later point.
You can first create a cover page, that just summerizes the what the survey is about.
./sdaps.py project_dir cover
This command will create a cover.pdf file inside project_dir.
To now create eg. 200 unique questionaires, run:
$ sdaps project_dir stamp 200
A PDF file called stamped_1.pdf (the number increases should you rerun it) is created. This PDF can be printed out, be carefull to print it in duplex mode, and you should disable scaling if you print with Adobe Acrobat Reader (though it will still work–with adjustments– if scaling has happened).
You can also just run:
$ sdaps project_dir stamp
To create a one page PDF that can be printed as many times as you want. The above feature with unique questionaires can be used to add more data via a web form later on (by telling people to write down the ID of their page).
An example with just 10 sheets: example-stamped.pdf.
As you can see there is a unique "Fragebogen-ID" on each page.
Scanning
The sheets now need to be scanned in. For this you obviously should have a fast duplex scanner. Some notes:
- You do not need to care about rotation or anything, just stuff them all into the scanner and let it do its job
- Pages need to be scanned at 300 dpi (anything else needs changes to hardcoded values)
- You need to scan into a 1bpp (black/white) multipage tiff file.
- There should be no dithering for grey values; instead the scanner should use a threshold.
There is also some example data.
The software will likely need some modifications to handle different scanners and settings well. If you have access to a batch scanner it would be great if you can provide us with some example data and information about the used scanner.
Scanners tested so far:
- Konica Minolta Bizhub 750. Works fine.
Canon DR-2510C
Scanning can be done with:scanimage -d canon_dr:libusb:002:003 --source "ADF Duplex" --mode Lineart --resolution 300 -l 0 -t 0 -x 210 -y 297 --page-height 297 --batch='out%05d.pnm' --batch-count=10 --threshold 150 --brightness -40
.
Other Options:checkbox: 0.35 < coverage < 0.65 textbox: padding = 2.0
The textbox padding needs to be increased because the scanner is quite distorts the position on the paper. It can happen that eg. the left side of the page is higher than the right side.- Sharp MX-M753U. Works fine.
For any scanner you should use the boxgalery tool to adjust the threshold for checkbox recognition before running recognize.
Adding the scanned data
Add the scanned data with:
$ sdaps project_dir add scanned_data.tif
Run the automated recognition
After you have scanned and added all data (you can run "add" as many times as you want), you should run the recognition algorithm.
$ sdaps project_dir recognize
This command analyses all the images, detects where boxes are checked and text has been written into freeform fields.
Using the Graphical Interface
You can have a look at what the program did, and also correct anything you find, with the graphical user interface. For this just run:
$ sdaps project_dir gui
Running the graphical interface is not needed in any way. It should only be neccessary if you want to have a look at what is going on, and to check if the recognition quality is good enough.
Have a look at the screenshot.png.
Creating a Report
As the last step, you can create a report in PDF form.
$ sdaps project_dir report
An example: example-report.pdf
Interpreting the Data
Well that is your job
SDAPS has a couple of more features, like creating reports for only a subset of the filled out questionaires. Or adding more data from a webform at a later point. But this should be good enough to give you an initial impression of what it is all about.
Troubleshooting
Please see the documentation in the GIT repository and write a mail to the authors.