Scripts for data acquisition with paper based surveys
SDAPS is a program to create surveys that can be printed out, and then batch scanned and analysed.
With SDAPS the questionaire is designed using OpenOffice.org. From the OpenOffice Document (ODT) the program will create a PDF that can be printed and handed out to people. After the sheets are filled out, you just need to scan them again, and the program will create a report.
If you are interested in using SDAPS, please contact benjamin@sipsolutions.net for more information and help.
Also have a look at the ToDo List.
Getting SDAPS
SDAPS is currently only available via git. You can browse the repository or check it out using using the following command:
git clone http://git.sipsolutions.net/sdaps.git
Process
This shows the process of taking out a survey, using an example. The example questionaire is currently just in german, but the process is the same for any language.
Creating the Questionaire
The questionaire is created using OpenOffice.org and a special set of styles. Have a look at this document. Special marks to enable the automatic processing of the scanned data will be added by SDAPS, so only print this document for testing purposes.
You will notice that there are a number of special styles. These styles are later used to extract the needed information from the document. So "QObject-Choice" for example is used for multpile choice questions, while "QObject-Mark" is a numerical range (eg. 1-5).
Initialising the Project
This is the first step that actually requires SDAPS. First of all, export your questionaire from OpenOffice into a PDF document. After that run:
$ sdaps project_path setup questionaire.odt questionaire.pdf
You will be presented with the detected headings, questions and answers. It is important that you verify the information that is printed out. It will look something like the following:
Fachschaft Elektro- und Informationstechnik
AG Lernverhalten
Datum: 28.07.2008
Umfrage: Prüfung ES Sommersemester 2008
Questionnaire
1. (Head) Allgemeines
1.1 (Choice) In welchem Studiengang bist Du immatrikuliert? {1}
0 (Checkbox) 23.0 63.6 3.5 3.5 ETIT
1 (Checkbox) 52.0 63.6 3.5 3.5 Anderer
1.2 (Choice) In welchem Fachsemester bist Du? {1}
0 (Checkbox) 23.0 76.4 3.5 3.5 1 – 2
1 (Checkbox) 52.0 76.4 3.5 3.5 3 – 4
2 (Checkbox) 81.0 76.4 3.5 3.5 5 – 6
3 (Checkbox) 110.0 76.4 3.5 3.5 7 – 8
4 (Checkbox) 139.0 76.4 3.5 3.5 9 – 10
5 (Checkbox) 168.0 76.4 3.5 3.5 11 und mehr
1.3 (Choice) War für Dich diese Prüfung der 1. Versuch? {1}
0 (Checkbox) 23.0 89.3 3.5 3.5 Ja
1 (Checkbox) 52.0 89.3 3.5 3.5 Nein
1.4 (Choice) Ist Deutsch Deine Muttersprache? {1}
0 (Checkbox) 23.0 102.1 3.5 3.5 Ja
1 (Checkbox) 52.0 102.1 3.5 3.5 Nein
[[...]]
4. (Head) Sonstiges
4.1 (Text) Kommentare (Was kann die Universität verbessern? Was kann die Fachschaft verbessern?) {2}
0 (Textbox ) 23.0 196.7 174.0 72.0
5. (Additional_Head) Resultat
5.1 (Additional_Mark) Welche Note hast Du bekommen? {0}
1 - 5If something is wrong, double check that the styles are correct, and then recreated the project (just remove the old version, it is a directory).
Printing it
To make everything machine readable corner marks and other information need to be added. In our case we also needed to be able to uniquely identify each questionaire, so that people could anonymously add more data via the internet at a later point.
You can first create a cover page, that just summerizes the what the survey is about.
./sdaps.py project_dir cover
This command will create a cover.pdf file inside project_dir.
To now create eg. 200 unique questionaires, run:
$ sdaps project_dir stamp 200
A PDF file called stamped_1.pdf (the number increases should you rerun it) is created. This PDF can be printed out, be carefull to print it in duplex mode, and you should disable scaling if you print with Adobe Acrobat Reader.
An example with just 10 sheets: example-stamped.pdf.BR As you can see there is a unique "Fragebogen-ID" on each page.
Scanning
The sheets now need to be scanned in. For this you obviously should have a fast duplex scanner. Some notes:
- You do not need to care about rotation or anything, just stuff them all into the scanner and let it do its job
- The software is currently only tested with one particular scanner, and certain settings:
- The scanner (and printer) is a Konica Minolta Bizhub 750
- 300 dpi resolution (everything else will currently require modifications of hardcoded values)
- black/white mode that does not do any dithering, but just uses some threshold.
- You need to scan into a 1bpp (black/white) multipage tiff file.
There is also some example data.
The software will likely need some modifications to handle different scanners and settings well. If you have access to a batch scanner it would be great if you can provide us with some example data and information about the used scanner.
Adding the scanned data
Add the scanned data with:
$ sdaps project_dir add scanned_data.tif
Run the automated recognition
After you have scanned and added all data (you can run "add" as many times as you want), you should run the recognition algorithm.
$ sdaps project_dir recognize
This command analyses all the images, detects where boxes are checked and text has been written into freeform fields.
Using the Graphical Interface
You can have a look at what the program did, and also correct anything you find, with the graphical user interface. For this just run:
$ sdaps project_dir gui
Running the graphical interface is not needed in any way. It should only be neccessary if you want to have a look at what is going on, and to check if the recognition quality is good enough.
Have a look at the screenshot.png.
Creating a Report
As the last step, you can create a report in PDF form.
$ sdaps project_dir report
An example: example-report.pdf
Interpreting the Data
Well that is your job
SDAPS has a couple of more features, like creating reports for only a subset of the filled out questionaires. Or adding more data from a webform at a later point. But this should be good enough to give you an initial impression of what it is all about.