Scripts for data acquisition with paper based surveys

SDAPS is a program to create surveys that can be printed out, and then batch scanned and analysed.

With SDAPS the questionaire is designed using OpenOffice.org. From the OpenOffice Document (ODT) the program will create a PDF that can be printed and handed out to people. After the sheets are filled out, you just need to scan them again, and the program will create a report.

If you are interested in using SDAPS, please contact benjamin@sipsolutions.net for more information and help.

Also have a look at the ToDo List.

Getting SDAPS

SDAPS is currently only available via git. You can browse the repository or check it out using using the following command:

git clone http://git.sipsolutions.net/sdaps.git

Process

This shows the process of taking out a survey, using an example. The example questionaire is currently just in german, but the process is the same for any language.

Creating the Questionaire

The questionaire is created using OpenOffice.org and a special set of styles. Have a look at this document. Special marks to enable the automatic processing of the scanned data will be added by SDAPS, so only print this document for testing purposes.

You will notice that there are a number of special styles. These styles are later used to extract the needed information from the document. So "QObject-Choice" for example is used for multpile choice questions, while "QObject-Mark" is a numerical range (eg. 1-5).

Initialising the Project

This is the first step that actually requires SDAPS. First of all, export your questionaire from OpenOffice into a PDF document. After that run:

$ sdaps project_path setup questionaire.odt questionaire.pdf

You will be presented with the detected headings, questions and answers. It is important that you verify the information that is printed out. It will look something like the following:

Fachschaft Elektro- und Informationstechnik
AG Lernverhalten
Datum: 28.07.2008
Umfrage: Prüfung ES Sommersemester 2008
Questionnaire
1. (Head) Allgemeines
1.1 (Choice) In welchem Studiengang bist Du immatrikuliert? {1}
        0 (Checkbox)  23.0  63.6   3.5   3.5 ETIT
        1 (Checkbox)  52.0  63.6   3.5   3.5 Anderer
1.2 (Choice) In welchem Fachsemester bist Du? {1}
        0 (Checkbox)  23.0  76.4   3.5   3.5 1 – 2
        1 (Checkbox)  52.0  76.4   3.5   3.5 3 – 4
        2 (Checkbox)  81.0  76.4   3.5   3.5 5 – 6
        3 (Checkbox) 110.0  76.4   3.5   3.5 7 – 8
        4 (Checkbox) 139.0  76.4   3.5   3.5 9 – 10
        5 (Checkbox) 168.0  76.4   3.5   3.5 11 und mehr
1.3 (Choice) War für Dich diese Prüfung der 1. Versuch? {1}
        0 (Checkbox)  23.0  89.3   3.5   3.5 Ja
        1 (Checkbox)  52.0  89.3   3.5   3.5 Nein
1.4 (Choice) Ist Deutsch Deine Muttersprache? {1}
        0 (Checkbox)  23.0 102.1   3.5   3.5 Ja
        1 (Checkbox)  52.0 102.1   3.5   3.5 Nein

[[...]]

4. (Head) Sonstiges
4.1 (Text) Kommentare (Was kann die Universität verbessern? Was kann die Fachschaft verbessern?) {2}
        0 (Textbox )  23.0 196.7 174.0  72.0 
5. (Additional_Head) Resultat
5.1 (Additional_Mark) Welche Note hast Du bekommen? {0}
        1 - 5

If something is wrong, double check that the styles are correct, and then recreated the project (just remove the old version, it is a directory).

Printing it

To make everything machine readable corner marks and other information need to be added. In our case we also needed to be able to uniquely identify each questionaire, so that people could anonymously add more data via the internet at a later point.

You can first create a cover page, that just summerizes the what the survey is about.

./sdaps.py project_dir cover

This command will create a cover.pdf file inside project_dir.

To now create eg. 200 unique questionaires, run:

$ sdaps project_dir stamp 200

A PDF file called stamped_1.pdf (the number increases should you rerun it) is created. This PDF can be printed out, be carefull to print it in duplex mode, and you should disable scaling if you print with Adobe Acrobat Reader.

An example with just 10 sheets: example-stamped.pdf.BR As you can see there is a unique "Fragebogen-ID" on each page.

Scanning

The sheets now need to be scanned in. For this you obviously should have a fast duplex scanner. Some notes:

There is also some example data.

The software will likely need some modifications to handle different scanners and settings well. If you have access to a batch scanner it would be great if you can provide us with some example data and information about the used scanner.

Adding the scanned data

Add the scanned data with:

$ sdaps project_dir add scanned_data.tif

Run the automated recognition

After you have scanned and added all data (you can run "add" as many times as you want), you should run the recognition algorithm.

$ sdaps project_dir recognize

This command analyses all the images, detects where boxes are checked and text has been written into freeform fields.

Using the Graphical Interface

You can have a look at what the program did, and also correct anything you find, with the graphical user interface. For this just run:

$ sdaps project_dir gui

Running the graphical interface is not needed in any way. It should only be neccessary if you want to have a look at what is going on, and to check if the recognition quality is good enough.

Have a look at the screenshot.png.

Creating a Report

As the last step, you can create a report in PDF form.

$ sdaps project_dir report

An example: example-report.pdf

Interpreting the Data

Well that is your job :-)

SDAPS has a couple of more features, like creating reports for only a subset of the filled out questionaires. Or adding more data from a webform at a later point. But this should be good enough to give you an initial impression of what it is all about.

SDAPS (last edited 2009-04-07 16:47:55 by BenjaminBerg)