PrefLib Format

PrefLib is about sharing data representing preferences. We have attempted to unify the formatting of the data as much as possible. We detail below the different aspects of the hosted data. Specifically, you can learn about:

Database Structure

Our database gathers several datasets. A dataset is a collection of data files somewhat related to each others (different years of the same election for instance). The data file represent preferences, expressed according to different formats, see the different types of data we host.

The datasets are classified based on some tags they can have or not. We describe them below.

Data Types

Each data file has a given data type which represents the type of preferences that are expressed in the data file. These types are also used as file extensions. We review them all in the following.

SOC - Strict Orders - Complete List

Illustration of the SOC type

The SOC extension contains preferences represented by a strict and complete linear order (transitive, and asymmetric relation) over the alternatives. They are complete in the sense that every linear order contains the whole set of alternatives. They are strict in the sense that no two alternatives can be tied.

Download all the SOC data files

SOI - Strict Orders - Incomplete List

Illustration of the SOI type

The SOI extension contains preferences represented by a strict and possibly incomplete linear order (transitive, and asymmetric relation) over the alternatives. They are possibly incomplete in the sense that some preferences might not contain the whole set of alternatives. They are strict in the sense that no two alternatives can be tied.

Download all the SOI data files

TOC - Orders with Ties - Complete List

Illustration of the TOC type

The TOC extension contains preferences represented by a transitive and complete relation over the alternatives. They are complete in the sense that every preference contains the whole set of candidates. They need not be strict: several alternatives can be tied.

Download all the TOC data files

TOI - Orders with Ties - Incomplete List

Illustration of the TOI type

The TOI extension contains preferences represented by a transitive and possibly incomplete relation over the alternatives. They are possibly incomplete in the sense that some preferences might not contain the whole set of alternatives. They also need not be strict: several alternatives can be tied.

Download all the TOI data files

CAT - Categorial Preferences

Illustration of the CAT type

Files with a CAT extension describe categorical preferences. In this domain, voters are asked to position all the alternatives into pre-determined categories, for instance the categories “Yes”, “Maybe”, and “No”. There exists an underlying ranking over the categories that determine the voters' preferences. Note that these preferences are closely related to the ordinal ones described above, except they allow for some categories to be empty.

Download all the CAT data files

WMD - Weighted Matching Data

Illustration of the WMD type

Files with a WMD extension describe a set of weighted matching data. These are weighted directed graphs, i.e., a collection of edges between the alternatives associated with a weight.

Download all the WMD data files

CSV and DAT - Extra Data File

Files with a CSV or a DAT extension are used when miscellaneous data are needed. They are generally paired with another file, providing more information than is expressible in the basic data formats.

File Format

All the data file share a common file format, with few adaptions for each specific type. Data files contain two parts, first a list of metadata with lines starting with a “#”; second the preferences themselves.

Below, you will find an example of the first 8 lines of the header of a file in the irish dataset.

1 # FILE NAME: 00001-00000001.soi
2 # TITLE: 2002 Dublin North
3 # DESCRIPTION:
4 # DATA TYPE: soi
5 # MODIFICATION TYPE: original
6 # RELATES TO:
7 # RELATED FILES: 00001-00000001.toc
8 # PUBLICATION DATE: 2013-08-17

Let us describe each of the metadata that are common to all files.

The preference part of the file is specific to each data type, we will describe it in the following.

Ordinal Preferences

Ordinal preferences are represented using the data types SOC, SOI, TOC, and TOI all represent ordinal preferences.

On top of the metadata described above, additional ones are specific to ordinal preferences.

The orders are described in the following way. Each line indicates first the number of voters who submitted the given preference list, and then, after a column, the preference list. Inside a preference list, a strict ordering is indicated by comma, and indifference classes are gouped with brackets. We provide some examples below for a better understanding.

To conclude, here is an example of the 27 first lines of a data file of complete orders with ties (TOC) (taken from the debian election dataset).

1 # FILE NAME: 00002-00000001.toc
2 # TITLE: Debian 2002 Leader
3 # DESCRIPTION: Obtained from the soi by adding the unranked alternatives at the bottom
4 # DATA TYPE: toc
5 # MODIFICATION TYPE: imbued
6 # RELATES TO: 00002-00000001.soi
7 # RELATED FILES:
8 # PUBLICATION DATE: 2013-08-17
9 # MODIFICATION DATE: 2022-09-16
10 # NUMBER ALTERNATIVES: 4
11 # NUMBER VOTERS: 475
12 # NUMBER UNIQUE ORDERS: 31
13 # ALTERNATIVE NAME 1: Branden Robinson
14 # ALTERNATIVE NAME 2: Raphael Hertzog
15 # ALTERNATIVE NAME 3: Bdale Garbee
16 # ALTERNATIVE NAME 4: None Of The Above
17 100: 3,1,2,4
18 79: 1,3,2,4
19 54: 3,2,1,4
20 43: 2,3,1,4
21 34: 3,2,4,1
22 30: 1,2,3,4
23 29: 2,1,3,4
24 16: 1,3,4,2
25 14: 2,3,4,1
26 12: 3,1,4,2
27 9: 3,{1,2,4}

Categorical Preferences

Categorical preferences are very similar to ordinal ones. Some metadata are also specific to them.

The preferences are described in the following way. Each line indicates first the number of voters who submitted the given preference list, and then, after a column, the preference list. Inside a preference list, each category is grouped around brackets, except for the categories with a single alternative, the empty category being “{}”. We provide some examples below for abetter understanding.

Let's conclude with an example from the french approval dataset.

1 # FILE NAME: 00026-00000001.cat
2 # TITLE: GylesNonains
3 # DESCRIPTION:
4 # DATA TYPE: cat
5 # MODIFICATION TYPE: original
6 # RELATES TO:
7 # RELATED FILES: 00026-00000001.toc
8 # PUBLICATION DATE: 2017-04-13
9 # MODIFICATION DATE: 2022-09-16
10 # NUMBER ALTERNATIVES: 16
11 # NUMBER VOTERS: 365
12 # NUMBER UNIQUE PREFERENCES: 216
13 # NUMBER CATEGORIES: 2
14 # CATEGORY NAME 1: Yes
15 # CATEGORY NAME 2: No
16 # ALTERNATIVE NAME 1: Megret
17 # ALTERNATIVE NAME 2: Lepage
18 # ALTERNATIVE NAME 3: Gluckstein
...
32 13: 6,{1,2,3,4,5,7,8,9,10,11,12,13,14,15,16}
33 13: {},{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16}
34 10: {9,10},{1,2,3,4,5,6,7,8,11,12,13,14,15,16}
35 10: {1,6},{2,3,4,5,7,8,9,10,11,12,13,14,15,16}

Weighted Matching

Let us start, as usual, for the metadata that are specific to the matching data.

The matching grap iteself is described as a list of Source, Destination, Weight. Below is an example from the kidney matching dataset.

1 # FILE NAME: 00036-00000001.wmd
2 # TITLE: Kidney Matching - 16 with 0
3 # DESCRIPTION:
4 # DATA TYPE: wmd
5 # MODIFICATION TYPE: synthetic
6 # RELATES TO:
7 # RELATED FILES: 00036-00000001.dat
8 # PUBLICATION DATE: 2017-04-13
9 # MODIFICATION DATE: 2022-09-16
10 # NUMBER ALTERNATIVES: 16
11 # NUMBER VOTERS: 365
12 # NUMBER EDGES: 59
13 # ALTERNATIVE NAME 1: Pair 1
14 # ALTERNATIVE NAME 2: Pair 2
15 # ALTERNATIVE NAME 3: Pair 3
...
28 1,5,1.0
29 1,6,1.0
30 2,1,1.0
31 2,3,1.0

Extra Data File

When miscellaneous data are needed, we use the file extension DAT which has no specified format.

Metadata

We have annotated most of our data files to be able to have a more fine grain analysis of the data we host. This allows for instance to be able to have a more interesting search tool. For each data file, its metadata are presented on its corresponding page.

In the following we present all the metadata we are using. Note that they may not always be available as some of them require sophisticated computations and/or do not apply for all types of data.

Modification Type

Each data file is labeled as either Original, Induced, Imbued or Synthetic.

We encourage you to understand some of the impacts that making these assumptions can have, see, e.g. A Behavioral Perspective on Social Choice. Anna Popova, Michel Regenwetter, and Nicholas Mattei. Annals of Mathematics and Artificial Intelligence 68(1-3), 2013.

General Properties

Preference Structure

Ballot Structure

Aggregtated Structure