Our study of local policy agendas relies on a massive dataset of local council agendas in all of the 98 Danish municipalities from 2007-2016. In some instances, the data extend back to the early 1990s. We have collected more than 250,000 agenda items and classified them by the political issue on which each focuses. Our coding system includes 25 major topics and 189 subtopics covering the gamut of Danish local government activities, for instance, primary schools, elder care, unemployment, park maintenance, etc. These data allow scholars to trace political attention precisely and consistently across the 98 local councils.
We coded the topic of each of the 250,000 agenda items. To do so, we applied an adapted version of the issue-coding scheme of the comparative agendas project (CAP; See: www.comparativeagendas.net) to categorize the issue content of each item on the agenda. The CAP codebook applies topic codes to agenda items at the national level of government and therefore it includes a number of categories that are not relevant at the local level, such as foreign affairs and defense. At the same time, the national-level CAP codebook groups together certain policy areas that we must treat as distinct policy areas at the local level, such as daycare, primary schools, and secondary schools. We preserve much of the structure of the CAP codebook, and, where necessary, recode select portions of the codebook to harmonize them with the powers and responsibilities of local governments.
The following codebooks have been developed for the coding of Danish local government agendas. You can find the codebook in a Danish version as well as in a translated English version.
Trained student coders, in combination with supervised machine learning for classification, assigned one of the 189 subtopic codes to each agenda item in the data. We applied machine-learning tools both to boost the efficiency of the coding process and to improve its final accuracy. The procedure began with student coders, who applied subtopic labels to an initial data set of twenty-five thousand agenda items randomly selected from the data. With these coded data in hand, we turned to the old and well-understood supervised classification algorithm, Naïve Bayes, to get computer-generated predictions for the correct subtopics of all the remaining data. At this point, the data coding was a months-long process of alternating between human coders and the machine-learning algorithm. Student coders corrected random selections of predictions from the algorithm to check its accuracy. Then, we added the newly corrected data to the training data and ran the algorithm again and predicted subtopics for the entire data set. When the accuracy of the computer-generated coding converged to a performance level better than our intercoder reliability scores for human coders, we considered the process complete and built our final data set by combing our human-coded training data with the remainder of our collected data labeled by the final trained algorithm. A total of around 83,000 city council agenda items, 28.5% of that data, were coded solely using automated tools.
Our final computer-generated predictions achieved accuracy rates over 80%, which we verified with a final check from student coders. Naturally, this also implies that our computer classifications suffer from a nearly 20% error rate. However, the bar for success is set by the best alternative. In our case, we asked student coders to apply one of 189 policy subtopics to thousands of sentences. We estimated that well-trained student coders generally achieved intercoder reliability rates—the common standard for accuracy of human coding—under 80%. Our computer accuracy rate was at least as good as human performance, and when applied to national-level data our model performed at a level comparable to other algorithms (see Loftis and Mortensen 2020). Furthermore, computer predictions were fast and cheap, and their errors were largely confined to distinguishing between similar categories and identifying the rarest subtopics in the data. For the most frequent subtopics in our data, computer performance was nearly perfect.
Loftis, M. & Mortensen, P. B. (2020). Collaborating with the Machines: A hybrid method for classifying policy documents. Policy Studies Journal, 48(1), 184-206. https://doi.org/10.1111/psj.12245
Link to the Comparative Agendas master codebook: https://www.comparativeagendas.net/pages/master-codebook