1. QUESTIONS
In earlier work, we developed a model (Yu et al. 2021) that simulates urban expansion as a multi-scale dynamic process. We subsequently estimated the parameters of the model for multiple time periods and functional urban areas across Europe and applied cluster analysis on the estimated parameter sets (Yu et al. 2022). One aim of the cluster analysis was to establish a data-driven approach to the development of urban growth scenarios. The data-driven clusters represent substantially different, yet plausible future urbanisation patterns across the spectrum of possible developments. From the analysis four clusters emerged that present modes of expansion: compact, medium compact, medium dispersed and dispersed.
The current article puts the model and approach to scenario development through a further test and investigates the plausibility of the generated scenarios. The hypothesis is that the model simulations using parameters drawn from the four emerging clusters provide plausible realisations of possible urban expansion trajectories. We will approach this with a Turing-like test by evaluating whether domain experts are able to visually differentiate between maps of true urban expansion data and model-generated data.
2. METHODS
2.1. QUIZ BASED SURVEY
To develop a test of plausibility we asked experts (n = 9) to visually compare model outputs with actual urban expansion. The experiment was set up as a quiz, consisting of four multiple-choice questions that are identical for all participants. In each of the four questions, a participant was presented with a line-up of four maps of a 20 km x 20 km square centred on a functional urban area (FUA) in Europe. The four maps presented four patterns of possible urban expansion over the period 1975-2014 (Figure 1).
Of the four maps presented in each quiz question, one was the actual expansion pattern of that FUA and three were model-generated scenarios. Each of the three model-generated scenarios was produced using one of the data-driven urban growth mode clusters. The omitted cluster in each question was the one which the FUA’s actual expansion most strongly resembles. In other words, for each question, there are four choices of expansion pattern maps; the experiment tests whether participants could choose the actual expansion pattern among the model-generated scenarios. The four FUAs used in the quiz were chosen to reflect a wide range of urban expansion types.
The quiz was taken online by nine participants, all of whom are academics or researchers at UK institutions, in a workshop on mathematical optimization in landscape modelling.
The quiz was set up in Google Forms and to date remains available online (https://forms.gle/ReCqRGh1HqMrsk7N6). Although this is not exactly a Turing test (Turing 1950), we are playing an “imitation game” and ask participants to identify the machine-generated information.
2.2. DATA
The sole data source for the data-driven scenario development was the GHS Built database (Florczyk et al. 2019). The source dataset (GHS_BUILT_LDSMT_GLOBE_R2018A) provides 250 m resolution built-up area density for the 1975-2014 period. For the model this was reclassified into binary urban / non-urban land where the aggregation was based on a 20% urban coverage threshold. Models were estimated for functional urban areas and separately for the periods “0”-2000 and 1975-2000. Where “0” indicates the moment of urban genesis, i.e. when the land was completely void of urban areas.
2.3. MODEL
The evaluated model is a Constrained Cellular Automata (CCA) model. The model operates on the binary urban/non-urban raster and dynamically changes pixels from non-urban to urban as the simulation steps through time. The model rules reflect two types of spatial processes as well as a random perturbation. The first process is urban agglomeration, which is modelled by making urban expansion more likely in the vicinity of existing urban areas. The second process is ecosystem service conservation, where urban expansion is less likely in areas where non-urban land is scarce. Both the vicinity of urban land and the scarcity of non-urban land are measured at two distinct spatial scales using distance decay functions.
Thus, the model has four parameters in total that reflect the relative importance of urban agglomeration at two spatial scales and ecosystem service conservation at two spatial scale respectively. A parameter for the fifth process of random perturbation is omitted as its weight is implied by the four others.
The parameters are estimated using a Markov Chain Monte Carlo method with Approximate Bayesian Computation (MCMC-ABC). Full model code (Python) is available in open source. (https://github.com/JingyanYu/LandUseDecisions).
3. FINDINGS
If the model results were indistinguishable from real urban expansion patterns, then we would expect participants to identify the correct map 25% of the time. The results show that participants chose the right answer 19.4% of the time. This is below 25%, but not significantly different according to Pearson’s Chi Squared test. It is also substantially below 100%, which would be expected for a model which can be readily identified by the experts.
For each of the four questions in the questionnaire the performance is respectively (0%, 33%,33%, and 11%), which again shows no significant deviation from the 25% expectation of perfect confusion.
The results offer support for the modelling approach introduced in the previous work (Yu et al. 2021, 2022). In particular, the methods create a range of urban outcomes from compact to dispersed expansion that are sufficiently realistic and plausible to be accepted by experts as true urban expansion. It confirms the viability of simulating a wide range of urban expansion dynamics realistically with just four parameters and no ancillary data. Secondly, it confirms the potential of using machine learning and dynamic models to characterize urban dynamics and explore possible future developments in the form of data-driven model-based scenarios.
Validation of environmental models, including urban expansion models, is a recognized challenge (Bennett et al. 2013) and commonly based on comparison between modelled and observed data. This is not possible in the case of scenarios that reflect a range of possible outcomes. In this case, the aim is not accuracy but realism or plausibility. Our Turing-like validation approach is a method to capture this elusive aspect of model performance.
Funding source
UKRI, NERC as part of the Landscape Decisions programme. Project number NE/T004150/1.