Logistic Curve Models of CO2 Accumulation

David Levinson

doi:10.32866/001c.13709

Research Question

This article explores the use of logistic-shaped diffusion curves (S-Curves) to predict the accumulation of atmospheric $\mathrm{CO_{2}}$ . Atmospheric $\mathrm{CO_{2}}$ is measured at a number of stations globally, the longest continuous series is from Mauna Loa, and the data series, the Keeling Curve, (Keeling et al. 2001) has been made famous in, among other places, An Inconvenient Truth (Gore 2006). $\mathrm{CO_{2}}$ in the atmosphere results from a variety of causes, but transport is considered one of the primary sources, amounting to about 24% of total $\mathrm{CO_{2}}$ emissions annually (Solaymani 2019).

As a well-known data series, the Keeling Curve has been used to demonstrate the relatively steady increase of atmospheric $\mathrm{CO_{2}}$ , which has separately been correlated with the rise of global temperature.

Logistic Curves have been used for historic analysis, especially in the domain of understanding technology deployment, particularly the deployment of transport networks, and are used prospectively for forecasting (De Tarde 1890; Rogers 1995; Marchetti 1980; Batten and Johansson 1985; Nakicenovic 1988; Garrison 1989; Grubler 1990; Garrison and Souleyrette 1996; Levinson and Krizek 2017; Dediu 2018). Despite, for instance, greater car use, cleaner engines, growing population, implementation of carbon policies, we ask if the curves of $\mathrm{CO_{2}}$ accumulation are stable – in other words, are those changes small or offsetting, and have trends that are already embedded in the function? While the level of $\mathrm{CO_{2}}$ in the atmosphere is the result of countless microscopic individual decisions and actions, along with random environmental factors, perhaps the resulting aggregate trends produce a predictable macroscopic pattern.

The broader research question here is whether forecasts using logistic curves are stable as suggested by historic analyses of other systems, that is, do they predict consistently over time with different amounts of data? To the extent they are stable, we suppose they are more reliable for forecasting.

Methodology

S-Curves use the following equation

$\frac{S_t}{{S_{max}-S_t}} = e^{b\cdot t + c}\hspace{23mm}(1)$

or

$ln \left(\frac{S_t}{{S_{max}-S_t}}\right) = b \cdot t + c\hspace{10mm}(2)$

Where:

$S_t$ = system status ( $\mathrm{CO_{2}}$ accumulation) at time $t$ .

$S_{max}$ = maximum system status (ultimate $\mathrm{CO_{2}}$ accumulation in the atmosphere).

$t$ = time (year). (Data are reported in months, denoted as decimal years)

$c, b$ = model parameters

The objective is to solve for $c$ and $b$ to best explain the relationship.

To apply the model, it is helpful to estimate the midpoint or the inflection year ( $t_{i}$ ). It turns out that:

$t_{i}=\frac{c}{{-b}}\hspace{27mm}(3)$

We can then estimate the system size (in this case $\mathrm{CO_{2}}$ accumulation) ( $S_t$ ) in any given year $t$ using the following equation:

$\widehat S_t = \frac{S_{max}}{{1 + e^{( { - b( {t - t_{i}} )} )} }}\hspace{10mm}(4)$

In back-casting, explaining the deployment of extant systems, $S_{max}$ is apparent. We aim to identify the final systems status ( $S_{max}$ ) for a system whose deployment is ongoing. While we may know the current and historic system size ( $S_t$ ), use of an S-curve requires knowing how large the system will be.

The method here solves for $S_{max}$ , $c$ , and $b$ which maximize goodness of fit for the equation, measured as the $R^2$ . We use ordinary least squares regression to solve for $c$ and $b$ for a given $S_{max}$ , and find the best $S_{max}$ using a generalized reduced gradient solver aiming to maximize $R^2$ by adjusting $S_{max}$ .

The monthly average atmospheric $\mathrm{CO_{2}}$ concentrations in parts per million (ppm) are derived from in situ air measurement at Mauna Loa, Observatory, Hawaii: Latitude $19.5^{\circ}$ N, Longitude $155.6^{\circ}$ W, Elevation 3397m, as recorded by the Scripps Institution of Oceanography. The raw data, interpolated to complete missing observations, was used.

Starting with a pre-industrial baseline level of 280 ppm (Etheridge et al. 1998), we find the parameter estimates for the best fit logistic curve for the Keeling Curve. We do this at 7 different points of time (1960, 1970, 1980, 1990, 2000, 2010, 2020) using the data available at those points in time. So for 1960, we use data from 1958-1959, for 1970, data from 1958-1969, and so on, until 2020, which uses all the available data through the end of 2019.

Findings

Table 1 shows the parameter values for each model, and Figure 1 shows those values graphed, along with the observed data. As can be seen from the figure, the models give a wide variation in results. While all models since 1980 are reasonably good fits ( $R^2 > 0.9$ ), and reproduce the observed data they are trying to replicate, and the 2020 model has a very good fit ( $R^2 > 0.99$ ), they produce very different outcomes. The growth of $\mathrm{CO_{2}}$ is not steady, and some decades have more change than others. The use of early forecasts to estimate maximum system states is precarious.

Table 1:Results of Logistic Regression and Comparison with Data

	2020	2010	2000	1990	1980	1970	1960
b	0.0273	0.0288	0.0285	0.0237	0.0241	0.0375	-0.0365
c	-55.8	-58.5	-58.0	-	-50.2	-73.7	72.3
t_i = c/ − b	2042	2029	2030	-	2080	1966	1983
S_max (ppm)	649	573	581	∞	962	363	331
R ²	0.992	0.988	0.980	0.962	0.906	0.648	0.014
Measured CO₂ (Jan. Avg.)	413	389	369	354	338	325	316
10-Year % Rise (over 280 ppm base)	22.8%	21.7%	21.0%	26.9%	28.7%	23.7%	-

Figure 1:Logistic Curves of CO2 Accumulation

The 1960 model, using slightly less than 2 years of data (22 months), does not predict an increase in $\mathrm{CO_{2}}$ at all, and instead takes the best-fit value of $S_{max}$ (330 ppm) and sees that the maximum has already been reached, and the accumulation of $\mathrm{CO_{2}}$ is in decline.

The 1970 model forecasts a small increase to 362 ppm (from a 1970 level of 325 ppm). The relatively low levels of $S_{max}$ resulting from extrapolation of 1950s-1960s data contrast sharply with the following decades, suggesting a faster rate of increase of $\mathrm{CO_{2}}$ (a positive second derivative) in the subsequent two decades (as seen in the final row of Table 1).

The level of $\mathrm{CO_{2}}$ started increasing at a faster rate in the 1970s and 1980s, so the 1980 model found a best-fit at $S_{max}$ =962 ppm. The late 1980s is when concern about the issue began to become mainstream.

The 1990 model does not actually have a finite saturation level, and instead fits an exponential pattern with no saturation level, the result of steady increases in the rate of $\mathrm{CO_{2}}$ atmospheric accumulation. But even small differences in early years extrapolate to large (and potentially infinite) differences in later years with this model form. The following years all converge on a best-fit value of $S_{max}$ , as the rate of increase slowed.

Models for 2000 and 2010 are very close, with $S_{max}$ of 580 ppm and 572 ppm respectively. This suggests that after 40 and 50 years, the results begin to stabilize.

The result using the most complete model (2020) is a saturation ( $S_{max}$ ) level of 649 ppm, and implies the increase in $\mathrm{CO_{2}}$ won’t begin to slow until 2042 $(t_i)$ . This growth trajectory is well in excess of levels required to keep the global average temperature under a $2^{\circ}$ Celsius rise from the 1950s baseline, (that fast approaching level is estimated at 450 ppm) and may be sufficient to melt Arctic and Antarctic ice (Fischetti 2011). For comparison, the ‘worst case’ Representative Concentration Pathways (RCP) scenario RCP8.5 (8.5 $W/m^{2}$ of radiative forcing) has a year 2100 level of $\mathrm{CO_{2}}$ of over 900 ppm, while the more optimistic RCP2.5 has a 400 ppm concentration, down from a mid-century peak (Van Vuuren et al. 2011).

The overall stability of the models can be considered by examining Table 2. For each data year, we compare all of the models. In almost all cases, the model estimated for a given year has the lowest root mean square error (standard deviation of the residuals) for a given data year, as shown by the bold numbers on the diagonal, which is not surprising. The only exception is the 2010 model has slightly lower RMSE than the 2000 model for year 2000 data. The data-limited earlier year models perform poorly in predicting later years, while the data-rich later year models do reasonably well (though obviously not as well as the earlier year model) in predicting on a limited set of earlier year data. The other notable observation is the 1990 model, which did not converge on an $S_{max}$ and so functions as an exponential model, does especially poorly in predicting later years, as it overestimates the accumulation of $\mathrm{CO_{2}}$ . $R^2$ is not the only goodness of fit measure, and the models could be optimized on a difference performance indicator, but it seems to reasonably reproduce the data when the models converge, and there are only small overall differences in RMSE in the converged models for the later years (models estimated for 1980, 2000, 2010, and 2020) over the span of study period, though clearly they have quite different implications.

Table 2:Model Root Mean Square Error

Model Year
Data Year	2020	2010	2000	1990	1980	1970	1960
2020	2.233	2.411	2.396	5.030	2.334	17.678	35.959
2010	2.201	2.189	2.190	3.231	2.333	12.006	28.272
2000	2.191	2.188	2.189	2.361	2.357	7.875	21.516
1990	2.124	2.128	2.129	2.079	2.281	5.009	15.451
1980	2.034	2.059	2.052	2.012	2.001	2.625	9.375
1970	2.033	2.055	2.050	2.005	1.974	1.931	4.506
1960	1.759	1.763	1.762	1.753	1.749	1.741	1.654

While there is uncertainty about its absolute magnitude, in the absence of a policy, economic, technological, geologic, environmental, or other shock, forecasts from the past four decades are in broad agreement of the trajectory of the problem.

Future research can test the general question of logistic curve stability with additional types of data, including network extent, vehicle kilometers traveled (VKT), and car ownership. Some of these (such as percent of people who own cars), have a natural upper limit e.g. an $S_{max}$ of 100%, other types of data are continuous and while there might be a physical maximum, it is not naturally derived, and instead depends on conditions, like VKT.

Logistic Curve Models of CO2 Accumulation

Abstract

Research Question

Methodology

Findings

References