Due to rapid urbanization and intense human activities, the urban
heat island (UHI) effect has become a more concerning climatic and
environmental issue. A high-spatial-resolution canopy UHI monitoring method
would help better understand the urban thermal environment. Taking the city
of Nanjing in China as an example, we propose a method for evaluating canopy
UHI intensity (CUHII) at high resolution by using remote sensing data and
machine learning with a random forest (RF) model. Firstly, the observed
environmental parameters, e.g., surface albedo, land use/land cover,
impervious surface, and anthropogenic heat flux (AHF), around densely
distributed meteorological stations were extracted from satellite images.
These parameters were used as independent variables to construct an RF model
for predicting air temperature. The correlation coefficient between the
predicted and observed air temperature in the test set was 0.73, and the
average root-mean-square error was 0.72
Throughout the world, cities have formed rapidly due to population growth
and people gathering in certain areas to settle and build their lives. Such
urbanization brings not only economic development but also the urban heat
island (UHI) phenomenon (Oke, 1982; Mirzaei,
2015; Cao et al., 2016; Zhao et al., 2020). Two major types of UHIs can be
distinguished: (a) the canopy urban heat island (CUHI) and (b) the surface
urban heat island (SUHI). The particular type of UHI is defined based on the
height above the ground at which the phenomenon is observed and measured
(Oke, 1982). The UHI effect has become an indisputable fact and
brings adverse impacts on urban ecology and energy consumption (Roth,
2007; Yang et al., 2019; Y. Yang et al., 2020b; Zheng et al., 2020). UHIs
amplify thermal stress, so people residing in urban areas are more impacted
during heatwave episodes (Koken et al., 2003; Estrada et al., 2017). A
recent study of the global UHI predicted that about 30 % of the world's
population is exposed to lethal high temperatures for at least
20 d yr
There are two main approaches to studying UHIs: numerical simulation and observation. Numerical simulation can reduce the need for a large number of observations and reveal mechanistic insights by investigating the impacts of cities on meteorological variables (Chun and Guldmann, 2014; Zou et al., 2014; Zhang et al., 2015; Taleghani et al., 2016; Li et al., 2020). For instance, Zhang et al. (2015) investigated the influence of land use/land cover (LULC) and anthropogenic heat flux (AHF) on the structure of the urban boundary layer in the Pearl River Delta region, China, through a series of numerical experiments. However, it is important to acknowledge that numerical simulation is a simplification of the real world and cannot replace actual observations. Observational studies of UHIs are arguably more robust in their findings (Hu et al., 2016; Chakraborty and Lee, 2019; Dewan et al., 2021) and can mainly be categorized into the following three methods: (1) in situ (field) measurement, (2) mobile measurements, and (3) remote sensing technology.
In situ (field) measurements include conventional measurements from national meteorological stations which are usually located in rural areas and high-density microclimate observations from experiments or high-density automatic sites over various underlying surfaces. It is easy to compare long-term series of air temperature (AT) between urban and rural stations based on meteorological observation data (Liu et al., 2006, 2008; Qiu et al., 2008; Yang et al., 2012; Scott et al., 2018; Nganyiyimana et al., 2020). With the analysis of meteorological data in a long time series, the contribution and trend changes of UHI intensity (UHII) can be clearly discovered. Meanwhile, however, due to the limitations of meteorological sites in terms of their spatial representation, it is difficult to build a comprehensive understanding of the spatial distribution of urban thermal environment parameters (such as urban canopy temperature, land surface temperature (LST) and vegetation) (Liu et al., 2008; Nganyiyimana et al., 2020). To overcome these limitations, high-density observation stations are used to explore the spatial distribution of the urban thermal environment and its relationship with the surrounding environment (Hu et al., 2016; Bassett et al., 2016; Ching et al., 2018; An et al., 2020). Deploying denser observation stations or urban microclimate surveys can to some extent compensate for the limitation of a coarse spatial resolution. However, such approaches are usually unsuitable for large-scale studies due to restrictions imposed by certain natural conditions, social activities, as well as the high cost of construction and maintenance (An et al., 2020). For example, mobile transect surveys have been used in many studies (Merbitz et al., 2012; Akdemir and Tagarakis, 2014; Hankey and Marshall, 2015; Al-Ameri et al., 2016; Liu et al., 2017; Popovici et al., 2018), as they can easily obtain the distribution of parameters along a designed route using only a set of equipment attached to a mobile vehicle. However, it is rather costly to obtain observations at a fine resolution, broad coverage, and high synchronicity with such an approach.
To overcome these possible issues, LST data from aerial sensors and Earth-observing satellites are commonly employed in UHI studies, and so remote sensing data such as those from the Advanced Very High Resolution Radiometer (AVHRR) (Roth et al., 1989; Caselles et al., 1991; Gallo et al., 1993a), Landsat (Chen et al., 2007; Zhou et al., 2015; Zhao et al., 2016), MODIS (Peng et al., 2012; Zhou et al., 2015; Li et al., 2017; Yang et al., 2018; Chakraborty and Lee, 2019), aerial images (Buyadi et al., 2013; Heusinkveld et al., 2014; Yu et al., 2020), and so on (Zhao et al., 2020; Gallo et al., 1993b; Qin et al., 2001; Chakraborty et al., 2020) are widely used to explain the spatial distribution of the surface UHI and its relationship with the local environment (e.g., LULC). Remote sensing data have good application prospects, as they can provide fine resolution and wide data coverage at times when other ground-based observations cannot. However, due to the influence of precipitation and clouds, the retrieval of LST sometimes can be challenging. In addition, each satellite remote sensing dataset has its own characteristics (Zhao et al., 2016; Chakraborty and Lee, 2019). For example, Landsat images have a high spatial resolution (30 m) that can show urban block sizes, but the temporal resolution is rather low (16 d). The MODIS LST dataset has the advantage of high temporal resolution (four times per day), but the spatial resolution is only 1 km (Yang et al., 2018).
LST derived by satellites has become an important indicator for exploring variation characteristics of the SUHI, because LST is closely related to the land cover type/structure, population density, anthropogenic heat release, etc., and it also can significantly influence surface air temperature, wind field, humidity, and surface fluxes in the urban region (Ho et al., 2016; Yang et al., 2019; Li et al., 2020, 2021). However, the LST can only quantify the SUHI effect, which is seriously affected by meteorological factors, e.g., clouds and evaporation. In contrast, as an important indicator reflecting the energy exchange between the atmosphere and land in the urban canopy, AT is more representative than LST. In particular, AT is more related with human health and ecological changes in cities (Ho et al., 2016). While UHI studies based on AT observed by meteorological sites suffer from limited spatial coverage, which impedes a comprehensive understanding of the influencing factors and causes of canopy UHI (CUHI). Thus, there is an urgent need to develop rapid, high-spatiotemporal-resolution AT, and refined CUHI intensity (CUHII) estimation methods to explore the mechanisms under which anthropogenic factors (e.g., urban land-use changes, anthropogenic heat emissions, urban morphology, and size) and natural factors (e.g., meteorological conditions and geographical differences) influence the CUHIs of complex and diverse cities.
Therefore, in this study, we (1) based on remote sensing data, AT and wind speed data as well as other environmental information from meteorological observations, retrieved the AT data at a 30 m spatial and 1 h temporal resolution in the study area by using machine learning; (2) calculated the CUHII distribution based on the retrieved AT data, and further explored the shape, intensity, and influencing factors of the CUHI by combining local LULC, wind vector, and urban morphology data.
Nanjing, the capital city of Jiangsu province in China, is located along the
lower reaches of the Yangtze River and, as part of the Yangtze River Delta
urban agglomeration, has a high level of urbanization. In fact, Nanjing has
been experiencing rapid urbanization since China's economic reform in 1978.
According to the National Bureau of Statistics, the population in Nanjing
increased from 6.13 million in 2000 to 8.34 million inhabitants in 2018. In 2016, the
built-up area of Nanjing expanded to 773.79 km
All of the satellite remote sensing data employed in this study are from the
geospatial data cloud (
Anthropogenic heat flux of Nanjing city and locations of
high-density automatic meteorological stations in Nanjing with recorded air
temperature:
High-density automatic meteorological observation data, including AT (with
resolution of 0.5
In addition to global climate change, the influence of human activities on
the CUHI cannot be ignored. Previous studies have pointed out that AHF is
closely related to the change in built-up areas and population density
around the stations, which reflects the fact that the effects from both
anthropogenic emissions and land-use change are related to latent heat flux
and sensible heat flux (Zhou et al., 2012; Y. Yang et al., 2020a; L. Wang et
al., 2020; Zhang et al., 2021). Therefore, AHF was retrieved via a physical
method (Chen and Shi, 2012; Chen et al., 2012, 2014)
based on 1000 m spatial resolution NOAA nighttime lighting data and with
local economic development and energy consumption data, and the AHF data at
the same time in Nanjing were provided by Chen and Shi (2012) and Chen et al. (2012, 2014). Note that the AHF here varied
annually. We expect that AHF distribution can shape the main morphology of
urban thermal environment. We cannot get AHF data at diurnal and seasonal
scales. In future, if we obtain high-temporal-resolution AHF data, we will
update them in the model. And lastly, the digital elevation model (DEM) data
(30 m spatial resolution) used in this study are based on the second version
of ASTER-GDEM, which is provided by the Geospatial Data Cloud site, Computer
Network Information Center, Chinese Academy of Sciences
(
The random forest (RF) model is a highly flexible machine learning algorithm that can analyze data with missing values or noise and has good anti-interference ability. To date, the RF model has been widely used as a feature selection tool for high-dimensional data to, for example, identify the importance of variables and predict or classify related variables. In this study, an RF model was constructed for each time's dataset to evaluate the AT using the RF package in R language.
The process of urbanization will have a significant impact on CUHIs (Zhou et al., 2015). To comprehensively take into account the local urban environment, 18 factors were selected as independent variables, including anthropogenic parameters (i.e., AHF), geometric parameters (distance from the city center, proportion of LULC area, altitude, longitude, latitude, slope, aspect), and physical parameters: proportion of impervious surface (IS) area, albedo, normalized difference vegetation index (NDVI), normalized difference built-up index (NDBI), green normalized difference vegetation index (gNDVI), soil-adjusted vegetation index (SAVI), and normalized difference moisture index (NDMI). Their sources and spatial resolution are summarized in Table 1. The inversion methods for these environmental variables were as follows: based on Landsat 8 OLI satellite data, the LULC in Nanjing was divided into four broad categories (built-up, cropland, vegetation, and water body) by combining a support vector machine method and visual interpretation. The remote sensing indices were calculated using corresponding bands (Yang et al., 2012; Shi et al., 2015). The IS and surface albedo data were extracted via multi-band information (Son et al., 2017; Liang, 2001). Then, the geometric center of the built-up area was calculated as the city center, and the distances between the meteorological stations and the city center were calculated. Slope and aspect were calculated based on the DEM data using ArcMap 10.2. The methods used for extracting the IS data and calculating the remote sensing indices and surface albedo are given in Sect. S1, together with the accuracy of IS and albedo. All the above data (except for DEM, aspect, and slope) were extracted for each of the 3 years corresponding to the three selected Landsat images. Taking the data on 21 July 2017 as an example, Fig. 2 shows the spatial distribution of some of the environmental parameters, i.e., IS, distance from city center, LULC, and NDVI, where high spatial consistency between these parameters and the urban structure can be seen. For example, high-density built-up areas correspond closely to high AHF and low vegetation cover.
Independent variables with their sources and spatial resolution.
Notes: DEM, digital elevation model; IS, impervious surface; NDVI, normalized difference vegetation index; NDBI, normalized difference built-up index; gNDVI, green normalized difference vegetation index; SAVI, soil-adjusted vegetation index; NDMI, normalized difference moisture index; AHF, anthropogenic heat flux.
Spatial distribution of typical environmental variables on 21 July
2017 in Nanjing:
Due to advection and turbulent transport, neighborhood surroundings can affect the local temperature (Yang et al., 2012; Shi et al., 2015). Therefore, a fixed buffer zone was built surrounding the meteorological stations. Within the buffer zone of each station the proportion of IS area and that of each LULC type, and the average values of surface albedo, AHF, NDVI, NDBI, SAVI, gNDVI, and NDMI were calculated. Together with longitude, latitude, altitude, and distance to the city center, these parameters were fed into the RF model as independent variables, with AT as the target variable. In addition, to find out the optimal size of the buffer zones for the model, we compared the model performances for different buffer zone sizes, i.e., buffer zones with a radius of 500, 1000, 2000, and 5000 m, respectively. Figure 3 summarizes the research framework of this paper.
Flowchart for constructing the RF model and evaluating the CUHII (canopy layer urban heat island).
This paper uses the coefficient of determination (
The cross validation (CV) method can be used to evaluate the performance of
the RF model (Zheng et al., 2020). In this paper, we employ the
5-fold CV method, in which the entire dataset is randomly divided into
five subsets – each time four subsets are used to train the RF model, and
the remaining one is used for validating. After constructing the model, the
validation data are used to calculate the current
Since not every variable in the model makes a prominent contribution to the
performance, deleting those variables that can reduce the prediction
accuracy can improve the performance and simplify the model. Therefore, the
number of variables should be minimized on the premise of improving or not
affecting the performance of the model. The contribution of each variable is
judged by two indicators: the percentage increase in mean-square error
(%IncMSE) and the percentage increase in node purity (IncNodePurity).
Using the backward selection method, the variable with the smallest
contribution is identified and removed, and the model is re-run. These steps
are then repeated until only one variable remains. The
The
To build an RF model, two important parameters need to be set: the number of
decision trees (Ntree) and the number of variables sampled at each node
(Mtry). The RF models were established with Ntree from 50 to 1200, with 50
as the step length, and Mtry from 1 to 16 respectively, with 1 as the step
length to traverse all the parameters. Figure 4 presents the
The principle of parameter selection is to choose a simpler model (smaller Ntree and Mtry) under the premise of good performance. In the end, the optimal Mtry and Ntree based on the datasets on 11 August 2013, 2 September 2015, and 21 July 2017 were 7 and 200, 10 and 150, and 7 and 50, respectively.
Table 2 compares the performance of the RF model with different buffer sizes
(500, 1000, 2000, and 5000 m) in the 5-fold CV. The RF model based
on the dataset on 11 August 2013 and 2 September 2015 within 1 km buffer
zones performed best, with an
In addition, three methods of AT modeling were also compared – two linear
regressions – stepwise linear regression (Alonso and Renard, 2019; Mira
et al., 2017) and geographically weighted regression (GWR) (L. Wang et al.,
2020; Li et al., 2021) – and one nonlinear regression (the RF model; Alonso and Renard, 2020). A detailed description of the linear
regression methods is provided in Sect. S2. For each model, the combination
of variables with the largest
Figure 5 compares the measured AT of the high-density automatic stations in
the training set or testing set and the predicted AT of the RF model in the
5-fold CV. In general, a large number of scattered points of predicted
and observed AT are clustered around the 1 : 1 line, indicating good
performance of the model. In the training set, the average
Scatterplot of predicted and observed air temperature: 5-fold
cross validation (CV) for the training set on
Furthermore, we used %IncMSE and IncNodePurity to determine the contribution of each variable (Table 4) and to compare their importance. The NDVI, and the proportion of IS, vegetation, and water body area all appeared in the three models, indicating that vegetation, water bodies, and human activities have important and universal impacts on the AT distribution. The distance to the city center appeared in the model based on the data on 2 September 2015 and 21 July 2017, and ranked high, implying the impact of urbanization on the heat island.
Importance of input variables for the RF model of AT estimation on the three different days. Date format: dd/mm/yyyy.
Notes: NDVI, normalized difference vegetation index; IS, impervious surface; AHF, anthropogenic heat flux; DEM, digital elevation model; NDBI, normalized difference built-up index; gNDVI, green normalized difference vegetation index; SAVI, soil-adjusted vegetation index.
The predicted relative error of the air temperature by random
forest:
The absolute error for RF prediction is defined as difference in predicted
AT and observed AT at each weather station(See Fig. S2). The relative error
is defined as that absolute error divided by observed AT, which is shown in
Fig. 6. In general, the mean relative (absolute) errors by all stations
are 0.07 % (0.014
To validate robustness of this RF framework and its practicality at a long
period, hourly meteorological AT observations during August 2013, September
2015, and July 2017, and corresponding environment variables were chosen to
establish the RF model. The temperature differences in a month are larger,
showing more complicated situations. For 5-fold CV, a scatterplot of
predicted and observed air temperature is given in Fig. 7, showing that
the mean RMSEs are 0.75, 0.52, and 0.59
Scatterplot of predicted and observed air temperature using data
in a 1-month 5-fold CV for the testing set on
After establishing the model, a 2 km buffer area was created for each
30 m resolution pixel and the same 18 independent variables were calculated.
The constructed RF model took these pixel-wise variables as input and output
AT for each pixel, and hence we obtained the RF model–predicted AT map at
30 m resolution (Fig. 8). LST is also a physical manifestation of surface
energy and moisture flux exchange between the atmosphere and the biosphere.
Previous studies point out that there is a relationship between LST and AT
(Mutiibwa et al., 2015; Benali et al., 2012); therefore,
Fig. 9 shows the LSTs of Nanjing on these days, which were retrieved by
using Google Earth Engine. CUHII is an important indicator to quantify the
UHI effect, which is usually defined as the difference in AT at the same
level between urban and rural areas (Y. Yang et al., 2020b; Nganyiyimana et
al., 2020), as follows:
Spatial distribution of AT in Nanjing and the
reference rural area:
Spatial distribution of the LST in Nanjing:
Spatial distribution of the CUHII in Nanjing:
Figure 8 shows that the AT on 11 August 2013 and 21 July 2017 was higher and
that the AT ranges were 35.4–37.8 and 33.6–36.4
Against different weather backgrounds, the spatial distributions of AT and CUHII exhibit heterogeneity in urban Nanjing on different days. The high-AT area on 11 August 2013 extended from the city center to a wide range, and the extreme value of AT was the highest (Fig. 8a), corresponding to the strongest CUHI (Fig. 10a). Combined with Fig. 2, we can see only a small range of vegetation coverage and water bodies in the central urban area, so the CUHII decreased slightly. Only in the suburban water body and farmland areas were there large cold island areas, and only on this day, the distribution of LST corresponds to that of AT. On 2 September 2015, the high-AT area was relatively small to the north of the Yangtze River. The AT on the Yangtze River was the lowest (Fig. 8b), with the strongest cold island here (Fig. 10b). The high-AT area extended from the central city to the south, and the cold islands in the southern water body and vegetation-covered areas were not significant. On 21 July 2017, the distribution of the heat island was the opposite. There was a large area of high AT to the north of the Yangtze River, and the cooling effect of the Yangtze River was weak (Fig. 8c). Meanwhile, the AT in the southern suburbs dropped significantly, and cold islands widely spread in water body and cropland areas (Fig. 10c). Compared with the distribution of CUHII on 11 August 2013, the AT over the water bodies and hills in the northeast of the central city was lower, forming a large and strong cold island area.
However, note that the distributions of LST at these three times are similar, and they are all strongly related to urban form and LULC (Li et al., 2021). This is because different factors caused different spatial distribution between LST and AT. Ground transfers heat to the air through radiation, conduction, and convection after absorbing solar energy, which is the main source of heat in the air (Hong et al., 2018; Khan et al., 2020). While LST is directly heated by solar energy, which is more sensitive to emissivity, surface material and humidity, which are related to LULC, tend to have greater temperature differences for different LULC types (Janatian et al., 2016; Long et al., 2020). The LULC types in these periods are similar, so the LST differences are marginal.
Area occupied by different levels of urban heat island intensity on
different days (km
To further explore the intensity and coverage of the CUHI on different days,
the area (km
According to previous studies, three factors – the wind vector field (He, 2018), LULC (Cao et al., 2018; R. Wang et al., 2020) and the urban structure (Shahmohamadi et al., 2011; Li et al., 2020) – are the most important influencing factors of CUHIs. In this section, we explore these three drivers of CUHI in Nanjing.
The horizontal air flow has a significant impact on the intensity and shape of the CUHI (He et al., 2021). Figure 11 shows the wind vector field observed by weather stations on the three days analyzed in our study.
Wind vector field in Nanjing on
On 11 August 2013, the average wind speed at the stations was 0.70 m s
On all three days, the wind speed in the suburban areas was higher than that in the central city, and this is because there is no shelter provided by tall and dense buildings in the suburban areas, which is conducive to cooling from air convection and therefore a weakening of the CUHII (P. Yang et al., 2020). That said, records show that, surprisingly, the boundary-layer mean wind speed in a city can be higher than its rural counterpart. On the one hand, Nanjing is traversed by the Yangtze River, and the central city surrounds a large area of water, wherein the low surface roughness of the water is conducive to air convection. On the other hand, channeling/the Venturi effect might be an important factor. When the prevailing wind is parallel to the axis between buildings, it will be forced to enter between the buildings, resulting in higher wind pressure, which increases the wind speed (Droste et al., 2018).
Relationship between CUHII
and wind speed around all meteorological stations on
In order to quantify the relationship, the average CUHII and standard
deviation under different wind speeds at various meteorological stations
were calculated (Fig. 12). On 11 August 2013, the maximum wind speed was 2 m s
There are two aspects concerning the influence of air convection on CUHIs.
On the one hand, air convection will facilitate horizontal advection cooling
between urban and rural areas, thereby weakening the CUHI (Brandsma et al., 2003). The greater the wind speed, the more significant the
cooling effect (Fig. 12). On the other hand, horizontal convection
transfers heat from the upwind to the downwind area, weakening the upwind
CUHII and strengthening the downwind CUHII (Bassett et al., 2016)
(Figs. 10 and 11). Under different wind speeds, the synergy of these two
aspects differs significantly. On 11 August 2013, the average wind speed was
the smallest among the three days at only 0.7 m s
In contrast, CUHII distribution is in good agreement with LST distribution on 11 August 2013, while the large pattern difference during the other two days (Figs. 9 and 11). This is because calm wind on 11 August 2013 cannot induce horizontal advection of urban heat; therefore, spatial distributions of LST and AT are well matched in this day. However, under large wind conditions (e.g., larger wind speeds on both 2 September 2015 and 21 July 2017), there is obvious urban heat island advection (Bassett, et al., 2016), resulting in different patterns between CUHII and SUHI during these two days (Figs. 9 and 11).
LULC also has a significant impact on CUHII (Li et al., 2020; Zong et al., 2021) and LST (Yang et al., 2018; Li et al., 2021). The average values and standard deviation of
CUHII were calculated for each LULC type on the three days (Fig. 13). On
11 August 2013, the CUHII in the built-up area was the strongest, exceeding
1.1
Mean CUHII and standard
deviations over different LULC on
Different LULC types have different effects on AT due to their own
intrinsic physical properties, mainly reflected in three aspects:
Due to the good thermal conductivity and small specific heat capacity of
the surface material in the built-up area, the ability to absorb shortwave
radiation during the day is stronger than that of other land uses. The LST
is significantly higher than that of the suburbs, and therefore the
atmosphere is easily heated (Hong et al., 2018). Due to sufficient water availability in cropland and vegetation-covered
areas, evaporation will increase the latent heat flux and cooling effect
(Zhao et al., 2020; Zheng et al., 2018). In contrast, the surface
humidity of the built-up area is low, with low corresponding latent heat
flux. The difference in latent heat flux will increase the difference in LST
and AT between urban and rural areas. The latent heat flux of the water
bodies is the largest, and the cooling effect is the most obvious. There is a significant correlation between LULC and wind speed
(Chen et al., 2020). Areas with tall buildings in built-up areas have
high surface roughness and low wind speed, whereas water bodies have low
surface roughness and high wind speed. The surface roughness of
vegetation-covered areas and cropland is somewhere between. The air
convection will increase the sensible heat flux and reduce the AT (Sect. 4.2.1). Therefore, LULC and air convection will jointly enhance or weaken
the CUHII.
On 11 August 2013, the average wind speed and the difference in wind speed
between different LULC types were small and so was the difference in
sensible heat flux. The difference in radiation and sensible heat flux was
the main factor. On 21 July 2017, the average wind speed was the highest,
and the synergy in the three aspects led to the CUHII over different LULC
types being highest in the built-up area, followed by cropland, vegetation,
and then water bodies. On 2 September 2015, the CUHII was highest in the
built-up areas, followed by vegetation, cropland, and then water bodies.
This was due to the influence of low wind speeds, which would have produced
heat transfer and made the CUHII shift from the built-up area to other LULC
types (Sect. 4.2.1).
Human activities and urbanization have a significant impact on the spatial distribution of UHI (Shahmohamadi et al., 2011; Li et al., 2020). To explore this influence, concentric rings with various radii (5, 10, 15, 40 km) were created surrounding the city center. Within each ring, the average values and error ranges of AHF and CUHII, along with the average proportion of built-up area, were calculated. Figure 14 shows that the CUHII, AHF, and proportion of built-up area all significantly decrease with increasing distance to the city center.
Changes in air temperature, AHF, and the
proportion of built-up areas with distance from the city center on
From a longitudinal perspective, the AHF and the proportion of built-up
areas both increased year by year. The built-up areas of Nanjing on the
three days were 982.78, 1076.19, and 1220.36 km
Based on the RF model and combined with local environment and background weather data, the pattern and causes of CUHIs can be analyzed in detail. On 11 August 2013, Nanjing experienced a heatwave, with almost no horizontal convection of air (Fig. 11a). In dry areas, such as built-up areas, the latent heat flux remained unchanged, but the high reflectivity of the surface raised the AT. In the heatwave period, the higher AT increased the latent heat flux in rural areas (Khan et al., 2020). For example, vegetation and water bodies alleviated the increase in AT in rural areas. This combined effect exacerbated the difference in AT between the urban and rural areas, making the overall CUHI the strongest (Nganyiyimana et al., 2020; Meili et al., 2021). In Fig. 10a, it can be seen that the cooling efficiency of vegetation in the urban area was not high and the coverage of the cooling area was small. This is because the stomata of leaves would have been closed under high AT and dry weather, resulting in reduced evapotranspiration and increased AT (Manoli et al., 2019). On 2 September 2015, northwesterly winds prevailed (Fig. 11b), and there was abundant water vapor over the hills of northeast Nanjing and over the Yangtze River. The increase in latent heat flux and horizontal convection cooling lowered the CUHII. Cold islands even appeared to the north of the Yangtze River. The CUHII in the southeast direction was strong (Fig. 10b), which was mainly affected by the heat transport of the prevailing winds (Chuanyan et al., 2005), causing the CUHI to shift toward the downwind area. On 21 July 2017, southwesterly winds prevailed in Nanjing, with high wind speed, decreasing the CUHII in the upwind region (Figs. 10c and 11c). However, there were large areas of vegetation coverage in the range of 10–20 km in the downwind region, where was affected by the combined effects of land use and horizontal advection cooling, leading to lower CUHII there than that of 20–30 km. This also confirms the conclusion (Bassett et al., 2016) that the upwind horizontal advection cooling has the strongest correlation with the weakening of the CUHI effect, and that the downwind region is affected by the wind speed.
Spatial distribution of canopy urban heat island intensity
(CUHII) in Nanjing during a heatwave period:
There are four main methods for retrieving AT for CUHII assessment:
Statistical methods (Prihodko and Goward, 1997; Alonso and Renard,
2020; Li et al., 2021): statistical models of environmental factors and
temperature are established to evaluate the AT, such as multiple linear
regression models, partial least-squares regression, and GWR. In previous
study (Alonso and Renard, 2020), two methods of AT prediction (namely,
stepwise linear regression and GWR) were compared with the RF model. The RF
model has the highest accuracy and effectively avoids the problem of
autocorrelation by filtering variables, which is consistent with previous
work (Yoo et al., 2018; Zhu et al., 2019) and our present work, while
conventional statistical methods, in addition, cannot effectively solve
nonlinear problems (Oh et al., 2020). Temperature–vegetation index method (VTX) (Stisen et
al., 2007; Vancutsem et al., 2010): this refers to inversion using the
relationship between AT, LST, and vegetation index under the premise that the
temperature of a dense vegetation canopy is similar to the AT. While VTX
only indicates the relationships between underlying surface, LST, and AT. In
fact, there are many factors that can affect AT, e.g., anthropogenic heat,
altitude, and distance to city. Ignoring these factors, the
accuracy of VTX method was low (Stisen et al., 2007). In
contrast, our RF model input multiple variables, including more affecting AT
factors. Physical model methods: this category mainly constitutes the energy
balance method (Yang et al., 2018), which refers to the study of AT
inversion using the principle of energy balance. The physical model approach is
relatively complex, and the performance is highly dependent on the
understanding of the mechanism affecting AT, which can only address specific
problems, while the RF framework in this paper is relatively simple,
comprehensive, and suitable for different weather backgrounds. Machine learning methods (Venter et al., 2020): predictions are
made by establishing models of various variables and AT, such as RF models
or neural networks. Compared with other machine learning methods such as
neural networks (Astsatryan et al., 2021), the RF model has better
noise immunity and is suitable for small sample sizes in this study. Other
machine learning methods usually require a lot of data with little noise, so
the data cleaning before modeling will take more time. In future, we would
like to compare different machine learning methods to come up with a
consistently well-performing model, e.g., SVM and ANN. We will also use
stacking ensemble strategy to combine the advantages of different models and
get the best prediction results.
The RF prediction framework proposed in this work not only can dynamically predict CUHII in detail and high frequency within highly heterogeneous cities but can also be built against different weather backgrounds, mainly because the environmental parameters entered into the model are relatively stable within a certain period (such as the same month or season). As long as the environmental parameters are acquired once, they can be combined with the AT data in real time to establish the RF model, and the spatial distribution characteristics of CUHII with high temporal and spatial resolution can be obtained. For instance, we randomly predicted the 30 m resolution AT and spatial distribution of CUHII (Fig. 15) with the wind vector field (Fig. S3) during the heatwave period of 12–14 August 2012, thereby supporting those involved in making decisions with respect to urban climate, urban planning, and urban energy consumption. Particularly, the potential that our proposed model can be used cross a short period as most of the environmental parameters fed to the model probably can remain stable for some time, e.g., 1 month or even longer.
Due to changes in local weather conditions (e.g., precipitation and cloud cover), however, there are various satellite-based LST samples and LST is usually dynamical in 1 month, leading to uncertainties in predicting AT; therefore, LST is not suitable to be an input variable for our present model of CUHII. Except for human activities and LULC, the background weather conditions (such as heatwaves, air pollution, atmospheric circulation, and cloud cover) are also extremely important (Bassett et al., 2016; P. Yang et al., 2020; Khan et al., 2020), which should be introduced to improve the RF model of CUHII.
Taking Nanjing as an example and using remote sensing data with data from local weather stations, parameters to characterize the urban environment were constructed, e.g., anthropogenic parameters (i.e., AHF), geometric parameters (distance from city center, proportions of LULC types by area, altitude, and latitude and longitude, slope, and aspect), and physical parameters (proportion of IS, surface albedo, NDVI, NDBI, SAVI, gNDVI, and NDMI). A 2 km buffer zone was created around the meteorological stations, and the observed environmental parameters were extracted. A refined assessment framework of CUHII was then established by using random forest model with observed AT and environmental variables.
Results showed that the correlation coefficient between the predicted and
observed AT was 0.731, and the average RMSE was 0.719
In general, overlapping the refined CUHII with local environmental variables and weather conditions helps to explore the causes of CUHIs in more detail, instead of being limited to the location of meteorological sites and frequent changes in various types of weather. The new 30 m resolution CUHII evaluation framework developed in this study has strong portability and important practical value. Our findings are helpful for improving our understanding of the relationship between human activities and regional climate change, which can provide important guidance for urban development planning and allocation of public resources in the context of global warming and rapid urbanization.
The model in this paper is based on the random forest data package in the R language, and our implementation and analysis code are available upon request to the corresponding author (yyj1985@nuist.edu.cn).
Landsat 8 OLI datasets
(
The following data are available online: Sect. S1: specific inversion steps of related environmental variables. Section S2: stepwise linear regression and geographically weighted regression. Section S3: table and caption. Table S1: band ranges and the main use of Landsat 8 OLI. Section S4: figures and captions. Figure S1: the performance of the RF models under different variable combinations. Figure S2: the predicted error of the air temperature by random forest: (a) 11 August 2013; (b) 2 September 2015; (c) 21 July 2017. Figure S3: spatial distribution of air temperature and wind vector field in Nanjing and the reference rural area during 12–14 August 2013. The supplement related to this article is available online at:
YY was responsible for conceptualisation, supervision, and funding acquisition. SC developed the software and prepared the original draft. SC and YY developed the methodology and carried out formal analysis. SC and YZ were responsible for data curation, validation, and visualisation. FD, YZ, DL, CL, ZG, and YY reviewed and edited the text.
The contact author has declared that neither they nor their co-authors have any competing interests.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
We are sincerely grateful to editor and three anonymous reviewers for their valuable time spent on reviewing our manuscript.
This research has been supported by the National Natural Science Foundation of China (grant nos. 42175098 and 42061134009) and the University Student Innovation Training Project of Nanjing University of Information Science and Technology (grant no. 201910300283).
This paper was edited by Cheng Liu and reviewed by three anonymous referees.