Comparison of aircraft measurements during GoAmazon2014/5 and ACRIDICON-CHUVA

The indirect effect of atmospheric aerosol particles on the Earth’s radiation balance remains one of the most uncertain components affecting climate change throughout the industrial period. The large uncertainty is partly due to the incomplete understanding of aerosol–cloud interactions. One objective of the GoAmazon2014/5 and the ACRIDICON (Aerosol, Cloud, Precipitation, and Radiation Interactions and Dynamics of Convective Cloud Systems)CHUVA (Cloud Processes of the Main Precipitation Systems in Brazil) projects was to understand the influence of emissions from the tropical megacity of Manaus (Brazil) on the surrounding atmospheric environment of the rainforest and to investigate its role in the life cycle of convective clouds. During one of the intensive observation periods (IOPs) in the dry season from 1 September to 10 October 2014, comprehensive measurements of trace gases and aerosol properties were carried out at several ground sites. In a coordinated way, the advanced suites of sophisticated in situ instruments were deployed aboard both the US Department of Energy Gulfstream-1 (G1) aircraft and the German High Altitude and Long-Range Research Aircraft (HALO) during three coordinated flights on 9 and 21 September and 1 OcPublished by Copernicus Publications on behalf of the European Geosciences Union. 662 F. Mei et al.: Comparison of aircraft measurements tober. Here, we report on the comparison of measurements collected by the two aircraft during these three flights. Such comparisons are challenging but essential for assessing the data quality from the individual platforms and quantifying their uncertainty sources. Similar instruments mounted on the G1 and HALO collected vertical profile measurements of aerosol particle number concentrations and size distribution, cloud condensation nuclei concentrations, ozone and carbon monoxide mixing ratios, cloud droplet size distributions, and downward solar irradiance. We find that the above measurements from the two aircraft agreed within the measurement uncertainties. The relative fraction of the aerosol chemical composition measured by instruments on HALO agreed with the corresponding G1 data, although the total mass loadings only have a good agreement at high altitudes. Furthermore, possible causes of the discrepancies between measurements on the G1 and HALO are examined in this paper. Based on these results, criteria for meaningful aircraft measurement comparisons are discussed.


Introduction
Dominated by biogenic sources, the Amazon basin is one of the few remaining continental regions where atmospheric conditions realistically represent those of the pristine or preindustrial era (Andreae et al., 2015). As a natural climatic "chamber", the area around the urban region of Manaus in central Amazonia is an ideal location for studying the atmosphere under natural conditions as well as under conditions influenced by human activities and biomass burning events (Andreae et al., 2015;Artaxo et al., 2013;Davidson et al., 2012;Keller et al., 2009;Kuhn et al., 2010;Martin et al., 2016b;Pöhlker et al., 2018;Poschl et al., 2010;Salati and Vose, 1984). The Observations and Modeling of the Green Ocean Amazon (GoAmazon2014/5) campaign was conducted in 2014 and 2015 (Martin et al., 2016b. The primary objective of GoAmazon2014/5 was to improve the quantitative understanding of the effects of anthropogenic influences on atmospheric chemistry and aerosol-cloud interactions in the tropical rainforest area. During the dry season in 2014, the ACRIDICON (Aerosol, Cloud, Precipitation, and Radiation Interactions and Dynamics of Convective Cloud Systems)-CHUVA (Cloud Processes of the Main Precipitation Systems in Brazil) campaign also took place to study tropical convective clouds and precipitation over Amazonia (Wendisch et al., 2016).
A feature of the GoAmazon 2014/5 field campaign was the design of the ground sites' location, which uses principles of Lagrangian sampling to align the sites with the Manaus pollution plume ( Fig. 1: source location -Manaus (T1 site), and downwind location -Manacapuru (T3 site)). The ground sites were overflown with the low-altitude US Department of Energy (DOE) Gulfstream-1 (G1) aircraft and the German High Altitude and Long Range Research Aircraft (HALO). These two aircraft are among the most advanced in atmospheric research, deploying suites of sophisticated and well-calibrated instruments (Schmid et al., 2014;Wendisch et al., 2016). The pollution plume from Manaus was intensively sampled during the G1 and HALO flights and also by the DOE Atmospheric Radiation Measurement (ARM) program Mobile Aerosol Observing System and ARM Mobile Facility located at one of the downwind surface sites (T3 site -70 km west of Manaus). The routine ground measurements with coordinated and intensive observations from both aircraft provided an extensive data set of multi-dimensional observations in the region, which serves (i) to improve the scientific understanding of the influence of the emissions of the tropical megacity of Manaus (Brazil) on the surrounding atmospheric environment of the rainforest and (ii) to understand the life cycle of deep convective clouds and study open questions related to their influence on the atmospheric energy budget and hydrological cycle.
As more and more data sets are merged to link the groundbased measurements with aircraft observations, and as more studies focus on the spatial variation and temporal evolution of the atmospheric properties, it is critical to quantify the uncertainty ranges when combining the data collected from the different platforms. Due to the challenges of airborne operations, especially when two aircraft are involved in data collection in the same area, direct comparison studies are rare. However, this type of research is critical for further combining the data sets between the ground sites and aircraft. Thus, the main objectives of the study herein are to demonstrate how to achieve meaningful comparisons between two moving platforms, to conduct detailed comparisons between data collected by two aircraft, to identify the potential measurement issues, to quantify reasonable uncertainty ranges of the extensive collection of measurements, and to evaluate the measurement sensitivities to the temporal and spatial variance. The comparisons and the related uncertainty estimations quantify the current measurement limits, which provide realistic measurement ranges to climate models as initial conditions to evaluate their output.
The combined GoAmazon2014/5 and ACRIDICON-CHUVA field campaigns not only provide critical measurements of aerosol and cloud properties in an undersampled geographic region but also offer a unique opportunity to understand and quantify the quality of these measurements using carefully orchestrated comparison flights. The comparisons between the measurements from similar instruments on the two research aircraft can be used to identify potential measurement issues and quantify the uncertainty range of the field measurements, which include primary meteorological variables (Sect. 3.1), trace gas concentrations (Sect. 3.2), aerosol particle properties (number concentration, size distribution, chemical composition, and microphysical properties) (Sect. 3.3), cloud properties (Sect. 3.4), and downward solar irradiance (Sect. 3.5). We evaluate the consistency be- tween the measurements aboard the two aircraft for a nearly full set of gas, aerosol particle, and cloud variables. Results from this comparison study provide the foundation not only for assessing and interpreting the observations from multiple platforms (from the ground to low altitude and then to high altitude) but also for providing high-quality data to improve the understanding of the accuracy of the measurements related to the effects of human activities in Manaus on local air quality, terrestrial ecosystems in the rainforest, and tropical weather.

Instruments
The ARM Aerial Facility deployed several in situ instruments on the G1 to measure atmospheric state parameters, trace gas concentrations, aerosol particle properties, and cloud characteristics Schmid et al., 2014). The instruments installed on HALO covered measurements of meteorological, chemical, microphysical, and radiation parameters. Details of measurements aboard HALO are discussed in the ACRIDICON-CHUVA campaign overview paper (Wendisch et al., 2016). The measurements compared between the G1 and HALO are listed in Table 1. Details on maintenance and calibration of the involved instrumentation can be found in the Supplement (Tables S1 and S2).

Atmospheric parameters
All G1 and HALO meteorological sensors were routinely calibrated to maintain measurement accuracy. The G1 primary meteorological data were provided at a 1 s time resolution based on the standard developed by the Inter-Agency Working Group for Airborne Data and Telemetry Systems (Webster and Freudinger, 2018). For static temperature measurement, the uncertainty given by the manufacturer (Emerson) is ±0.1 K, and the uncertainty of the field data is ±0.5 K. The static pressure had a measurement uncertainty of 0.5 hPa. The standard measurement uncertainties were ±2 K for the chilled mirror hygrometer and 0.5 ms −1 for wind speed.
On HALO, primary meteorological data were obtained from the Basic HALO Measurement and Sensor System (BAHAMAS) at a 1 s time resolution. The system acquired data from airflow and thermodynamic sensors and from the aircraft avionics and a high-precision inertial reference system to derive the basic meteorological parameters like pressure, temperature, the 3-D wind vector, aircraft position, and attitude. The water vapor mixing ratio and further derived humidity quantities were measured by the Sophisticated Hygrometer for Atmospheric Research (SHARC) based on direct absorption measurement by a tunable diode laser (TDL) system. The absolute accuracy of the primary meteorological data was 0.5 K for air temperature, 0.3 hPa for air pressure, 0.4-0.6 ms −1 for wind, and 5 % (±1 ppm) for water vapor mixing ratio. All sensors were routinely calibrated and traceable to national standards (Giez et al., 2017;Krautstrunk and Giez, 2012).

Gas phase
Constrained by data availability, the comparison of trace gas measurements is focused on carbon monoxide (CO) and ozone (O 3 ) concentrations. Those measurements were made aboard the G1 by a CO/N 2 O/H 2 O instrument (Los Gatos Integrated Cavity Output Spectroscopy instrument model 907-0015-0001), and an ozone analyzer (Thermo Scientific, model 49i), respectively. The G1 CO analyzer was calibrated for response daily by NIST-traceable commercial standards before the flight. Due to the difference between laboratory Table 1. List of compared measurements and corresponding instruments deployed aboard the G1 and HALO during GoAmazon2014/5. The acronyms are defined in a table at the end of this paper. D p indicates the particle diameter. D p refers to the size resolution.

Measurement variables
Instruments deployed on the G1 Schmid et al., 2014) Instruments deployed on HALO (Wendisch et al., 2016) Static and field conditions, the uncertainty of the CO measurements is about ±5 % for 1 s sampling periods. An ultrafast carbon monoxide monitor (Aero Laser GmbH, AL5002) was deployed on HALO. The detection of CO is based on vacuum-ultraviolet fluorimetry, employing the excitation of CO at 150 nm, and the precision is 2 ppb, and the accuracy is about 5 %. The ozone analyzer measures ozone concentration based on the absorbance of ultraviolet light at a wavelength of 254 nm. The ozone analyzer (Thermo Scientific, model 49c) in the HALO payload is very similar to the one on the G1 (model 49i), with an accuracy greater than 2 ppb or about ±5 % for 4 s sampling periods. The G1 ozone monitor was calibrated at the New York State Department of Environmental Conservation testing laboratory in Albany.

Aerosol
Aerosol number concentration was measured by different condensation particle counters (CPCs) on the G1 (TSI, CPC 3010) and HALO (Grimm, CPC model 5.410). Although two CPCs were from different manufacturers, they were designed using the same principle, which is to detect particles by condensing butanol vapor on the particles to grow them to a large enough size that they can be counted optically. Both CPCs were routinely calibrated in the lab and reported the data at a 1 s time resolution. The HALO CPC operated at 0.6-1 L min −1 , with a nominal cutoff of 4 nm. Due to inlet losses, the effective cutoff diameter increases to 9.2 nm at 1000 hPa, and 11.2 nm at 500 hPa Petzold et al., 2011). The G1 CPC operated at 1 L min −1 volumetric flow rate and the nominal cutoff diameter D 50 measured in the lab was ∼ 10 nm. During a flight, the cutoff diameter may vary due to tubing losses, which contributes less than 10 % uncertainty to the comparison between two CPC concentrations. Two instruments deployed on the G1 measured aerosol particle size distribution: a Fast Integrated Mobility Spectrometer (FIMS) inside the G1 cabin measured the aerosol mobility size from 15 to 400 nm (Kulkarni and Wang, 2006a, b;Olfert et al., 2008;Wang, 2009). The ambient aerosol particles were charged after entering the FIMS inlet and then separated into different trajectories in an electric field based on their electrical mobility. The spatially separated particles grow into supermicrometer droplets in a condenser where supersaturation of the working fluid is generated by cooling. At the exit of the condenser, a high-speed charge-coupled device camera captures the image of an illuminated grown droplet at high resolution. In this study, we used the FIMS 1 Hz data for comparison. The size distribution data from FIMS were smoothed. Aside from the FIMS, the airborne version of the Ultra High Sensitivity Aerosol Spectrometer (UHSAS) was deployed on G1 and HALO. The G1 and HALO UHSASs were manufactured by the same company, and both were mounted under the wing on a pylon. UHSAS is an opticalscattering, laser-based particle spectrometer system. The size resolution is around 5 % of the particle size. The G1 UHSAS typically covered a size range of 60 to 1000 nm. HALO UH-SAS covered a 90 to 500 nm size range for the 9 September flight.
Based on operating principles, FIMS measures aerosol electrical mobility size, and UHSAS measures the aerosol optical equivalent size. Thus, the difference in the averaged size distributions from those two types of instruments might be linked to differences in their underlying operating principles, such as the assumption in the optical properties of aerosol particles. The data processing in the G1 UHSAS assumed that the particle refractive index is similar to ammonium sulfate (1.55), which is larger than the average refractive index (1.41-0.013i) from a previous Amazon study (Guyon et al., 2003). The HALO UHSAS was calibrated with polystyrene latex spheres, which have a refractive index of about 1.572 for the UHSAS wavelength of 1054 nm. The uncertainty due to the refraction index can lead to up to 10 % variation in UHSAS measured size (Kupc et al., 2018). Also, the assumption of spherical particles affects the accuracy of UHSAS sizing of ambient aerosols.
The chemical composition of submicron non-refractory (NR-PM 1 ) organic and inorganic (sulfate, nitrate, ammonium) aerosol particles was measured using a high-resolution time-of-flight aerosol mass spectrometer (HR-ToF-AMS) aboard the G1 (DeCarlo et al., 2006;Jayne et al., 2000;Shilling et al., 2013Shilling et al., , 2018. Based on the standard deviation of observed aerosol mass loadings during filter measurements, the HR-ToF-AMS detection limits for the average time of 13 s are approximately 0.13, 0.01, 0.02, and 0.01 (3σ values) µg m −3 for organic, sulfate, nitrate, and ammonium, respectively (DeCarlo et al., 2006). A compact timeof-flight aerosol mass spectrometer (C-ToF-AMS) was operated aboard HALO to investigate the aerosol composition. Aerosol particles enter both the C-ToF-AMS and HR-ToF-AMS via constant pressure inlets controlling the volumetric flow into the instrument, although the designs of the inlets are somewhat different (Bahreini et al., 2008). The details about the C-ToF-AMS operation and data analysis are reported Schulz et al. (2018). The overall accuracy has been reported as ∼ 30 % for both AMS instruments (Alfarra et al., 2004;Middlebrook et al., 2012). Data presented in this section were converted to the same conditions as the HALO AMS data, which are 995 hPa and 300 K. Both AMS instruments were calibrated before and after the field deployment and also once a week during the field campaign.
The number concentration of cloud condensation nuclei (CCN) was measured aboard both aircraft using the same type of CCN counter from Droplet Measurement Technologies (DMT, model 200). This CCN counter contains two continuous-flow, thermal-gradient diffusion chambers for measuring aerosols that can be activated at constant supersaturation. The supersaturation is created by taking advantage of the different diffusion rates between water vapor and heat. After the supersaturated water vapor condenses on the CCN in the sample air, droplets are formed, counted, and sized by an optical particle counter (OPC). The sampling frequency is 1 s for both deployed CCN counters. Both CCN counters were calibrated using ammonium sulfate aerosol particles in the diameter range of 20-200 nm. The uncertainty of the effective water vapor supersaturation was ±5 %. (Rose et al., 2008)

Clouds
Aircraft-based measurements are an essential method for in situ samplings of cloud properties Wendisch and Brenguier, 2013). Over the last 50-60 years, hot-wire probes have been the most commonly used devices to estimate liquid water content (LWC) in the cloud from research aircraft. Since the 1970s, the most widely used technique for cloud droplet spectra measurements has been developed based on the light-scattering effect. This type of instrument provides the cloud droplet size distribution as the primary measurement. By integrating the cloud droplet size distribution, additional information such as LWC can be derived from the high-order data product.
Three cloud probes from the G1 are discussed in this paper. The cloud droplet probe (CDP) is a compact, lightweight, forward-scattering cloud particle spectrometer that measures cloud droplets in the 2 to 50 µm size range (Faber et al., 2018). Using state-of-the-art electro-optics and electronics, Stratton Park Engineering (SPEC Inc.) developed a fast cloud droplet probe (FCDP), which also uses forward scattering to determine cloud droplet distributions and concentrations in the same range as CDP with up to 100 Hz sampling rate. The G1 also carried a two-dimensional stereo probe (2DS, SPEC Inc.), which has two 128-photodiode linear arrays working independently. The 2DS electronics produce shadowgraph images with 10 µm pixel resolution. Two orthogonal laser beams cross in the middle of the sample volume, with the sample cross section for each optical path of 0.8 cm 2 . The manufacturer claims the maximum detection size is up to 3000 µm for the 2DS. However, due to the counting statistic issue, the data used in this study are from 10 to 1000 µm only (Lawson et al., 2006). 2DS was upgraded with modified probe tips, and an arrival time algorithm was applied to the 2DS data processing. Both efforts effectively reduced the number of small shattered particles (Lawson, 2011). For G1 cloud probes, the laboratory calibrations of the sample area and droplet sizing were performed before the field deployment. During the deployment, weekly calibrations with glass beads were performed with the size variation of less than 5 %, which were consistent with the pre-campaign and post-campaign calibrations. Comparison between the LWC derived from cloud droplet spectra with hot-wire LWC measurement was made to estimate/eliminate the coincidence errors in cloud droplet concentration measurements (Lance et al., 2010;Wendisch et al., 1996).
Aboard HALO, two cloud probes were operated and discussed in this paper, each consisting of a combination of two instruments: cloud combination probe (CCP) and a cloud aerosol precipitation spectrometer (CAPS, denoted as NIXE-CAPS; NIXE: Novel Ice Experiment). The CCP is a combination of a CDP (denoted as CCP-CDP) with a CIPgs (cloud imaging probe with greyscale, DMT, denoted as CCP-CIPgs). NIXE-CAPS consists of a CAS-Dpol (cloud and aerosol spectrometer, DMT, denoted as NIXE-CAS) and a CIPgs (denoted as NIXE-CIPgs). CIPgs is an optical array probe comparable to the 2DS operated on the G1. CIPgs obtains images of cloud elements using a 64-element photodiode array (15 µm resolution) to generate two-dimensional images with a nominal detection diameter size range from 15 to 960 µm (Klingebiel et al., 2015;Molleker et al., 2014). The CCP-CDP detects the forward-scattered laser light by cloud particles in the size range of 2.5 to 46 µm. The sample area of the CCP-CDP was determined to be 0.27 ± 0.025 mm 2 with an uncertainty of less than 10 % (Klingebiel et al., 2015). CAS-Dpol (or NIXE-CAS) is a light-scattering probe comparable to the CDP but covers the size range of 0.6 to 50 µm in diameter, thus including the upper size range of the aerosol particle size spectrum (Luebke et al., 2016). Furthermore, CAS-Dpol measures the polarization state of the particles (Costa et al., 2017). Similar to the G1 CDP, the performance of the CCP-CDP and NIXE-CAS was frequently examined by glass bead calibrations. Prior to or after each HALO flight, CCP-CIPgs and NIXE-CIPgs calibrations were performed by using a mainly transparent spinning disk that carries opaque spots of different but known sizes. The data of the CCP measured particle concentration aboard HALO are cor-rected to gain ambient conditions using a thermodynamic approach developed by Weigel et al. (2016). For NIXE-CAPS, the size distributions were provided where NIXE-CAS was merged with the NIXE-CIPgs at 20 µm.

Solar radiation
The G1 radiation suite included shortwave (SW, 400-2700 nm) broadband total upward and downward irradiance measurements using Delta-T Devices model SPN1 radiometers. The radiation data were corrected for aircraft tilt from the horizontal reference plane. A methodology has been developed (Long et al., 2010) for using measurements of total and diffuse shortwave irradiance and corresponding aircraft navigation data (latitude, longitude, pitch, roll, heading) to calculate and apply a correction for platform tilt to the broadband hemispheric downward SW measurements. Additionally, whatever angular offset there may be between the actual orientation of each radiometer's detector and what the navigation data say is level has also been determined for the most accurate tilt correction.
The Spectral Modular Airborne Radiation measurement sysTem (SMART-Albedometer) was installed aboard HALO. Depending on the scientific objective and the configuration, the optical inlets determining the measured radiative quantities can be chosen. The SMART-Albedometer has been utilized to measure the spectral upward and downward irradiances; thereby, it is called an albedometer, as well as used to measure the spectral upward radiance. The SMART-Albedometer is designed initially to cover measurements in the solar spectral range between 300 and 2200 nm Wendisch et al., 2001Wendisch et al., ., 2016. However, due to the decreasing sensitivity of the spectrometer at large wavelengths, the use of the wavelengths was restricted to 300-1800 nm. The spectral resolution is defined by the full width at half maximum (FWHM), which is between 2 and 10 nm. In this case, the instruments were mounted on an active horizontal stabilization system for keeping the horizontal position of the optical inlets during aircraft movements (up to ±6 • from the horizontal plane).

Flight patterns
During the dry season IOP (1 September-10 October 2014), two types of coordinated flights were carried out: one flight in the cloud-free condition (9 September) and two flights with clouds present (21 September and 1 October). In this study, we compare the measurements for both coordinated flight patterns. The discussion is mainly focused on the flights under cloud-free conditions on 9 September and the flight with clouds present on 21 September, as shown in Fig. 1. The other coordinated flight on 1 October is included in the Supplement (Sect. S1, Figs. S1, S2, S7, and S8).
For the cloud-free coordinated flight on 9 September, the G1 took off first and orbited around an area from the planned rendezvous point until HALO arrived in sight. It then coordinated with HALO and performed a wing-to-wing maneuver along straight legs around 500 m above sea level, as shown in Fig. 2. The normal G1 average sampling speed is 100 m s −1 , and the normal HALO average sampling speed is 200 m s −1 . During the coordinated flight on 9 September, both aircraft also adjusted their normal sampling speed by about 50 m s −1 so that they could fly side by side.
For the second type of coordinated flights, the G1 and HALO flew the stacked pattern at their own typical airspeed. On 21 September, the G1 also took off from the airport first, followed by HALO 15 min later. Then, both aircraft flew above the T3 ground site and subsequently flew several flight legs stacked at different altitudes. The two aircraft were vertically separated by about 330 m and sampled below, inside, and above clouds. Due to the different aircraft speeds, the time difference between two aircraft visiting the same part of the flight paths varied, increasing up to 1 h at the end of the flight path, as shown in Fig. 3. On 1 October, the G1 focused on the cloud microphysical properties and contrasting polluted versus clean clouds. HALO devoted the flight to the cloud vertical evolution and life cycle and also probed the cloud processing of aerosol particles and trace gases. The G1 and HALO coordinated two flight legs between 950 and 1250 m above the T3 site under cloud-free conditions. Following that, HALO flew to the south of Amazonia, and the G1 continued sampling plume-influenced clouds above the T3 site and then flew above the Rio Negro area.
In this study, to perform a meaningful comparison of in situ measurements, all the data from instruments were time synchronized with the aircraft (G1 or HALO) navigation system. For AMS and CPC data, the time shifting due to tubing length and instrument flow had been corrected. For the coordinated flight on 9 September, the data compared were from the same type of measurements with the same sampling rate. For the measurements with the different sampling rate, the data were binned to the same time interval for comparison. For the flight with the cloud present (21 September and 1 October), the following criteria are used: (1) the data collected by the two aircraft must be less than 30 min apart from each other; (2) the comparison data were binned to 200 m altitude intervals; (3) the cloud flag was applied to the aerosol measurements, and the data affected by the cloud shattering are eliminated from the comparisons of aerosol measurements. Moreover, additional comparison criteria are specified for individual measurements in the following section. Table 2 shows the total number of points used for the comparison.

Comparison of the G1 and HALO measurements of atmospheric state parameters
The atmospheric state parameters comprise the primary variables observed by the research aircraft. The measurements provide essential meteorological information not only for understanding the atmospheric conditions but also for providing the sampling conditions for other measurements, such as those of aerosol particles, trace gases, and cloud microphysical properties. For cloud-free coordinated flights, the comparison focused on the nearly side-by-side flight leg at around 500 m, as shown in Fig. 2. Table 3 shows the basic statistics of the  data for primary atmospheric state parameters, assuming that two measurements from the G1 and HALO have a proportional relationship without any offset (Y = m0 × X). In general, the atmospheric state parameters observed from both aircraft were in excellent agreement. The linear regression achieved a slope that was near 1 for four individual measurements. The regression is evaluated using the equation below: where the sum squared regression error is calculated by SS regression = (y i − y regression ) 2 , and the sum squared total error is calculated by SS Total = (y i − y) 2 , y i is the individual data point, y is the mean value, and y regression is the regression value. When the majority of the data points are in a narrow value range, using the mean is better than the regression line, and the R 2 will be negative ("neg." in Table 3). The difference between the average ambient temperatures on the two aircraft was 0.5 K, and the difference between the average dew-point temperatures was about 1 K. For temperature and humidity, the G1 data were slightly higher than the HALO data. The main contributions to the observed differences include the error propagation in the derivation of the ambient temperature from the measured temperature, instrumental-measurement uncertainty, and the temporal and spatial variability. The average horizontal wind speed measured by HALO is 0.4 m s −1 higher than the average horizontal wind speed measured by the G1. The uncertainty source of wind estimation is mainly due to the error propagation from the indicated aircraft speed measurement and the aircraft ground speed estimation from GPS. The static pressure distribution measured aboard HALO showed a smaller standard deviation (0.9 hPa) compared to the value of the G1 (1.5 hPa). Part of the reason for this difference is a more substantial variation of the G1 altitude during level flight legs when the G1 flew at around 50 m s −1 faster than its normal airspeed. Thus, any biases caused by their nearly side-by-side airspeeds being different from their typical airspeeds would be undetected during these coordinated flights.
For the coordinated flights under cloudy conditions, we used the criteria from Sect. 2 to compare ambient conditions measured by the G1 and HALO aircraft. In addition to the ordinary linear regression, we also used the orthogonal regression to minimize the perpendicular distances from the data points to the fitted line. The ordinary linear regression assumes only the response (Y ) variable contains measurement error but not the predictor (X), which remains unknown when we start the comparison between the measurements from the G1 and HALO. Thus, the additional orthogonal regression examines the assumption in the least squares regression and makes sure the roles of the variables have little influence on the results. In Table 4, two equations were used for the orthogonal regression. One assumes that two measurements have a proportional relationship (Y = m1 × X). The other one assumes a linear relationship, which can be described with the slope-intercept equation Y = m×X+b. The two regression results in Table 4 do not show a significant difference. The regression using the slope-intercept equation shows a different level of improvement in each individual measurement and will be discussed in the corresponding sections.
As shown in Fig. 4, the linear regression slopes for ambient temperature (Fig. 4a), pressure (Fig. 4b), and dew-point temperature were also close to 1 between the G1 and HALO measurements during the 21 September coordinated flight. The R 2 value is also close to 1. These results suggest that the G1 and HALO measurements achieved excellent agreement. Note that the dew-point temperature from the G1 measurement was erroneous and removed from the comparison the data points between 2200 and 2700 m and above 3700 m (Fig. 4c) because the G1 sensor was skewed by wetting in the cloud. The HALO dew-point temperature was calculated from the total water mixing ratio measured by TDL, and that measurement in the cloud was more accurate than the measurement made by the chilled mirror hygrometer aboard the G1.
The lower value of the R 2 value in horizontal wind speed means the ratio of the regression error and total error in wind measurement is much higher than the temperature and pressure measurements. The main contributions to this difference are the error propagation during the horizontal wind speed estimation and the temporal and spatial variance between two aircraft sampling locations. We observed differences between the two aircraft data of up to 2 m s −1 , caused by the increasing sampling distance as the two aircraft were climbing up. For example, the G1 flew a level leg above T3 around 2500 m between 16:20 and 16:30 UTC, while HALO stayed around 2500 m for a short period and kept climbing to a higher altitude. Due to strong vertical motion, turbulence, and different saturations (evaporationcondensation processes), the variances in the horizontal wind speed (Fig. 4d) were also more significant compared to the variances of temperature and pressure measurements.

Comparison of trace gas measurements
For the cloud-free coordinated flight on 9 September, ozone is the only trace gas measurement available on both aircraft. The linear regression slope shows that the HALO ozone concentration was about 8 % higher than the G1 concentration. The difference between the averaged ozone concentrations was 4.1 ppb. As mentioned in Sect. 2.1.2, each instrument has a 2 ppb accuracy (or 5 %) on the ground based on a direct photometric measurement measuring the ratio between a sample and an ozone-free cell. The in-flight calibration suggested that the accuracy of each instrument could rise to 5 %-7 % (or 2-3.5 ppb). Thus, the difference between the averaged ozone concentrations (4.1 ppb) is within the instrument variation. The primary source of bias is probably the different ozone loss in the sampling and transfer lines.
The comparison made on the 21 September flight in Fig. 5 shows good agreement for the vertically averaged ozone measurements. Comparing the statistics data from 9 September, the ozone measurement is not sensitive to the temporal and spatial changes. Although we do not have the comparison data on 9 September, the G1 and HALO CO measurement comparison shows a higher correlation than the ozone data comparison at different altitudes on 21 September. Note that the data points with more substantial variance in CO concentration were excluded because the G1 and HALO were sampling different air masses between 2000 and 3000 m, as indicated in Figs. S3-S5. The CO plot in Fig. 5b shows the real atmospheric variability. Around 4000 m, the CO reading from the G1 and HALO has the minimum variation and is averaged around 85 ppb, which is at the atmospheric background level. At lower altitudes and higher CO concentrations, the local contribution is not well mixed, and the inhomogeneity is expressed as the more substantial variations observed in the plot.

Comparison of aerosol measurements
Aerosol particles exhibited substantial spatial variations, both vertically and horizontally, due to many aerosol sources and complex atmospheric processes in the Amazon basin, especially with the local anthropogenic sources in Manaus. Thus, spatially resolved measurements are critical to characterizing the properties of the Amazonian aerosols. The cloudfree coordinated flights allow us to compare the G1 and HALO aerosol measurements and thus will facilitate further studies that utilize the airborne measurements. The vertical profiles obtained using the G1 and HALO in different aerosol regimes of the Amazon basin have contributed to many studies (Fan et al., 2018;Martin et al., 2017;Wang et al., 2016).
The design and performance of the aircraft inlets can strongly influence measured aerosol particle number con-  centration, size distribution, and chemical composition (Wendisch et al., 2004). Therefore, they need to be taken into consideration when comparing the measurements aboard two aircraft. The G1 aerosol inlet is a fully automated isokinetic inlet. Manufacturer wind tunnel tests and earlier studies show that this inlet operates for aerosol particles with a diameter up to 5 µm, with transmission efficiency around 50 % at 1.5 µm (Dolgos and Martins, 2014;Kleinman et al., 2007;Zaveri et al., 2010). The HALO submicrometer aerosol inlet (HASI) was explicitly designed for HALO. Based on the numerical flow modeling, optical particle counter measurements, and field study evaluation, HASI has a cutoff size of 3 µm, with transmission efficiency larger than 90 % at 1 µm Minikin et al., 2017).

Aerosol particle number concentration
For the cloud-free coordinated flight on 9 September, the linear regression of CPC and UHSAS between the G1 and HALO measurements is also included in Table 3. The total number concentration measured by HALO CPC was about 20 % lower than that by the G1 CPC, as shown in Fig. 6a. The CPC measurement is critically influenced by the isokinetic inlet operation and performance. During the flights, the  aircraft attitude, such as the pitch and roll angles, will cause the isokinetic sampling under non-axial conditions. The nonaxial flow at the probe inlet may result in flow separation, turbulence, and particle deposition. Therefore, quantitative particle measurements have more substantial uncertainty. As shown in Fig. 6b, we compared the CPC data by applying three different data quality criteria. The first criterion is the same criterion described in the previous section that makes sure all the compared measurements happen less than 30 min apart, and the linear regression is included in Table 3. The second criterion constrains the data under the isokinetic and isoaxial condition, and the plot in Fig. 6b shows the isoaxial criteria reduced the broadness of the scattered data but no significant change to the linear regression. We further constrained the data with the averaging. Based on the average wind speed and distance between two aircraft, we averaged the data into 10 s intervals and found that the regression R 2 increases to 0.9392. The typical uncertainty between two CPCs is 5 %-10 % in a well-controlled environment (Gunthe et al., 2009;Liu and Pui, 1974). Although both CPCs from the G1 and HALO were characterized in the lab to be within 10 % of their respective lab standards, we observed a 20 % variance during the flight. This result suggests the challenging condition of airborne measurement can significantly increase the systematic uncertainties of CPC measurements, such as systematic instrument drifts, different aerosol particle losses inside the two CPCs, and different inlet transmission efficiencies in the two aircraft. The CPC data in Fig. 6 are color coded with UTC time. The general trend is that the aerosol number concentration increased with time through the Manaus plume between 15:30 and 15:40 UTC. A similar trend was observed in aerosol particle number concentration (Fig. 7) measured by the UHSAS-Airborne version (referred to as UHSAS). The total number concentration data given by UHSAS (Fig. 7) are integrated over the overlapping size range (90-500 nm for the 9 September flight) for both the G1 and HALO UHSAS. The linear regression shows that the total aerosol particle number concentration from HALO UHSAS is about 16.5 % higher than that from the G1 UHSAS. The discrepancy between the two UHSAS measurements is mainly due to the error propagation in the sampling flow, the differential pressure transducer reading, the instrument stability, and calibration repeatability, consistent with the other UHSAS study (Kupc et al., 2018). In the airborne version of UHSAS, mechanical vibrations have a more significant impact on the pressure transducer reading than the case for the bench version of UH-SAS.
For the coordinated flight on 21 September, the G1 and HALO data are averaged to 200 m vertical altitude intervals Figure 7. The G1 and HALO comparison for aerosol number concentration measured by UHSAS (90-500 nm) on 9 September. (Fig. 8). The data points with an altitude between 2000 and 3000 m were excluded from the comparisons, because the G1 and HALO sampled different air masses, as evidenced from trace gas and aerosol chemical composition data (detailed in Sect. 3.2 and 3.3.3). The UHSAS size range was integrated from 100 to 700 nm on 21 September. The variation of the size range was because the overlap of size distributions from both UHSAS instruments was changed. Both the CPC and UHSAS measurement comparisons show stronger variation at the low altitude, especially below 2000 m. Above 3500 m, the variations on the CPC-and UHSAS-measured concentration became significantly smaller than the variation at the lower altitude. This result is consistent with the observation from the trace gas measurement and confirms that the variability of aerosol properties changes significantly with time and space. It is noticeable that the discrepancy observed in the UHSAS measurement comparison is larger than that in the CPC comparison. That is because the aerosol flow control inside the UHSAS cannot respond quickly enough to the rapid change of the altitude and caused significant uncertainty in the data.

Aerosol particle size distribution
For the cloud-free coordinated flight on 9 September, the averaged aerosol size distributions measured by FIMS, G1 UH-SAS, and HALO UHSAS during one flight leg are compared in Fig. 9. For particle diameter below 90 nm, the G1 UHSAS overestimated the particle concentration, which is due to the uncertainty in counting efficiency correction. The UHSAS detection efficiency is close to 100 % for particles larger than 100 nm and concentrations below 3000 cm −3 but decreases considerably for both smaller particles and higher concentrations (Cai et al., 2008). The aerosol counting efficiency correction developed under the lab conditions does not necessarily apply under the conditions during the flight. Between 90 and 250 nm, FIMS agreed well with the G1 UHSAS, whereas HALO UHSAS was about 30 % higher than the two instruments. For the size range of 250-500 nm, FIMS had good agreement with HALO UHSAS and was about 30 %-50 % higher than the G1 UHSAS depending on the particle size. Because the UHSAS has a simplified "passive" inlet, the large size aerosol particle loss in the UHSAS inlet was expected to increase with increasing aircraft speed. Thus, the lower G1 UHSAS concentrations at a larger aerosol particle size are likely related to the particle loss correction.
For the 21 September flight, the vertical profiles of aerosol size distributions are averaged into 100 m altitude intervals (Fig. 10). Overall, all size distribution measurements captured the mode near 100 nm between 800 and 1000 m at the top of the convective boundary layer, as indicated by the potential temperature (Fig. 10d), which starts from a maximum near the ground and then becomes remarkably uniform across the convective boundary layer. The peak of the aerosol size distribution shifted from 100 to 150 nm with increasing altitude. Note that due to data availability, the aerosol size distribution data from the HALO UHSAS have a reduced vertical resolution. Figure 11 shows the vertical profiles of the aerosol mass concentrations measured by the two AMSs on 21 September. The upper panel shows the medians and interquartile ranges of the different species (organics, nitrate, sulfate, ammonium) and the total mass concentration for the G1 (circles) and HALO (triangles). The lower panel shows the difference between the medians of G1 and HALO. The error bars were calculated using error propagation from the error of the median (interquartile range divided by 2 × √ N). The data were grouped into 400 m altitude bins. The total mass concentration is the highest in the lower altitudes between 100 and 2000 m with a median value of 5 µg m −3 (G1-AMS). At altitudes between 2000 and 3800 m, the aerosol mass concentration decreased to a median value of 1.2 µg m −3 (G1-AMS).

Aerosol particle chemical composition
The most significant difference was observed at altitudes below 1800 m. The aerosol mass concentration measured by HALO-AMS is less than that measured by G1 AMS, likely due to particle losses in the constant pressure inlet (CPI) used on the HALO AMS. Between 1800 and 3000 m, the mass concentrations measured by the HALO AMS exceed those measured by the G1-AMS. This is most likely because the G1 was sampling different air masses from the HALO, as indicated by the differences in CO mixing ratios and the CPC concentrations for this altitude region (see Figs. 5 and 8). Above 3000 m altitude, both instruments agree within the uncertainty range.
Among individual species, the largest difference above 2000 m is observed for ammonium. The deployed G1 AMS is a high-resolution mass spectrometer (HR-ToF), whereas the HALO AMS has a lower resolution (C-ToF). The higher resolution of the G1-AMS allows for a better separation of interfering ions at m/z 15, 16, and 17 (NH + , NH 2+ , and  NH 3+ ) and thereby a more reliable calculation of the ammonium mass concentration.
Overall, the aerosol chemical composition is dominated by organics, as is evident from the vertical profiles of the relative fractions (Fig. 12). Both AMSs show a dominant contribution of organics to the total mass concentration with values around 70 %. This contribution is constant at altitudes between 100 and 3500 m and decreases to 50 % at 3800 m altitude. The inorganic fraction has the highest contribution from sulfate (20 %), followed by ammonium (7 %) and nitrate (2 %-4 %). For organics, ammonium, and sulfate, both instruments give similar relative fractions, only for nitrate where a discrepancy is observed between 1000 and 3000 m. Although the absolute aerosol mass concentration measured by the HALO-AMS was affected by the constant pressure inlet below 1800 m altitude, the relative fractions of both instruments generally agree well. Similar results were found for a second comparison flight on 1 October 2014 (see Sect. S2 and Figs. S7, S8).

CCN number concentration
These measurements provide key information about aerosols' ability to form cloud droplets and thereby modify the microphysical properties of clouds. Numerous laboratory and field studies have improved the understanding of the connections among aerosol particle size, chemical composition, mixing states, and CCN activation properties (Bhattu and Tripathi, 2015;Broekhuizen et al., 2006;Chang et al., 2010;Duplissy et al., 2008;Lambe et al., 2011;Mei et al., 2013a, b;Pöhlker et al., 2016;Thalman et al., 2017). In addition, based on the simplified chemical composition and internal mixing state assumption, various CCN closure studies have achieved success within ±20 % uncertainty for ambient aerosols (Broekhuizen et al., 2006;Mei et al., 2013b;Rissler et al., 2004;Wang et al., 2008).
According to earlier studies (Gunthe et al., 2009;Pöhlker et al., 2016;Roberts et al., 2001Roberts et al., , 2002Thalman et al., 2017), the hygroscopicity (κ CCN ) of CCN in the Amazon basin is usually dominated by organic components (κ Org ). Long-term ground-based measurements at the Amazon Tall Tower Observatory also suggest low temporal variability and lack of pronounced diurnal cycles in hygroscopicity only under natural rainforest background conditions (Pöhlker et al., 2016. Using FIMS and CCN data from both the G1 and HALO collected during the coordinated flight leg on 9 September, the critical activation dry diameter (D 50 ) was determined by integrating FIMS size distribution to match the CCN total number concentration (Sect. S3). Then, the effective particle hygroscopicity was derived from D 50 and the CCN-operated supersaturation using the κ-Köhler theory. The histogram plots based on the density of the estimated hygroscopicity (κ est ) from both aircraft were compared for the flight leg above T3. The κ est values derived from the G1 and HALO measurements during the flight leg above the T3 site are  0.19±0.07 and 0.19±0.08, respectively. Those values agree very well with the overall mean value of 0.17 ± 0.06 derived from long-term measurements at the Amazon Tall Tower Observatory (Pöhlker et al., 2016;Thalman et al., 2017).
A comparison of the vertical profiles of the CCN concentrations at 0.5 % supersaturation on 21 September is shown in Fig. 13 as an example. The difference between the CCN measurements on the two aircraft is about 20 % on average. The linear regression slope would increase to 0.9120 if we focused on the data above 2500 m. The main contributions to the difference include the difference in aerosol inlet structure, aerosol particle loss correction in the main aircraft inlet, and the constant pressure inlet, the systematic inlet difference below 2500 m as shown in AMS data, as well as the error propagation of CCN measurements.

Comparison of cloud measurements
In situ cloud measurements help to capture the diversity of different cloud forms and their natural temporal and spatial variability. The G1 CDP and FCDP were deployed under the different wing pylons and also on the different side of the aircraft. The G1 2DS was deployed on the same side as FCDP. The HALO cloud combination probe (CCP-CDP   Figs. 1b and 3, both aircraft were sampling above T3 site and passing through the same cloud field at the ∼ 1600 m flight leg and the ∼ 1900 m flight leg as shown in Figs. S11 and S12. We used the cloud probe data from the ∼ 1900 m flight leg for the cloud droplet number concentration comparison. Two size ranges were considered: 3-20 µm from lightscattering probes (CDP vs. FCDP on the G1, CCP-CDP vs. NIXE-CAS on HALO) and 2-960 µm from combined cloud probes.

Comparison of cloud droplet number concentration between 3 and 20 µm
For underwing cloud probes, such as the CDP and the CAS, Lance (Lance, 2012) suggests an undercounting bias of measured particle number concentration by up to 44 % due to coincidence as soon as the ambient cloud particle density rises to 1000 cm −3 . At identical cloud particle densities, an earlier study (Baumgardner et al., 1985) estimates the coincidence bias for underwing cloud probes to the range at 20 %. Factually, the coincidence correction depends on the instruments' individual detection volume, the air's volume flow rate through the detector, and the cloud particles' residence time within the detection volume (Hermann and Wiedensohler, 2001;Jaenicke, 1972). For this comparison, coincidence bias remained unconsidered for each of the cloud probe measurements to avoid deviations that are caused by the application of different corrections. The primary cloud layer was observed by both the G1 and HALO between 1000 and 2500 m above ground. Although the two aircraft have sampled along the same flight path, the instruments probably observed different sets of the cloud due to cloud movement with the prevailing wind or different cloud evolution stages. Thus, an initial comparison focuses on the redundant instruments on the same aircraft, that measured in a truly collocated and synchronous manner aboard HALO and of the G1, respectively. In Fig. 14a, the data of the CCP-CDP and the NIXE-CAS are juxtaposed and sampled over about 13 min for particle detection size ranges which were considered as most equivalent. The comparison reveals two ranges of particle number concentrations at which densification of agreeing measurements becomes visible. At very low number concentrations (about 10 −1 -10 cm −3 ), the pres- ence of inactivated (interstitial) aerosols in the clear air space between the very few cloud elements should be considered. Over specific ranges, however, the fine structure of varying cloud droplet number concentration may cause the regression's scattering, indicated by cloud particle measured by one instrument whilst a respective antagonist seems to measure within almost clear air -and vice versa. At higher number concentrations, i.e., between 10 2 and 10 3 cm −3 , the comparison of the highly resolved data constitutes increasing compactness with respect to the 1 : 1 line. The overall data scatter of this comparison, however, may indicate the highly variable structure within clouds as those investigated over the Amazon basin. The data of the G1 CDP and the FCDP are juxtaposed as the same as HALO cloud probes. However, the sampled cloud period was much shorter -about 3 min. Similar to the HALO cloud probe comparison, we observe two ranges of particle number concentrations at which densification of agreeing measurements becomes visible, especially for the lower number concentrations (Fig. 14b). At higher number concentrations, only a few cloud elements were observed by the G1 cloud probes. That is because the G1 passed the same locations as HALO about 7-23 min later and experienced much fewer cloud elements.

Comparison of cloud droplet size distribution between 2 and 960 µm from both aircraft
Comparing the cloud probes from the G1 and HALO, the size distributions from the HALO CCP and NIXE-CAPS probes are in remarkably good agreement between 2 and 960 µm, and both peaked around 10 µm, as shown in Fig. 15. That is because the potential effects of cloud elements shattering on the probe measurements were considered similarly for the HALO-deployed CCP and NIXE instruments. On the G1, the CDP and FCDP had a more significant difference in the size range of less than 8 µm, although both of them peaked between 10 and 20 µm. The difference between the G1 CDP and FCDP is mainly due to the data post-processing. The G1 CDP used an old data acquisition system from Science Engineering Associates, which limited its capability to store the particle-by-particle (PBP) data for further processing. The CDP had placed an 800 µm diameter pinhole in front of the sizing detector to minimize the coincidence up to 1850 cm −3 . On the other side, the FCDP was equipped with new electronics, and PBP data were locally stored in a flash drive aboard the Linux machine. For the G1 flights, a constant probe-dependent adjustment factor was applied to FCDP to adjust the coincidence further in the final data product. The G1 CDP and FCDP operated with a redesigned probe tip to minimize the shattering effect. An additional algorithm was applied to the FCDP data to eliminate particles with short interarrival times. For cloud droplets larger than 20 µm, the difference between the obtained cloud particle size distributions from two aircraft becomes substantial (up to 2 orders of magnitude), which indicated the observations of two different stages within the progressing development of a precipitation cloud. The precipitation cloud developing process is evidently expressed in elevated number concentrations of larger cloud elements observed during the G1 measurement that happened later. We also observed that the general cloud characteristic is similar at different altitude levels, as shown in Fig. S13. The first two of three averaged periods were chosen during the flight leg of ∼ 1600 m, and the last average period is for the flight leg ∼ 1900 m compared in Fig. 15. Due to the averaging, the fine in-cloud structure gets suppressed. The smallscale variabilities inside a cloud which are illustrated by the scattering of the highly resolved measurement data from the instrument comparison (see Fig. 14) and the temporal evolution of in-cloud microphysics are not ascertainable and furthermore are beyond the scope of this study.

Comparison of radiation measurements
In this study, the downward irradiance measured by the SPN1 unshaded center detector was compared with the integrated downward irradiance from the SMART-Albedometer between 300 and 1800 nm wavelengths in Fig. 16. Only measurements from flight legs where the G1 and HALO flew nearly side by side and at the same altitude were taken into consideration for analysis. Figure 16a shows the time series of SPN1 measurements, and Fig. 16b shows the time series of SMART-Albedometer measurements. The black dots represented all data, and the blue circles identified data when the navigation condition was within ±1 • from the horizontal level. The large scatter in the data between 15:12-15:28 and 15:35-15:40 UTC is mainly due to the different sensor trajectories during the maneuvering of the aircraft to get to the coordinated flight position. Because of the difference of each aircraft position from horizontal, the measured signal varied from the signal of the direct component of sunlight. Each sensor might look at different directions of the sky or different parts of the clouds. In addition, both aircraft flew under scattered clouds, and this uneven sunlight blocking is another contribution to the "drop-off" behavior in the time series plots of the downward irradiance.
Comparing the G1 and HALO measurements between 15:15 and 01:55 UTC using the restricted navigation criteria in Fig. S14, we observed that the G1 SPN1 irradiance is slightly higher than the integrated irradiance from the SMART-Albedometer. We used the National Center for Atmospheric Research (NCAR) tropospheric ultraviolet and visible (TUV) radiation model estimated the weighted irradiance at 15:42:00 UTC on 9 September 2014 and confirm that the spectral variation in the instruments is the main contribution to the difference in the comparison.

Uncertainty assessment
As mentioned in the introduction, a low-flying G1 and a highflying HALO cover the sampling area from the atmospheric boundary layer to the free troposphere, and the sampling period from the dry and wet seasons . This spatial coverage provides the user community with abundant atmospheric-related data sets for their further studies, such as for remote sensing validation and modeling evaluation. However, one critical step to bridge the proper usage of the observation with further atmospheric science study is to understand the measurement uncertainty in this data set, especially the variation between the coexisting measurements due to the temporal and spatial difference.
For the majority of the measurements during this field study, three primary sources contributed to the measurement variation between the two aircraft: the temporal and spatial variations, the difference in the inlet characterization, and the limitation of the instrument capability. We used both ordinary least squares (OLS) linear regression and the orthogonal distance regression (ODR) to correlate the measurements from the G1 and HALO and confirmed that the slope and R 2 are very similar for the measurements made on 9 September. The results from Table 2 confirmed that the G1 and HALO measurements should be in a linear relationship without an offset if there is no altitude variation. It also shows the minimum discrepancy between two aerosol instruments (CPC or UHSAS) could be around 20 %, which will include the error caused by the difference in the inlet characterization and the limitation of the instrument capability. If we assume those two measurement variation sources are not affected by the altitude, then by comparing the linear regression data from Table 3 to those in Table 2, we can estimate the temporal and spatial variation between two aircraft in a stack flight pattern. Three linear regression approaches were assessed, and the results are listed in Table 3. If we assume that two measurements from the G1 and HALO should not have any offsets, the OLS and ODR regressions show similar results. For the meteorological parameters, this assumption is valid. In addition, good correlations also indicate that there is no significant temporal or spatial variation during the stack pattern flight. As expected, the wind speed and the aerosol measurements show that the correlations between the measurements from the G1 and HALO significantly improved with the offset assumption. This result suggests that the temporal and spatial variation in a half hour will add an additional 20 % variance to the measured aerosol properties. This will lead to considerable uncertainty when we combine the observation data between the ground station and the airborne platform. Thus, to evaluate or constrain atmospheric modeling work, more routine and long-term airborne measurements should be used to provide statistically sufficient observation.

Summary
In situ measurements made by well-characterized instruments installed on two research aircraft (the G1 and HALO) during the GoAmazon 2014/5 and ACRIDICON-CHUVA campaigns were compared (Table S3). Overall, the analysis shows good agreement between the G1 and HALO measurements for a relatively broad range of atmosphericrelated variables in a challenging lower troposphere environment. Measured variables included atmospheric state parameters, aerosol particles, trace gases, clouds, and radiation properties. This study outlines the well-designed coordinated flights for achieving a meaningful comparison between two moving platforms. The high data quality was ensured by the most sophisticated instruments aboard two aircraft using the most advanced techniques, assisted with the bestcalibrated/characterized procedures. The comparisons and the related uncertainty estimations quantify the current measurement limits, which provide guidance to the modeler to realistically quantify the modeling input value and evaluate the variation between the measurement and the model output. The comparison also identified the measurement issues, outlined the associated reasonable measurement ranges, and evaluated the measurement sensitivities to the temporal and spatial variance.
The comparisons presented here were mainly from two coordinated flights. The flight on 9 September was classified as a cloud-free flight. During this flight, the G1 and HALO flew nearly side by side within a "polluted" leg, which was above the T3 site and across the downwind pollution plume from Manaus, and a "background" leg, which was outbound from Manaus to the west and could be influenced by the regional biomass burning events during the dry season. Both legs were at 500 m altitude and showed linear regression slopes of ambient temperature and pressure, horizontal wind speed, and dew-point temperature close to 1 between the G1 and HALO measurements. These comparisons provide a solid foundation for further evaluation of aerosol, trace gas, cloud, and radiation properties. The total aerosol concentration from CPC and UHSAS were compared for the 500 m flight leg above the T3 site. The UHSAS measurements had a better agreement than the CPC measurements. That is because of the minor difference in the inlet structure and instrument design between two UHSASs and one FIMS in the G1 suggests that UHSAS had an overcounting issue at the size range between 60 and 90 nm, which was probably due to electrical noise and small signal-to-noise ratio in that size range. Good agreement in the aerosol size distribution measurement provides a "sanity" check for AMS measurements. A CCN closure study suggested that FIMS provides valuable size coverage for better CCN number concentration estimation. Based on the κ-Köhler parameterization, κ est observed at 500 m above the T3 site is 0.19±0.08 which is similar to the overall mean κ from long-term ATTO measurements -0.17±0.06 (Pöhlker et al., 2016). This similarity suggests that there is no significant spatial variability along the downwind transect, although the freshly emitted aerosol particles may have much less hygroscopicity. The difference in the ozone measurement comparison is about 4.1 ppb, which suggests that the bias is due to the sampling line loss inside of the G1 gas inlet. The irradiance from the SPN1 unshaded center detector in the G1 was compared with the HALO integrated downward irradiance between 300 and 1800 nm and achieved a very encouraging agreement with a variance of less than 10 %.
During the second type of coordinated flights on 21 September (with cloudy conditions), HALO followed the G1 after take-off from Manaus airport; then, the two aircraft flew stacked legs relative to each other at different altitudes above the T3 site. For atmospheric state parameters, nearly linear correlations between the G1 and HALO were observed for ambient pressure, temperature, and dew-point temperature measurements at an altitude range from ground to around 5000 m. The horizontal wind had more variation than the rest of the meteorological properties, which is mainly due to the temporal and spatial variability. The aerosol number concentration and the trace gas measurements both suggest inhomogeneous aerosol distribution between 2000 and 3000 m altitude. The integrated aerosol number concentration from UHSAS showed consistent discrepancy at different altitudes. This considerable uncertainty in the UHSAS measurements is caused by the significant aerosol sample flow variations due to the slow and unstable flow control. Although the aircraft-based UHSAS is a challenging instrument to operate, a reasonable size distribution profile comparison was made between both UHSAS and FIMS on the G1. Overall, the chemical composition of the aerosol is dominated by organics. Around 70 % of the AMS-measured mass is organic, and this fractional contribution is maintained from the surface to 3500 m, then decreases to 50 % at higher altitudes. The most substantial difference among all the species is observed for ammonium due to the different mass resolution of the AMS instruments, and more reliable ammonium mass concentration can be achieved with the high-resolution mass spectrometer. Although the absolute aerosol mass concentration measured by the HALO AMS was affected below 1800 m altitude by the constant pressure inlet, the relative fractions of both instruments from the G1 and HALO agree well.
Cloud probe comparisons were made for the cloud droplet number concentration between 3 and 20 µm for the initial comparison between the redundant instruments on the same aircraft. Then, the comparison of cloud droplet size distribution between 2 and 960 µm for a flight leg around 1900 m showed remarkably good agreement. The major cloud appearance was captured by both aircraft, although the cloud elements observed were affected by the cloud movement with the prevailing wind and the different cloud evolution stages. Furthermore, the relatively short time delay of 7-23 min between the independent measurements may give a hint for the timescales in which the cloud droplet spectra develop within a convective cloud over the Amazon basin.
The above results provide additional information about the reasonableness of measurements for each atmospheric variable. This study confirms the high-quality spatial and temporal data set with clearly identified uncertainty ranges had been collected from two aircraft and builds a good foundation for further studies on the remote sensing validation and the spatial and temporal evaluation of modeling representation of the atmospheric processing and evolution.
Several efforts made by both airborne measurement teams have significantly contributed to the overall success of this comparison study, and we recommend them for future field operations.
Instruments should be characterized following the same established guidelines. For example, the aerosol instruments can follow the guideline from the World Calibration Centre for Aerosol Physics (WCCAP).
Periodically, the measurements from different instruments should be compared for consistency in the field. For example, we found that comparing the integrated aerosol volume distribution from the aerosol sizer with the converted total aerosol mass from the AMS measurement can help check both the instrument performance and the inlet operation condition. Additionally, measurements from different cloud probes should be compared in the overlapping size ranges.
Daily calibration would be valuable but likely unrealistic to perform in the field. One alternative is to daily even hourly monitor the variation of the critical instrument parameters, such as the aerosol sample flow of the individual aerosol instruments.
For the cases with minor variations in the calibration results, the typical practice is to use the average calibration results for the variation period. However, we also recommend documenting the corresponding uncertainty with the data product.
A side-by-side comparison among the similar instruments deployed at different platforms, including those at ground sites, is highly recommended and will provide a comprehensive view of the data reliability.
Data availability. The measured data collected with the HALO aircraft are available on the HALO database (HALO-DB). The link is https://halo-db.pa.op.dlr.de/ (last access: 10 February 2020). All ARM Aerial Facility data sets collected with the G1 aircraft used in this study can be downloaded from the ARM website at https:// www.arm.gov/research/campaigns/amf2014goamazon (last access: 10 February 2020).
Author contributions. FM, JW, JMC, MP, and BS defined the scientific questions and scope of this study. STM, MW, LATM, MOA, and PA designed, planned, and supervised the broader GoAma-zon2014/5 field experiment. JES, JS, CS, and SSdS carried out the AMS measurements and data processing. FM, JMC, RW, MK, CM, JF, AA, and SB carried out the cloud measurements and data processing. CNL, MW, and TK carried out radiation measurements and data processing. JH, MP, MZ, and AG carried out the atmospheric parameter measurements and data processing. FM, JT, BW, AM, MLP, UP, CP, and TK carried out the aerosol measurements and data processing. SS and HS carried out the trace gas measurements and data processing. FM prepared the paper with contributions from all co-authors.