<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0" article-type="research-article">
  <front>
    <journal-meta><journal-id journal-id-type="publisher">AMT</journal-id><journal-title-group>
    <journal-title>Atmospheric Measurement Techniques</journal-title>
    <abbrev-journal-title abbrev-type="publisher">AMT</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Atmos. Meas. Tech.</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">1867-8548</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/amt-19-4219-2026</article-id><title-group><article-title>Improving imputation of missing PM<sub>2.5</sub> speciation data using PMF-informed source-receptor relationships</article-title><alt-title>Imputation of PM<sub>2.5</sub> speciation data using PMF reconstruction</alt-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Zhu</surname><given-names>Wubin</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff2">
          <name><surname>Xie</surname><given-names>Mingjie</given-names></name>
          
        <ext-link>https://orcid.org/0000-0002-2717-7557</ext-link></contrib>
        <contrib contrib-type="author" corresp="yes" rid="aff1 aff3">
          <name><surname>Dai</surname><given-names>Qili</given-names></name>
          <email>daiql@nankai.edu.cn</email>
        <ext-link>https://orcid.org/0000-0001-9534-2887</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Bi</surname><given-names>Xiaohui</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Zhang</surname><given-names>Yufen</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Feng</surname><given-names>Yinchang</given-names></name>
          
        <ext-link>https://orcid.org/0000-0002-6014-5258</ext-link></contrib>
        <aff id="aff1"><label>1</label><institution>State Environmental Protection Key Laboratory of Urban Ambient Air Particulate Matter Pollution Prevention and Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300350, China</institution>
        </aff>
        <aff id="aff2"><label>2</label><institution>Collaborative Innovation Center of Atmospheric Environment and Equipment Technology, Jiangsu Key Laboratory of Atmospheric Environment Monitoring and Pollution Control, School of Environmental Science and Engineering, Nanjing University of Information Science &amp; Technology, 219 Ningliu Road, Nanjing, 210044, China</institution>
        </aff>
        <aff id="aff3"><label>3</label><institution>Tianjin Key Laboratory of Software Experience and Human Computer Interaction, Tianjin 300457, China</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Qili Dai (daiql@nankai.edu.cn)</corresp></author-notes><pub-date><day>26</day><month>June</month><year>2026</year></pub-date>
      
      <volume>19</volume>
      <issue>12</issue>
      <fpage>4219</fpage><lpage>4231</lpage>
      <history>
        <date date-type="received"><day>27</day><month>January</month><year>2026</year></date>
           <date date-type="rev-request"><day>5</day><month>March</month><year>2026</year></date>
           <date date-type="rev-recd"><day>30</day><month>May</month><year>2026</year></date>
           <date date-type="accepted"><day>10</day><month>June</month><year>2026</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2026 Wubin Zhu et al.</copyright-statement>
        <copyright-year>2026</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://amt.copernicus.org/articles/19/4219/2026/amt-19-4219-2026.html">This article is available from https://amt.copernicus.org/articles/19/4219/2026/amt-19-4219-2026.html</self-uri><self-uri xlink:href="https://amt.copernicus.org/articles/19/4219/2026/amt-19-4219-2026.pdf">The full text article is available as a PDF file from https://amt.copernicus.org/articles/19/4219/2026/amt-19-4219-2026.pdf</self-uri>
      <abstract><title>Abstract</title>

      <p id="d2e159">Missing values are ubiquitous in atmospheric monitoring due to instrument drift, calibration cycles, operational interruptions, and other random malfunctions. Such gaps can undermine the reliability of subsequent analyses and introduce systematic biases. Conventional imputation methods, such as geometric mean substitution, K-nearest neighbor (KNN), Bayesian principal component analysis (BPCA), and deep learning models often rely primarily on statistical correlations, may require auxiliary inputs, and offer limited physical interpretability. To address this issue, we propose a novel source-receptor-informed Positive Matrix Factorization Reconstruction (PMFr) method that leverages PMF-derived source-receptor relationships, rather than purely statistical interpolation, to impute missing PM<sub>2.5</sub> speciation data without requiring auxiliary data. Benchmarking on a two-month dataset against commonly used imputation techniques, including KNN, BPCA, and a deep learning predictive model, demonstrates that PMFr achieves superior accuracy and robustness across real-world missing scenarios, with a mean coefficient of determination (<inline-formula><mml:math id="M4" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>) of 0.81, index of agreement (IoA) of 0.92, and mean absolute percentage error (MAPE) of 22.8 %, reducing MAPE by 25.5 %–29.1 %, particularly for key PM<sub>2.5</sub> species. Further PMF-based validation shows that PMFr better preserves source-profile composition and source-contribution temporal features, indicating that the completed dataset retains more physically meaningful source information and is more suitable for source apportionment. These results highlight PMFr as a robust and physically interpretable approach for reconstructing reliable PM<sub>2.5</sub> speciation data.</p>
  </abstract>
    
<funding-group>
<award-group id="gs1">
<funding-source>National Natural Science Foundation of China</funding-source>
<award-id>42577117</award-id>
</award-group>
<award-group id="gs2">
<funding-source>Natural Science Foundation of Tianjin Municipality</funding-source>
<award-id>24JCYBJC01870</award-id>
</award-group>
<award-group id="gs3">
<funding-source>Chinese Academy of Sciences</funding-source>
<award-id>n/a</award-id>
</award-group>
</funding-group>
</article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
      <p id="d2e209">Ambient fine particulate matter (PM<sub>2.5</sub>) remains a pressing global environmental challenge due to its well-documented impacts on climate forcing, atmospheric visibility degradation, and adverse health outcomes <xref ref-type="bibr" rid="bib1.bibx35 bib1.bibx44 bib1.bibx16 bib1.bibx25" id="paren.1"/>. These effects are governed by the chemically diverse nature of PM<sub>2.5</sub>, which comprises inorganic ions, carbonaceous materials, trace metals, and other species. Comprehensive PM<sub>2.5</sub> speciation measurements are therefore fundamental for tracking source contributions, elucidating atmospheric processes, and evaluating their diverse impacts. However, missing data are ubiquitous in both routine monitoring networks and intensive field campaigns due to instrument drift, calibration cycles, operational interruptions, and other random malfunctions <xref ref-type="bibr" rid="bib1.bibx57" id="paren.2"/>. Such gaps can undermine the reliability of subsequent analyses and introduce systematic biases. Consequently, accurate and robust imputation of missing values is essential, as inappropriate handling of missing data can lead to distorted interpretations and erroneous scientific conclusions.</p>
      <p id="d2e245">A wide range of methods have been developed to address missing values in PM<sub>2.5</sub> chemical component datasets, generally falling into listwise deletion, simple substitutions, and advanced statistical models <xref ref-type="bibr" rid="bib1.bibx2" id="paren.3"/>. Basic approaches such as listwise deletion and mean or median substitution, although recommended in the U.S. EPA's guidelines for their simplicity <xref ref-type="bibr" rid="bib1.bibx39" id="paren.4"/>, often compromise data quality: listwise deletion discards samples containing any missing species and substantially reduces statistical power, whereas median or mean substitution introduces bias that becomes more pronounced as data variability increases <xref ref-type="bibr" rid="bib1.bibx14 bib1.bibx46 bib1.bibx22" id="paren.5"/>. Linear interpolation is also frequently applied because of its ease of implementation, yet its performance is highly sensitive to the temporal pattern and extent of missingness <xref ref-type="bibr" rid="bib1.bibx49 bib1.bibx21" id="paren.6"/>. To better capture inter-species correlations and nonlinear dependencies, more advanced techniques, including KNN, BPCA, and deep learning models such as deep belief networks (DBN), have been explored and often outperform simpler methods <xref ref-type="bibr" rid="bib1.bibx29 bib1.bibx27 bib1.bibx60 bib1.bibx55 bib1.bibx50" id="paren.7"/>. Nevertheless, these statistical and machine-learning approaches typically rely on mathematical interpolation, may require auxiliary inputs such as meteorological variables or satellite-derived AOD, and offer limited physical interpretability <xref ref-type="bibr" rid="bib1.bibx53 bib1.bibx28 bib1.bibx18 bib1.bibx23 bib1.bibx24" id="paren.8"/>. As a result, accurately reconstructing missing PM<sub>2.5</sub> chemical species remains a methodological challenge.</p>
      <p id="d2e285">To address these limitations, we develop a physically interpretable imputation method grounded in air pollution source-receptor principles to reconstruct missing PM<sub>2.5</sub> chemical species. Source contributions and profiles are first resolved from pre-existing speciation data using Positive Matrix Factorization (PMF), which decomposes the dataset into a source chemical profile matrix and its corresponding contribution matrix. Under the commonly assumed temporal stability of source chemical compositions, the resolved profiles are then used to reproduce new PM<sub>2.5</sub> speciation datasets containing missing species by multiplying the estimated source-specific PM<sub>2.5</sub> mass by the resolved source profiles, enabling estimation of missing values based on physically meaningful source signatures. This approach ensures that reconstructed concentrations align with both chemical structure and emission characteristics rather than relying solely on mathematical interpolation. To evaluate performance, we generated artificial missing data in complete-speciation datasets; the proposed method was then compared against geometric mean substitution, linear interpolation, K-nearest neighbors (KNN), deep belief networks (DBN), and Bayesian principal component analysis (BPCA). The datasets completed by each imputation method were subsequently used as PMF inputs to assess how different imputation strategies influence downstream source apportionment results. This study highlights the potential of a source-informed strategy for robust and interpretable imputation, as well as for generating completed datasets suitable for subsequent source apportionment and chemical-constrained health and climate impact analyses.</p>
</sec>
<sec id="Ch1.S2">
  <label>2</label><title>Material and Methods</title>
<sec id="Ch1.S2.SS1">
  <label>2.1</label><title>Sample Collection and Data Processing</title>
      <p id="d2e330">Hourly PM<sub>2.5</sub> speciation data were collected on the rooftop of the Nanjing Environmental Protection Building (NEPB, 118.75° E, 32.06° N). Water-soluble inorganic ions including NH<inline-formula><mml:math id="M16" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">4</mml:mn><mml:mo>+</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, SO<inline-formula><mml:math id="M17" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">4</mml:mn><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula>, NO<inline-formula><mml:math id="M18" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, Cl<sup>−</sup>, Ca<sup>2+</sup>, Mg<sup>2+</sup>, K<sup>+</sup>, and Na<sup>+</sup> were determined by MARGA (ADI 2080; Applikon Analytical B.V., the Netherlands). Hourly OC and EC concentrations were measured using a semi-continuous OC/EC analyzer (RT-4, Sunset laboratory Inc., USA) with the NIOSH method 5040 <xref ref-type="bibr" rid="bib1.bibx6" id="paren.9"/>. An Xact 625 ambient metals monitor (Cooper Environmental, United States) was configured to quantify twenty-three elements (K, Fe, Zn, Ca, Si, Mn, Pb, Cu, Ti, As, V, Ba, Cr, Se, Ag, Cd, Ni, Au, Co, Sn, Sb, Tl, and Hg). Detailed information on the monitoring site, instrument setup and maintenance, and chemical analysis were reported by studies <xref ref-type="bibr" rid="bib1.bibx58 bib1.bibx59 bib1.bibx56" id="paren.10"/>.</p>
      <p id="d2e439">The dataset used in this study comprises inorganic ions (NH<inline-formula><mml:math id="M24" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">4</mml:mn><mml:mo>+</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, SO<inline-formula><mml:math id="M25" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">4</mml:mn><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula>, and NO<inline-formula><mml:math id="M26" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>), trace elements (K, Fe, Zn, Ca, Si, Mn, Pb, Cu, Ti, As, V, Ba, Cr, and Se), and carbonaceous materials (OC and EC), which were measured from 1 October 2017, 01:00 am local time (LT, GMT+8) to 30 November 2017, 11:00 pm LT. The summary for the missing of raw data can be seen in Table S1 in the Supplement.</p>
</sec>
<sec id="Ch1.S2.SS2">
  <label>2.2</label><title>Missing Data Generation</title>
      <p id="d2e489">Four factors are considered to affect the performance of imputation methods: the missing-data generation mechanism, the proportion of missing data, the gap pattern of missing data <xref ref-type="bibr" rid="bib1.bibx20" id="paren.11"/>, and whether multiple species are missing simultaneously (MCMS) or independently (MCMI). Specifically, MCMS refers to the simultaneous absence of multiple species at a single timestamp, while MCMI denotes missing values occurring independently at distinct timestamps.</p>
      <p id="d2e495">The mechanisms of missing values are typically classified into three categories <xref ref-type="bibr" rid="bib1.bibx33" id="paren.12"/>: (i) missing completely at random (MCAR), where missing values are generated independently, namely independent of both observed and unobserved values. (ii) missing at random (MAR), where missingness is related to observed data, and (iii) missing not at random (MNAR), where missing values are related to unobserved data such as values below the detection limit (BDL). Analysis of the NEPB dataset shows no systematic association between missing occurrences and pollutant concentration levels or temporal patterns, indicating that the missingness does not follow the MNAR mechanism <xref ref-type="bibr" rid="bib1.bibx15" id="paren.13"/>. Consequently, missing data were generated randomly to ensure that the artificial missingness remained independent of pollutant concentrations.</p>
      <p id="d2e504">The proportion of missing data is a critical factor affecting imputation performance. In this study, missingness rates of 10 %, 15 %, and 20 % were imposed, matching the observed range of 10 %–20 % in the monitored dataset.</p>
      <p id="d2e507">Gap pattern refers to the proportion of different gap lengths within the total missing data. Based on the summary of the missing data, the physical meaning and prior research <xref ref-type="bibr" rid="bib1.bibx45 bib1.bibx4 bib1.bibx19 bib1.bibx48" id="paren.14"/>, gap lengths (<inline-formula><mml:math id="M27" display="inline"><mml:mi>l</mml:mi></mml:math></inline-formula>) were categorized into three types: (i) short gaps, with <inline-formula><mml:math id="M28" display="inline"><mml:mi>l</mml:mi></mml:math></inline-formula> from 1 to 6; (ii) medium gaps, with lengths greater than 6 but less than 23; and (iii) large gaps, <inline-formula><mml:math id="M29" display="inline"><mml:mi>l</mml:mi></mml:math></inline-formula> ranging from 23 to 115 consecutive values (1 to 5 d), which represents the longest gap observed in the raw dataset (Table S2).</p>

<table-wrap id="T1" specific-use="star"><label>Table 1</label><caption><p id="d2e538">Description of different missing scenarios considered in this study.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="6">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="left"/>
     <oasis:colspec colnum="4" colname="col4" align="left"/>
     <oasis:colspec colnum="5" colname="col5" align="left"/>
     <oasis:colspec colnum="6" colname="col6" align="left"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1">Missing Scenario</oasis:entry>
         <oasis:entry colname="col2">Case</oasis:entry>
         <oasis:entry colname="col3">Missing Compositions</oasis:entry>
         <oasis:entry colname="col4">Proportion</oasis:entry>
         <oasis:entry colname="col5">Missing Pattern</oasis:entry>
         <oasis:entry colname="col6">Gap</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4">(%)</oasis:entry>
         <oasis:entry colname="col5"/>
         <oasis:entry colname="col6">Length</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Scenario no. 1: Random</oasis:entry>
         <oasis:entry colname="col2">1</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M30" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NH</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>+</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M31" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M32" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NO</mml:mi><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, Ca,</oasis:entry>
         <oasis:entry colname="col4">15</oasis:entry>
         <oasis:entry colname="col5">Missing</oasis:entry>
         <oasis:entry colname="col6">Short</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Single Species Missing</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">Fe, Si, OC, EC</oasis:entry>
         <oasis:entry colname="col4"/>
         <oasis:entry colname="col5">Separately</oasis:entry>
         <oasis:entry colname="col6"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Scenario no. 2: Instrument</oasis:entry>
         <oasis:entry colname="col2">2</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M33" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NH</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>+</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M34" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M35" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NO</mml:mi><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4">10, 20</oasis:entry>
         <oasis:entry colname="col5">MCMS</oasis:entry>
         <oasis:entry colname="col6">Medium</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Failure Induced Missing</oasis:entry>
         <oasis:entry colname="col2">3</oasis:entry>
         <oasis:entry colname="col3">OC, EC</oasis:entry>
         <oasis:entry colname="col4">10, 20, 30</oasis:entry>
         <oasis:entry colname="col5">MCMS</oasis:entry>
         <oasis:entry colname="col6">Medium</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Scenario no. 3: Station-Wide</oasis:entry>
         <oasis:entry colname="col2">4</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M36" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NH</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>+</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M37" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NO</mml:mi><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4">10, 20</oasis:entry>
         <oasis:entry colname="col5">MCMS, MCMI</oasis:entry>
         <oasis:entry colname="col6">Large</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Instrument Malfunctions</oasis:entry>
         <oasis:entry colname="col2">5</oasis:entry>
         <oasis:entry colname="col3">Fe, Ca, Si, Ti</oasis:entry>
         <oasis:entry colname="col4">10, 20</oasis:entry>
         <oasis:entry colname="col5">MCMS, MCMI</oasis:entry>
         <oasis:entry colname="col6">Large</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">6</oasis:entry>
         <oasis:entry colname="col3">K, <inline-formula><mml:math id="M38" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NH</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>+</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M39" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NO</mml:mi><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4">10, 20</oasis:entry>
         <oasis:entry colname="col5">MCMS, MCMI</oasis:entry>
         <oasis:entry colname="col6">Large</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">7</oasis:entry>
         <oasis:entry colname="col3">K, OC, EC</oasis:entry>
         <oasis:entry colname="col4">10, 20</oasis:entry>
         <oasis:entry colname="col5">MCMS, MCMI</oasis:entry>
         <oasis:entry colname="col6">Large</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">8</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M40" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NH</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>+</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M41" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NO</mml:mi><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, OC, EC</oasis:entry>
         <oasis:entry colname="col4">10, 20</oasis:entry>
         <oasis:entry colname="col5">MCMS, MCMI</oasis:entry>
         <oasis:entry colname="col6">Large</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">9</oasis:entry>
         <oasis:entry colname="col3">K, <inline-formula><mml:math id="M42" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NH</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>+</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M43" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NO</mml:mi><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, OC, EC</oasis:entry>
         <oasis:entry colname="col4">10, 20</oasis:entry>
         <oasis:entry colname="col5">MCMI</oasis:entry>
         <oasis:entry colname="col6">Large</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d2e1000">In summary, missing data were generated according to the scenarios listed in Table <xref ref-type="table" rid="T1"/>. These scenarios include: (i) random single-species missing that create short gaps in individual species (Case 1); (ii) instrument-failure-induced missing that produce medium gaps across all species measured by a given instrument, affecting ionic (Case 2) and carbonaceous (Case 3) monitors; and (iii) station-wide instrument malfunctions that result in large gaps spanning multiple species. These scenarios include malfunction of the ionic monitoring instrument (Case 4) and elemental monitoring instrument (Case 5), concurrent malfunction of two instruments (Cases 6–8), and malfunction of all monitoring instruments (Case 9). Potassium (K) was treated as missing in multi-instrument malfunction scenarios (Cases 6, 7 and 9) due to its strong correlations with both ionic and carbonaceous species (Fig. S1). The performance of the imputation methods was evaluated using the coefficient of determination (<inline-formula><mml:math id="M44" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>), the mean absolute percentage error (MAPE), and the index of agreement (IoA) (Sect. S1.2).</p>
</sec>
<sec id="Ch1.S2.SS3">
  <label>2.3</label><title>Source-Receptor Informed Positive Matrix Factorization Reconstruction (PMFr) and Validation</title>
      <p id="d2e1024">A source-tracer for imputation, hereafter referred to as a tracer, is defined as a key species that distinguishes a specific factor (source) from others and reflects how that factor (source) influences the receptor over time. Co-tracers refer to co-varying tracers of the same factor, collectively characterizing the temporal behavior of the corresponding source. As illustrated in Fig. <xref ref-type="fig" rid="F1"/>, PMF is first applied to resolve factor profiles and their contributions, providing source-receptor relationships constrained by expert knowledge, given that pollution sources imprint distinct temporal patterns on the receptor site. Details of the usage of PMF for SA can be found in the literature, and the uncertainty settings are provided in Sect. S1.3 <xref ref-type="bibr" rid="bib1.bibx17 bib1.bibx40 bib1.bibx41" id="paren.15"/>. Based on the SA results with selected source profiles, species requiring imputation are classified as tracers or non-tracers through a knowledge-driven step <xref ref-type="bibr" rid="bib1.bibx5" id="paren.16"/>. When imputing tracers, the availability of co-tracers should be checked at each timestamp before reconstruction, because the source contribution vector (<inline-formula><mml:math id="M45" display="inline"><mml:mi mathvariant="bold-italic">g</mml:mi></mml:math></inline-formula>) needs to be constrained by source-specific tracer information. If all tracers associated with a specific factor are simultaneously missing, the corresponding <inline-formula><mml:math id="M46" display="inline"><mml:mi mathvariant="bold-italic">g</mml:mi></mml:math></inline-formula> vector is less directly constrained by observed species; in such cases, these missing tracer values are first imputed using another imputation method, with KNN recommended for its simplicity, efficiency, and ability to provide a reasonable estimate of temporal variation. The corresponding uncertainty is set to 10 % of the imputed concentration. For missing tracers with available co-tracers, as well as for non-tracers, missing values are replaced by the geometric mean. The uncertainty calculation is further discussed in Sect. S1.4. The pre-imputed dataset and its associated uncertainty matrix are then fed into the PMF analysis for reconstruction. The PMF run decomposes the dataset into factor profiles (<inline-formula><mml:math id="M47" display="inline"><mml:mi mathvariant="bold">F</mml:mi></mml:math></inline-formula>) and source contributions (<inline-formula><mml:math id="M48" display="inline"><mml:mi mathvariant="bold">G</mml:mi></mml:math></inline-formula>), and data reconstruction is achieved by multiplying the <inline-formula><mml:math id="M49" display="inline"><mml:mi mathvariant="bold">G</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M50" display="inline"><mml:mi mathvariant="bold">F</mml:mi></mml:math></inline-formula> matrices. Rather than relying directly on covariance in the high-dimensional chemical dataset, PMFr reconstructs missing values within this low-entropy source structure represented by PMF-resolved source profiles and temporal contributions.</p>

      <fig id="F1" specific-use="star"><label>Figure 1</label><caption><p id="d2e1080">Flow chart of source-receptor informed Positive Matrix Factorization Reconstruction (PMFr) and validation.</p></caption>
          <graphic xlink:href="https://amt.copernicus.org/articles/19/4219/2026/amt-19-4219-2026-f01.png"/>

        </fig>

      <p id="d2e1089">The performance of PMFr was evaluated using two complementary validation endpoints: direct reconstruction accuracy and physical source-feature preservation. The reconstructed concentrations were directly compared with observed values and benchmarked against baseline methods, including LI, KNN, DBN, BPCA, and geometric mean imputation (Mean), using <inline-formula><mml:math id="M51" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>, IoA, and MAPE. The U.S. EPA PMF 5.0 User Guide recommends handling missing values by replacing them with the species median and assigning a high uncertainty to downweight these substituted values. Here, missing values were replaced by the species-specific geometric mean, following the same constant-substitution and downweighting principle. Because the geometric mean is also a robust central value for skewed data and was adopted in the previous PMF analysis using the same hourly PM<sub>2.5</sub> speciation dataset <xref ref-type="bibr" rid="bib1.bibx56" id="paren.17"/>, it was used here as a representative conventional PMF missing-value treatment for comparison with PMFr. Physical source-feature preservation was further assessed by comparing the PMF-resolved source profiles and corresponding source contributions obtained from different imputed datasets with those derived from the original complete dataset.</p>
</sec>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>Results and Discussion</title>
<sec id="Ch1.S3.SS1">
  <label>3.1</label><title>Source-receptor relationship resolved by PMF</title>
      <p id="d2e1131">PMF solutions were explored from four to nine factors using datasets containing 10 % missing values. The best-fitting solution was selected by the model performance, including the interpretability of the factor profiles, which is essential for determining the optimal factor number and imputation, and the distributions of scaled residuals <xref ref-type="bibr" rid="bib1.bibx47 bib1.bibx7 bib1.bibx12 bib1.bibx13" id="paren.18"/> (Figs. S2 and S3). Bootstrapping (BS), displacement (DISP), and combined BS-DISP analyses were also performed for these solutions<xref ref-type="bibr" rid="bib1.bibx42 bib1.bibx34 bib1.bibx54" id="paren.19"/>. Four-to-six factor solutions were statistically insufficient to fully explain the variance in the input data matrix. When the factor number increased from six to seven, the <inline-formula><mml:math id="M53" display="inline"><mml:mrow><mml:mi>Q</mml:mi><mml:mo>/</mml:mo><mml:msub><mml:mi>Q</mml:mi><mml:mi mathvariant="normal">exp</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> ratio experienced a decline of 11.2 %. This drop indicates that the 6-factor model leaves a substantial amount of residual variance unexplained. Specifically, the 5-factor solution improperly lumped on-road traffic emissions with metal smelting (Fig. S5). In the 6-factor solution, sulfate and nitrate were mixed together as a single identified secondary inorganic aerosol factor (Fig. S6). Eight and nine factor solutions demonstrated statistical over-resolution with diminishing returns. As the factor number increased from seven to eight, the <inline-formula><mml:math id="M54" display="inline"><mml:mrow><mml:mi>Q</mml:mi><mml:mo>/</mml:mo><mml:msub><mml:mi>Q</mml:mi><mml:mi mathvariant="normal">exp</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> ratio dropped less dramatically (8.5 %) compared to the previous step. Furthermore, the 8-factor solution exhibited a high unmapped rate during the BS analysis, highlighting statistical instability. From a physical perspective, these higher-factor solutions over-resolved the data into physically meaningless profiles. For instance, the 8-factor solution isolated a Cu-high loading factor that lacks a clear chemical profile (Fig. S7), while the 9-factor solution further fragmented the coal combustion into two unidentifiable sources (Fig. S8). For the 7-factor solution, the model predicted concentrations of tracers such as Ca, V, <inline-formula><mml:math id="M55" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NH</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>+</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M56" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NO</mml:mi><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> correlated with the observed values with coefficients of determination (<inline-formula><mml:math id="M57" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>) of 0.92, 0.91, 0.98, and 0.88, respectively (Table S4). The high <inline-formula><mml:math id="M58" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> values of bulk species indicate that the 7-factor model fits well for the data. For tracers like Si, Mn, Se, and Cu, the scaled residuals follow a normal distribution with a mean of 0 and a variance of 1. For bulk species like <inline-formula><mml:math id="M59" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NH</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>+</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M60" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NO</mml:mi><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M61" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula>, the scaled residuals exhibit a small-tailed distribution, with the highest frequency concentrated near 0 and ranging from <inline-formula><mml:math id="M62" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>2 to 2. Additionally, the scaled residuals of the bulk species OC and EC follow a normal distribution with a mean of 0 and a variance of 1. The distribution of scaled residuals demonstrates the validity of our solution. Physically, the 7-factor solution successfully decouples all distinct emission sources without redundant splitting. The first factor was interpreted as Coal Combustion (CC), characterized by high explained variances of Pb, As, and Se <xref ref-type="bibr" rid="bib1.bibx10" id="paren.20"/> and higher daytime concentrations (Fig. S10a) <xref ref-type="bibr" rid="bib1.bibx30" id="paren.21"/>. These species have relatively low DISP intervals. The Heavy Oil Combustion (HOC) was characterized by V and Ni, which are tracers of HOC <xref ref-type="bibr" rid="bib1.bibx3" id="paren.22"/>. The presence of HOC is consistent with the fact that Nanjing is the biggest container port on the Yangtze River. The Metal Smelting (MS) factor was identified with high Cr, Fe, Mn, Zn and Ni explained variances. Cr, Mn, Zn and Fe are typically emitted from iron and steel production <xref ref-type="bibr" rid="bib1.bibx9 bib1.bibx43" id="paren.23"/>, and they exhibit relatively low DISP intervals. Cu and Ba, along with high loadings of OC and EC, serve as tracers for On-road Traffic (OT), reflecting vehicle exhaust and non-exhaust emissions such as brake and tire wear <xref ref-type="bibr" rid="bib1.bibx37 bib1.bibx3" id="paren.24"/>. OT is also identified by high loadings of OC and EC, with the increase of concentration during the rush hour (Fig. S10d). The Crustal Dust (CD) factor is composed of crustal elements Ca, Si, Ba, Fe and Ti <xref ref-type="bibr" rid="bib1.bibx54" id="paren.25"/>. The remaining two factors are Secondary Sulfate (SS) and Secondary Nitrate (SN), whose tracers are sulfate for SS, and ammonium and nitrate for SN, respectively. SS and SN exhibit enhanced formation around midday and nighttime, respectively (Fig. S10f and g). The following reconstruction process will be based on the 7-factor solution.</p>
</sec>
<sec id="Ch1.S3.SS2">
  <label>3.2</label><title>Comparison of Imputation Methods under Different Missing Scenarios</title>
<sec id="Ch1.S3.SS2.SSS1">
  <label>3.2.1</label><title>Overall Performance under All Missing Scenarios</title>
      <p id="d2e1303">As shown in Fig. S30, the PMFr method achieves the overall <inline-formula><mml:math id="M63" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> of 0.81 and MAPE of 22.8 % under the three evaluated missing scenarios. In comparison, DBN results in an <inline-formula><mml:math id="M64" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> of 0.73 and a MAPE of 32.2 %, BPCA yields an <inline-formula><mml:math id="M65" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> of 0.72 and a MAPE of 30.6 %, and KNN achieves an <inline-formula><mml:math id="M66" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> of 0.72 and a MAPE of 31.2 %. For simple baseline methods, LI produces an <inline-formula><mml:math id="M67" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> of 0.35 and a high MAPE of 61.7 %, while the geometric mean imputation method (Mean) results in a higher MAPE of 66.75 %. Given that mean imputation produces a constant value without temporal variation and consistently fails to provide effective reconstruction across individual scenarios (Figs. S11–S29), its performance is solely quantified by MAPE here and is excluded from further detailed comparisons in subsequent sections. Furthermore, the Taylor diagram (Fig. S31) illustrates that the PMFr reconstructed data yield a normalized standard deviation (<inline-formula><mml:math id="M68" display="inline"><mml:mi mathvariant="italic">σ</mml:mi></mml:math></inline-formula>) of 0.93, closely matching the observational variance (<inline-formula><mml:math id="M69" display="inline"><mml:mrow><mml:mi mathvariant="italic">σ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1.0</mml:mn></mml:mrow></mml:math></inline-formula>), suggesting its capability to capture the amplitude of data variations.</p>
</sec>
<sec id="Ch1.S3.SS2.SSS2">
  <label>3.2.2</label><title>Scenario no. 1: Random Single Species Missing</title>
      <p id="d2e1389">As shown in Fig. <xref ref-type="fig" rid="F2"/>, PMFr achieves the highest mean IoA (0.96) and lowest mean MAPE (16.88 %), both with low standard deviations. Both PMFr and DBN attain the highest mean <inline-formula><mml:math id="M70" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> of 0.86, with DBN exhibiting a lower standard deviation.</p>

      <fig id="F2" specific-use="star"><label>Figure 2</label><caption><p id="d2e1407">Performance of five imputation methods across nine Cases. Asymmetric error bars indicate the standard deviation. Points show the performance for individual species. <bold>(a)</bold> <inline-formula><mml:math id="M71" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>, <bold>(b)</bold> IoA, and <bold>(c)</bold> MAPE, where LI method is excluded due to poor performance under system failure conditions.</p></caption>
            <graphic xlink:href="https://amt.copernicus.org/articles/19/4219/2026/amt-19-4219-2026-f02.png"/>

          </fig>

      <p id="d2e1436">For inorganic ions, PMFr performs best when imputing <inline-formula><mml:math id="M72" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NH</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>+</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M73" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NO</mml:mi><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> with <inline-formula><mml:math id="M74" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> values of 0.96 and 0.91, respectively. PMFr shows the highest agreement with the observed values when imputing <inline-formula><mml:math id="M75" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NH</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>+</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M76" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NO</mml:mi><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, especially for both low and high concentrations (Figs. S11 and S12). The performance of PMFr declines when imputing <inline-formula><mml:math id="M77" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula>, with <inline-formula><mml:math id="M78" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M79" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 0.79, IoA <inline-formula><mml:math id="M80" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 0.92 and MAPE <inline-formula><mml:math id="M81" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 15.09 %. Nevertheless, PMFr still outperforms LI, KNN, and BPCA. PMFr achieves higher <inline-formula><mml:math id="M82" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> and IoA (0.83 and 0.96, respectively), but it attains a lower MAPE (15.09 %) compared to DBN (19.81 %). As shown in Fig. S12, values imputed by PMFr show better agreement with true observations when the missing data correspond to low <inline-formula><mml:math id="M83" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula> concentrations. All methods except LI struggle to accurately impute high <inline-formula><mml:math id="M84" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula> concentrations. The absence of other cations like <inline-formula><mml:math id="M85" display="inline"><mml:mrow class="chem"><mml:msup><mml:mi mathvariant="normal">Na</mml:mi><mml:mo>+</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M86" display="inline"><mml:mrow class="chem"><mml:msup><mml:mi mathvariant="normal">Mg</mml:mi><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>+</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> may impact the imputation efficiency when the missing <inline-formula><mml:math id="M87" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula> concentrations are high. This difference is likely because the formation of <inline-formula><mml:math id="M88" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">NH</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">NO</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> typically dominates the nitrate fraction, while <inline-formula><mml:math id="M89" display="inline"><mml:mrow class="chem"><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="normal">NH</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub><mml:msub><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> accounts for only a portion of the total sulfate.</p>
      <p id="d2e1677">For elements, PMFr performs well, with <inline-formula><mml:math id="M90" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> values of 0.82–0.93, IoA values of 0.95–0.98, and MAPE of 13.21 %–17.17 %, all accompanied by low standard deviations. Compared with PMFr, DBN performs better when imputing Fe, yielding higher <inline-formula><mml:math id="M91" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> and IoA, but also a higher MAPE. Conversely, PMFr shows better performance when imputing Ca, particularly for high concentration values (Fig. S14). The proposed method underestimates Fe, whereas DBN shows better consistency for missing observations that correspond to high Fe concentrations. Nevertheless, all methods fail to accurately reconstruct those high Fe concentrations (Fig. S16). LI performs better when imputing elements than ions, indicating that element concentrations fluctuate more steadily.</p>
      <p id="d2e1702">For carbonaceous materials, PMFr attains the highest IoA (0.94) for OC and the second-highest IoA (0.95) for EC, with low MAPE values of 17.42 % and 15.53 %, respectively. KNN achieves the highest <inline-formula><mml:math id="M92" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> (0.86 for OC and 0.87 for EC), although with lower IoA values compared to PMFr. DBN performs worse for OC, especially for low concentrations (Fig. S17). Although LI performs reasonably well for OC, it exhibits weak correlations with the true observations for EC, a trend also observed when imputing <inline-formula><mml:math id="M93" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NO</mml:mi><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M94" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula>, as its performance is easily affected by the distribution pattern of missing data <xref ref-type="bibr" rid="bib1.bibx21" id="paren.26"/>. EC is primarily emitted from motor vehicles, whereas OC encompasses both directly emitted primary organic carbon (POC) and secondary organic carbon (SOC) formed through atmospheric processes. The behavior of POC is consistent with partial origins from vehicular emissions, while the variations of SOC are likely associated with secondary sources such as SS and SN <xref ref-type="bibr" rid="bib1.bibx32" id="paren.27"/>. The proposed method effectively captures SOC by utilizing reasonable factor profiles, whereas other imputation methods fail to reveal the formation of SOC due to limited data. Therefore, PMFr is recommended for imputing missing components caused by random missingness.</p>
</sec>
<sec id="Ch1.S3.SS2.SSS3">
  <label>3.2.3</label><title>Scenario no. 2: Instrument Failure Induced Missing</title>
      <p id="d2e1760">As shown in Fig. <xref ref-type="fig" rid="F2"/>, PMFr achieves the highest mean <inline-formula><mml:math id="M95" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> (0.67) and IoA (0.83), and the lowest mean MAPE (36.78 %) in Case 2. Although the <inline-formula><mml:math id="M96" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> error bar is relatively broad, indicating species-dependent variability in correlation performance, the smaller MAPE error bar suggests that PMFr maintains more stable magnitude accuracy across species. The <inline-formula><mml:math id="M97" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>, IoA, and MAPE of PMFr range from 0.54–0.81, 0.59–0.95, and 27.09 %–52.01 %. Performance declines for <inline-formula><mml:math id="M98" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula>, with IoA values of 0.58 and 0.64 for 10 % and 20 % missingness, respectively. PMFr yields lower MAPE (34.09 % and 52.01 %) when the missing percentages are 10 % and 20 %, respectively. When imputing <inline-formula><mml:math id="M99" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NH</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>+</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M100" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NO</mml:mi><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, PMFr shows the best agreement with true observed values among all methods, particularly for both low and high concentrations (Figs. S18–S21), owing to the constructed source-receptor relationships, which effectively address the difficulties that machine-learning methods face in capturing extreme values.</p>
      <p id="d2e1841">In Case 3, the proposed method achieves the highest mean IoA (0.93) and the lowest mean MAPE (17.26 %), both with low standard deviations. Although performance declines relative to Case 1, PMFr remains comparable to DBN, which leverages inter-variable correlations for imputation. The decline is likely attributable to the absence of key tracers, consistent with the tracer-dependent variability observed at the NEPB site – where the strong OC-EC correlation may reflect their common origin in motor-vehicle emissions <xref ref-type="bibr" rid="bib1.bibx59" id="paren.28"/>. The performance of PMFr may be impacted because PMF tends to overestimate the loading of OC and EC in the OT factor, thereby obscuring their contributions from other sources. Nevertheless, this highlights the interdependence between OC and EC, and the greater decline observed in KNN and BPCA compared with the proposed method.</p>
</sec>
<sec id="Ch1.S3.SS2.SSS4">
  <label>3.2.4</label><title>Scenario no. 3: Station-Wide Instrument Malfunctions</title>
      <p id="d2e1855">As illustrated in Fig. <xref ref-type="fig" rid="F2"/>, PMFr achieves the highest mean <inline-formula><mml:math id="M101" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>, IoA, and the lowest mean MAPE with low standard deviations in Case 4 (0.86 %, 0.95 %, and 19.97 %, respectively) and Case 5 (0.77 %, 0.89 %, and 29.01 %, respectively). In Case 4, PMFr captures the temporal variability of the imputed species more effectively, yielding higher <inline-formula><mml:math id="M102" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> and IoA values, particularly for both low and high concentrations of <inline-formula><mml:math id="M103" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NH</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>+</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M104" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NO</mml:mi><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, indicating the stability of SN even under extreme missing cases. In Case 5, the imputation results show that all elemental species are well reconstructed, with Ti being the only exception. For Ti, PMFr demonstrates the highest accuracy, achieving IoA values between 0.82 and 0.95 and outperforming other methods, especially at low concentration levels (Figs. S26–S29). This improvement is likely associated with the predominant emission of Ti from dust sources, enabling PMFr to estimate missing values by leveraging the characteristic Ti-Ca-Si ratios in source profiles once the CD factor is identified <xref ref-type="bibr" rid="bib1.bibx54" id="paren.29"/>.</p>
      <p id="d2e1912">PMFr consistently achieves the highest mean <inline-formula><mml:math id="M105" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> (0.81–0.84), IoA (0.93), and the lowest MAPE (16.06 %–27.45 %) under Cases 6–8, all accompanied by low standard deviations. Compared with Case 4, the performance of KNN and BPCA declines in Cases 6 and 8. For KNN, IoA falls from 0.93 to 0.88 (Case 6) and 0.89 (Case 8); for BPCA, IoA declines from 0.95 to 0.89 (Case 6) and 0.90 (Case 8), with both methods showing increased standard deviations. These results indicate that KNN and BPCA become unstable when additional species correlated with <inline-formula><mml:math id="M106" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NH</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>+</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M107" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NO</mml:mi><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> are missing, with the degradation being substantial in Case 6. In contrast, PMFr remains stable with low standard deviations because <inline-formula><mml:math id="M108" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NH</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>+</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M109" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NO</mml:mi><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> are estimated from source-receptor relationships – specifically the SN profile – rather than from correlations with species such as K, which are estimated via the CC and CD profiles. In Case 9, PMFr achieves the highest mean <inline-formula><mml:math id="M110" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>, IoA, and the lowest MAPE (0.87 %, 0.94 %, and 18.79 %, respectively). The strong performance of PMFr, KNN, and BPCA in this MCMI setting is attributable to the abundant co-occurring information, even as the number of missing species increases.</p>

      <fig id="F3" specific-use="star"><label>Figure 3</label><caption><p id="d2e1992">Comparison of observed and imputed values derived from different imputation methods under Scenarios no. 3 (Cases 4–9) stratified by chemical species: <bold>(a)</bold> Inorganic ions; <bold>(b)</bold> Trace elements; and <bold>(c)</bold> Carbonaceous materials.</p></caption>
            <graphic xlink:href="https://amt.copernicus.org/articles/19/4219/2026/amt-19-4219-2026-f03.png"/>

          </fig>

      <p id="d2e2011">As shown in Fig. <xref ref-type="fig" rid="F3"/>a and c, PMFr achieves the lowest MAPE (16.45 % and 27.93 %) and the highest <inline-formula><mml:math id="M111" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> (0.81 and 0.86) and IoA (both 0.94) when imputing ionic and carbonaceous species. The performance of DBN declines for ionic species, which may be attributed to insufficient valid training samples and variables caused by long missing gaps and an increasing number of missing species <xref ref-type="bibr" rid="bib1.bibx36" id="paren.30"/> (Figs. S22–S25). Furthermore, <inline-formula><mml:math id="M112" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NH</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>+</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M113" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NO</mml:mi><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> are strongly correlated, a pattern consistent with the predominance of <inline-formula><mml:math id="M114" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">NH</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">NO</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> during the fall at the NEPB site <xref ref-type="bibr" rid="bib1.bibx59" id="paren.31"/>. The absence of either species therefore degrades the performance of machine learning methods, whereas PMFr can reconstruct the <inline-formula><mml:math id="M115" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NH</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>+</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M116" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NO</mml:mi><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> relationship using the existing source profiles. When imputing OC and EC, PMFr performs best at low concentration ranges (0–10 <inline-formula><mml:math id="M117" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>g m<sup>−3</sup>), likely due to their relatively stable emission patterns. The limitations of machine-learning methods for imputing ionic and carbonaceous species have also been reported by Lee et al. (2023), particularly when the number of missing species increases. PMFr achieves the lowest MAPE (24.25 %) for elemental species while still maintaining a high <inline-formula><mml:math id="M119" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> (0.85) and IoA (0.95), remaining comparable to DBN, which attains the highest <inline-formula><mml:math id="M120" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> (0.87) and IoA (0.97). Machine-learning methods can effectively capture correlations between a target element and co-varying elements <xref ref-type="bibr" rid="bib1.bibx31" id="paren.32"/>, and elemental species are generally emitted directly without undergoing chemical reactions <xref ref-type="bibr" rid="bib1.bibx11" id="paren.33"/>, which contributes to the strong performance of DBN when imputing elemental species.</p>
</sec>
</sec>
<sec id="Ch1.S3.SS3">
  <label>3.3</label><title>Assessing the Impact of Imputation on PMF Source Apportionment</title>
      <p id="d2e2160">Results showed that the numerical advantage of PMFr over baseline methods narrowed mainly under two challenging conditions: instrument-failure-type missingness and missingness of specific species such as <inline-formula><mml:math id="M121" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula> and crustal elements such as Fe. Accordingly, two representative cases were selected for downstream PMF evaluation: <inline-formula><mml:math id="M122" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula> missingness in Case 2 at a 10 % missing rate and high-concentration Fe missingness in Case 5 at a 20 % missing rate. The SS and CD factors were used to assess whether these imputation differences propagated into PMF-resolved source profiles and source contributions. For the SS factor (Fig. S32), the <inline-formula><mml:math id="M123" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M124" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula> <inline-formula><mml:math id="M125" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NH</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>+</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> mass ratio derived from the PMFr-completed dataset was 3.39 (<inline-formula><mml:math id="M126" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NH</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>+</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> associated with <inline-formula><mml:math id="M127" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">NH</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">NO</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> removed), close to that from the complete observed dataset (3.44). BPCA also produced a comparable ratio of 3.33, whereas LI (3.93), KNN (2.91), DBN (2.55), and Mean (3.83) showed larger deviations. This indicates that PMFr better preserved the <inline-formula><mml:math id="M128" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M129" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NH</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>+</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> relationship in the SS profile, which is critical for maintaining the chemical interpretability of the SS factor. For the CD factor (Fig. S33), PMFr also reproduced the crustal elemental ratios consistently. The Fe <inline-formula><mml:math id="M130" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula> Ca and Ca <inline-formula><mml:math id="M131" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula> Si ratios from the complete observed dataset were 0.93 and 1.64, respectively, while PMFr yielded corresponding values of 0.95 and 1.62. In contrast, larger deviations were observed for several baseline methods, such as DBN for Fe <inline-formula><mml:math id="M132" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula> Ca (1.21) and BPCA or LI for Ca <inline-formula><mml:math id="M133" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula> Si (1.43 and 1.44, respectively). These results suggest that inappropriate imputation can alter the resolved source-profile composition, whereas PMFr maintains the physical consistency of source profiles.</p>
      <p id="d2e2319">For the SS factor contributions, PMFr achieved the highest Pearson's correlation coefficient (<inline-formula><mml:math id="M134" display="inline"><mml:mi>r</mml:mi></mml:math></inline-formula>) of 0.943, followed by KNN (0.914), BPCA (0.913), LI (0.900), Mean (0.802), and DBN (0.743). For the CD factor contributions, PMFr also showed the highest temporal agreement, with an <inline-formula><mml:math id="M135" display="inline"><mml:mi>r</mml:mi></mml:math></inline-formula> of 0.954, followed by Mean (0.948), KNN (0.926), BPCA (0.925), LI (0.796), and DBN (0.706). These results indicate that competitive concentration-level imputation does not necessarily guarantee equivalent preservation of PMF-resolved source-contribution patterns. As shown in Fig. S34a, b, the PMFr-derived SS contribution closely reproduced the diurnal pattern from the original complete dataset, particularly during daytime periods when secondary sulfate formation is expected to be enhanced. Similarly, PMFr captured the diurnal variation of CD more consistently than baseline methods, especially around the daytime peak likely associated with dust resuspension and other daytime dust-related activities. The selected time-series episodes showed the same behavior (Fig. S35a, b). For Case 2 at a 20 % missing rate, PMFr achieved the highest <inline-formula><mml:math id="M136" display="inline"><mml:mi>r</mml:mi></mml:math></inline-formula> of 0.985 for SS, compared with BPCA (0.981), KNN (0.980), DBN (0.954), LI (0.936), and Mean (0.705). For the high-Fe missing case, PMFr also showed the highest agreement for CD, with an <inline-formula><mml:math id="M137" display="inline"><mml:mi>r</mml:mi></mml:math></inline-formula> of 0.951, followed by Mean (0.928), BPCA (0.888), KNN (0.881), DBN (0.713), and LI (0.683). Therefore, the advantage of PMFr is not limited to pointwise concentration accuracy; it also better preserves the chemical and temporal source structures needed for physically interpretable PMF source apportionment. These results indicate that inaccurate imputation may propagate into PMF analysis and introduce source-apportionment biases, potentially making the imputed dataset less reliable than one processed using conventional PMF missing-value treatments <xref ref-type="bibr" rid="bib1.bibx23" id="paren.34"/>.</p>
</sec>
<sec id="Ch1.S3.SS4">
  <label>3.4</label><title>Applicability and Limitations of PMFr</title>
      <p id="d2e2361">PMFr is applicable when the chemical profile of each source factor can be constrained by at least one representative tracer, ensuring that the completed remains suitable subsequent source apportionment analysis. One limitation of PMFr is related to missing patterns in which source-related constraints become insufficient. As shown in Table S7, the performance of PMFr declines when the missing pattern shifts from MCMI to MCMS. For <inline-formula><mml:math id="M138" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NH</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>+</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> at a 10 % missing rate, the MAPE increases from 9.57 % under MCMI to 20.67 % under MCMS, and the IoA decreases from 0.98 to 0.95. For <inline-formula><mml:math id="M139" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NO</mml:mi><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> at a 10 % missing rate, the MAPE increases from 14.82 % under MCMI to 23.92 % under MCMS. At a 20 % missing rate, the MAPE increases from 13.63 % to 25.87 % for <inline-formula><mml:math id="M140" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NH</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>+</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, and from 22.81 % to 28.46 % for <inline-formula><mml:math id="M141" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">NO</mml:mi><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>. As shown in Table S6, when OC and EC are simultaneously missing, the performance of PMFr becomes comparable to that of baseline methods. For instance, at a 10 % missing rate, the <inline-formula><mml:math id="M142" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> values for OC are 0.73 for PMFr, 0.74 for DBN, 0.68 for KNN, and 0.66 for BPCA. For EC, <inline-formula><mml:math id="M143" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> values are 0.84 for PMFr, 0.85 for BPCA, 0.80 for DBN, and 0.79 for KNN. Fundamentally, PMFr assumes that the source contribution vector (<inline-formula><mml:math id="M144" display="inline"><mml:mi mathvariant="bold-italic">g</mml:mi></mml:math></inline-formula>) can be sufficiently constrained by observed species, which requires at least one key tracer for each factor. The key tracers used for imputation and source identification are shown in Table S13. If all key tracers for a specific source are simultaneously missing, the corresponding source contribution vector (<inline-formula><mml:math id="M145" display="inline"><mml:mi mathvariant="bold-italic">g</mml:mi></mml:math></inline-formula>) is less directly constrained by observed species and should be interpreted with caution. Nevertheless, sensitivity analysis indicates that PMFr can still outperform baseline methods when the pre-imputation step provides a reasonable estimate of the general temporal variation of the missing species (Sect. S1.5 and Table S14). The numerical advantage of PMFr is less substantial for certain species such as <inline-formula><mml:math id="M146" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula> and crustal elements. For crustal elements, baseline methods can become competitive because these species are primarily emitted directly and usually exhibit relatively stable inter-variable correlations. As shown in Table S5, when imputing Ca, Si, and Fe at a 15 % missing rate, several statistical or machine-learning methods perform comparably to PMFr. For Ca, the <inline-formula><mml:math id="M147" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> values are 0.93 for PMFr, 0.91 for BPCA, and 0.90 for DBN. For Si, PMFr achieves an <inline-formula><mml:math id="M148" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> of 0.82, which is matched by DBN and closely followed by KNN (0.79). For Fe, the <inline-formula><mml:math id="M149" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> values are 0.83 for PMFr, 0.86 for DBN, 0.84 for KNN, and 0.84 for BPCA, with DBN and KNN achieving slightly higher IoA values than PMFr. This reduced separation suggests that statistical or machine-learning methods can capture stable co-variation patterns among some primary species, thereby reducing the relative advantage of the source-constrained PMFr method for these specific cases. However, comparable concentration-level performance does not necessarily imply that baseline methods are equally reliable for source apportionment. The PMF evaluation results showed that PMFr better preserved source-profile composition and source-contribution temporal patterns, even in representative cases where direct imputation metrics became comparable among methods.</p>
      <p id="d2e2503">Another limitation of the PMFr lies in the assumption of relatively stable source profiles. In PMFr, source profiles are assumed to remain stable so that the source-receptor relationships resolved by PMF can be used to guide missing-value reconstruction. This assumption is generally more reasonable for short-term datasets, but it may become weaker for long-term datasets, especially those spanning multiple years, during which emission patterns and atmospheric processes can change substantially. Therefore, source-profile stability should be evaluated before applying PMFr in extended applications. Moving-window PMF approaches provide a promising way to examine source-profile stability in long-term applications by resolving time-dependent factor profiles within short moving windows and screening accepted PMF solutions using source-specific criteria, such as factor-tracer correlations, diurnal patterns, and PMF diagnostics and non-modeled time points <xref ref-type="bibr" rid="bib1.bibx51 bib1.bibx8" id="paren.35"/>. Improvements of PMFr could incorporate time-dependent source profiles to address this limitation and better support reconstruction under changing atmospheric conditions.</p>
</sec>
</sec>
<sec id="Ch1.S4" sec-type="conclusions">
  <label>4</label><title>Conclusion</title>
      <p id="d2e2518">We developed a physically interpretable imputation method (PMFr) for reconstructing missing PM<sub>2.5</sub> speciation data by leveraging source-receptor relationships encoded in key chemical species. Benchmarking against commonly used imputation techniques, including Mean, LI, KNN, BPCA, and a deep learning predictive model, demonstrates that PMFr achieves improved accuracy and robustness while preserving physical and chemical interpretability, especially for key marker species. Crucially, the PMFr-completed dataset is better suited for subsequent PMF source apportionment because it preserves source-profile composition and source-contribution temporal features. Nevertheless, the advantage of PMFr may become less substantial when source-related constraints are weakened, such as when all key tracers for a specific source factor are simultaneously missing, or when baseline methods can already capture stable co-variation patterns for certain species. These chemically consistent and physically meaningful estimates also rely on the temporal stability of source chemical compositions. Recognizing the limitations of such static assumptions for long-term datasets, we highlight the necessity of systematically verifying source stability in extended applications. Therefore, this work offers a simple and generalizable solution that strengthens the reliability of real-world speciation datasets and enhances their suitability for source apportionment and policy-relevant analyses.</p>
</sec>

      
      </body>
    <back><notes notes-type="codedataavailability"><title>Code and data availability</title>

      <p id="d2e2536">The PM<sub>2.5</sub> speciation dataset utilized in this research is derived from previous studies <xref ref-type="bibr" rid="bib1.bibx58 bib1.bibx59 bib1.bibx56" id="paren.36"/>. LI, KNN, and BPCA were implemented in R version 4.3.1, and DBN was applied in python 3.6.13. For the geometric mean substitution method, the geometric mean was used as the input. LI was performed using the R package “imputeTS” <xref ref-type="bibr" rid="bib1.bibx38" id="paren.37"/> (<uri>https://CRAN.R-project.org/package=imputeTS</uri>, last access: 19 June 2026). KNN was implemented by the R package “VIM”, which is a package designed to impute numerical, semi-continuous, and categorical variables <xref ref-type="bibr" rid="bib1.bibx26" id="paren.38"/> (<uri>https://CRAN.R-project.org/package=VIM</uri>, last access: 19 June 2026). DBN was implemented using the third-party open-source Python package deep-belief-network, available at: <uri>https://github.com/albertbup/deep-belief-network</uri> (last access: 19 June 2026; <xref ref-type="bibr" rid="bib1.bibx1" id="altparen.39"/>). BPCA was selected as it is an advanced factor-based imputation method, which is mathematically similar to the proposed approach. By comparing the imputation efficiency of the proposed method with that of BPCA method, the improvement achieved by incorporating physical information can be better demonstrated. The R package “pcaMethods” was used to implement the BPCA method <xref ref-type="bibr" rid="bib1.bibx52" id="paren.40"/> (<uri>https://bioconductor.org/packages/pcaMethods/</uri>, last access: 19 June 2026).</p>
  </notes><app-group>
        <supplementary-material position="anchor"><p id="d2e2576">The supplement related to this article is available online at <inline-supplementary-material xlink:href="https://doi.org/10.5194/amt-19-4219-2026-supplement" xlink:title="pdf">https://doi.org/10.5194/amt-19-4219-2026-supplement</inline-supplementary-material>.</p></supplementary-material>
        </app-group><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d2e2585">WZ: Writing – original draft, Writing – review and editing, Visualization, Methodology, Formal analysis, Data curation. MX: Data curation, Resources. QD: Conceptualization, Supervision, Writing – review and editing. XB: Writing – review and editing. YZ: Writing – review and editing. YF: Supervision, Writing – review and editing.</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d2e2591">The contact author has declared that none of the authors has any competing interests.</p>
  </notes><notes notes-type="disclaimer"><title>Disclaimer</title>

      <p id="d2e2597">Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. The authors bear the ultimate responsibility for providing appropriate place names. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.</p>
  </notes><notes notes-type="financialsupport"><title>Financial support</title>

      <p id="d2e2603">This research has been supported by the National Natural Science Foundation of China (grant no. 42577117), the project of the Young Scientific and Technological Talents in Tianjin (grant no. QN20230350), the Tianjin Natural Science Foundation Project (grant no. 24JCYBJC01870), and the robotic AI-Scientist platform of Chinese Academy of Sciences.</p>
  </notes><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d2e2609">This paper was edited by Haichao Wang and reviewed by Cheng Wu and two anonymous referees.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bibx1"><label>albertbup(2026)</label><mixed-citation>albertbup: deep-belief-network: A Python implementation of Deep Belief Networks built upon NumPy and TensorFlow with scikit-learn compatibility, GitHub repository [code], <uri>https://github.com/albertbup/deep-belief-network</uri>, last access: 19 June 2026.</mixed-citation></ref>
      <ref id="bib1.bibx2"><label>Alwateer et al.(2024)</label><mixed-citation>Alwateer, M., Atlam, E.-S., Abd El-Raouf, M. M., Ghoneim, O. A., and Gad, I.: Missing data imputation: A comprehensive review, Journal of Computer and Communications, 12, 53–75, <ext-link xlink:href="https://doi.org/10.4236/jcc.2024.1211004" ext-link-type="DOI">10.4236/jcc.2024.1211004</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx3"><label>Becagli et al.(2012)</label><mixed-citation>Becagli, S., Sferlazzo, D. M., Pace, G., di Sarra, A., Bommarito, C., Calzolai, G., Ghedini, C., Lucarelli, F., Meloni, D., Monteleone, F., Severi, M., Traversi, R., and Udisti, R.: Evidence for heavy fuel oil combustion aerosols from chemical analyses at the island of Lampedusa: a possible large role of ships emissions in the Mediterranean, Atmos. Chem. Phys., 12, 3479–3492, <ext-link xlink:href="https://doi.org/10.5194/acp-12-3479-2012" ext-link-type="DOI">10.5194/acp-12-3479-2012</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx4"><label>Betancourt et al.(2023)</label><mixed-citation>Betancourt, C., Li, C. W. Y., Kleinert, F., and Schultz, M. G.: Graph Machine Learning for Improved Imputation of Missing Tropospheric Ozone Data, Environ. Sci. Technol., 57, 18246–18258, <ext-link xlink:href="https://doi.org/10.1021/acs.est.3c05104" ext-link-type="DOI">10.1021/acs.est.3c05104</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx5"><label>Bi et al.(2019)</label><mixed-citation>Bi, X., Dai, Q., Wu, J., Zhang, Q., Zhang, W., Luo, R., Cheng, Y., Zhang, J., Wang, L., Yu, Z., Zhang, Y., Tian, Y., and Feng, Y.: Characteristics of the main primary source profiles of particulate matter across China from 1987 to 2017, Atmos. Chem. Phys., 19, 3223–3243, <ext-link xlink:href="https://doi.org/10.5194/acp-19-3223-2019" ext-link-type="DOI">10.5194/acp-19-3223-2019</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx6"><label>Birch(2002)</label><mixed-citation>Birch, M. E.: Occupational monitoring of particulate diesel exhaust by NIOSH method 5040, Applied Occupational and Environmental Hygiene, 17, 400–405, <ext-link xlink:href="https://doi.org/10.1080/10473220290035390" ext-link-type="DOI">10.1080/10473220290035390</ext-link>, 2002.</mixed-citation></ref>
      <ref id="bib1.bibx7"><label>Brown et al.(2015)</label><mixed-citation>Brown, S. G., Eberly, S., Paatero, P., and Norris, G. A.: Methods for estimating uncertainty in PMF solutions: Examples with ambient air and water quality data and guidance on reporting PMF results, Sci. Total Environ., 518–519, 626–635, <ext-link xlink:href="https://doi.org/10.1016/j.scitotenv.2015.01.022" ext-link-type="DOI">10.1016/j.scitotenv.2015.01.022</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx8"><label>Canonaco et al.(2021)</label><mixed-citation>Canonaco, F., Tobler, A., Chen, G., Sosedova, Y., Slowik, J. G., Bozzetti, C., Daellenbach, K. R., El Haddad, I., Crippa, M., Huang, R.-J., Furger, M., Baltensperger, U., and Prévôt, A. S. H.: A new method for long-term source apportionment with time-dependent factor profiles and uncertainty assessment using SoFi Pro: application to 1 year of organic aerosol data, Atmos. Meas. Tech., 14, 923–943, <ext-link xlink:href="https://doi.org/10.5194/amt-14-923-2021" ext-link-type="DOI">10.5194/amt-14-923-2021</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx9"><label>Chang et al.(2018)</label><mixed-citation>Chang, Y., Huang, K., Xie, M., Deng, C., Zou, Z., Liu, S., and Zhang, Y.: First long-term and near real-time measurement of trace elements in China's urban atmosphere: temporal variability, source apportionment and precipitation effect, Atmos. Chem. Phys., 18, 11793–11812, <ext-link xlink:href="https://doi.org/10.5194/acp-18-11793-2018" ext-link-type="DOI">10.5194/acp-18-11793-2018</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx10"><label>Cheng et al.(2015)</label><mixed-citation>Cheng, K., Wang, Y., Tian, H., Gao, X., Zhang, Y., Wu, X., Zhu, C., and Gao, J.: Atmospheric Emission Characteristics and Control Policies of Five Precedent-Controlled Toxic Heavy Metals from Anthropogenic Sources in China, Environ. Sci. Technol., 49, 1206–1214, <ext-link xlink:href="https://doi.org/10.1021/es5037332" ext-link-type="DOI">10.1021/es5037332</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx11"><label>Choi et al.(2022)</label><mixed-citation>Choi, E., Yi, S.-M., Lee, Y. S., Jo, H., Baek, S.-O., and Heo, J.-B.: Sources of airborne particulate matter-bound metals and spatial-seasonal variability of health risk potentials in four large cities, South Korea, Environ. Sci. Pollut. R., 29, 28359–28374, <ext-link xlink:href="https://doi.org/10.1007/s11356-021-18445-8" ext-link-type="DOI">10.1007/s11356-021-18445-8</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx12"><label>Dai et al.(2020)</label><mixed-citation>Dai, Q., Liu, B., Bi, X., Wu, J., Liang, D., Zhang, Y., Feng, Y., and Hopke, P. K.: Dispersion Normalized PMF Provides Insights into the Significant Changes in Source Contributions to PM2.5 after the COVID-19 Outbreak, Environ. Sci. Technol., 54, 9917–9927, <ext-link xlink:href="https://doi.org/10.1021/acs.est.0c02776" ext-link-type="DOI">10.1021/acs.est.0c02776</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx13"><label>Dai et al.(2021)</label><mixed-citation>Dai, Q., Ding, J., Song, C., Liu, B., Bi, X., Wu, J., Zhang, Y., Feng, Y., and Hopke, P. K.: Changes in source contributions to particle number concentrations after the COVID-19 outbreak: Insights from a dispersion normalized PMF, Sci. Total Environ., 759, 143548, <ext-link xlink:href="https://doi.org/10.1016/j.scitotenv.2020.143548" ext-link-type="DOI">10.1016/j.scitotenv.2020.143548</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx14"><label>Emmanuel et al.(2021)</label><mixed-citation>Emmanuel, T., Maupong, T., Mpoeleng, D., Semong, T., Mphago, B., and Tabona, O.: A survey on missing data in machine learning, Journal of Big Data, 8, 1–37, <ext-link xlink:href="https://doi.org/10.1186/s40537-021-00516-9" ext-link-type="DOI">10.1186/s40537-021-00516-9</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx15"><label>García-Laencina et al.(2010)</label><mixed-citation>García-Laencina, P. J., Sancho-Gómez, J.-L., and Figueiras-Vidal, A. R.: Pattern classification with missing data: a review, Neural Comput. Appl., 19, 263–282, <ext-link xlink:href="https://doi.org/10.1007/s00521-009-0295-6" ext-link-type="DOI">10.1007/s00521-009-0295-6</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx16"><label>Hao et al.(2023)</label><mixed-citation>Hao, H., Wang, Y., Zhu, Q., Zhang, H., Rosenberg, A., Schwartz, J., Amini, H., van Donkelaar, A., Martin, R., Liu, P., Weber, R., Russel, A., Yitshak-sade, M., Chang, H., and Shi, L.: National Cohort Study of Long-Term Exposure to PM2.5 Components and Mortality in Medicare American Older Adults, Environ. Sci. Technol., 57, 6835–6843, <ext-link xlink:href="https://doi.org/10.1021/acs.est.2c07064" ext-link-type="DOI">10.1021/acs.est.2c07064</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx17"><label>Hopke(2016)</label><mixed-citation>Hopke, P. K.: Review of receptor modeling methods for source apportionment, J. Air Waste Manage., 66, 237–259, <ext-link xlink:href="https://doi.org/10.1080/10962247.2016.1140693" ext-link-type="DOI">10.1080/10962247.2016.1140693</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx18"><label>Hu et al.(2014)</label><mixed-citation>Hu, J., Zhang, H., Chen, S., Ying, Q., Wiedinmyer, C., Vandenberghe, F., and Kleeman, M. J.: Identifying PM2.5 and PM0.1 Sources for Epidemiological Studies in California, Environ. Sci. Technol., 48, 4980–4990, <ext-link xlink:href="https://doi.org/10.1021/es404810z" ext-link-type="DOI">10.1021/es404810z</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx19"><label>Jing et al.(2022)</label><mixed-citation>Jing, X., Luo, J., Wang, J., Zuo, G., and Wei, N.: A Multi-imputation method to deal with hydro-meteorological missing values by integrating chain equations and random forest, Water Resour. Manag., 36, 1159–1173, <ext-link xlink:href="https://doi.org/10.1007/s11269-021-03037-5" ext-link-type="DOI">10.1007/s11269-021-03037-5</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx20"><label>Junger and Ponce de Leon(2015)</label><mixed-citation>Junger, W. and Ponce de Leon, A.: Imputation of missing data in time series for air pollutants, Atmos. Environ., 102, 96–104, <ext-link xlink:href="https://doi.org/10.1016/j.atmosenv.2014.11.049" ext-link-type="DOI">10.1016/j.atmosenv.2014.11.049</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx21"><label>Junninen et al.(2004)</label><mixed-citation>Junninen, H., Niska, H., Tuppurainen, K., Ruuskanen, J., and Kolehmainen, M.: Methods for imputation of missing values in air quality data sets, Atmos. Environ., 38, 2895–2907, <ext-link xlink:href="https://doi.org/10.1016/j.atmosenv.2004.02.026" ext-link-type="DOI">10.1016/j.atmosenv.2004.02.026</ext-link>, 2004.</mixed-citation></ref>
      <ref id="bib1.bibx22"><label>Khan and Hoque(2020)</label><mixed-citation>Khan, S. I. and Hoque, A. S. M. L.: SICE: an improved missing data imputation technique, Journal of Big Data, 7, 37, <ext-link xlink:href="https://doi.org/10.1186/s40537-020-00313-w" ext-link-type="DOI">10.1186/s40537-020-00313-w</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx23"><label>Kim et al.(2024)</label><mixed-citation>Kim, Y., Yi, S.-M., Heo, J., Kim, H., Lee, W., Kim, H., Hopke, P. K., Lee, Y. S., Shin, H.-J., Park, J., Yoo, M., Jeon, K., and Park, J.: Is replacing missing values of PM2.5 constituents with estimates using machine learning better for source apportionment than exclusion or median replacement?, Environ. Pollut., 354, 124165, <ext-link xlink:href="https://doi.org/10.1016/j.envpol.2024.124165" ext-link-type="DOI">10.1016/j.envpol.2024.124165</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx24"><label>Kim et al.(2025a)</label><mixed-citation>Kim, Y., Hopke, P. K., Yi, S.-M., Lee, W., Kim, H., Heo, J., Kim, H., Lee, Y. S., Jeon, K., and Park, J.: Positive matrix factorization outperforms machine learning in imputing missing PM2.5 and further identifying spatial patterns in multi-sites without external data, Urban Climate, 62, 102552, <ext-link xlink:href="https://doi.org/10.1016/j.uclim.2025.102552" ext-link-type="DOI">10.1016/j.uclim.2025.102552</ext-link>, 2025a.</mixed-citation></ref>
      <ref id="bib1.bibx25"><label>Kim et al.(2025b)</label><mixed-citation>Kim, Y., Kang, C., Yi, S.-M., Heo, J., Kim, H., Lee, W., Kim, H., Hopke, P. K., Lee, Y. S., Shin, H.-J., Park, J., Yoo, M., Jeon, K., and Park, J.: Imputing missing data with statistical-learning estimates: impacts on mortality risks attributable to area- and source-specific PM2.5, Atmos. Pollut. Res., 102785, <ext-link xlink:href="https://doi.org/10.1016/j.apr.2025.102785" ext-link-type="DOI">10.1016/j.apr.2025.102785</ext-link>, 2025b.</mixed-citation></ref>
      <ref id="bib1.bibx26"><label>Kowarik and Templ(2016)</label><mixed-citation>Kowarik, A. and Templ, M.: Imputation with the R Package VIM, J. Stat. Softw., 74, 1–16, <ext-link xlink:href="https://doi.org/10.18637/jss.v074.i07" ext-link-type="DOI">10.18637/jss.v074.i07</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx27"><label>Lai and Kuok(2019)</label><mixed-citation>Lai, W. Y. and Kuok, K.: A study on bayesian principal component analysis for addressing missing rainfall data, Water Resour. Manag., 33, 2615–2628, <ext-link xlink:href="https://doi.org/10.1007/s11269-019-02209-8" ext-link-type="DOI">10.1007/s11269-019-02209-8</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx28"><label>Lee et al.(2024)</label><mixed-citation>Lee, S.-J., Ju, J.-T., Lee, J.-J., Song, C.-K., Shin, S.-A., Jung, H.-J., Shin, H. J., and Choi, S.-D.: Mapping nationwide concentrations of sulfate and nitrate in ambient PM2.5 in South Korea using machine learning with ground observation data, Sci. Total Environ., 926, 171884, <ext-link xlink:href="https://doi.org/10.1016/j.scitotenv.2024.171884" ext-link-type="DOI">10.1016/j.scitotenv.2024.171884</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx29"><label>Lee et al.(2023)</label><mixed-citation>Lee, Y. S., Choi, E., Park, M., Jo, H., Park, M., Nam, E., Kim, D. G., Yi, S.-M., and Kim, J. Y.: Feature extraction and prediction of fine particulate matter (PM2.5) chemical constituents using four machine learning models, Expert Syst. Appl., 221, 119696, <ext-link xlink:href="https://doi.org/10.1016/j.eswa.2023.119696" ext-link-type="DOI">10.1016/j.eswa.2023.119696</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx30"><label>Li et al.(2020)</label><mixed-citation>Li, R., Wang, Q., He, X., Zhu, S., Zhang, K., Duan, Y., Fu, Q., Qiao, L., Wang, Y., Huang, L., Li, L., and Yu, J. Z.: Source apportionment of PM<sub>2.5</sub> in Shanghai based on hourly organic molecular markers and other source tracers, Atmos. Chem. Phys., 20, 12047–12061, <ext-link xlink:href="https://doi.org/10.5194/acp-20-12047-2020" ext-link-type="DOI">10.5194/acp-20-12047-2020</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx31"><label>Li et al.(2023)</label><mixed-citation>Li, R., Gao, Y., Chen, Y., Peng, M., Zhao, W., Wang, G., and Hao, J.: Measurement report: Rapid changes of chemical characteristics and health risks for highly time resolved trace elements in PM<sub>2.5</sub> in a typical industrial city in response to stringent clean air actions, Atmos. Chem. Phys., 23, 4709–4726, <ext-link xlink:href="https://doi.org/10.5194/acp-23-4709-2023" ext-link-type="DOI">10.5194/acp-23-4709-2023</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx32"><label>Liao et al.(2023)</label><mixed-citation>Liao, K., Wang, Q., Wang, S., and Yu, J. Z.: Bayesian Inference Approach to Quantify Primary and Secondary Organic Carbon in Fine Particulate Matter Using Major Species Measurements, Environ. Sci. Technol., 57, 5169–5179, <ext-link xlink:href="https://doi.org/10.1021/acs.est.2c09412" ext-link-type="DOI">10.1021/acs.est.2c09412</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx33"><label>Little and Rubin(2019)</label><mixed-citation>Little, R. J. and Rubin, D. B.: Statistical analysis with missing data, John Wiley &amp; Sons, <ext-link xlink:href="https://doi.org/10.1002/9781119482260" ext-link-type="DOI">10.1002/9781119482260</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx34"><label>Liu et al.(2017)</label><mixed-citation>Liu, B., Wu, J., Zhang, J., Wang, L., Yang, J., Liang, D., Dai, Q., Bi, X., Feng, Y., Zhang, Y., and Zhang, Q.: Characterization and source apportionment of PM2.5 based on error estimation from EPA PMF 5.0 model at a medium city in China, Environ. Pollut., 222, 10–22, <ext-link xlink:href="https://doi.org/10.1016/j.envpol.2017.01.005" ext-link-type="DOI">10.1016/j.envpol.2017.01.005</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx35"><label>Liu and Matsui(2021)</label><mixed-citation>Liu, M. and Matsui, H.: Aerosol radiative forcings induced by substantial changes in anthropogenic emissions in China from 2008 to 2016, Atmos. Chem. Phys., 21, 5965–5982, <ext-link xlink:href="https://doi.org/10.5194/acp-21-5965-2021" ext-link-type="DOI">10.5194/acp-21-5965-2021</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx36"><label>Liu et al.(2022)</label><mixed-citation>Liu, X., Fu, Y., Wang, Q., Bi, Y., Zhang, L., Zhao, G., Xian, F., Cheng, P., Zhang, L., Zhou, J., and Zhou, W.: Unraveling the process of aerosols secondary formation and removal based on cosmogenic beryllium-7 and beryllium-10, Sci. Total Environ., 821, 153293, <ext-link xlink:href="https://doi.org/10.1016/j.scitotenv.2022.153293" ext-link-type="DOI">10.1016/j.scitotenv.2022.153293</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx37"><label>McKenzie et al.(2009)</label><mixed-citation>McKenzie, E. R., Money, J. E., Green, P. G., and Young, T. M.: Metals associated with stormwater-relevant brake and tire samples, Sci. Total Environ., 407, 5855–5860, <ext-link xlink:href="https://doi.org/10.1016/j.scitotenv.2009.07.018" ext-link-type="DOI">10.1016/j.scitotenv.2009.07.018</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx38"><label>Moritz and Bartz-Beielstein(2017)</label><mixed-citation>Moritz, S. and Bartz-Beielstein, T.: imputeTS: Time Series Missing Value Imputation in R, R J., 9, 207–218, <ext-link xlink:href="https://doi.org/10.32614/RJ-2017-009" ext-link-type="DOI">10.32614/RJ-2017-009</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx39"><label>Norris et al.(2014)</label><mixed-citation>Norris, G., Duvall, R., Brown, S., and Bai, S.: EPA Positive Matrix Factorization (PMF) 5.0 Fundamentals and User Guide, U.S. Environmental Protection Agency, Office of Research and Development, Washington, DC, USA, EPA/600/R-14/108, <uri>https://www.epa.gov/sites/default/files/2015-02/documents/pmf_5.0_user_guide.pdf</uri> (last access: 19 June 2026), 2014.</mixed-citation></ref>
      <ref id="bib1.bibx40"><label>Paatero(1999)</label><mixed-citation>Paatero, P.: The Multilinear Engine: A Table-Driven, Least Squares Program for Solving Multilinear Problems, including the n-Way Parallel Factor Analysis Model, J. Comput. Graph. Stat., 8, 854–888, <ext-link xlink:href="https://doi.org/10.1080/10618600.1999.10474853" ext-link-type="DOI">10.1080/10618600.1999.10474853</ext-link>, 1999.</mixed-citation></ref>
      <ref id="bib1.bibx41"><label>Paatero and Hopke(2003)</label><mixed-citation>Paatero, P. and Hopke, P. K.: Discarding or downweighting high-noise variables in factor analytic models, Anal. Chim. Acta, 490, 277–289, <ext-link xlink:href="https://doi.org/10.1016/S0003-2670(02)01643-4" ext-link-type="DOI">10.1016/S0003-2670(02)01643-4</ext-link>, 2003.</mixed-citation></ref>
      <ref id="bib1.bibx42"><label>Paatero et al.(2014)</label><mixed-citation>Paatero, P., Eberly, S., Brown, S. G., and Norris, G. A.: Methods for estimating uncertainty in factor analytic solutions, Atmos. Meas. Tech., 7, 781–797, <ext-link xlink:href="https://doi.org/10.5194/amt-7-781-2014" ext-link-type="DOI">10.5194/amt-7-781-2014</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx43"><label>Pekney et al.(2006)</label><mixed-citation>Pekney, N. J., Davidson, C. I., Robinson, A., Zhou, L., Hopke, P., Eatough, D., and Rogge, W. F.: Major source categories for PM2.5 in Pittsburgh using PMF and UNMIX, Aerosol Sci. Tech., 40, 910–924, <ext-link xlink:href="https://doi.org/10.1080/02786820500380271" ext-link-type="DOI">10.1080/02786820500380271</ext-link>, 2006.</mixed-citation></ref>
      <ref id="bib1.bibx44"><label>Peng et al.(2023)</label><mixed-citation>Peng, X., Xie, T.-T., Tang, M.-X., Cheng, Y., Peng, Y., Wei, F.-H., Cao, L.-M., Yu, K., Du, K., He, L.-Y., and Huang, X.-F.: Critical Role of Secondary Organic Aerosol in Urban Atmospheric Visibility Improvement Identified by Machine Learning, Environ. Sci. Technol. Letters, 10, 976–982, <ext-link xlink:href="https://doi.org/10.1021/acs.estlett.3c00084" ext-link-type="DOI">10.1021/acs.estlett.3c00084</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx45"><label>Plaia and Bondì(2006)</label><mixed-citation>Plaia, A. and Bondì, A.: Single imputation method of missing values in environmental pollution data sets, Atmos. Environ., 40, 7316–7330, <ext-link xlink:href="https://doi.org/10.1016/j.atmosenv.2006.06.040" ext-link-type="DOI">10.1016/j.atmosenv.2006.06.040</ext-link>, 2006.</mixed-citation></ref>
      <ref id="bib1.bibx46"><label>Polissar et al.(1998)</label><mixed-citation>Polissar, A. V., Hopke, P. K., Paatero, P., Malm, W. C., and Sisler, J. F.: Atmospheric aerosol over Alaska: 2. Elemental composition and sources, J. Geophys. Res.-Atmos., 103, 19045–19057, <ext-link xlink:href="https://doi.org/10.1029/98JD01212" ext-link-type="DOI">10.1029/98JD01212</ext-link>, 1998.</mixed-citation></ref>
      <ref id="bib1.bibx47"><label>Reff et al.(2007)</label><mixed-citation>Reff, A., Eberly, S. I., and Bhave, P. V.: Receptor modeling of ambient particulate matter data using positive matrix factorization: review of existing methods, J. Air Waste Manage., 57, 146–154, <ext-link xlink:href="https://doi.org/10.1080/10473289.2007.10465319" ext-link-type="DOI">10.1080/10473289.2007.10465319</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx48"><label>Richardson and Hollinger(2007)</label><mixed-citation>Richardson, A. D. and Hollinger, D. Y.: A method to estimate the additional uncertainty in gap-filled NEE resulting from long gaps in the CO2 flux record, Agr. Forest Meteorol., 147, 199–208, <ext-link xlink:href="https://doi.org/10.1016/j.agrformet.2007.06.004" ext-link-type="DOI">10.1016/j.agrformet.2007.06.004</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx49"><label>Samal et al.(2021)</label><mixed-citation>Samal, K. K. R., Babu, K. S., and Das, S. K.: Multi-directional temporal convolutional artificial neural network for PM2.5 forecasting with missing values: A deep learning approach, Urban Climate, 36, 100800, <ext-link xlink:href="https://doi.org/10.1016/j.uclim.2021.100800" ext-link-type="DOI">10.1016/j.uclim.2021.100800</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx50"><label>Shen et al.(2018)</label><mixed-citation>Shen, H., Li, T., Yuan, Q., and Zhang, L.: Estimating Regional Ground-Level PM2.5 Directly From Satellite Top-Of-Atmosphere Reflectance Using Deep Belief Networks, J. Geophys. Res.-Atmos., 123, 13875–13886, <ext-link xlink:href="https://doi.org/10.1029/2018JD028759" ext-link-type="DOI">10.1029/2018JD028759</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx51"><label>Song et al.(2021)</label><mixed-citation>Song, L., Dai, Q., Feng, Y., and Hopke, P. K.: Estimating uncertainties of source contributions to PM2.5 using moving window evolving dispersion normalized PMF, Environ. Pollut., 286, 117576, <ext-link xlink:href="https://doi.org/10.1016/j.envpol.2021.117576" ext-link-type="DOI">10.1016/j.envpol.2021.117576</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx52"><label>Stacklies et al.(2007)</label><mixed-citation>Stacklies, W., Redestig, H., Scholz, M., Walther, D., and Selbig, J.: pcaMethods – a bioconductor package providing PCA methods for incomplete data, Bioinformatics, 23, 1164–1167, <ext-link xlink:href="https://doi.org/10.1093/bioinformatics/btm069" ext-link-type="DOI">10.1093/bioinformatics/btm069</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx53"><label>van Donkelaar et al.(2019)</label><mixed-citation>van Donkelaar, A., Martin, R. V., Li, C., and Burnett, R. T.: Regional Estimates of Chemical Composition of Fine Particulate Matter Using a Combined Geoscience-Statistical Method with Information from Satellites, Models, and Monitors, Environ. Sci. Technol., 53, 2595–2611, <ext-link xlink:href="https://doi.org/10.1021/acs.est.8b06392" ext-link-type="DOI">10.1021/acs.est.8b06392</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx54"><label>Wang et al.(2018)</label><mixed-citation>Wang, Q., Qiao, L., Zhou, M., Zhu, S., Griffith, S., Li, L., and Yu, J. Z.: Source Apportionment of PM2.5 Using Hourly Measurements of Elemental Tracers and Major Constituents in an Urban Environment: Investigation of Time-Resolution Influence, J. Geophys. Res.-Atmos., 123, 5284–5300, <ext-link xlink:href="https://doi.org/10.1029/2017JD027877" ext-link-type="DOI">10.1029/2017JD027877</ext-link>, 2018. </mixed-citation></ref>
      <ref id="bib1.bibx55"><label>Xie(2017)</label><mixed-citation>Xie, J.: Deep Neural Network for PM2.5 Pollution Forecasting Based on Manifold Learning, in: 2017 International Conference on Sensing, Diagnostics, Prognostics, and Control (SDPC), 236–240, <ext-link xlink:href="https://doi.org/10.1109/SDPC.2017.52" ext-link-type="DOI">10.1109/SDPC.2017.52</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx56"><label>Xie et al.(2022)</label><mixed-citation>Xie, M., Lu, X., Ding, F., Cui, W., Zhang, Y., and Feng, W.: Evaluating the influence of constant source profile presumption on PMF analysis of PM2.5 by comparing long- and short-term hourly observation-based modeling, Environ. Pollut., 314, 120273, <ext-link xlink:href="https://doi.org/10.1016/j.envpol.2022.120273" ext-link-type="DOI">10.1016/j.envpol.2022.120273</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx57"><label>Yu et al.(2017)</label><mixed-citation>Yu, Y., Yu, J. J., Li, V. O. K., and Lam, J. C. K.: Low-rank singular value thresholding for recovering missing air quality data, in: 2017 IEEE International Conference on Big Data (Big Data), 508–513, <ext-link xlink:href="https://doi.org/10.1109/BigData.2017.8257965" ext-link-type="DOI">10.1109/BigData.2017.8257965</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx58"><label>Yu et al.(2019)</label><mixed-citation>Yu, Y., He, S., Wu, X., Zhang, C., Yao, Y., Liao, H., Wang, Q., and Xie, M.: PM2.5 elements at an urban site in Yangtze River Delta, China: High time-resolved measurement and the application in source apportionment, Environ. Pollut., 253, 1089–1099, <ext-link xlink:href="https://doi.org/10.1016/j.envpol.2019.07.096" ext-link-type="DOI">10.1016/j.envpol.2019.07.096</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx59"><label>Yu et al.(2020)</label><mixed-citation>Yu, Y., Ding, F., Mu, Y., Xie, M., and Wang, Q.: High time-resolved PM2.5 composition and sources at an urban site in Yangtze River Delta, China after the implementation of the APPCAP, Chemosphere, 261, 127746, <ext-link xlink:href="https://doi.org/10.1016/j.chemosphere.2020.127746" ext-link-type="DOI">10.1016/j.chemosphere.2020.127746</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx60"><label>Zaini et al.(2022)</label><mixed-citation>Zaini, N., Ean, L. W., Ahmed, A. N., and Malek, M. A.: A systematic literature review of deep learning neural network for time series air quality forecasting, Environ. Sci. Pollut. R., 1–33, <ext-link xlink:href="https://doi.org/10.1007/s11356-021-17442-1" ext-link-type="DOI">10.1007/s11356-021-17442-1</ext-link>, 2022.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>Improving imputation of missing PM<sub>2.5</sub> speciation data using PMF-informed source-receptor relationships</article-title-html>
<abstract-html/>
<ref-html id="bib1.bib1"><label>albertbup(2026)</label><mixed-citation>
      
albertbup: deep-belief-network: A Python implementation of Deep Belief Networks built upon NumPy and TensorFlow with scikit-learn compatibility, GitHub repository [code], <a href="https://github.com/albertbup/deep-belief-network" target="_blank"/>, last access: 19 June 2026.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>Alwateer et al.(2024)</label><mixed-citation>
      
Alwateer, M., Atlam, E.-S., Abd El-Raouf, M. M., Ghoneim, O. A., and Gad, I.: Missing data imputation: A comprehensive review, Journal of Computer and Communications, 12, 53–75, <a href="https://doi.org/10.4236/jcc.2024.1211004" target="_blank">https://doi.org/10.4236/jcc.2024.1211004</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>Becagli et al.(2012)</label><mixed-citation>
      
Becagli, S., Sferlazzo, D. M., Pace, G., di Sarra, A., Bommarito, C., Calzolai, G., Ghedini, C., Lucarelli, F., Meloni, D., Monteleone, F., Severi, M., Traversi, R., and Udisti, R.: Evidence for heavy fuel oil combustion aerosols from chemical analyses at the island of Lampedusa: a possible large role of ships emissions in the Mediterranean, Atmos. Chem. Phys., 12, 3479–3492, <a href="https://doi.org/10.5194/acp-12-3479-2012" target="_blank">https://doi.org/10.5194/acp-12-3479-2012</a>, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>Betancourt et al.(2023)</label><mixed-citation>
      
Betancourt, C., Li, C. W. Y., Kleinert, F., and Schultz, M. G.: Graph Machine Learning for Improved Imputation of Missing Tropospheric Ozone Data, Environ. Sci. Technol., 57, 18246–18258,
<a href="https://doi.org/10.1021/acs.est.3c05104" target="_blank">https://doi.org/10.1021/acs.est.3c05104</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>Bi et al.(2019)</label><mixed-citation>
      
Bi, X., Dai, Q., Wu, J., Zhang, Q., Zhang, W., Luo, R., Cheng, Y., Zhang, J., Wang, L., Yu, Z., Zhang, Y., Tian, Y., and Feng, Y.: Characteristics of the main primary source profiles of particulate matter across China from 1987 to 2017, Atmos. Chem. Phys., 19, 3223–3243, <a href="https://doi.org/10.5194/acp-19-3223-2019" target="_blank">https://doi.org/10.5194/acp-19-3223-2019</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>Birch(2002)</label><mixed-citation>
      
Birch, M. E.: Occupational monitoring of particulate diesel exhaust by NIOSH method 5040, Applied Occupational and Environmental Hygiene, 17, 400–405, <a href="https://doi.org/10.1080/10473220290035390" target="_blank">https://doi.org/10.1080/10473220290035390</a>, 2002.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>Brown et al.(2015)</label><mixed-citation>
      
Brown, S. G., Eberly, S., Paatero, P., and Norris, G. A.: Methods for estimating uncertainty in PMF solutions: Examples with ambient air and water quality data and guidance on reporting PMF results, Sci. Total Environ., 518–519, 626–635, <a href="https://doi.org/10.1016/j.scitotenv.2015.01.022" target="_blank">https://doi.org/10.1016/j.scitotenv.2015.01.022</a>, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>Canonaco et al.(2021)</label><mixed-citation>
      
Canonaco, F., Tobler, A., Chen, G., Sosedova, Y., Slowik, J. G., Bozzetti, C., Daellenbach, K. R., El Haddad, I., Crippa, M., Huang, R.-J., Furger, M., Baltensperger, U., and Prévôt, A. S. H.: A new method for long-term source apportionment with time-dependent factor profiles and uncertainty assessment using SoFi Pro: application to 1 year of organic aerosol data, Atmos. Meas. Tech., 14, 923–943, <a href="https://doi.org/10.5194/amt-14-923-2021" target="_blank">https://doi.org/10.5194/amt-14-923-2021</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>Chang et al.(2018)</label><mixed-citation>
      
Chang, Y., Huang, K., Xie, M., Deng, C., Zou, Z., Liu, S., and Zhang, Y.: First long-term and near real-time measurement of trace elements in China's urban atmosphere: temporal variability, source apportionment and precipitation effect, Atmos. Chem. Phys., 18, 11793–11812, <a href="https://doi.org/10.5194/acp-18-11793-2018" target="_blank">https://doi.org/10.5194/acp-18-11793-2018</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>Cheng et al.(2015)</label><mixed-citation>
      
Cheng, K., Wang, Y., Tian, H., Gao, X., Zhang, Y., Wu, X., Zhu, C., and Gao, J.: Atmospheric Emission Characteristics and Control Policies of Five Precedent-Controlled Toxic Heavy Metals from Anthropogenic Sources in China, Environ. Sci. Technol., 49, 1206–1214, <a href="https://doi.org/10.1021/es5037332" target="_blank">https://doi.org/10.1021/es5037332</a>, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>Choi et al.(2022)</label><mixed-citation>
      
Choi, E., Yi, S.-M., Lee, Y. S., Jo, H., Baek, S.-O., and Heo, J.-B.: Sources of airborne particulate matter-bound metals and spatial-seasonal variability of health risk potentials in four large cities, South Korea, Environ. Sci. Pollut. R., 29, 28359–28374, <a href="https://doi.org/10.1007/s11356-021-18445-8" target="_blank">https://doi.org/10.1007/s11356-021-18445-8</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>Dai et al.(2020)</label><mixed-citation>
      
Dai, Q., Liu, B., Bi, X., Wu, J., Liang, D., Zhang, Y., Feng, Y., and Hopke, P. K.: Dispersion Normalized PMF Provides Insights into the Significant Changes in Source Contributions to PM2.5 after the COVID-19 Outbreak, Environ. Sci. Technol., 54, 9917–9927, <a href="https://doi.org/10.1021/acs.est.0c02776" target="_blank">https://doi.org/10.1021/acs.est.0c02776</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>Dai et al.(2021)</label><mixed-citation>
      
Dai, Q., Ding, J., Song, C., Liu, B., Bi, X., Wu, J., Zhang, Y., Feng, Y., and Hopke, P. K.: Changes in source contributions to particle number concentrations after the COVID-19 outbreak: Insights from a dispersion normalized PMF, Sci. Total Environ., 759, 143548,
<a href="https://doi.org/10.1016/j.scitotenv.2020.143548" target="_blank">https://doi.org/10.1016/j.scitotenv.2020.143548</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>Emmanuel et al.(2021)</label><mixed-citation>
      
Emmanuel, T., Maupong, T., Mpoeleng, D., Semong, T., Mphago, B., and Tabona, O.: A survey on missing data in machine learning, Journal of Big Data, 8, 1–37, <a href="https://doi.org/10.1186/s40537-021-00516-9" target="_blank">https://doi.org/10.1186/s40537-021-00516-9</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>García-Laencina et al.(2010)</label><mixed-citation>
      
García-Laencina, P. J., Sancho-Gómez, J.-L., and Figueiras-Vidal, A. R.: Pattern classification with missing data: a review, Neural Comput. Appl., 19, 263–282, <a href="https://doi.org/10.1007/s00521-009-0295-6" target="_blank">https://doi.org/10.1007/s00521-009-0295-6</a>, 2010.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>Hao et al.(2023)</label><mixed-citation>
      
Hao, H., Wang, Y., Zhu, Q., Zhang, H., Rosenberg, A., Schwartz, J., Amini, H., van Donkelaar, A., Martin, R., Liu, P., Weber, R., Russel, A., Yitshak-sade, M., Chang, H., and Shi, L.: National Cohort Study of Long-Term Exposure to PM2.5 Components and Mortality in Medicare American Older Adults, Environ. Sci. Technol., 57, 6835–6843, <a href="https://doi.org/10.1021/acs.est.2c07064" target="_blank">https://doi.org/10.1021/acs.est.2c07064</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>Hopke(2016)</label><mixed-citation>
      
Hopke, P. K.: Review of receptor modeling methods for source apportionment, J. Air Waste Manage., 66, 237–259, <a href="https://doi.org/10.1080/10962247.2016.1140693" target="_blank">https://doi.org/10.1080/10962247.2016.1140693</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>Hu et al.(2014)</label><mixed-citation>
      
Hu, J., Zhang, H., Chen, S., Ying, Q., Wiedinmyer, C., Vandenberghe, F., and Kleeman, M. J.: Identifying PM2.5 and PM0.1 Sources for Epidemiological Studies in California, Environ. Sci. Technol., 48, 4980–4990, <a href="https://doi.org/10.1021/es404810z" target="_blank">https://doi.org/10.1021/es404810z</a>, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>Jing et al.(2022)</label><mixed-citation>
      
Jing, X., Luo, J., Wang, J., Zuo, G., and Wei, N.: A Multi-imputation method to deal with hydro-meteorological missing values by integrating chain equations and random forest, Water Resour. Manag., 36, 1159–1173,
<a href="https://doi.org/10.1007/s11269-021-03037-5" target="_blank">https://doi.org/10.1007/s11269-021-03037-5</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>Junger and Ponce de Leon(2015)</label><mixed-citation>
      
Junger, W. and Ponce de Leon, A.: Imputation of missing data in time series
for air pollutants, Atmos. Environ., 102, 96–104,
<a href="https://doi.org/10.1016/j.atmosenv.2014.11.049" target="_blank">https://doi.org/10.1016/j.atmosenv.2014.11.049</a>, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>Junninen et al.(2004)</label><mixed-citation>
      
Junninen, H., Niska, H., Tuppurainen, K., Ruuskanen, J., and Kolehmainen, M.: Methods for imputation of missing values in air quality data sets, Atmos. Environ., 38, 2895–2907, <a href="https://doi.org/10.1016/j.atmosenv.2004.02.026" target="_blank">https://doi.org/10.1016/j.atmosenv.2004.02.026</a>, 2004.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>Khan and Hoque(2020)</label><mixed-citation>
      
Khan, S. I. and Hoque, A. S. M. L.: SICE: an improved missing data imputation technique, Journal of Big Data, 7, 37, <a href="https://doi.org/10.1186/s40537-020-00313-w" target="_blank">https://doi.org/10.1186/s40537-020-00313-w</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>Kim et al.(2024)</label><mixed-citation>
      
Kim, Y., Yi, S.-M., Heo, J., Kim, H., Lee, W., Kim, H., Hopke, P. K., Lee, Y. S., Shin, H.-J., Park, J., Yoo, M., Jeon, K., and Park, J.: Is replacing missing values of PM2.5 constituents with estimates using machine learning better for source apportionment than exclusion or median replacement?, Environ. Pollut., 354, 124165, <a href="https://doi.org/10.1016/j.envpol.2024.124165" target="_blank">https://doi.org/10.1016/j.envpol.2024.124165</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>Kim et al.(2025a)</label><mixed-citation>
      
Kim, Y., Hopke, P. K., Yi, S.-M., Lee, W., Kim, H., Heo, J., Kim, H., Lee,
Y. S., Jeon, K., and Park, J.: Positive matrix factorization outperforms
machine learning in imputing missing PM2.5 and further identifying spatial
patterns in multi-sites without external data, Urban Climate, 62, 102552,
<a href="https://doi.org/10.1016/j.uclim.2025.102552" target="_blank">https://doi.org/10.1016/j.uclim.2025.102552</a>, 2025a.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>Kim et al.(2025b)</label><mixed-citation>
      
Kim, Y., Kang, C., Yi, S.-M., Heo, J., Kim, H., Lee, W., Kim, H., Hopke, P. K., Lee, Y. S., Shin, H.-J., Park, J., Yoo, M., Jeon, K., and Park, J.: Imputing missing data with statistical-learning estimates: impacts on mortality risks attributable to area- and source-specific PM2.5, Atmos. Pollut. Res., 102785, <a href="https://doi.org/10.1016/j.apr.2025.102785" target="_blank">https://doi.org/10.1016/j.apr.2025.102785</a>, 2025b.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>Kowarik and Templ(2016)</label><mixed-citation>
      
Kowarik, A. and Templ, M.: Imputation with the R Package VIM, J. Stat. Softw., 74, 1–16, <a href="https://doi.org/10.18637/jss.v074.i07" target="_blank">https://doi.org/10.18637/jss.v074.i07</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>Lai and Kuok(2019)</label><mixed-citation>
      
Lai, W. Y. and Kuok, K.: A study on bayesian principal component analysis for addressing missing rainfall data, Water Resour. Manag., 33, 2615–2628,
<a href="https://doi.org/10.1007/s11269-019-02209-8" target="_blank">https://doi.org/10.1007/s11269-019-02209-8</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>Lee et al.(2024)</label><mixed-citation>
      
Lee, S.-J., Ju, J.-T., Lee, J.-J., Song, C.-K., Shin, S.-A., Jung, H.-J., Shin, H. J., and Choi, S.-D.: Mapping nationwide concentrations of sulfate and nitrate in ambient PM2.5 in South Korea using machine learning with ground observation data, Sci. Total Environ., 926, 171884,
<a href="https://doi.org/10.1016/j.scitotenv.2024.171884" target="_blank">https://doi.org/10.1016/j.scitotenv.2024.171884</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>Lee et al.(2023)</label><mixed-citation>
      
Lee, Y. S., Choi, E., Park, M., Jo, H., Park, M., Nam, E., Kim, D. G., Yi, S.-M., and Kim, J. Y.: Feature extraction and prediction of fine particulate matter (PM2.5) chemical constituents using four machine learning models, Expert Syst. Appl., 221, 119696, <a href="https://doi.org/10.1016/j.eswa.2023.119696" target="_blank">https://doi.org/10.1016/j.eswa.2023.119696</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>Li et al.(2020)</label><mixed-citation>
      
Li, R., Wang, Q., He, X., Zhu, S., Zhang, K., Duan, Y., Fu, Q., Qiao, L., Wang, Y., Huang, L., Li, L., and Yu, J. Z.: Source apportionment of PM<sub>2.5</sub> in Shanghai based on hourly organic molecular markers and other source tracers, Atmos. Chem. Phys., 20, 12047–12061, <a href="https://doi.org/10.5194/acp-20-12047-2020" target="_blank">https://doi.org/10.5194/acp-20-12047-2020</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>Li et al.(2023)</label><mixed-citation>
      
Li, R., Gao, Y., Chen, Y., Peng, M., Zhao, W., Wang, G., and Hao, J.: Measurement report: Rapid changes of chemical characteristics and health risks for highly time resolved trace elements in PM<sub>2.5</sub> in a typical industrial city in response to stringent clean air actions, Atmos. Chem. Phys., 23, 4709–4726, <a href="https://doi.org/10.5194/acp-23-4709-2023" target="_blank">https://doi.org/10.5194/acp-23-4709-2023</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>Liao et al.(2023)</label><mixed-citation>
      
Liao, K., Wang, Q., Wang, S., and Yu, J. Z.: Bayesian Inference Approach to Quantify Primary and Secondary Organic Carbon in Fine Particulate Matter Using Major Species Measurements, Environ. Sci. Technol., 57, 5169–5179, <a href="https://doi.org/10.1021/acs.est.2c09412" target="_blank">https://doi.org/10.1021/acs.est.2c09412</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>Little and Rubin(2019)</label><mixed-citation>
      
Little, R. J. and Rubin, D. B.: Statistical analysis with missing data, John Wiley &amp; Sons, <a href="https://doi.org/10.1002/9781119482260" target="_blank">https://doi.org/10.1002/9781119482260</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>Liu et al.(2017)</label><mixed-citation>
      
Liu, B., Wu, J., Zhang, J., Wang, L., Yang, J., Liang, D., Dai, Q., Bi, X., Feng, Y., Zhang, Y., and Zhang, Q.: Characterization and source apportionment of PM2.5 based on error estimation from EPA PMF 5.0 model at a medium city in China, Environ. Pollut., 222, 10–22, <a href="https://doi.org/10.1016/j.envpol.2017.01.005" target="_blank">https://doi.org/10.1016/j.envpol.2017.01.005</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>Liu and Matsui(2021)</label><mixed-citation>
      
Liu, M. and Matsui, H.: Aerosol radiative forcings induced by substantial changes in anthropogenic emissions in China from 2008 to 2016, Atmos. Chem. Phys., 21, 5965–5982, <a href="https://doi.org/10.5194/acp-21-5965-2021" target="_blank">https://doi.org/10.5194/acp-21-5965-2021</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib36"><label>Liu et al.(2022)</label><mixed-citation>
      
Liu, X., Fu, Y., Wang, Q., Bi, Y., Zhang, L., Zhao, G., Xian, F., Cheng, P., Zhang, L., Zhou, J., and Zhou, W.: Unraveling the process of aerosols secondary formation and removal based on cosmogenic beryllium-7 and beryllium-10, Sci. Total Environ., 821, 153293,
<a href="https://doi.org/10.1016/j.scitotenv.2022.153293" target="_blank">https://doi.org/10.1016/j.scitotenv.2022.153293</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib37"><label>McKenzie et al.(2009)</label><mixed-citation>
      
McKenzie, E. R., Money, J. E., Green, P. G., and Young, T. M.: Metals associated with stormwater-relevant brake and tire samples, Sci. Total Environ., 407, 5855–5860, <a href="https://doi.org/10.1016/j.scitotenv.2009.07.018" target="_blank">https://doi.org/10.1016/j.scitotenv.2009.07.018</a>, 2009.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib38"><label>Moritz and Bartz-Beielstein(2017)</label><mixed-citation>
      
Moritz, S. and Bartz-Beielstein, T.: imputeTS: Time Series Missing Value Imputation in R, R J., 9, 207–218, <a href="https://doi.org/10.32614/RJ-2017-009" target="_blank">https://doi.org/10.32614/RJ-2017-009</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib39"><label>Norris et al.(2014)</label><mixed-citation>
      
Norris, G., Duvall, R., Brown, S., and Bai, S.: EPA Positive Matrix Factorization (PMF) 5.0 Fundamentals and User Guide, U.S. Environmental Protection Agency, Office of Research and Development, Washington, DC, USA, EPA/600/R-14/108, <a href="https://www.epa.gov/sites/default/files/2015-02/documents/pmf_5.0_user_guide.pdf" target="_blank"/> (last access: 19 June 2026), 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib40"><label>Paatero(1999)</label><mixed-citation>
      
Paatero, P.: The Multilinear Engine: A Table-Driven, Least Squares Program for Solving Multilinear Problems, including the n-Way Parallel Factor Analysis Model, J. Comput. Graph. Stat., 8, 854–888, <a href="https://doi.org/10.1080/10618600.1999.10474853" target="_blank">https://doi.org/10.1080/10618600.1999.10474853</a>, 1999.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib41"><label>Paatero and Hopke(2003)</label><mixed-citation>
      
Paatero, P. and Hopke, P. K.: Discarding or downweighting high-noise variables in factor analytic models, Anal. Chim. Acta, 490, 277–289,
<a href="https://doi.org/10.1016/S0003-2670(02)01643-4" target="_blank">https://doi.org/10.1016/S0003-2670(02)01643-4</a>, 2003.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib42"><label>Paatero et al.(2014)</label><mixed-citation>
      
Paatero, P., Eberly, S., Brown, S. G., and Norris, G. A.: Methods for estimating uncertainty in factor analytic solutions, Atmos. Meas. Tech., 7, 781–797, <a href="https://doi.org/10.5194/amt-7-781-2014" target="_blank">https://doi.org/10.5194/amt-7-781-2014</a>, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib43"><label>Pekney et al.(2006)</label><mixed-citation>
      
Pekney, N. J., Davidson, C. I., Robinson, A., Zhou, L., Hopke, P., Eatough, D., and Rogge, W. F.: Major source categories for PM2.5 in Pittsburgh using PMF and UNMIX, Aerosol Sci. Tech., 40, 910–924, <a href="https://doi.org/10.1080/02786820500380271" target="_blank">https://doi.org/10.1080/02786820500380271</a>, 2006.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib44"><label>Peng et al.(2023)</label><mixed-citation>
      
Peng, X., Xie, T.-T., Tang, M.-X., Cheng, Y., Peng, Y., Wei, F.-H., Cao, L.-M., Yu, K., Du, K., He, L.-Y., and Huang, X.-F.: Critical Role of Secondary Organic Aerosol in Urban Atmospheric Visibility Improvement Identified by Machine Learning, Environ. Sci. Technol. Letters, 10, 976–982,
<a href="https://doi.org/10.1021/acs.estlett.3c00084" target="_blank">https://doi.org/10.1021/acs.estlett.3c00084</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib45"><label>Plaia and Bondì(2006)</label><mixed-citation>
      
Plaia, A. and Bondì, A.: Single imputation method of missing values in environmental pollution data sets, Atmos. Environ., 40, 7316–7330,
<a href="https://doi.org/10.1016/j.atmosenv.2006.06.040" target="_blank">https://doi.org/10.1016/j.atmosenv.2006.06.040</a>, 2006.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib46"><label>Polissar et al.(1998)</label><mixed-citation>
      
Polissar, A. V., Hopke, P. K., Paatero, P., Malm, W. C., and Sisler, J. F.: Atmospheric aerosol over Alaska: 2. Elemental composition and sources, J. Geophys. Res.-Atmos., 103, 19045–19057, <a href="https://doi.org/10.1029/98JD01212" target="_blank">https://doi.org/10.1029/98JD01212</a>, 1998.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib47"><label>Reff et al.(2007)</label><mixed-citation>
      
Reff, A., Eberly, S. I., and Bhave, P. V.: Receptor modeling of ambient particulate matter data using positive matrix factorization: review of existing methods, J. Air Waste Manage., 57, 146–154, <a href="https://doi.org/10.1080/10473289.2007.10465319" target="_blank">https://doi.org/10.1080/10473289.2007.10465319</a>, 2007.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib48"><label>Richardson and Hollinger(2007)</label><mixed-citation>
      
Richardson, A. D. and Hollinger, D. Y.: A method to estimate the additional uncertainty in gap-filled NEE resulting from long gaps in the CO2 flux record, Agr. Forest Meteorol., 147, 199–208, <a href="https://doi.org/10.1016/j.agrformet.2007.06.004" target="_blank">https://doi.org/10.1016/j.agrformet.2007.06.004</a>, 2007.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib49"><label>Samal et al.(2021)</label><mixed-citation>
      
Samal, K. K. R., Babu, K. S., and Das, S. K.: Multi-directional temporal convolutional artificial neural network for PM2.5 forecasting with missing values: A deep learning approach, Urban Climate, 36, 100800,
<a href="https://doi.org/10.1016/j.uclim.2021.100800" target="_blank">https://doi.org/10.1016/j.uclim.2021.100800</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib50"><label>Shen et al.(2018)</label><mixed-citation>
      
Shen, H., Li, T., Yuan, Q., and Zhang, L.: Estimating Regional Ground-Level PM2.5 Directly From Satellite Top-Of-Atmosphere Reflectance Using Deep Belief Networks, J. Geophys. Res.-Atmos., 123, 13875–13886,
<a href="https://doi.org/10.1029/2018JD028759" target="_blank">https://doi.org/10.1029/2018JD028759</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib51"><label>Song et al.(2021)</label><mixed-citation>
      
Song, L., Dai, Q., Feng, Y., and Hopke, P. K.: Estimating uncertainties of source contributions to PM2.5 using moving window evolving dispersion normalized PMF, Environ. Pollut., 286, 117576, <a href="https://doi.org/10.1016/j.envpol.2021.117576" target="_blank">https://doi.org/10.1016/j.envpol.2021.117576</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib52"><label>Stacklies et al.(2007)</label><mixed-citation>
      
Stacklies, W., Redestig, H., Scholz, M., Walther, D., and Selbig, J.: pcaMethods – a bioconductor package providing PCA methods for incomplete data, Bioinformatics, 23, 1164–1167, <a href="https://doi.org/10.1093/bioinformatics/btm069" target="_blank">https://doi.org/10.1093/bioinformatics/btm069</a>, 2007.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib53"><label>van Donkelaar et al.(2019)</label><mixed-citation>
      
van Donkelaar, A., Martin, R. V., Li, C., and Burnett, R. T.: Regional Estimates of Chemical Composition of Fine Particulate Matter Using a Combined Geoscience-Statistical Method with Information from Satellites, Models, and Monitors, Environ. Sci. Technol., 53, 2595–2611, <a href="https://doi.org/10.1021/acs.est.8b06392" target="_blank">https://doi.org/10.1021/acs.est.8b06392</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib54"><label>Wang et al.(2018)</label><mixed-citation>
      
Wang, Q., Qiao, L., Zhou, M., Zhu, S., Griffith, S., Li, L., and Yu, J. Z.: Source Apportionment of PM2.5 Using Hourly Measurements of Elemental Tracers and Major Constituents in an Urban Environment: Investigation of Time-Resolution Influence, J. Geophys. Res.-Atmos., 123, 5284–5300, <a href="https://doi.org/10.1029/2017JD027877" target="_blank">https://doi.org/10.1029/2017JD027877</a>, 2018.


    </mixed-citation></ref-html>
<ref-html id="bib1.bib55"><label>Xie(2017)</label><mixed-citation>
      
Xie, J.: Deep Neural Network for PM2.5 Pollution Forecasting Based on Manifold Learning, in: 2017 International Conference on Sensing, Diagnostics, Prognostics, and Control (SDPC), 236–240, <a href="https://doi.org/10.1109/SDPC.2017.52" target="_blank">https://doi.org/10.1109/SDPC.2017.52</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib56"><label>Xie et al.(2022)</label><mixed-citation>
      
Xie, M., Lu, X., Ding, F., Cui, W., Zhang, Y., and Feng, W.: Evaluating the influence of constant source profile presumption on PMF analysis of PM2.5 by comparing long- and short-term hourly observation-based modeling, Environ. Pollut., 314, 120273, <a href="https://doi.org/10.1016/j.envpol.2022.120273" target="_blank">https://doi.org/10.1016/j.envpol.2022.120273</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib57"><label>Yu et al.(2017)</label><mixed-citation>
      
Yu, Y., Yu, J. J., Li, V. O. K., and Lam, J. C. K.: Low-rank singular value thresholding for recovering missing air quality data, in: 2017 IEEE International Conference on Big Data (Big Data), 508–513, <a href="https://doi.org/10.1109/BigData.2017.8257965" target="_blank">https://doi.org/10.1109/BigData.2017.8257965</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib58"><label>Yu et al.(2019)</label><mixed-citation>
      
Yu, Y., He, S., Wu, X., Zhang, C., Yao, Y., Liao, H., Wang, Q., and Xie, M.: PM2.5 elements at an urban site in Yangtze River Delta, China: High time-resolved measurement and the application in source apportionment, Environ. Pollut., 253, 1089–1099, <a href="https://doi.org/10.1016/j.envpol.2019.07.096" target="_blank">https://doi.org/10.1016/j.envpol.2019.07.096</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib59"><label>Yu et al.(2020)</label><mixed-citation>
      
Yu, Y., Ding, F., Mu, Y., Xie, M., and Wang, Q.: High time-resolved PM2.5 composition and sources at an urban site in Yangtze River Delta, China after the implementation of the APPCAP, Chemosphere, 261, 127746,
<a href="https://doi.org/10.1016/j.chemosphere.2020.127746" target="_blank">https://doi.org/10.1016/j.chemosphere.2020.127746</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib60"><label>Zaini et al.(2022)</label><mixed-citation>
      
Zaini, N., Ean, L. W., Ahmed, A. N., and Malek, M. A.: A systematic literature review of deep learning neural network for time series air quality forecasting, Environ. Sci. Pollut. R., 1–33, <a href="https://doi.org/10.1007/s11356-021-17442-1" target="_blank">https://doi.org/10.1007/s11356-021-17442-1</a>, 2022.

    </mixed-citation></ref-html>--></article>
