Reply on RC1

The article describes a new index for improving the extreme fire behaviour predictability by considering the vertical atmospheric profile, namely, to evaluate if conditions are favourable to convective fire behaviour. In general, the article is well written, the methodology is interesting, and the topic is relevant for the fire research community and for practical applications. I have however some concerns and comments regarding the current version of the manuscript.

My main concern is that a discussion comparing the proposed index with existing works in the literature is lacking. In the introduction, the authors mention the Haines Index but quickly dismiss its usefulness citing Pinto et al. 2020 and the saturation problem of the Haines Index. The cited work by Pinto et al. 2020 does however address the same problem, proposing an enhanced Fire Weather Index that combines the FWI with the "Continuous Haines Index". I was expecting to see some discussion or comparison to put the proposed EFBI in the context of those existing indices. For instance, since the EFBI uses the vertical profile of the atmosphere, I would assume that the authors have all the data necessary to compute the Continuous Haines Index (that is based on the temperature and dew point at two pressure levels) and the enhanced FWI. I believe such a comparison would be of great interest to the fire community and would set a new validation standard for future works on this topic.
In this paper we propose a methodology for the computation of an index that intends to improve the concept developed by Haines by also considering the conditioned stability that may be changed by an ongoing fire. As the reviewer suggests, it is interesting to compare the existing indexes, and we added the "Continous Haines" index (Chaines) in the machine learning section of the paper. However, an extensive comparison of an enhanced FWI with Chaines at different altitudes, the comparison of the Haines and FWI indices from ECMWF extreme forecast fall out of the scope of the paper, which is intended to describe the new EFBI index. The comparison of the EFBI with other existing indices would require large amount of computation and additional analysis, which may be considered is subsequent work. We consider that a hyperparameter tunning would be required for each model for an inter-model comparison using machine learning. As noted by the reviewer, the Chaines can be computed at different heights and it was actually used for the analysis of the fire in Pedrogao with ERA Interim in 2017. See below the 4 different height versions of Continuous Haines.
As the reviewer comments, the use of EFBI required an aggregation, which was also applied to the FWI, in order to have a relatively simple and common approach that could be used with machine learning for both indexes. The FWI was developed as a daily fire danger index, to be computed at 12:00, and not intended to be computed continuously during the day. The sub-daily conditions may become a disadvantage too for the EFBI, as the EFBI depicts the "current" condition atmophere above a fire with a lot of variability which may show much better results when having high frequency data as in the use case of Pedrogao instead of a label of large or small fire. A set of cases with high spatial and temporal accuracy would allow such analysis, . Databases as GlobFire or FireAtlas was a direct option to find these big fires in different conditions and locations, but the databases have limitations, so we added use cases where we used the EFBI.
EFBI provides information about convection, but it does not consider the hourly conditions of wind speed, gusty winds, turbulences or fuel moisture. The EFBI is not adimensional and its value describes the amount of energy required by a parcel of air when the temperature changes one degree at the surface. The index could be high in very cold areas too. The EFBI becomes useful when is combined with FWI as it is not intended to be used on its own.

Other comments
L20-21: I suggest rewording this sentence. L116-117: Are the 222 small fires a subset of a larger initial selection matching the described criteria? I would expect the number of small fires to be higher.
Despite neural networks are less sensitive to unbalanced number of elements in each class. We decided to create a subset balanced and in close areas to the large fires but different years to avoid including small spotting fires around a big fire and evaluate a close location under different conditions. The "ground truth" is not real truth, but outcome from an automatic selection from a dataset with implicit uncertainty.
L136: If the initial day of the time window is increased by two days wouldn't the correct day be missed for the cases where the MODIS MCD64A1 gives the correct or the following day? How is the 2-day value selected?
MCD64A1 provides a lot of very valuable information but includes uncertainties. That fact could add noise to the dataset used. We increased the time window to avoid large convective fires which may be mapped days after. Then we evaluate min, max and average during the time window values for that fire event. So, the righ value would be included in the time window, but as the reviewer mentions increasing the time window we could some noise to the EFBI min,max values for those fires that are correctly mapped in MCD64A1. But, we want to catch the large convective fires which sometimes are mapped with some days of delay for clouds and plumes (case of Pedrogao with MCD64A1) L141-142: Consider updating to: "using Scikit-learn (Pedregosa et al., 2011)." We used the citation recommended by the authors of scikit-learn (https://scikitlearn.org/stable/about.html) L179-180: I suggest updating to: "than the percentile and value of drought code". In fact, looking at Figure 4, the MI for dc_percentile is not significantly higher than zero. Done.

L197: What activation function was used in the multilayer perceptron?
ReLU. Also solver changed to Adam and only 3 layers with 300 neurons each. Added to the text.

L198-199: What is the standard error in these cases?
Added to the text of the article.

L199: I assume FWI is referring to the set of FWI components in percentile form, please clarify if this is the case.
Yes, we are using the FWI and components in percentiles. Because except of the DC, gives more information.
L203-205: It is certainly expected that by removing the 50 most often misclassified cases, out of a total of 445 cases, the accuracy would rise substantially. Unless there is some manual checking of these 50 events, I don't see the point of this exercise.
The events removed are not only removed from the dataset used for validation. The values are removed for the entire process included training. Repeating the process with random data should not show any significative improvement. But it does in this case when removing the most misclassified cases. The accuracy may not improve removing these cases. However, at the same time, the reviewer is totally right, it is not correct to remove these cases. But analyse them and create new classes for them, which is a future work, would be the correct step. Those cases could be large for some reasons which are not depicted by the data of EBFI and the FWI or there is no right class for them with the data used. However, the approach is applied at global scale and using only remote burnt area products based on remote sensing and weather reanalysis to demonstrate that the EFBI provides information. Then, I would have to instead of classifying between big and small try to classify fires by typology and speed which may be even more complex at global scale and would require, nowadays, manual classification and verification. It can be removed from the paper. The exercise only shows that is not random data, these cases are misclassified and are potentially fire types which cannot be classified using only two classes like large or small using FWI and EBFI.

L231-232:
The results of the case study are interesting; I would further comment that speeds greater than about 1 km/h are only present for EFBI values above ~220. This result is close to the threshold of about 200 in Figure 5 for "Index_max".
Comment added. Done.
L270: I suggest adding some comment regarding the need for future research towards constructing datasets of fire behaviour type and higher temporal resolution fire progression.
Comment added. Done.

L291-292: Does the 4-hour computation time considers the time to download the GFS data?
Yes. Computation is done simultaneously to the download (pipeline approach) except for the first step using a single node of 16 cores.