[关键词]
[摘要]
为解决随机森林方法经验性选取预测因子时存在的错误发现率问题,引入多重假设检验领域控制错误发现率的方法对预测因子的筛选进行质量控制,将因子筛选由经验依赖转化为数据依赖,从而提出一种基于多重假设检验的随机森林方法长期降水预测方法。以巴西巴拉那河上游流域为研究区,利用逐月气候系统指数,应用提出的方法对研究区2018—2020年54个雨量站点的逐月降水量进行模拟预测、检验和交叉验证。结果表明:与传统的随机森林方法相比,该方法预报精度更高,对不同站点1—12月的预测平均合格率达到64%,其中6月预测合格率达到84%,表明该方法可以作为流域长期降水预测的有效工具之一。
[Key word]
[Abstract]
Long-term precipitation prediction refers to forecasting precipitation over a period of more than one month. This is a crucial aspect of integrated water resources management. The accuracy of long-term precipitation predictions is low due to various uncertainties. Traditional long-term precipitation prediction methods are mainly divided into dynamical numerical methods and mathematical statistical methods. Dynamical numerical methods simulate future weather conditions using sea-land thermodynamic models for precipitation prediction. This approach has a clear physical mechanism, but the model calculations are complex. Data-driven mathematical-statistical methods simulate the correlation between precipitation and predictors from a statistical perspective to establish a long-term prediction model. However, research on precipitation prediction based on mathematical statistical methods mainly focuses on improving the model, with relatively little emphasis on how to select the predictors. In fact, the predictors affect the accuracy of model predictions. Therefore, the focus and challenge of precipitation prediction lie in selecting the necessary predictors for modeling from the relevant factors. Random forest, as a flexible, efficient, and easy-to-use machine learning algorithm, has been widely used in hydrological prediction. The random forest method calculates the importance scores of various related factors and then selects predictors for the model based on empirical experience. This process can result in a certain error rate issue with the selected predictors. To address the issue of false discovery rate in the random forest algorithm when selecting key predictors, this study employs the false discovery rate control method in multiple hypothesis testing to ensure quality control in predictor selection. This transformation shifts variable selection from being experience-dependent to becoming data-dependent. Finally, the random forest algorithm is used to construct a long-term precipitation prediction model by integrating the selected precipitation predictors. Taking the upper basin of the Parana River in Brazil as the study area, the precipitation from 54 measured rainfall stations and 130 climate system indices was analyzed. The predictors influencing precipitation in the corresponding months of the following year were obtained using the "Model-X Knockoff" method. A monthly precipitation prediction model is established based on the predictors that influence the precipitation for the corresponding month of the following year. The top 5 predictors with the highest importance scores are directly selected for random forest modeling using the traditional random forest method. The validity of the proposed method is subsequently verified using 10-fold cross-validation and a test of the monthly precipitation prediction results from 2018 to 2020. The effect of 10-fold cross-validation for 54 rainfall stations shows that the model prediction pass rate of the method introduced is higher than that of the traditional random forest method from January to December, with the highest pass rate of 77% in June. The results of precipitation prediction from 2018 to 2020 indicate that our method achieved an average pass rate of 66% from January to December, outperforming the traditional random forest method, which scored 64%. In summary, our research combines multiple hypothesis testing with predictor selection and quality control to establish a long-term precipitation prediction model, which differs from the traditional random forest method. This model exhibits a higher prediction pass rate and improved stability, suggesting that this approach can serve as an effective tool for long-term precipitation prediction in a basin.
[中图分类号]
[基金项目]