原文信息:
Prediction and explanation of the formation of the Spanish day-ahead electricity price through machine learning regression原文鏈接:https://www.sciencedirect.com/science/article/pii/S0306261919302260
HighlightsWe propose a regression-tree-based method for modeling electricity price formation.The explanatory variables are extracted from publicly accessible energy related data.
The energy-related data are free and published by the TSO in a graphical interface.The model shows good accuracy in predicting the price formation. It also allows for a non-linear analysis of the dependence of price on predictors.
導(dǎo) 讀
近來(lái),通過(guò)回歸分析估算未來(lái)現(xiàn)貨價(jià)格的
電力系統(tǒng)狀態(tài)的詳細(xì)信息基本僅限于有資質(zhì)的機(jī)構(gòu)。然而,為了確保運(yùn)營(yíng)的透明度,西班牙傳輸系統(tǒng)運(yùn)營(yíng)商已經(jīng)啟動(dòng)了一個(gè)信息網(wǎng)站,其中可以通過(guò)圖形界面查閱大量的實(shí)時(shí)能源相關(guān)數(shù)據(jù)。毫無(wú)疑問(wèn),這為沒(méi)有資格的各方提供了開(kāi)發(fā)應(yīng)用程序和算法的機(jī)會(huì),這其中價(jià)格預(yù)測(cè)以及價(jià)格是如何確定的信息是必需的。 本文探討了從該界面提取的數(shù)據(jù)的使用,其目的有兩個(gè):以簡(jiǎn)單的方式預(yù)測(cè)日前價(jià)格,以及探索潛在能源驅(qū)動(dòng)因素對(duì)其的影響。對(duì)于預(yù)測(cè),作者指定了基于梯度Boosted回歸樹(shù)的分位數(shù)回歸模型。它以更復(fù)雜的代價(jià)提高了多個(gè)線性回歸模型的準(zhǔn)確度,與其他機(jī)器學(xué)習(xí)方法相比,它仍然具有更簡(jiǎn)單的規(guī)范準(zhǔn)則。計(jì)算指標(biāo)表明,當(dāng)使用中值作為點(diǎn)預(yù)測(cè)方法時(shí),該模型產(chǎn)生非常低的預(yù)測(cè)誤差(RMSE = 2.78€/ MWh,MAE = 1.94€/ MWh,MAPE = 0.059)。有趣的是,分位數(shù)回歸模型還允許固有的定義預(yù)測(cè)區(qū)間,具有不同的準(zhǔn)確度解釋。結(jié)果表明,平均90%的預(yù)測(cè)誤差不會(huì)超過(guò)6.8€/ MWh。
本文還對(duì)該模型實(shí)施了部分依賴性分析。這種實(shí)施 - 據(jù)我們所知,第一次用于分析電價(jià)的形成 - 已經(jīng)證明在檢測(cè)高度非線性關(guān)系方面具有重要意義。
AbstractUntil recently, detailed information on the power system state to estimate future spot prices by regression analysis was generally restricted to qualified parties. However, to ensure transparency inoperation, the Spanish Transmission System Operator has launched an informative web in which a sizable amount of real-time energy-related data can be consulted through a graphical interface. Undoubtedly, this provides the opportunity for non-qualified parties to develop applications and algorithms in which price forecast and maybe knowledge about how price is determined are required.This paper approaches the use of data extracted from that interface with two aims: the prediction of the day-ahead price in a simple way, and the exploration of the influence that the underlying energy drivers have on it. For the prediction we specified a quantile regression model based on Gradient Boosted Regression Trees. It improves the accuracy over multiple linear regression models at the cost of more complexity, and still it has simpler specification and tuning compared to other machine learning approaches. The calculated metrics show that our model produces remarkably low prediction errors when using the median as point prediction method (RMSE?=?2.78?€/MWh, MAE?=?1.94?€/MWh, and MAPE?=?0.059). Interestingly, the quantile regression model also allows to inherently define prediction intervals, with a different interpretation of accuracy. Our results show that on average 90% of times the prediction error will not exceed 6.8?€/MWh.We also implemented a partial dependence analysis on that model. This implementation—as far as we know the first time employed to analyze the formation of electricity prices—has shown to be of significant usefulness in detecting highly non-linear relationships.
KeywordsLinear regressionPrincipal componentsQuantile regressionGradient boosting regressionDay-ahead electricity price
Schematics

Fig. 1. Summary of the methodology.
Fig. 2. Pearson correlation between the variables employed in the GBRT and PCR models.
Fig. 8. Variable importance (only the most important, 20 out of 66, are represented) of the percentile 50 prediction model.
Fig. 10. Partial dependence (indeed the deviation, as in Fig. 9) between non-categorical predictors and the predicted day-ahead price. The four categories of predictors described in Section 2.2 are separately plotted. From top to bottom: forecasts, availability of international links, available dispatchable generation, and power generation at 11.00?a.m.
Fig. 11. Partial dependence between coal-based generation and the day-ahead price,depending on the values ahead of 11.00?a.m.