بازسازی‌های غیرخطی دمای جهانی با قابلیت تطبیق آسان

خلاصه

درک تنوع آب و هوایی ماهانه تا سالانه برای سازگاری با شرایط شدید آب و هوایی آینده ضروری است. راه‌های کلیدی برای انجام این کار، از طریق تجزیه و تحلیل بازسازی‌ها و تحلیل‌های مجدد میدان آب و هوا است. با این حال، تولید چنین بازسازی‌هایی را می‌توان با هزینه‌های تولید بالا، مفروضات خطی غیرواقعی، یا توزیع نابرابر سوابق آب و هوایی محلی محدود کرد. در اینجا، ما یک روش بازسازی تغییرپذیری آب و هوایی غیرخطی مبتنی بر یادگیری ماشین را با استفاده از یک شبکه عصبی تکراری ارائه می‌کنیم که قادر به یادگیری از خروجی‌های مدل موجود و داده‌های تحلیل مجدد است. به عنوان اثبات مفهوم، ما بیش از 400 سال ناهنجاری دمای جهانی و ماهانه را بر اساس داده‌های شبه ایستگاهی پراکنده و واقعی توزیع کردیم و تأثیر مجموعه داده‌های آموزشی مختلف را نشان دادیم. بازسازی‌های ما الگوهای واقعی دما و بازتولید بزرگی را نشان می‌دهد که هزینه آن حدود 1 ساعت در لپ‌تاپ‌های طبقه متوسط است. ما قابلیت روش را از نظر آمار میانگین در مقایسه با روش‌های تثبیت‌شده‌تر برجسته می‌کنیم و متوجه می‌شویم که برای بازسازی رویدادهای آب و هوایی خاص نیز مناسب است. این رویکرد به راحتی می تواند برای طیف وسیعی از مناطق، دوره ها و متغیرها تطبیق داده شود.

معرفی

از آنجایی که گرمایش جهانی همچنان فراگیر است، آب و هوای شدید و رویدادهای آب و هوایی به عنوان تهدیدهایی با تأثیر بالا برای جامعه ما برجسته شده اند ^1 ، 2 . پیش بینی می شود که با عدم تعادل انرژی زمین، تغییرپذیری آب و هوا به صورت غیرخطی در پاسخ به گرمایش جهانی انسان زا افزایش یابد، فرآیندی که بر وقوع و توزیع رویدادهای شدید آب و هوایی حاکم است ^{3 ، 4 ، 5 ، 6 ، 7} . متأسفانه، جدیدترین مدل‌های گردش عمومی (GCM) فاقد توانایی نمایش میزان صحیح تغییرپذیری در بخش‌های حیاتی سیستم آب و ^هوا^{هستند .}. بنابراین، دستیابی به بینش بیشتر در مورد دامنه واقعی تغییرات آب و هوایی از اهمیت بالایی برخوردار است.

برای بررسی تنوع آب و هوای واقعی در مقیاس‌های زمانی و مکانی مختلف با شرایط پس‌زمینه آب و هوایی متنوع، سری‌های زمانی طولانی و جهانی مورد نیاز است. با این حال، هرچه زمان به عقب تر برود، مشاهدات آب و هوا و آب و هوا کمتر در دسترس است و مشکل کمبود داده ایجاد می کند. دمای نزدیک به سطح، متغیر با طولانی ترین رکوردها، در ابتدا تنها در تعداد انگشت شماری از شهرهای اروپایی ثبت شد و از اواخر قرن هجدهم، 10 ^،^{11 ، 12 مورد بررسی قرار گرفت.}. سایر متغیرها و اندازه گیری های نیمکره جنوبی به طور قابل توجهی کمیاب هستند. در واقع، برای بازگشت بیشتر به گذشته، لازم است از مجموعه‌ای از اندازه‌گیری‌های پروکسی غیرمستقیم از آرشیوهای paleoclimate استفاده شود. در مقایسه با مشاهدات، پروکسی‌های آب و هوا از آرشیوهای paleoclimate از وضوح زمانی کاهش یافته و افزایش قابل توجهی در نویز رنج می‌برند. علاوه بر این، بیشتر اسناد و داده‌های پراکسی موجود، آب و هوای تابستان یا فصل رشد را ثبت می‌کنند، در حالی که آب و هوای زمستان معمولاً کمتر ارائه ^{می‌شود} .

به این ترتیب، جامعه آب و هوا تلاش مشترکی برای بررسی ابزارهایی برای بازسازی زمانی و مکانی داده های آب و هوا و آب و هوا، از جمله روش هایی مانند کریجینگ، تجزیه و تحلیل مؤلفه اصلی (رگرسیون) و الگوریتم های بیزی انجام داده است. نتایج این تلاش‌ها شامل مجموعه داده‌هایی با وضوح مکانی و زمانی بالا است که بیشتر بر مناطق خاص متمرکز شده‌اند تا پوشش جهانی ^{14 ، 15 ، 16} . بازسازی های آب و هوایی را می توان با بهینه سازی مکان داده های ورودی با استفاده از روش های فراابتکاری مانند الگوریتم های ژنتیک و تکاملی ^{17 ، 18 ، 19 بهبود بخشید.}. با این وجود، امروزه اکثر بازسازی‌های اقلیمی بر ثابت بودن درون و بین سوابق اقلیمی تکیه دارند و اغلب بر اساس فرض خطی بودن در زمان و مکان عمل می‌کنند. تجزیه و تحلیل مجدد آب و هوا راه ممکن دیگری برای ایجاد بازسازی های اقلیمی حل شده فضایی است ^{20 ، 21 ، 22 ، 23 ، 24}. در حالی که آنها یک مجموعه داده چهار بعدی را برای بررسی تغییرپذیری آب و هوا با وضوح بالا برای طیف گسترده ای از متغیرها ارائه می دهند، آنها همچنین توسط در دسترس بودن داده های ورودی (مانند ایستگاه ها، داده های اسنادی، داده های پراکسی) محدود می شوند. لازم به ذکر است که در حالی که خود طرح همسان سازی می تواند ساده باشد، اکثر تحلیل های مجدد آب و هوا به مجموعه گران قیمتی از خروجی های GCM برای وضعیت آب و هوای پس زمینه و ماتریس کوواریانس نیاز دارند. این پیش‌نیاز، فرآیند تولید تحلیل مجدد آب و هوا را نسبتاً پرهزینه می‌سازد، زیرا حجم زیادی از داده‌ها باید تولید و ذخیره شوند.

اخیراً پیشرفت سریعی در پیاده سازی ابزارهای هوش مصنوعی برای علم آب و هوا حاصل شده است ^{25 ، 26 ، 27} . ^{به ویژه، ابزارهای یادگیری عمیق} در استخراج ویژگی های مورد علاقه از داده های شبکه ای، پیش بینی سری های زمانی و نمایش سیستم ^های^فیزیکی^{نویدبخش}^هستند . تاکنون، این الگوریتم‌های یادگیری عمیق، سازش خوبی بین مهارت و هزینه‌ها فراهم می‌کنند، در حالی که به خوبی با ویژگی‌های غیرخطی داده‌های موجود سروکار دارند. کاربرد آنها در بازسازی اطلاعات اقلیمی به بررسی داده های شبکه ای ^31 ، 32 یا بازسازی های سری زمانی ^{33 ، 34 محدود شده است.}.

Here, we present a novel approach based on a simple recurrent neural network that reconstructs global fields from sparse local data. Our main goal focuses on proving the effectiveness of basic recurrent neural networks in generating fast robust climate reconstructions using minimal computational power. As such, we stayed extremely conservative in both the spatial and temporal availability of data and therefore operated on very small sample sizes for a deep learning approach. Although temperature anomalies are reconstructed in this study, our method is flexible and can be applied to reconstruct multiple variables from differing local coordinates and input data types. In support of the United Nations Sustainable Development Goals^35,36, the whole reconstruction process of training and generating more than 4800 months of global temperature anomalies takes just a few minutes on an averaged-priced laptop, making climate research more accessible and energy efficient.

Results and discussion

Figure 1 highlights the workflow of our approach. We use monthly near-surface temperature data from three different gridded climate data sets as training data for our global reconstruction: The National Oceanic and Atmospheric Administration (NOAA) 20^th Century reanalysis Version 3 (20CRv3)²² (1836-2015 CE, 1851-2015 CE used for the training), the Max Planck Institute for Meteorology Grand Ensemble (MPI-GE) GCM³⁷ (1851-2005 CE), as well as, the National Center for Atmospheric Research Community Earth System Model Last Millennium Ensemble (CESM-LME)³⁸ (850-2005 CE). We do so in order to understand the impact that training data can have on our approach and to highlight strength and weaknesses of this often-used training data in the climate science space. Furthermore, it helps us understand the amount and variability of data needed for the training. The three datasets used have enough dissimilarities in origin, idea and time frame covered in order to see some meaningful differences. 20CRv3 only assimilates surface pressure data and uses sea surface temperature (SST) and sea ice reconstructions as boundary conditions, whereas MPI-GE and CESM-LME are free running models with transient forcing. All three data sets consist of multiple ensemble members. For most of our study, we focus on the ensemble mean of 20CRv3 since this is the information most users will use as well as to highlight the capabilities of small training sizes for this reconstruction procedure. For all training data sets we calculate monthly temperature anomalies with respect to the period 1951–1980 CE. We utilized these three data sets because they are commonly used in the climate science community, are fully open-access and cover climate information and variables close to the reconstruction goal. Moreover, they are also independent allowing for the study of significant differences on the final reconstructions.

**Fig. 1: Concept of the reconstruction process.**

In each of the gridded products, 25 pseudo-station locations chosen based on a realistic distribution of historical meteorological station data. All locations are situated in the Northern Hemisphere, with the majority in Western Europe. Using a nearest neighbour approach, we then extract the grid temperature data for each location in the specific training data set, resulting in 25 monthly near-surface temperature anomaly time series.

For the RNN models, we found that most models overfit for the small training size of N = 1980 no matter the training data set at hand. In order to avoid overfitting, dropout values would need to be at least 20%. Recurrent dropout will further reduce the chance for overfitting, but we see no additional benefit from recurrent dropout in the RNN-type models. Increasing the training sample size to N = 20,000 will drastically reduce the chance for overfitting and in most cases a dropout of just 5% is enough to prevent any kind of overfitting. We performed a small sensitivity test with the MPI-GE training set and found that a training size of N = 10,000 with a dropout of 5% is enough to avoid overfitting. A training size of N = 5000 still needs a dropout of 20% to do so.

Overfitting is just one metric to assess the usefulness of a model. In terms of the lowest validation loss, we found that very simple models show the best results. On average over 140 models, we found that one layer GRU and LSTM models with 32 or 64 neurons for smaller training sizes and up to 256 neurons for larger sample sizes beat models with more neurons, more layers or convolutional structures. In cases of large training sizes, a second layer to the LSTM did not show worse results, but due to increased complexity showed substantially higher computational costs. Moreover, dropout rates needed to be slightly higher in two layer models to avoid overfitting. During the training phase, we could not identify a distinct preference of models for specific training data sets.

In the evaluation phase of our study we let the trained models reconstruct 1000 unseen time steps of the training data set. We then assessed their skill for the reconstruction using mean squared error (MSE) and Pearson correlation coefficient (R) metrics, common metrics for regression problems. Due to the fact that the 1000 time steps used for evaluation are still part of a population with similar characteristics, the best MSE scores were achieved by simpler models that would generally overfit, namely one layer RNN, GRU and LSTM models with no dropout. However, adding 5 or 10% of dropout to these models still yields very performant MSE values. For models trained on N = 20,000 samples, more neurons seem to be appropriate, as can be expected. The only training data for which the CNNs showed an acceptable performance was the 20CRv3 multi-member data set. All other data sets preferred any of the RNN architectures.

For the correlation evaluation, a virtually identical outcome emerges. Simple GRU and LSTM models dominate the performance assessment, small dropout rates have little impact on that performance and models with larger training sizes benefit from more neurons. The robustness of the simple GRU and LSTM architectures for this task was also highlighted by an experiment for which we reduced the time dependence of the prediction in the 20CRv3 data set. We randomized parts of the training data set, reducing the skill that can be learned from the previous time step. In that case, one layer, 64-neuron GRU and LSTM models still outperformed the rest of the models. Naturally, chances of overfitting were reduced in that case and validation losses increased.

To compare the deep learning climate reconstructions with a more established, linear statistical tool, we created a Principal Component Regression (PCR) reconstruction^39,40 trained with the MPI-GE data set. The PCR training sample size is N = 20,000. Compared to the deep learning models, the PCR needs a calibration period which is a time range where pseudo-location information (i.e., testing data) and training data overlap. As the testing data set spans 1602–2003 CE and the MPI-GE data spans 1850–2005 CE, the years in common are 1850–2003, which were chosen as the calibration period for the linear regression. As such we will evaluate the reconstruction for the full period 1602–2003, for months outside the calibration period (1602–1849) and just for the calibration period (1850–2003). Note that the deep learning methods do not require such a calibration period and can be trained purely on an independent data set. That said, due to the fact that the calibration of the PCR substantially removes the bias for the calibration period, we might expect a higher performance of the PCR compared to the deep learning models in those years. We refrain from performing the PCR for all training sizes and training data sets since our point here is to just provide a reference point for the interested reader rather than creating as many reconstructions as possible.

As we move on to the testing phase of our study, we continue with only one of the performant model architectures in order to highlight the impact of the different training data sets. Out of 140 models, we focus on the output of an LSTM architecture model with 64 neurons and a dropout of 5% to compare our reconstructions with the independent test data sets. That said, as this study is designed as proof-of-concept, the reader can adapt this architecture easily for the task and training sample size at hand. Our goal here is not to tune one model on one data set to produce the most specific performance, but to show a range of outcomes possible for this reconstruction task.

For testing, we extract 2m temperature data from the 25 locations in the updated global atmospheric paleo-reanalysis for the last 400 years (EKF400v2)²⁴. The task of the trained neural network and the PCR is to reconstruct global temperature anomaly fields for 4824 months covering 1602–2003 CE based solely on the 25 local temperature time series. The reconstruction created by the MPI-GE trained LSTM is henceforth called MPI-GE-REC, same for the 20CRv3 and CESM-LME trained reconstructions.

By using temperature anomalies rather than absolute temperature values no information is given about seasonality. The difficulty for the neural network is thus increased as it must learn the physical relationships within the spatial domain. Given this a-priori setup, we expect a higher skill in boreal winter temperature anomaly reconstruction due to stronger Northern Hemisphere planetary wave interactions in that season⁴¹. We further expect boreal summer temperature anomalies to be better represented in 20CRv3 than in the coupled climate models due to the assimilated data. With the impact of large-scale circulation prevailing in winter, the different reconstructions should perform rather similarly in boreal winter and differently for boreal summer. We utilize anomaly correlation, mean temperature biases and ability to reconstruct patterns of variability as assessments for the skill of the reconstruction. Since we use EKF400v2 as baseline comparison data set, it is noteworthy to mention that this paleo-reanalysis is an imperfect data set in and of itself. Differences between our reconstruction and EKF400v2 must thus be seen as objective differences. We provide context and explanations for stark dissimilarities whenever possible. Moreover, where applicable, we compare the RNN reconstructions to completely independent products or time intervals. Details of the reanalysis and climate models are described in the Methods section.

Reconstructed temporal and spatial variability

Figure 2 shows the temporal correlation between EKF400v2 and the MPI-GE reconstruction, highlighting regions of weak and strong correlation skill. As expected, correlations for the Southern Hemisphere, where no observations were available, are on average substantially lower than for the Northern Hemisphere.

شکل 2 — **Fig. 2: Global distribution of reconstruction performance.**

برای مناطق گرمسیری، دمای نزدیک به سطح در ال نی $\tilde{n}$ دامنه o فقط همبستگی های ضعیف ضعیفی را نشان می دهد. ما دریافتیم که همه بازسازی‌ها، مستقل از مجموعه داده‌های آموزشی، در این منطقه با مشکل مواجه هستند. اقیانوس آرام شرقی منطقه ای با تغییرات دمای بالا و محرکی برای اتصالات راه دور آب و هوایی غیر ثابت در سراسر جهان است. ما ارتباطات از راه دور را در زمینه خود به عنوان رابطه بین یک متغیر یا متغیرها در فواصل زیاد تعریف می کنیم. مقادیر ضرایب همبستگی در اینجا نه تنها به این بستگی دارد که چگونه این اتصالات از راه دور آب و هوا توسط MPI-GE کپی شده و از آن کپی شده است، بلکه به کیفیت شرایط مرزی EKF400v2 SST نیز بستگی دارد. به ویژه برای قرن‌های اولیه، تغییرپذیری SST درون سالانه EKF400v2 با مقادیر اقلیم‌شناسی ترکیب شده با El Ni نشان داده می‌شود o رگرسیون ⁴² $\tilde{n}$ . منطقه ای از همبستگی های منفی آشکار در آفریقای غربی استوایی آشکار می شود. ما متوجه شدیم که در این مورد، آن تفاوت‌ها مختص MPI-GE به عنوان مجموعه داده‌های آموزشی است و برای 20CRv3 یا CESM-LME رخ نمی‌دهد. به این ترتیب، آنها نتیجه مصنوعات داده در MPI-GE هستند (برای تحلیل همبستگی با بازسازی های مبتنی بر 20CRv3 و CESM-LME به شکل های تکمیلی S1 و S2 مراجعه کنید) .

برای نیمکره شمالی، تغییرپذیری دمای اقیانوس اطلس شمالی مرکزی برای RNN دشوار است، ویژگی ای که برای هر مجموعه داده آموزشی رخ می دهد اما برای 20CRv3 کمتر مشخص است. مشاهدات فشار مبتنی بر کشتی در 20CRv3 ممکن است به غلبه بر این ضعف کمک کند. یک منطقه جالب با مهارت بازسازی کمتر، غرب ایالات متحده است، که در آن تمام مجموعه داده های آموزشی کاهش عملکرد را نشان می دهد. ^{ما متوجه شدیم که EKF400v2 در این منطقه به جز فصل تابستان شمالی (JJA) که بیشتر} داده‌ها در این منطقه جذب می‌شوند، اعتماد بالایی ندارد .

همانطور که انتظار می رود، میانگین جهانی ضرایب همبستگی تابستان شمالی و اهمیت مرتبط (شکل 2 الف) به طور کلی کمتر از زمستان شمالی است (شکل 2)ب). الگوهای آب و هوای تابستانی اغلب توسط فرآیندهای محلی و پویا بین سطح و اتمسفر آزاد هدایت می‌شوند و پیش‌بینی آن‌ها را از طریق فعل و انفعالات اقلیمی در مقیاس بزرگ سخت می‌کند. به عنوان مثال، منطقه مدیترانه شرقی است که به خوبی توسط داده های جذب شده پوشش داده شده است، اما با تغییرات آب و هوایی نسبتاً محلی اداره می شود و مهارت بازسازی توسط RNN آموزش دیده MPI-GE را مهار می کند. از سوی دیگر، زمستان نیمکره شمالی با تفاوت های دمایی عرضی شدید مشخص می شود و امواج جوی قوی تری تولید می کند که می تواند برای پیش بینی ناهنجاری های دمایی محلی استفاده شود. مناطق با عرض جغرافیایی بالا که در آن تغییرات دمای نزدیک به سطح عمدتاً توسط ناهنجاری‌های گردش بزرگ جوی غالب است، به‌خوبی بازسازی شده‌اند. با کمال تعجب، تنوع تابستانی نیمکره جنوبی نیز به خوبی در بازسازی ما به تصویر کشیده شده است.

Furthermore, correlating monthly 2m temperature anomalies (Fig. 2c) results in generally lower correlation coefficients than those produced for seasonal means. This is partly explained by the increased sample size. That said, annual mean correlations (Fig. 2d) show high skill across the globe, taking advantage of smoothing intra-annual variability. Figure 2e, f exhibits the impact of increased noise in the time series. Instead of extracting local data from the EKF400v2 ensemble mean, smoothing inter-member differences (or sampling uncertainty), we extracted pseudo-station data from each of the 30 EKF400v2 members, creating 30 noisy MPI-GE-REC realisations. The ensemble mean of those 30 reconstructions is then employed to calculate the correlation with the EKF400v2 ensemble mean. As anticipated, correlation coefficients are overall lower in this case.

The reconstruction skill can be improved by increasing the size of the training data set. Figure 2g shows the impact of increasing the training sample size by a factor of 10. Among the improved regions are the aforementioned Western United States, equatorial Western Africa and the Eastern Mediterranean. Southern Hemisphere regions profit less from prolonged training. Lastly, Fig. 2h investigates the annual correlation skill with the completely independent Last Millennium Reanalysis (LMRv2, available in annual resolution only)⁴³. Since LMRv2 uses a different approach than EKF400v2 to reproduce annual climate variability and is completely independent of the pseudo-station data used to generate the reconstruction, global correlation skill is lower, as can be expected (for a comparison between EKF400v2 and LMRv2 directly see Fig. S7). Similar regions appear to be challenging for the RNN reconstruction, such as the Southern Ocean, the El Ni $\tilde{n}$ o domain, the central North Atlantic and Eastern Mediterranean. However no additional inconsistencies or artifacts appear in this analysis. It is worth mentioning that both baseline data sets, LMRv2 and EKF400v2, are imperfect reconstructions and carry uncertainties into the correlation analysis.

To investigate the time-dependent performance of RNN reconstructions, Fig. 3 displays field correlations for boreal summer and winter. Compared to the previous temporal correlation analysis, field correlation uses co-variance in space across the full grid.

شکل 3 — **Fig. 3: Reconstruction performance over time.**

Looking at the time series for boreal summer (Fig. 3a), the 20CRv3-REC exhibits a substantially higher correlation skill compared to the MPI-GE-REC and CESM-LME-REC, with MPI-GE-REC outperforming CESM-LME-REC for most decades. This can be explained by the more realistic depiction of summer climate in 20CRv3 (and EKF400v2) due to assimilation of surface pressure data. The CESM-LME-REC shows negative or insignificant correlation coefficients for two thirds of the reconstructed era. This behaviour could be due to a substantial weakness of CESM-LME in reproducing JJA climate patterns, the lower spatial resolution of the original product, or the fact that CESM-LME covers a much wider range of climate background states (going back to 850 CE), from which we randomly sample only 1980 months. As such, it is possible that we sampled climate conditions that are very different to the ones in the period 1602-2003 during the training of the LSTM. However, decadal variability and a general increase in correlation skill over time is visible in all three reconstructions. Decadal variability is determined by how well the three climate data sets reproduce temperature impacts by events such as volcanic eruptions or El Ni $\tilde{n}$ o – Southern Oscillation. Generally, the reconstructions based on ensemble mean pseudo-stations show higher correlation than the noisier input data. This difference increases with time due to higher certainty in the EKF400v2 ensemble.

The positive trend in correlation skill over time is mostly governed by better representation of climate patterns with time in EKF400v2 as well as more familiar climate background conditions for the training data sets (with MPI-GE and 20CRv3 starting in 1851). The remarkable drop of skill for the 20CRv3-REC in the 20th century is an artifact due to a decrease of global mean co-variance in both, 20CRv3-REC and EKF400v2. Using anomalies with respect to 1951–1980 reduces the spatial co-variance in both ensemble mean products for this time period, resulting in reduced correlation skill. This feature is less prevalent in the model-based reconstructions since we computed anomalies with respect to an all-member average climatology, leaving individual members with a higher co-variance for that time period. We speculate that this feature appears only in JJA due to the higher spatial climate variability in individual December–January–February (DJF) seasons.

In winter, EKF400v2 assimilates less data than in summer. Nevertheless, like all training data sets, EKF400v2 winter climate is more influenced by planetary wave interactions which act as a source of skill for inter-continental climate reconstructions. This leads to all three reconstructions showing very similar overall skill, decadal variability and improvement over time (Fig. 3b) for boreal winter patterns. Therefore all three training data sets provide a similar picture of DJF near-surface temperature variability, with EKF400v2 itself being more model-like. Here, MPI-GE-REC even outperforms the 20CRv3-REC consistently over all decades. This is a testament to how well MPI-GE manages to represent boreal winter climate tele-connections. The improvement of correlation skill over time can most likely be attributed to the incorporation of instrumental data in EKF400v2 (rather than warm season proxies and documentary data) as well as to the aforementioned familiarity of the training data sets with the reconstructed era.

Performing the same field correlation analysis using the independent period of 1836–1850, and comparing to the 20CRv3 ensemble mean rather than to EKF400v2, we see generally the same behaviour as in Fig. 3, with our 20CRv3-REC outperforming EKF400v2. During DJF, all reconstructions show equal performance (see Supplementary Fig. S3). That said, 20CRv3 in these early periods can be assumed to be highly uncertain.

Finally, we evaluate reconstructed climate variability by checking for the leading principal components of 2m temperature anomalies in the global reconstructions. We find realistic first and second rank empirical orthogonal functions (EOFs) for monthly 2m temperature anomalies in all reconstructions compared to EKF400v2 (see Supplementary Fig. S4). The 20CRv3-REC EOFs show the closest resemblance to EKF400v2 due to improved representation of summer near-surface temperature variability (Fig. 3a). Increasing the amount of observations would naturally improve the representation of EOFs, especially for the Southern Hemisphere.

بنابراین می‌توانیم استدلال کنیم که با توجه به اینکه مجموعه داده‌های آموزشی همه ویژگی‌های یک تغییر دمای واقعی را نشان می‌دهد، شبکه عصبی قادر است آن ویژگی‌ها را بر این اساس بگیرد و نمایش دهد.

میانگین سوگیری های میدان های بازسازی شده

ارزیابی بایاس‌های بزرگی دما به مجموعه داده‌های آموزشی، میزان ایستگاه‌ها و عیوب مجموعه داده‌های خط پایه بستگی دارد. شکل 4 سوگیری های دما را برای دو بازه زمانی بررسی می کند، یک ارزیابی کلی برای 400 سال بازسازی شده و یک دوره زمانی کوتاه تر از 15 سال (1836-1850) به منظور مقایسه با 20CRv3. MPI-GE-REC انحرافات دمایی مثبت گسترده ای را برای اکثر نقاط شبکه قاره ای و دریایی با توجه به EKF400v2 نشان می دهد (شکل 4آ). قوی‌ترین سوگیری‌ها در اقیانوس جنوبی رخ می‌دهد، احتمالاً در نتیجه تفاوت‌های بین اقیانوس جفت شده در MPI-GE و SST‌های بازسازی‌شده در EKF400v2. مناطق قطبی در MPI-GE-REC کمی خنک‌تر هستند، منطقه‌ای که در EKF400v2 تقریباً هیچ جذبی ندارد. از سوی دیگر، مقایسه با LMRv2 قطب شمال را به عنوان منطقه اصلی با تعصبات دمایی مثبت در MPI-GE-REC برجسته می کند (شکل 4 ب). به این ترتیب، EKF400v2 خود ممکن است نسبت به ناهنجاری های دمای بیش از حد گرم در قطب شمال گرایش داشته باشد. در سطح جهانی، مقایسه با LMRv2 کاهش تعصبات را برای همه مناطق خارج از قطب شمال نشان می دهد (شکل 4 ج). مقایسه هر دو پالئو-تحلیل، تفاوت های قوی در اقیانوس جنوبی و ناهنجاری های دمای بالاتر در LMRv2 در سطح جهان، به جز قطب شمال را نشان می دهد.

Analysing a much shorter period in the 19^th century reveals similar biases between EKF400v2 and the MPI-GE-REC (Fig. 4d): positive temperature biases globally except over the Arctic. This pattern changes when compared to 20CRv3, apart from mismatches in the Southern Ocean. Strong positive temperature biases occur over Northern Eurasia, with strong negative biases over large parts of North America. Meanwhile weak negative temperature biases dominate most of the remaining continents (Fig. 4e). Those discrepancies are generally amplified in the comparison between EKF400v2 and 20CRv3 (Fig. 4f). These results support the notion that EKF400v2 might be too warm in the European Arctic and too cold for many continental regions. It is not surprising that Fig. 4e, f shows generally similar patterns, since the MPI-GE-REC uses pseudo-stations based on EKF400v2 grid values (spatial biases for the 20CRv3-REC and CESM-LME-REC reconstructions are shown in Supplementary Figs. S5 and S12)

To examine the impact of the different training data sets on the temperature bias, Fig. 4g displays the Kernel density distribution of all temperature values over all grids and months for EKF400v2 and different RNN reconstructions. Choosing a reference period in the second half of the 20th century results in the distribution median for the whole time period being located in the slightly negative temperature anomaly range. In general, the RNN reconstructions show a narrower distribution with less extreme values. This behaviour is expected when comparing few pseudo-stations to a product assimilating much more data. Interestingly, the 20CRv3-REC achieves a wider distribution compared to the model-based reconstructions, probably due to better representation of summer temperature patterns. Another feature present in Fig. 4a, is the overall shift of the distribution towards more positive temperature values in the RNN reconstructions. This might reflect the challenges of the RNN reconstructions in capturing cold extremes, but could also be a result of biases in EKF400v2 (Fig. 4c, f).

Focusing on the Northern Hemisphere, the distribution of RNN reconstructed temperature anomalies become more aligned with EKF400v2 (Fig. 4h). Interestingly, the MPI-GE-REC’s median is now closest to EKF400v2, yet still undersampling extreme values. Nevertheless, the positive temperature bias in the RNN reconstructions persist for the Northern Hemisphere sub-sample. Overall, the RNN reconstructions are able to reproduce a realistic distribution, where the exact shape and median depend on the training data sets and the number of pseudo-stations.

So far we have analysed the overall statistical performance of the MPI-GE-REC. However, climate reconstructions are often used to investigate case studies and as such should reproduce individual events in the correct magnitude and location. In order to have good amount of independent data to compare to, we analysed several case studies from the early 19th century cold seasons (Oct–May) (See Supplementary Figs. S8–S12). As independent data sets we utilize (an experimental version of) 20CRv3, EKF400v2 and a recently released Bayesian cold season reconstruction for the 19th century (CSR)¹⁶ in order to compare temperature anomalies to MPI-GE-REC. We find that 20CRv3 is often times the outlier between all four data sets, with the least sophisticated MPI-GE-REC providing reasonably similar maxima and minima locations as EKF400v2 and CSR. In general, MPI-GE-REC shows a slightly reduced anomaly magnitude compared to the other three products. This behaviour is a result of the temperature anomaly magnitudes represented in the training data sets, as well as the lower magnitudes of the pseudo-station anomalies compared to the magnitude of the assimilated data in the reanalyses. Still, the magnitude of anomalies depends on the setup of the RNN and can be improved with different input and training data sets. Note that due to the temporal limitations of the CSR, we are forced to use a different reference period. Thus CSR anomalies should mostly be seen as qualitative check for the location of anomalies, since we would expect positive anomalies to be weaker and negative anomalies to be stronger when compared to 1951–1980.

Linear versus non-linear reconstructions

In order to assess the performance of MPI-GE-REC compared to a PCR reconstruction, Fig. 5 shows correlation coefficients between the two reconstructions and EKF400v2 temperature anomalies. Since the PCR reconstruction requires a calibration period, we split Fig. 5 in three time slices, (1) the complete time period 1602–2003 CE, (2) the pre-calibration period 1602–1849 CE and (3) the calibration period 1850–2003 CE (which is the overlap of the time span covered by MPI-GE and EKF400v2). Note that we expect the PCR to benefit greatly from this bias-reduction procedure, which the LSTM can not access. We show this performance assessment for the MPI-GE training data set only, due to time and space constraints. For all three time periods, we see a slightly better performance of the LSTM on a global level. Interestingly, the strongest difference in performance can be found for the pre-calibration period, possibly due to more flexibility and non-linearity in the neural network. That said, striking regional differences appear. The deep learning reconstruction substantially outperformed the PCR reconstruction in the northern extra-tropics. The PCR reconstruction on the other hand shows a better performance over the tropics and parts of the extra-tropical oceans. While MPI-GE artifacts have stronger negative impacts for the PCR reconstruction, the Southern Ocean is better represented by these linear models. This seems to hint towards an easier, systematic bias in the SSTs that can be reduced by learning from the calibration method, whereas the artifacts over western Africa and Northern India are less systematic.

**شکل 5: عملکرد در مقایسه با روش بازسازی ایجاد شده.**

ما عملکرد قوی بازسازی مبتنی بر یادگیری عمیق را به دلیل توانایی آن در ارزیابی تکامل از ماه به ماه، از مکانی به مکان دیگر و همچنین توانایی آن در بازسازی ویژگی‌های غیرخطی بهتر توضیح می‌دهیم. دینامیک اتمسفر در مناطق فرا استوایی شمالی توسط امواج سیاره ای بزرگ هدایت می شود که اطلاعات و انرژی را در طول جغرافیایی و در زمان حرکت می دهد. این وابستگی زمانی توسط LSTM و همچنین توانایی آن در برون یابی به شرایط جوی جدید از تمرین تا پیش بینی بهتر است. علاوه بر این، انتشار سیگنال ماه به ماه در مناطق فرا استوایی بسیار قوی تر از مناطق استوایی است، جایی که فصلی بودن به طور قابل ملاحظه ای ضعیف تر است.

با توجه به عملکرد قوی PCR در مناطق استوایی، ما فرض می‌کنیم که کالیبراسیون بین اقیانوس MPI-GE و SST‌های EKF400v2 اجازه می‌دهد تا اتصالات آب و هوایی “درست” (درست نزدیک به هدف، در مورد ما EKF400v2) را بدست آوریم. ویژگی های گرمسیری و حذف بیش از حد یا کمتر از سیگنال های آب و هوا. علاوه بر این، انتقال سیگنال در اقیانوس‌ها می‌تواند بسیار کندتر از اتمسفر باشد، به این معنی که احتمالاً نیاز به پیاده‌سازی مقیاس‌های زمانی متعدد در روند آموزشی برای بازسازی اقیانوس‌ها و جو به بهترین شکل ممکن وجود دارد. از سوی دیگر، آب و هوای گرمسیری بر روی زمین عمدتاً با آب و هوای روزانه تعریف می شود، که LSTM نمی تواند از ناهنجاری های ماهانه یاد بگیرد.

با این وجود، عملکرد جهانی را می توان با مکان های شبه ایستگاه های مختلف بهبود بخشید. با توجه به ماهیت روش، PCR اجازه دارد نگاهی اجمالی به مجموعه داده‌های آزمایشی داشته باشد (چیزی که برای مدل‌های مبتنی بر شبکه‌های عصبی مجاز نیست)، که به نوبه خود با جذب داده‌های دنیای واقعی به واقعیت محدود می‌شود. ما متوجه شدیم که LSTM یک مکمل عالی برای PCR است، به ویژه برای بازسازی مناطق خارج از گرمسیری ماهر است.

نتیجه

With the recent convergence of open-access big-data availability and user friendly, easy-to-implement machine learning software packages, there is a huge untapped potential for providing non-linear, flexible climate field reconstructions and services to the community. By utilising existing data, storage and energy consumption can be kept at a minimum, while at the same time contributing to the United Nations Sustainable Development Goals.

To prepare for the challenges of the 21st century, equalizing access to and production of knowledge needs to be a key goal for the upcoming decades. Our approach shows that machine learning generated knowledge can help decentralise climate expertise. The RNN reconstructions can be created in less than an hour on a average-priced laptop, operating solely with open-access, open-source software and data. Adding GPU access will increase the speed of the reconstruction, but is not crucial.

We focus here on a grid reconstruction problem, a typical issue in the (paleo-) climate community. By using a very conservative approach with small sample sizes (especially for deep learning), we could produce a realistic, robust global temperature reconstruction. Compared to previous gridded climate reconstruction techniques^31,32, we use sparse time series to reconstruct a grid of temperatures with four orders of magnitude more data. However, our methodology can be improved in a few ways: (a) more data can be incorporated; (b) meta-heuristic algorithms can be applied to find optimal measuring locations; (c) seasonal information and co-variate data can be incorporated as an additional layer; (d) longer training periods can be performed; (e) hyper-parameters and loss function can be tuned more vigorously and (f) larger ensembles of reconstruction can be computed.

In terms of training data set selection, we could see that data assimilation in 20CRv3 helps reconstructing summer features correctly. This comes with the drawback of smaller training sizes. In the end, it is up to the user to decide which features are most important for the reconstruction and should thus decide on the training data set. In our case, we would refrain from recommending CESM-LME for reconstructing EKF400v2, since it shows apparent flaws compared to 20CRv3 and MPI-GE.

On the other hand, additional challenges could occur when modifying our method. 2m temperatures are one of the easiest climate variables to reconstruct. Meanwhile, reconstructing short-term, local processes such as precipitation, will be more challenging for the RNN to learn, while large-scale, long-term processes like SSTs will be easier to reconstruct. In-situ data like proxy or station data will be less homogeneous, less complete, and noisier than our pseudo-station data. We tried to evaluate the impact of additional noise for the input data and found an expected decrease in reconstruction skill. To counter-act the impact of noise, we find that training with noisy data such as using 20CRv3 individual members rather than ensemble mean, and then creating large ensembles of reconstructions is a good solution (not shown). Moreover, we find that both, long-term variability patterns expressed in EOFs as well as specific, single-season case study temperature anomaly patterns can be reconstructed accordingly by a RNN.

We see potential for this approach to reconstruct climate variables from (paleo) climate archives such as tree rings, coral and ice cores, even without forward-modelling temperature from the proxy in the first place. Future studies will investigate different types of input data in time and space, with varying uncertainty and resolution. Moreover, with novel deep learning architectures, non-numeric historical documents could also be used as input information, reconstructing climate fields from text and image data sets. Needless to say, reconstructions do not need to be global, but can focus on a region of interest. It is also possible to reconstruct multiple variables from single variable input data. In the context of paleo-reanalyses, spatial RNN reconstructions of specific variables or regions could then be assimilated, reducing the uncertainty of the reanalysis outcome. With paleo-reanalyses performing best during warm seasons, the tested PCR performing best over the tropics and our promising RNN-performing best for boreal winter and the extra-tropics, all three spatial reconstruction methods complement each other excellently.

Methods

Neural Network Architecture

Overall we investigated 20 different models using three different data sets and two different training sizes for this study. The models used were the simple Recurrent Neural Network (RNN), Long-short Term Memory models (LSTM), Gated Recurrent Unit models (GRU) and 1-dimensional Convolutional Neural Networks (CNN). Each of these models were then executed with a variety of dropout and layer architectures. Summed up over all these different contributors, we analysed 140 different deep learning models for this climate reconstruction task (a list of those models can be found in the Code Availability git repository). After evaluation of those models, we decided to focus one long short-term memory (LSTM) recurrent neural network with 955,232 trainable parameters, one layer of 64 neurons and 5% dropout. A LSTM is an archetype of neural network designed to memorize sequences of data for multiple time steps⁴⁴. Through a series of feedback connections, the LSTM processes the input information sequentially, storing the previous hidden state through time. This makes it an excellent architecture to reconstruct the climate of the past using observational and proxy time series. In our case, we utilized a small LSTM with an output dimensionality of 64 neurons and the hyperbolic tangent as activation function, followed by a dense layer of 18,432 parameters that were reshaped into a grid of 96 × 192 temperature points. The neural network was trained with three features (i.e., latitude, longitude, and 2m temperature anomaly) using the ADAM optimizer⁴⁵ with a learning rate of 10⁻⁴و میانگین مربعات خطا به عنوان تابع ضرر. سپس پنج عضو گروه تولید و متعاقباً برای هر خروجی شبکه عصبی میانگین‌گیری شدند و در مجموع 4824 ماه بازسازی شده با میانگین دمای 2 متر ناهنجاری ایجاد کردند. توجه داشته باشید که مجموعه داده های آموزش و اعتبارسنجی به ترتیب 80% و 20% اطلاعات موجود را تشکیل می دهند. با این راه‌اندازی، فرآیند آموزش در لپ‌تاپ‌های طبقه متوسط کمتر از یک ساعت طول می‌کشد تا در CPU و پنج تا ده دقیقه در یک GPU استاندارد (مانند NVIDIA GeForce RTX 3050 Ti) تکمیل شود.

انتخاب داده های شبه ایستگاه

برای انتخاب مکان‌های واقعی برای ثبت‌های احتمالی دمایی، از بانک اطلاعاتی ⁴⁶ ابتکار بین‌المللی دمای سطح (ISTI) استفاده می‌کنیم و 25 ایستگاه را با رکوردهایی انتخاب می‌کنیم که به طور مداوم چندین قرن را در بر می‌گیرند و بنابراین به قرن 19 یا 18 بازمی‌گردند (جدول 1) .). از آنجایی که این مطالعه یک اثبات مفهومی را نشان می‌دهد، مقدار مطلق مکان‌ها برای روند بازسازی حیاتی نیست. گفته می شود، هر چه مکان های موجود بیشتر باشد، بازسازی بهتری انجام می شود. انتخاب نمونه ما بر اساس توزیع نیمه واقعی داده های دمایی تاریخی در زمان و مکان انتخاب شد. این رویکرد محافظه‌کارانه به این معنی است که همه مکان‌های ایستگاه ما در نیمکره شمالی هستند و بیشتر آنها در اروپا یا آمریکای شمالی قرار دارند. توجه داشته باشید که این مکان‌ها به هیچ وجه بهترین هم‌تغییرپذیری ممکن را در فضا برای دمای 2 متر نمونه‌برداری نمی‌کنند، به‌ویژه که بسیاری از ایستگاه‌ها در اروپای مرکزی در کنار هم قرار گرفته‌اند. به این ترتیب، نه تنها داده های بیشتر، بلکه مکان های داده بهینه نیز نتایج را بهبود می بخشد.

جدول 1 انتخاب ایستگاه های بلند مدت ISTI مورد استفاده برای نمونه برداری از موقعیت های شبه ایستگاه در EKF400v2.

جدول اندازه کامل

از آنجایی که ایستگاه‌های انتخاب‌شده ISTI داده‌های گمشده و ناهمگونی‌های دیگر را در سوابق خود دارند، ما مقادیر دمایی 2 متری EKF400v2 را با رویکرد نزدیک‌ترین همسایه در مکان‌های ISTI مربوطه نمونه‌برداری می‌کنیم. این رویکرد به ما اجازه می دهد 25 سری زمانی دمایی 2 متری پیوسته و همگن داشته باشیم. گفته می شود، این رویکرد همچنین رکورد دما را هموار می کند، زیرا دمای یک شبکه مدل نشان دهنده دمای متوسط در یک منطقه بزرگتر است. ما سعی می کنیم با نگاه کردن به تک تک اعضای گروه در EKF400v2، تأثیر هموارسازی را برای روند بازسازی ارزیابی کنیم (به زیر مراجعه کنید).

Paleo-Reanalys و بازسازی فصل سرد

به‌عنوان مجموعه داده‌های منبع برای داده‌های شبه ایستگاه و همچنین مجموعه داده‌های معیار برای ناهنجاری‌های دمایی ۲ متری جهانی، ما از تحلیل مجدد پالئوی جوی جهانی به‌روز شده استفاده می‌کنیم که ۴۰۰ سال گذشته را پوشش می‌دهد (EKF400v2) ²⁴ . EKF400v2 بر اساس مدل گردش جهانی ECHAM5.4 در پیکربندی فقط اتمسفر با دمای بازسازی‌شده سطح دریا برای ارائه یک تحلیل مجدد ماهانه و کاملاً جهانی برای سال‌های 1602-2003 پس از میلاد است. از 30 عضو گروه تشکیل شده است و داده های ابزاری، مستند و پروکسی را با استفاده از رویکرد آفلاین Ensemble Kalman Filter جذب می کند. EKF400v2 در مقایسه با رکوردهای مستقل عملکرد خوبی داشت ²⁴ . ما همه 30 عضو جداگانه را برای میدان‌های دمایی 2 متری ماهانه با وضوح T63 دانلود کردیمhttps://cera-www.dkrz.de/WDCC/ui/cerasearch/entry?acronym=EKF400v2.0 . ما ناهنجاری‌های دمایی 2 متری ماهانه و مختص اعضا را با توجه به اقلیم‌شناسی ماهانه 1951-1980 پس از میلاد محاسبه می‌کنیم. ما از انتخاب نزدیکترین همسایه برای بدست آوردن داده های شبه ایستگاه از شبکه های EKF400v2 استفاده می کنیم که در نتیجه 25 سری زمانی با N = 4824 ماه برای بازسازی ایجاد می شود. برای بررسی تأثیر عدم قطعیت در داده‌ها، از داده‌های میانگین شبه پروکسی یک عضو و مجموعه برای بازسازی شبکه عصبی استفاده می‌کنیم.

For an independent evaluation of annual 2m temperature anomalies, we use the updated Last Millenium Reanalysis (LMRv2)⁴³. LMRv2 also uses an offline Ensemble Kalman Filter approach, but key differences exist compared to EKF400v2. Firstly, LMRv2 only assimilates proxy data, rather than instrumental or documentary data. It further uses the CCSM4 Last Millennium simulation as the source of prior, where the prior is generated by randomly sampling 100 randomly drawn years. LMRv2 covers the time period 0–2000 CE, and we use an ensemble mean of the years 1602-2000 for comparison with EKF400v2 and the neural network reconstruction. We downloaded the LMRv2.1 data on T42 resolution from https://atmos.washington.edu/~hakim/lmr/LMRv2/. LMRv2 2m temperature data is only available as anomalies with respect to the 1951–1980 CE climatology.

For an independent evaluation of cold season 2m temperature anomalies, we use a newly published climate field reconstruction for the years 1701–1905 (CSR)¹⁶. The October–May average temperature field reconstructions were computed using a Bayesian reweighting method with the same ECHAM5.4 model run of EKF400v2 as prior, however using completely independent, newly digitized plant and ice phenology data to constrain the weights of each field. We downloaded the CSR data on T63 resolution from https://doi.org/10.1594/PANGAEA.934288.

Training data

For training we investigate three different gridded climate data sets; one climate reanalysis and two coupled general circulation models.

For reanalysis data, we use the National Oceanic and Atmospheric Administration 20th Century Reanalysis Version 3 (20CRv3)²². Monthly mean, ensemble mean (of 80 ensemble members) 2m temperature data for 1836–2015 CE on a 1 degree spatial resolution were downloaded from https://psl.noaa.gov/data/gridded/data.20thC_ReanV3.html. This reanalysis assimilates surface pressure observations only, with sea surface temperature and sea ice reconstructions as boundary conditions. We use the years 1851–2015 CE (1980 months) for training purposes and leave out the years 1836–1850 for independent evaluation. 20CRv3 was found to represent global climate variability in terms of magnitude, timing and spatial precision, with substantial improvements over previous versions⁴⁷. 2m temperature is among the most skilful variables in the data set, and as such we assume a realistic representation of near surface temperature inter-continental correlations for the neural network to learn.

The second data set we use for training purposes is the Max Planck Institute for Meteorology Grand Ensemble (MPI-GE)³⁷. MPI-GE consists of 100 members of the well-tested MPI-ESM1.1⁴⁸ model, run on four different forcing scenarios, including a historical forcing (1850–2005 CE) as well as RCP2.6, RCP4.5, and RCP8.5 (2006–2099 CE). We only use the historical forcing ensemble data on a monthly resolution and download 2m temperature data on a T63 resolution from the DKRZ ESGF-CoG Node https://esgf-data.dkrz.de/search/mpi-ge/. MPI-GE was found to capture temperature variability skillfully⁴⁹. The total amount of available months for training in the historical forcing setup is N = 187,200. Out of this data pool, we randomly sub-sample the data to either 1980 or 20,000 months for our training routine.

سومین مجموعه داده ای که ما برای اهداف آموزشی استفاده می کنیم، مجموعه مرکز ملی تحقیقات جوی جامعه سیستم زمین مدل Last Millennium Ensemble (CESM-LME) ³⁸ است . این مجموعه داده شامل 36 شبیه‌سازی هزاره آخر برای سال‌های 850 تا 2005 پس از میلاد مسیح از مدل گردش عمومی C E S M - C A M 5 _C N NCAR ، 13 عضو شامل نیروی گذرا (تابش خورشیدی، ذرات معلق در هوا، گازهای گلخانه‌ای، کاربری زمین/زمین) است. شرایط پوشش و پارامترهای مداری). ما فیلدهای دمایی ماهانه 2 متری را با وضوح فضایی 1.9 ^× 2.5 ^∘ برای دوره CESM-LME 851-2005 CE دانلود کردیم.https://www.cesm.ucar.edu/projects/community-projects/LME/data-sets.html . این مجموعه داده با موفقیت در مطالعات مختلفی استفاده شده است که سعی در تفکیک سهم متغیرهای آب و هوایی داخلی و خارجی دارد ⁵⁰ . مجموع ماه های موجود برای آموزش در این تنظیم اجباری 180024 = N است . از این مجموعه داده‌ها، ما به‌طور تصادفی داده‌ها را در 1980 یا 20000 ماه برای روال تمرین خود نمونه‌برداری می‌کنیم.

برای همه مجموعه داده‌های آموزشی، ما ناهنجاری‌های دمایی 2 متری ماهانه را با توجه به اقلیم‌شناسی ماهانه 1951-1980 پس از میلاد محاسبه می‌کنیم. ما از درون یابی دوخطی برای شبکه بندی مجدد تمام داده های آموزشی به شبکه EKF400v2، یعنی T63 با وضوح فضایی 1.875 استفاده می ^کنیم . ما همچنین از آزمون‌های t دو طرفه برای ارزیابی اهمیت آماری نتایج نشان‌داده‌شده در شکل‌ها استفاده می‌کنیم. 2 و 3 .

در دسترس بودن داده ها

همه محصولات داده های شبکه ای و همچنین پایگاه داده ISTI به صورت رایگان برای اهداف تحقیقاتی و آموزشی در دسترس هستند. EKF400v2 را می توانید در https://cera-www.dkrz.de/WDCC/ui/cerasearch/entry?acronym=EKF400_v2.0 دانلود کنید . میانگین گروه 20CRv3 را می توان در https://psl.noaa.gov/data/gridded/data.20thC_ReanV3.html دانلود کرد . MPI-GE را می توان در https://esgf-data.dkrz.de/search/mpi-ge/ دانلود کرد . CESM-LME را می توان در https://www.cesm.ucar.edu/projects/community-projects/LME/data-sets.html دانلود کرد . بازسازی فصل سرد را می توانید در https://doi.org/10.1594/PANGAEA.934288 دانلود کنید .

منابع

پررا، ا.، نیک، وی ام، چن، دی.، اسکارتزینی، جی.-ال. و هانگ، تی. کمیت کردن تأثیرات تغییرات آب و هوایی و رویدادهای شدید آب و هوایی بر سیستم های انرژی. نات انرژی 5 ، 150–159 (2020).

مقاله Google Scholar
هاسگاوا، تی و همکاران. رویدادهای شدید آب و هوایی خطر ناامنی غذایی جهانی و نیازهای سازگاری را افزایش می دهد. نات غذا 2 ، 587-595 (2021).

مقاله Google Scholar
سلینجر، MJ تغییرپذیری و تغییر آب و هوا: گذشته، حال و آینده – یک مرور کلی. صعود تغییر 70 ، 9-29 (2005).

مقاله CAS Google Scholar
Hansen, J., Sato, M., Kharecha, P. & von Schuckmann, K. عدم تعادل انرژی زمین و مفاهیم. اتمس. شیمی. فیزیک 11 ، 13421-13449 (2011).

مقاله CAS Google Scholar
Pendergrass, AG, Knutti, R., Lehner, F., Deser, C. & Sanderson, BM تنوع بارش در اقلیم گرمتر افزایش می یابد. علمی Rep. 7 , 1-9 (2017).

مقاله CAS Google Scholar
مدل‌های آب و هوایی Bathiany، S.، Dakos، V.، Scheffer، M. & Lenton، TM تغییرات دما را در کشورهای فقیر پیش‌بینی می‌کنند. علمی Adv. 4 , ear5809 (2018).

مقاله Google Scholar
Rehfeld, K., Hébert, R., Lora, JM, Lofverstrom, M. & Brierley, CM تغییرپذیری آب و هوای سطحی در شبیه سازی گذشته و آینده. سیستم زمین دین 11 ، 447-468 (2020).

مقاله Google Scholar
پارسونز، لس آنجلس، برنان، MK، ویلز، RC و پرویستوزسکو، C. بزرگی ها و الگوهای فضایی تنوع دمای بین دهه ای در CMIP6. ژئوفیز. Res. Lett. 47 , e2019GL086588 (2020).

مقاله Google Scholar
Coburn, J. & Pryor, S. اعتبار متفاوت حالت‌های آب و هوایی در CMIP6. جی. کلیم. 34 ، 8145-8164 (2021).

مقاله Google Scholar
موبرگ، A. و همکاران. روند تغییرات دمای روز به روز در رکوردهای ابزاری اروپایی 160 تا 275 ساله. جی. ژئوفیس. Res. اتمس. 105 ، 22849-22868 (2000).

مقاله Google Scholar
Dobrovolny`, P. et al. بازسازی دمای ماهانه، فصلی و سالانه برای اروپای مرکزی برگرفته از شواهد مستند و سوابق ابزاری از سال 1500 پس از میلاد. Clim. تغییر 101 ، 69-107 (2010).

مقاله Google Scholar
پاپرت، دی و همکاران. باز کردن قفل مشاهدات آب و هوا از Societas Meteorologica Palatina (1781-1792). صعود 17 گذشته ، 2361–2379 (2021).

مقاله Google Scholar
Emile-Geay، J. و همکاران. یک پایگاه داده چند پروکسی جهانی برای بازسازی دمای دوران مشترک. علمی داده 4 , 170088 (2017).

مقاله Google Scholar
Luterbacher, J., Dietrich, D., Xoplaki, E., Grosjean, M. & Wanner, H. تغییرات دمای فصلی و سالانه اروپا، روندها و افراط از سال 1500. Science 303 , 1499-1503 (2004).

مقاله CAS Google Scholar
Pauling, A., Luterbacher, J., Casty, C. & Wanner, H. پانصد سال بازسازی بارش با وضوح بالا شبکه ای در اروپا و اتصال به گردش در مقیاس بزرگ. صعود Dyn 26 , 387–405 (2006).

مقاله Google Scholar
ریچن، ال و همکاران. یک دهه از زمستان های سرد اوراسیا که برای اوایل قرن نوزدهم بازسازی شد. نات اشتراک. 13 , 2116 (2022).
Salcedo-Sanz، S. et al. انتخاب تقریباً بهینه نقاط اندازه گیری نماینده برای بازسازی میدان دمای قوی با روش های CRO-SL و آنالوگ. گلوب. سیاره. تغییر 178 ، 15–34 (2019).

مقاله Google Scholar
Jaume-Santero، F.، Barriopedro، D.، García-Herrera، R.، Calvo، N. & Salcedo-Sanz، S. انتخاب مکان های پروکسی بهینه برای بازسازی میدان دما با استفاده از الگوریتم های تکاملی. علمی نماینده _ https://doi.org/10.1038/s41598-020-64459-6 (2020).
Jaume-Santero, F., Barriopedro, D., García-Herrera, R. & Luterbacher, J. ماهانه بازسازی فشار سطح دریای آتلانتیک شمالی به سال 1750 میلادی با استفاده از بهینه سازی هوش مصنوعی. جی کلیم https://journals.ametsoc.org/view/journals/clim/aop/JCLI-D-21-0155.1/JCLI-D-21-0155.1.xml (2022).
حکیم، جی جی و همکاران. آخرین پروژه تحلیل مجدد آب و هوای هزاره: چارچوب و اولین نتایج جی. ژئوفیس. Res. اتمس. 121 ، 6745-6764 (2016).

مقاله Google Scholar
Franke, J., Brönnimann, S., Bhend, J. & Brugnara, Y. پالئو-تحلیل ماهانه جهانی جو از سال 1600 تا 2005 برای مطالعه تغییرات آب و هوایی گذشته. علمی داده https://doi.org/10.1038/sdata.2017.76 (2017).
اسلیوینسکی، ال سی و همکاران به سوی یک تحلیل مجدد تاریخی قابل اعتمادتر: بهبودهایی برای نسخه 3 سیستم تحلیل مجدد قرن بیستم. Meteorol QJR. Soc. 145 ، 2876-2908 (2019).

مقاله Google Scholar
عثمان، MB و همکاران. دمای سطح جهانی از زمان آخرین حداکثر یخبندان حل شده است. Nature 599 ، 239-244 (2021).

مقاله CAS Google Scholar
Valler, V., Franke, J., Brugnara, Y. & Brönnimann, S. تحلیل مجدد پارینه جوی جهانی به روز شده که 400 سال گذشته را پوشش می دهد. Geosci. داده جی . https://doi.org/10.1002/gdj3.121 (2021).
Karpatne, A., Ebert-Uphoff, I., Ravela, S., Babaie, HA & Kumar, V. یادگیری ماشینی برای علوم زمین: چالش ها و فرصت ها. IEEE Trans. دانستن مهندسی داده 31 ، 1544-1554 (2018).

مقاله Google Scholar
رایششتاین، ام. یادگیری عمیق و درک فرآیند برای علم سیستم زمین مبتنی بر داده. طبیعت https://doi.org/10.1038/s41586-019-0912-1 (2019).
Toms، BA، Barnes، EA و Ebert-Uphoff، I. شبکه‌های عصبی قابل تفسیر فیزیکی برای علوم زمین: برنامه‌های کاربردی برای تغییرپذیری سیستم زمین. J. Adv. مدل. سیستم زمین 12 , e2019MS002002 (2020).

مقاله Google Scholar
Barnes, EA, Hurrell, JW, Ebert-Uphoff, I., Anderson, C. & Anderson, D. مشاهده الگوهای آب و هوایی اجباری از طریق لنز هوش مصنوعی. ژئوفیز. Res. Lett. 46 ، 13389–13398 (2019).

مقاله Google Scholar
Rasp، S. و همکاران. WeatherBench: مجموعه داده های معیار برای پیش بینی آب و هوا مبتنی بر داده. J. Adv. مدل. سیستم زمین 12 , e2020MS002203 (2020).

مقاله Google Scholar
بیوکلر، تی و همکاران. اعمال محدودیت های تحلیلی در شبکه های عصبی شبیه سازی سیستم های فیزیکی فیزیک کشیش لِت 126 , 098302 (2021).

مقاله CAS Google Scholar
هوش مصنوعی اطلاعات آب و هوایی از دست رفته را بازسازی می کند. نات Geosci. 13 ، 408-413 (2020).

مقاله CAS Google Scholar
گوارا، ام.، تاوفر، ام و وارگاس، آر. رطوبت جهانی سالانه خاک بدون شکاف: شبکه‌های 15 کیلومتری برای سال‌های 1991-2018. سیستم زمین علمی داده 13 ، 1711-1735 (2021).

مقاله Google Scholar
Bolibar, J., Rabatel, A., Gouttevin, I. & Galiez, C. بازسازی یادگیری عمیق سری تعادل جرم برای همه یخچالهای طبیعی در کوههای آلپ فرانسه: 1967-2015. سیستم زمین علمی داده 12 ، 1973-1983 (2020).

مقاله Google Scholar
O’Connor, P., Murphy, C., Matthews, T. & Wilby, RL جریانهای رودخانه ماهانه بازسازی شده برای حوضه های آبریز ایرلندی 1766-2016. Geosci. داده J. 8 ، 34-54 (2021).

مقاله Google Scholar
Arora، NK & Mishra، I. اهداف توسعه پایدار سازمان ملل متحد 2030 و پایداری محیطی: مسابقه با زمان (2019).
پرسلو، سی و همکاران. یادگیری عمیق و رصد زمین برای حمایت از اهداف توسعه پایدار: رویکردهای فعلی، چالش‌های باز و فرصت‌های آینده. IEEE Geosci. سنسور از راه دور Mag . https://doi.org/10.1109/MGRS.2021.3136100 (2022).
ماهر، N. و همکاران. گروه بزرگ موسسه ماکس پلانک: امکان اکتشاف تنوع سیستم آب و هوا را فراهم می کند. J. Adv. مدل. سیستم زمین 11 ، 2050–2069 (2019).

مقاله Google Scholar
Otto-Bliesner، BL تنوع و تغییر آب و هوا از سال 850 پس از میلاد: یک رویکرد مجموعه ای با مدل سیستم زمین جامعه (CESM). BAMS https://doi.org/10.1175/bams-d-14-00233.1 (2016).
Cook, ER, Briffa, KR & Jones, PD روشهای رگرسیون فضایی در دندروکلیماتولوژی: بررسی و مقایسه دو تکنیک. بین المللی جی.کلیماتول. 14 ، 379-402 (1994).

مقاله Google Scholar
Tipton, J., Hooten, M. & Goring, S. بازسازی دمای مکانی-زمانی از سوابق تاریخی پراکنده با استفاده از رگرسیون مولفه اصلی احتمالی قوی. Adv. آمار کلیماتول. 3 ، 1-16 (2017).

Google Scholar
Barnston، AG & Livezey، RE طبقه بندی، فصلی بودن و تداوم الگوهای گردش جوی فرکانس پایین. دوشنبه Weather Rev. 115 , 1083-1126 (1987).

مقاله Google Scholar
Bhend, J., Franke, J., Folini, D., Wild, M. & Brönnimann, S. رویکردی مبتنی بر گروه به بازسازی آب و هوا. صعود گذشته https://doi.org/10.5194/cp-8-963-2012 (2012).
تاردیف، آر و همکاران. تجزیه و تحلیل مجدد هزاره گذشته با پایگاه داده پروکسی گسترده و مدل سازی پراکسی فصلی. صعود 15 گذشته ، 1251–1273 (2019).

مقاله Google Scholar
Hochreiter, S. & Schmidhuber, J. حافظه کوتاه مدت. محاسبات عصبی 9 ، 1735-1780 (1997).

مقاله CAS Google Scholar
Kingma, DP & Ba, J. Adam: روشی برای بهینه سازی تصادفی. در (ویرایش‌های Bengio, Y. & LeCun, Y.) سومین کنفرانس بین‌المللی در بازنمایی یادگیری، ICLR 2015، سن دیگو، کالیفرنیا، ایالات متحده آمریکا، 7-9 مه، 2015، مجموعه مقالات کنفرانس https://arxiv.org/pdf/ 1412.6980.pdf (2015).
رنی، جی جی و همکاران. ابتکار بین المللی دمای سطح بانک اطلاعات سطح زمین: شرح و روش های انتشار داده های دمای ماهانه. Geosci. داده J. 1 ، 75-102 (2014).

مقاله Google Scholar
اسلیوینسکی، ال و همکاران. ارزیابی عملکرد تحلیل مجدد قرن بیستم نسخه 3. J. Clim. 34 ، 1417-1438 (2021).

مقاله Google Scholar
جورجتا، MA و همکاران. تغییرات آب و هوا و چرخه کربن از 1850 تا 2100 در شبیه سازی های MPI-ESM برای فاز 5 پروژه مقایسه مدل جفت شده. J. Adv. مدل. سیستم زمین 5 ، 572-597 (2013).

مقاله Google Scholar
Suarez-Gutierrez, L., Li, C., Müller, WA & Marotzke, J. تنوع داخلی در دمای تابستان اروپا در 1.5 ^∘ C و 2 ^∘ C گرم شدن کره زمین. محیط زیست Res. Lett. 13 , 064026 (2018).

مقاله Google Scholar
Neukom، R. و همکاران. تنوع چند دهه ای ثابت در بازسازی و شبیه سازی دمای جهانی در دوره مشترک. نات Geosci. 12 , 643 (2019).

پایدار، سبز یا هوشمند؟ مسیرهایی...

ژوئن 9, 2026