LIM_CPC is a Python package with functionality for pre-processing data, computing EOFs, training linear inverse models (LIMs) on selected variable with specified PC truncations, pre-processing realtime data to initialize a LIM, and generating forecasts.
LIM_CPC also post-processes the forecast output to generate maps of forecast anomalies, probabilities derived from the LIM error covariance, perfect model expected skill, hovmollers, and projections onto CPC-specified teleconnection loading patterns.
The latest version of LIM_CPC is v1.2.0
Version changes:
v1.2.0 (1 Feb 2022):
* Post-processes the LIM forecast output with a regression from forecast temperature PCs (JRA space, 2.5 degree grid) to the CPC temperature dataset PCs for the same time period and same truncation. The new forecast PCs are reconstructed on the 2-degree CPC forecast grid over CONUS.
* Verification of the LIM forecasts are done on the 2-degree CPC grid over CONUS, using the
CPC Global Temp Dataset from PSL (0.5deg resolution) and area-averaging to the 2-degree grid.
* New products: Loops of anomalies at daily time-steps.
* Added latitude/longitude labels on tropical heating maps.
v1.1.1 (1 Nov 2021):
* Extended LIM forecasts to include upcoming CPC forecast times (weeks 3 and 4, issued on Friday each week). New maps added for these verification periods (constant through the week and move ahead by one week every Saturday).
v1.1.0 (25 Oct 2021):
* Changed plotting package from basemap to cartopy.
* Verification using the
CPC Global Temp Dataset from PSL (0.5deg resolution) area-averaged to 2.5 land-masked grid used in JRA temperature dataset for LIM.
* Anomalies re-defined with new climatology (1991-2020), as of forecast initialized on 28 May 2021.
* New products: Hovmollers of 500mb heights averaged between 20N-40N and 40N-60N. Hovmollers of tropical heating averaged between 15S-0N and 0N-15N.
v1.0.0 (1 Mar 2021):
* Full details of LIM and package functionality listed below.
v1.0.0beta (1 Oct 2020):
* Same as v1.0.0 without monthly blending.
Operational LIM details:
Trained on JRA-55 data. Realtime LIM uses JRA-55, typically 3 days lagged. All variables are 7-day running averages, with the value assigned to the last day of the 7-day period.
Variable: EOF truncation
(H10, H100): 12, H500: 16, SF750: 15, SLP: 23, colIrr: 23, T2m: 5
Computed monthly from MONTH / 1 to MONTH+1 / 28
Error covariance scaled to add full variance back in, adjusting probabilities (not affecting the ensemble forecast mean).
Weighted blend adjacent month LIM forecasts for the first and last weeks of each month.
Forecast products:
Anomaly and probability maps for week-2, 3, and 4, and initialization (after pre-processing realtime data, including EOF truncation)
* 2-meter temperature (North America)
* 500mb heights (northern hemisphere >20N)
* Sea level pressure (northern hemisphere >20N)
* Tropical heating (20S - 20N)
These maps are also generated for the upcoming CPC weeks-3/4 forecast period, valid 15-28 days after the upcoming Friday.
Hovmollers for
* 500mb heights 20N-40N, 30N-50N, 40N-60N
* Tropical heating 15S-0N, 7.5S-7.5N, 0N-15N
Teleconnection forecasts for
* North Atlantic Oscillation
* Pacific / North America
* Polar / Eurasian
* E Pacific / N Pacific
* E Atlantic / W Russia
* E Atlantic
* Scandinavia
* Tropical / Northern Hemisphere
* W Pacific
LIM_CPC can be installed from source by cloning the GitHub repository:
git clone https://github.com/splillo/LIM_CPC/
cd tropycal
python setup.py install
LIM_CPC runs on Python 3.7+
Dependencies:
* netCDF4 >= 1.6.0
* cartopy >= 0.17.0
* cfgrib >= 0.9.6
* matplotlib >= 2.2.2
* numpy >= 1.14.3
* scipy >= 1.1.0
* xarray >= 0.10.7
dataset.varDataset
Creates an instance of varDataset object based on requested files & variable.
Parameters
----------
varlabel : str
Label for object, consistent with namelist use_vars dictionary.
path : str
Directory containing files for concatenation.
varname : str
Name of variable in files.
kwargs : dict
see Other Parameters
Other Parameters
----------------
level : float
If file contains data for multiple levels
climoyears : tuple
(start year, end year) to slice data
latbounds : tuple
(south lat, north lat) to slice data
lonbounds : tuple)
(west lon, east lon) to slice data
time_window : int
Used for running_mean, number of time units.
time_sample : str
Integer + string, indicating number of time sample which can be
'D' = Day, 'W' = Week, 'M' = Month, 'Y' = Year.
Example: '3D' would be a time sample of one ob every 3 days.
landmask : bool
If True, only use values over landmass.
Returns
-------
Dataset : object
An instance of Dataset.
dataset.varDataset.subset
Creates a new instance of varDataset object subset by dates.
Parameters
----------
datebounds : tuple
Start and end dates of season, in form "m/d".
season0 : bool
If True, starts data at beginning of season. If False, starts data at beginning of first year.
Only matters for seasons that cross Jan 1st.
Returns
-------
newobj : object
A new instance of varDataset.
dataset.varDataset.plot_map
Takes space vector or time, and plots a map of the data.
Parameters
----------
z : ndarray / list
Vector with same length as number of space points in self.anomaly
time : datetime object
Optional to plot self.anomaly data from specific time in self.time.
ax : axes instance
Can pass in own axes instance to plot in existing axes.
kwargs : dict
see Other Parameters
Other Parameters
----------------
prop : dict
Options for plotting, keywords and defaults include
* 'cmap' - None
* 'levels' - negative abs data max to abs data max
* 'figsize' - (10,6)
* 'dpi' - 150
* 'cbarticks' - every other level, determined in get_cmap_levels
* 'cbarticklabels' - same as cbarticks
* 'cbar_label' - None. Optional string, or True for self.attrs['units']
* 'extend' - 'both'
* 'interpolate' - None. If float value, interpolates to that lat/lon grid resolution.
* 'drawcountries' - False
* 'drawstates' - False
Returns
-------
ax : axes instance
dataset.eofDataset
Creates an instance of eofDataset object based on requested files & variables.
Parameters
----------
varobjs : list
List of variable objects
Max_eofs : int
Maximum number of modes to save to object. Default is 100. (This is different than the truncation in the LIM)
Returns
-------
Dataset : object
An instance of eofDataset.
Saved attributes
-------------
Accessible using a period after the object, followed by the name.
Can also view what attributes are available in any object by typing objname.__dict__.keys()
* eof, pc, totalVar, varExplByEOF
dataset.eofDataset.reconstruct
Method for reconstructing spatial vector from PCs and EOFs
Parameters
----------
pcs : list or tuple or ndarray
list of floating numbers corresponding to leading PCs
order : int
1 = forecast PCs, 2 = error covariance matrix
num_eofs : int
truncation for reconstruction. Default is to retain all EOFs available.
pc_wt : dict
keys corresponding to PC number. Values corresponding to weight
Returns
-------
recon : dict
dictionary with keys = variable label, and values = reconstructed spatial vector.
dataset.eofDataset.plot
Map EOFs and plot PC timeseries in subplots.
Include percent variance explained.
Parameters
----------
num_eofs : int
number of EOFs to plot. Default is 4
return_figs : bool
Return list of the figures. Default is False
Returns
-------
List of figures
model.Model
Parameters
----------
tau0_data : ndarray
Data for calibrating the LIM. Expects
a 2D MxN matrix where M (rows) represent the sampling dimension and
N(columns) represents the feature dimension (e.g. spatial grid
points).
tau1_data : ndarray, optional
Data with lag of tau=1. Used to calculate the mapping term, G1,
going from tau0 to tau1. Must be the same shape as tau0_data. If
not provided, tau0_data is assumed to be sequential and
nelem_in_tau1 and tau0_data is used to calculate lag covariance.
tau1n : int, optional
Number of time samples that span tau=1. E.g. for daily data when
a forecast tau is equivalent to 1 week, tau1n should be 7.
Used if tau1_data is not provided.
Saved attributes
-------------
Accessible using a period after the object, followed by the name.
Can also view what attributes are available in any object by typing objname.__dict__.keys()
* tau1n, X0, X1, C0, Ctau, G1, Geigs, L, Leigs, Q
model.Model.forecast
Forecast on provided data.
Performs LIM forecast over the times specified by the fcst_leads.
Forecast can be performed by calculating G for each time
period or by L for a 1-tau lag and then calculating each fcst_lead G
from that L matrix.
Parameters
----------
t0_data : ndarray
Data to forecast from.
Expects a 1D or 2D array where M (rows) represent the
initialization time dimension
and N(columns) represents the feature dimension.
lead_time : List (int)
A list of forecast lead times. Each value is interpreted as a
tau value, for which the forecast matrix is determined as G_1^tau.
Returns
-----
fcst_out : ndarray-like
LIM forecasts in a list of arrays, where the first dimension
corresponds to each lead time. Second dimension corresponds
to initialization times. Third dimension is PCs
driver.Driver.get_variables
Load data into variable objects and save to pickle files.
Parameters
----------
read_path : str
If None, compiles data from paths specified in variable namelist file.
save_path : str
If None, does not save variables.
save_netcdf_path : str
If a str, saves netcdf files of variable with running-mean anomalies and climatology to specified directory.
Default is None, which means netcdf files will not be written.
segmentby : str
If save_netcdf_path is not None, segmentby specifies how to chunk the netcdf files. Options are "all" or "year". Default is "all".
driver.Driver.get_eofs
Load data into EOF objects and save to pickle files.
Parameters
----------
read : bool
If True, reads previously saved eof objects, else compiles data from path.
save_netcdf_path : str
If a str, saves netcdf files of EOF object with regridded EOF maps, PCs, and variance explained.
Default is None, which means netcdf files will not be written.
driver.Driver.cross_validation
Cross validation.
Parameters
----------
num_folds : int
Number of times data is subset and model is trained. Default is 10,
Which means each iteration trains on 90% of data, forecasts run on 10% of the data.
lead_times : list, tuple, ndarray
Lead times (in days) to integrate model for output. Default is first 28 lead times.
average : bool
Whether to average the forecasts if multiple lead times are provided.
save_netcdf_path : str
Path to save netcdf files containing forecast and spread.
segmentby : str
If save_netcdf_path is not None, segmentby specifies how to chunk the netcdf files.
Options are "day", "month", "year". Default is "day".
Returns
-------
model_F : dict
Dictionary of model forecast output, with keys corresponding to valid time
model_E : dict
Dictionary of model error output, with keys corresponding to valid time
driver.Driver.pc_to_grid
Convect PC state space vector to variable grid space.
Parameters
----------
F : ndarray
Array with one or two dimensions. LAST axis must be the PC vector.
E : ndarray
Array with one or two dimensions. Error covariance matrix. If None, ignores.
LAST TWO axes must be length of PC vector.
limkey
Name of LIM / EOF truncation dictionary. Default is first one in namelist.
regrid : bool
Whether to regrid to lat/lon gridded map.
varname : str
If just one variable output is desired.
Default is None, in which case all variables are output in a dictionary.
Returns
-------
Fmap : dict
If F was provided. Dictionary with keys as variable names, and values are ndarrays of
reconstructed gridded space.
Emap : dict
If E was provided. Dictionary with keys as variable names, and values are ndarrays of
reconstructed gridded space.
driver.Driver.get_rhoinf_time
Get timeseries of rho infinity.
Parameters
----------
varname : str
Name of variable of interest.
fcst : ndarray
Forecast array with shape (time, space) or (space). Default is None, using the self.model_F output.
spread : ndarray
Spread array with shape (time, space) or (space). Default is None, using the self.model_E output.
latbounds : tuple
latitude bounds to specify spatial subdomain. Default is None, using full domain.
lonbounds : tuple
longitude bounds to specify spatial subdomain. Default is None, using full domain.
region : str
Using regionmask, can specify country to define the subdomain. Default is None.
Returns
-------
rhoinf : ndarray or float
driver.Driver.get_model
Get the LIM model object.
Parameters
----------
eof_trunc : dict
Dictionary containing eofs as keys and truncations as values
tau1n : int
Training lag
datebounds : tuple
Strings (m/d) defining start and end of training period. Default is (‘1/1’,’12/31’)
load_file : str
Filename for pickle file containing model to load.
Default is None, in which case a new model is trained.
save_file : str
Filename of pickle file to save new model to.
Default is None, in which case the model is not saved.
save_to_netcdf : str
Filename to save netcdf containing model attributes.
Default is None, which does not write a file.
Returns
-------
model : model object
Object of trained model
driver.Driver.prep_realtime_data
Compile realtime data, interpolate to same grid as LIM, and convert into PCs.
If it has already been run before within the defined self object,
self.RT_VARS will be the same, and self.RT_PC can be updated for different LIMs with same vars,
without having to process variables again.
Parameters
----------
limkey
Name of LIM / EOF truncation dictionary. Default is first one in namelist.
Saved to self.RTLIMKEY
Returns
-------
self.RT_VARS : dict
dictionary with keys = variable names + "time", and values = data,
processed to same time res and space res & domain as training variables.
self.RT_PC : dict
dictionary with keys = eof names, and values = truncated PCs, both according to specified limkey
>>> Example use: Can zero out certain PCs in initial conditions by
self.RT_PC[eofname] *= 0
driver.Driver.run_forecast
Run forecasts initialized with self.RT_PC, with EOFs & model specified by self.RTLIMKEY
Parameters
----------
t_init : datetime object
Date of initialization for the forecast.
Default is maximum date available in self.RT_VARS['time']
lead_times : list, tuple, or ndarray
Lead_times in increment of data to integrate model forward.
Default is first 28 leads.
save_netcdf_path : str
Path to save netcdf files containing forecast and spread.
Default is None, which will not write netcdf files.
Returns
-------
model_F : dict
Dictionary of model forecast output, with keys corresponding to init time.
model_E : dict
Dictionary of model error covariance output, with keys corresponding to init time.
driver.Driver.plot_teleconnection
Plots teleconnection timeseries analysis, forecast, and spread.
Parameters
----------
list_of_teleconnections : list
List of names (str) of teleconnections to plot.
daysback : int
Number of days prior to t_init to plot analysis.
save_to_file : str
Name of file to save figure to. Default is None, which does not save the figure
Other Parameters
----------------
prop : dict
Customization properties for plotting
driver.Driver.plot_timelon
Plots meridionally averaged data with longitude on x-axis and time on y-axis
Parameters
----------
varname : str
Name of variable (consistent with use_var dictionary keys) to plot.
lat_bounds : tuple or list
Latitude boundaries to average data between.
t_init : datetime object
Time of forecast initialization. Default is maximum time in self.model_F.
daysback : int
Number of days prior to t_init to plot hovmoller
save_to_file : str
Name of file to save figure to. Default is None, which does not save the figure
driver.Driver.verif
Plots verifying anomalies, categories, hit/miss maps, and outputs HSS and RPSS skill scores.
Parameters
----------
varname : str
Must be a variable name in the list of keys used in self.RT_VARS.
t_init : datetime object
Time of forecast initialization.
Default is maximum time in self.RT_VARS['time'] minus maximum lead time.
lead_times : ndarray / list / tuple
Lead times. Forecasts and obs WILL be averaged for given leads.
Fmap : ndarray
Default is None, in which case forecast will be taken from self.model_F[t_init]
Emap : ndarray
Default is None, in which case spread will be taken from self.model_F[t_init]
prob_thresh : int
Probability threshold (50 to 100) above which verification statistics will be calculated and mapped.
Default is 50 (no masking).
regMask : str
Name of region / country to use for masking maps and skill score calculation. Default is None.
Options listed in regionmask.defined_regions.natural_earth.countries_110
Other Parameters
----------------
prop : dict
Customization properties for plotting