LIM_CPC is a Python package with functionality for pre-processing data, computing EOFs, training linear inverse models (LIMs) on selected variable with specified PC truncations, pre-processing realtime data to initialize a LIM, and generating forecasts. LIM_CPC also post-processes the forecast output to generate maps of forecast anomalies, probabilities derived from the LIM error covariance, perfect model expected skill, hovmollers, and projections onto CPC-specified teleconnection loading patterns. The latest version of LIM_CPC is v1.2.0 Version changes: v1.2.0 (1 Feb 2022): * Post-processes the LIM forecast output with a regression from forecast temperature PCs (JRA space, 2.5 degree grid) to the CPC temperature dataset PCs for the same time period and same truncation. The new forecast PCs are reconstructed on the 2-degree CPC forecast grid over CONUS. * Verification of the LIM forecasts are done on the 2-degree CPC grid over CONUS, using the CPC Global Temp Dataset from PSL (0.5deg resolution) and area-averaging to the 2-degree grid. * New products: Loops of anomalies at daily time-steps. * Added latitude/longitude labels on tropical heating maps. v1.1.1 (1 Nov 2021): * Extended LIM forecasts to include upcoming CPC forecast times (weeks 3 and 4, issued on Friday each week). New maps added for these verification periods (constant through the week and move ahead by one week every Saturday). v1.1.0 (25 Oct 2021): * Changed plotting package from basemap to cartopy. * Verification using the CPC Global Temp Dataset from PSL (0.5deg resolution) area-averaged to 2.5 land-masked grid used in JRA temperature dataset for LIM. * Anomalies re-defined with new climatology (1991-2020), as of forecast initialized on 28 May 2021. * New products: Hovmollers of 500mb heights averaged between 20N-40N and 40N-60N. Hovmollers of tropical heating averaged between 15S-0N and 0N-15N. v1.0.0 (1 Mar 2021): * Full details of LIM and package functionality listed below. v1.0.0beta (1 Oct 2020): * Same as v1.0.0 without monthly blending.

Operational LIM details: Trained on JRA-55 data. Realtime LIM uses JRA-55, typically 3 days lagged. All variables are 7-day running averages, with the value assigned to the last day of the 7-day period. Variable: EOF truncation (H10, H100): 12, H500: 16, SF750: 15, SLP: 23, colIrr: 23, T2m: 5 Computed monthly from MONTH / 1 to MONTH+1 / 28 Error covariance scaled to add full variance back in, adjusting probabilities (not affecting the ensemble forecast mean). Weighted blend adjacent month LIM forecasts for the first and last weeks of each month. Forecast products: Anomaly and probability maps for week-2, 3, and 4, and initialization (after pre-processing realtime data, including EOF truncation) * 2-meter temperature (North America) * 500mb heights (northern hemisphere >20N) * Sea level pressure (northern hemisphere >20N) * Tropical heating (20S - 20N) These maps are also generated for the upcoming CPC weeks-3/4 forecast period, valid 15-28 days after the upcoming Friday. Hovmollers for * 500mb heights 20N-40N, 30N-50N, 40N-60N * Tropical heating 15S-0N, 7.5S-7.5N, 0N-15N Teleconnection forecasts for * North Atlantic Oscillation * Pacific / North America * Polar / Eurasian * E Pacific / N Pacific * E Atlantic / W Russia * E Atlantic * Scandinavia * Tropical / Northern Hemisphere * W Pacific

LIM_CPC can be installed from source by cloning the GitHub repository: git clone https://github.com/splillo/LIM_CPC/ cd tropycal python setup.py install LIM_CPC runs on Python 3.7+ Dependencies: * netCDF4 >= 1.6.0 * cartopy >= 0.17.0 * cfgrib >= 0.9.6 * matplotlib >= 2.2.2 * numpy >= 1.14.3 * scipy >= 1.1.0 * xarray >= 0.10.7

dataset.varDataset

	Creates an instance of varDataset object based on requested files & variable.
    
	Parameters
	----------
	varlabel : str
		Label for object, consistent with namelist use_vars dictionary.
	path : str
		Directory containing files for concatenation.
	varname : str
		Name of variable in files.

	kwargs : dict
		see Other Parameters
        
	Other Parameters
	----------------
	level : float
		If file contains data for multiple levels
	climoyears : tuple
		(start year, end year) to slice data
	latbounds : tuple
		(south lat, north lat) to slice data
	lonbounds : tuple)
		(west lon, east lon) to slice data
	time_window : int
		Used for running_mean, number of time units.
	time_sample : str
		Integer + string, indicating number of time sample which can be 
		'D' = Day, 'W' = Week, 'M' = Month, 'Y' = Year.
		Example: '3D' would be a time sample of one ob every 3 days.
	landmask : bool
		If True, only use values over landmass.    

	Returns
	-------
	Dataset : object
		An instance of Dataset.

dataset.varDataset.subset

	Creates a new instance of varDataset object subset by dates.

	Parameters
	----------
	datebounds : tuple
		Start and end dates of season, in form "m/d".
	season0 : bool
		If True, starts data at beginning of season. If False, starts data at beginning of first year.
		Only matters for seasons that cross Jan 1st.

	Returns
	-------
	newobj : object
		A new instance of varDataset.

dataset.varDataset.plot_map

	Takes space vector or time, and plots a map of the data.

	Parameters
	----------
	z : ndarray / list
		Vector with same length as number of space points in self.anomaly
	time : datetime object
		Optional to plot self.anomaly data from specific time in self.time.
	ax : axes instance
		Can pass in own axes instance to plot in existing axes.

	kwargs : dict
		see Other Parameters

	Other Parameters
	----------------
	prop : dict
		Options for plotting, keywords and defaults include
		* 'cmap' - None
		* 'levels' - negative abs data max to abs data max
		* 'figsize' - (10,6)
		* 'dpi' - 150
		* 'cbarticks' - every other level, determined in get_cmap_levels
		* 'cbarticklabels' - same as cbarticks
		* 'cbar_label' - None. Optional string, or True for self.attrs['units']
		* 'extend' - 'both'
		* 'interpolate' - None. If float value, interpolates to that lat/lon grid resolution.
		* 'drawcountries' - False
		* 'drawstates' - False

	Returns
	-------
	ax : axes instance

dataset.eofDataset

	Creates an instance of eofDataset object based on requested files & variables.

	Parameters
	----------
		varobjs : list
			List of variable objects
		Max_eofs : int
	Maximum number of modes to save to object. Default is 100. (This is different than the truncation in the LIM)
   
	Returns
	-------
	Dataset : object
		An instance of eofDataset.

	Saved attributes
	-------------
	Accessible using a period after the object, followed by the name.
	Can also view what attributes are available in any object by typing objname.__dict__.keys()

		* eof, pc, totalVar, varExplByEOF

dataset.eofDataset.reconstruct

	Method for reconstructing spatial vector from PCs and EOFs

	Parameters
	----------
	pcs : list or tuple or ndarray
		list of floating numbers corresponding to leading PCs
	order : int
		1 = forecast PCs, 2 = error covariance matrix
	num_eofs : int
		truncation for reconstruction. Default is to retain all EOFs available.
	pc_wt : dict
		keys corresponding to PC number. Values corresponding to weight        
	
	Returns
	-------
	recon : dict
		dictionary with keys = variable label, and values = reconstructed spatial vector.

dataset.eofDataset.plot

	Map EOFs and plot PC timeseries in subplots.
	Include percent variance explained.

	Parameters
	----------
	num_eofs : int
		number of EOFs to plot. Default is 4
	return_figs : bool
		Return list of the figures. Default is False
	
	Returns
	-------
	List of figures

model.Model

	Parameters
	----------
	tau0_data : ndarray
		Data for calibrating the LIM.  Expects
		a 2D MxN matrix where M (rows) represent the sampling dimension and
		N(columns) represents the feature dimension (e.g. spatial grid
		points).
	tau1_data : ndarray, optional
		Data with lag of tau=1.  Used to calculate the mapping term, G1,
		going from tau0 to tau1.  Must be the same shape as tau0_data.  If
		not provided, tau0_data is assumed to be sequential and
		nelem_in_tau1 and tau0_data is used to calculate lag covariance.
	tau1n : int, optional
		Number of time samples that span tau=1.  E.g. for daily data when
		a forecast tau is equivalent to 1 week, tau1n should be 7.
		Used if tau1_data is not provided.

	Saved attributes
	-------------
	Accessible using a period after the object, followed by the name.
	Can also view what attributes are available in any object by typing objname.__dict__.keys()

		* tau1n, X0, X1, C0, Ctau, G1, Geigs, L, Leigs, Q

model.Model.forecast

	Forecast on provided data.

	Performs LIM forecast over the times specified by the fcst_leads. 
	Forecast can be performed by calculating G for each time 
	period or by L for a 1-tau lag and then calculating each fcst_lead G 
	from that L matrix.

	Parameters
	----------
	t0_data : ndarray
		Data to forecast from.  
		Expects a 1D or 2D array where M (rows) represent the 
		initialization time dimension
		and N(columns) represents the feature dimension.
	lead_time : List (int)
		A list of forecast lead times.  Each value is interpreted as a
		tau value, for which the forecast matrix is determined as G_1^tau.
	
	Returns
	-----
	fcst_out : ndarray-like
		LIM forecasts in a list of arrays, where the first dimension 
		corresponds to each lead time. Second dimension corresponds 
		to initialization times. Third dimension is PCs

driver.Driver.get_variables

	Load data into variable objects and save to pickle files.

	Parameters
	----------

	read_path : str
		If None, compiles data from paths specified in variable namelist file.
	save_path : str
		If None, does not save variables.
	save_netcdf_path : str
		If a str, saves netcdf files of variable with running-mean anomalies and climatology to specified directory.
		Default is None, which means netcdf files will not be written.
	segmentby : str
		If save_netcdf_path is not None, segmentby specifies how to chunk the netcdf files. Options are "all" or "year". Default is "all".

driver.Driver.get_eofs

	Load data into EOF objects and save to pickle files.

	Parameters
	----------
	read : bool
		If True, reads previously saved eof objects, else compiles data from path.
	save_netcdf_path : str
		If a str, saves netcdf files of EOF object with regridded EOF maps, PCs, and variance explained.
		Default is None, which means netcdf files will not be written.

driver.Driver.cross_validation

	Cross validation.

	Parameters
	----------
	num_folds : int
		Number of times data is subset and model is trained. Default is 10,
		Which means each iteration trains on 90% of data, forecasts run on 10% of the data.
	lead_times : list, tuple, ndarray
		Lead times (in days) to integrate model for output. Default is first 28 lead times.
	average : bool
		Whether to average the forecasts if multiple lead times are provided.
	save_netcdf_path : str
		Path to save netcdf files containing forecast and spread.
	segmentby : str
		If save_netcdf_path is not None, segmentby specifies how to chunk the netcdf files. 
		Options are "day", "month", "year". Default is "day".
	
	Returns
	-------
	model_F : dict
		Dictionary of model forecast output, with keys corresponding to valid time
	model_E : dict
		Dictionary of model error output, with keys corresponding to valid time

driver.Driver.pc_to_grid

	Convect PC state space vector to variable grid space.

	Parameters
	----------
	F : ndarray
		Array with one or two dimensions. LAST axis must be the PC vector.
	E : ndarray
		Array with one or two dimensions. Error covariance matrix. If None, ignores.
		LAST TWO axes must be length of PC vector.
	limkey
		Name of LIM / EOF truncation dictionary. Default is first one in namelist.
	regrid : bool
		Whether to regrid to lat/lon gridded map.
	varname : str
		If just one variable output is desired. 
		Default is None, in which case all variables are output in a dictionary.
	
	Returns
	-------
	Fmap : dict
		If F was provided. Dictionary with keys as variable names, and values are ndarrays of 
		reconstructed gridded space.
	Emap : dict
		If E was provided. Dictionary with keys as variable names, and values are ndarrays of 
		reconstructed gridded space.

driver.Driver.get_rhoinf_time

	Get timeseries of rho infinity.

	Parameters
	----------
	varname : str
		Name of variable of interest.
	fcst : ndarray
		Forecast array with shape (time, space) or (space). Default is None, using the self.model_F output.
	spread : ndarray
		Spread array with shape (time, space) or (space). Default is None, using the self.model_E output.
	latbounds : tuple
		latitude bounds to specify spatial subdomain. Default is None, using full domain.
	lonbounds : tuple
		longitude bounds to specify spatial subdomain. Default is None, using full domain.
	region : str
		Using regionmask, can specify country to define the subdomain. Default is None.
	
	Returns
	-------
	rhoinf : ndarray or float

driver.Driver.get_model

	Get the LIM model object.

	Parameters
	----------
	eof_trunc : dict
		Dictionary containing eofs as keys and truncations as values
	tau1n : int
		Training lag
	datebounds : tuple
		Strings (m/d) defining start and end of training period. Default is (‘1/1’,’12/31’)
	load_file : str
		Filename for pickle file containing model to load.
		Default is None, in which case a new model is trained.
	save_file : str
		Filename of pickle file to save new model to.
		Default is None, in which case the model is not saved.
	save_to_netcdf : str
		Filename to save netcdf containing model attributes. 
		Default is None, which does not write a file.

	Returns
	-------
	model : model object
		Object of trained model

driver.Driver.prep_realtime_data

	Compile realtime data, interpolate to same grid as LIM, and convert into PCs.
	If it has already been run before within the defined self object, 
	self.RT_VARS will be the same, and self.RT_PC can be updated for different LIMs with same vars,
	without having to process variables again.

	Parameters
	----------
	limkey
		Name of LIM / EOF truncation dictionary. Default is first one in namelist.
		Saved to self.RTLIMKEY

	Returns
	-------
	self.RT_VARS : dict
		dictionary with keys = variable names + "time", and values = data,
		processed to same time res and space res & domain as training variables.
	self.RT_PC : dict
		dictionary with keys = eof names, and values = truncated PCs, both according to specified limkey

	
	>>> Example use: Can zero out certain PCs in initial conditions by 
		self.RT_PC[eofname] *= 0

driver.Driver.run_forecast

	Run forecasts initialized with self.RT_PC, with EOFs & model specified by self.RTLIMKEY

	Parameters
	----------
	t_init : datetime object
		Date of initialization for the forecast.
		Default is maximum date available in self.RT_VARS['time']
	lead_times : list, tuple, or ndarray
		Lead_times in increment of data to integrate model forward. 
		Default is first 28 leads.
	save_netcdf_path : str
		Path to save netcdf files containing forecast and spread. 
		Default is None, which will not write netcdf files.

	Returns
	-------
	model_F : dict
		Dictionary of model forecast output, with keys corresponding to init time.
	model_E : dict
		Dictionary of model error covariance output, with keys corresponding to init time.

driver.Driver.plot_teleconnection

	Plots teleconnection timeseries analysis, forecast, and spread.

	Parameters
	----------
	list_of_teleconnections : list
		List of names (str) of teleconnections to plot.
	daysback : int
		Number of days prior to t_init to plot analysis.
	save_to_file : str
		Name of file to save figure to. Default is None, which does not save the figure

	Other Parameters
	----------------
	prop : dict
		Customization properties for plotting

driver.Driver.plot_timelon

	Plots meridionally averaged data with longitude on x-axis and time on y-axis

	Parameters
	----------
	varname : str
		Name of variable (consistent with use_var dictionary keys) to plot.
	lat_bounds : tuple or list
		Latitude boundaries to average data between.
	t_init : datetime object
		Time of forecast initialization. Default is maximum time in self.model_F.
	daysback : int
		Number of days prior to t_init to plot hovmoller
	save_to_file : str
		Name of file to save figure to. Default is None, which does not save the figure

driver.Driver.verif

	Plots verifying anomalies, categories, hit/miss maps, and outputs HSS and RPSS skill scores.

	Parameters
	----------
	varname : str
		Must be a variable name in the list of keys used in self.RT_VARS.
	t_init : datetime object
		Time of forecast initialization. 
		Default is maximum time in self.RT_VARS['time'] minus maximum lead time.
	lead_times : ndarray / list / tuple
		Lead times. Forecasts and obs WILL be averaged for given leads.
	Fmap : ndarray
		Default is None, in which case forecast will be taken from self.model_F[t_init]
	Emap : ndarray
		Default is None, in which case spread will be taken from self.model_F[t_init]
	prob_thresh : int
		Probability threshold (50 to 100) above which verification statistics will be calculated and mapped.
		Default is 50 (no masking).
	regMask : str
	Name of region / country to use for masking maps and skill score calculation. Default is None.
	Options listed in regionmask.defined_regions.natural_earth.countries_110

	Other Parameters
	----------------
	prop : dict
		Customization properties for plotting