API Reference
This section documents the public classes and functions, generated from the in-code docstrings. See Code Structure for how these modules fit together.
Binding layer
Abstract base class
Core code for coupling any hydrodynamic simulation software with the main script for GPE surrogate model construction and Bayesian Active Learning .
Author: Andres Heredia M.Sc.
- class hydroBayesCal.hysim.HydroSimulations(control_file='control.cas', model_dir='', res_dir='', calibration_pts_file_path='', n_cpus=0, init_runs=1, calibration_parameters=None, param_values=None, calibration_quantities=None, extraction_quantities=None, dict_output_name='extraction-data', user_param_values=False, max_runs=1, complete_bal_mode=False, only_bal_mode=False, check_inputs=False, delete_complex_outputs=True, validation=False, multitask_selection='variables', *args, **kwargs)[source]
Bases:
ABC- __init__(control_file='control.cas', model_dir='', res_dir='', calibration_pts_file_path='', n_cpus=0, init_runs=1, calibration_parameters=None, param_values=None, calibration_quantities=None, extraction_quantities=None, dict_output_name='extraction-data', user_param_values=False, max_runs=1, complete_bal_mode=False, only_bal_mode=False, check_inputs=False, delete_complex_outputs=True, validation=False, multitask_selection='variables', *args, **kwargs)[source]
Constructor of the HydroSimulations class to manage and run hydrodynamic simulations within the context of Bayesian Calibration using a Gaussian Process Emulator (GPE). The class is designed to handle simulation setup, execution, and result storage while managing calibration parameters and Bayesian Active Learning (BAL) iterations.
- Parameters:
control_file (str) – Name of the file that controls the full complexity model simulation (default is “control.cas” as an example for Telemac).
model_dir (str) – Full complexity model directory where all simulation files (mesh, control file, boundary conditions) are located.
res_dir (str) – Directory where a subfolder called “auto-saved-results-HydroBayesCal” will be created to store all the result files. In this directory, the results of the calibration process will be stored according to the calibration quantity name. Addiionally, subfolders for plots, surrogate models, and restart data will be created.
calibration_pts_file_path (str or optional) – File path to the calibration points data file. Please check documentation for further details of the file format.
n_cpus (int) – Number of CPUs to be used for parallel processing (if available).
init_runs (int) – Initial runs of the full complexity model (before Bayesian Active Learning).
calibration_parameters (list of str) – Names of the considered calibration parameters (e.g. roughness coefficients, empirical constants, turbulent viscosity, etc).
param_values (list) – Value ranges considered for parameter sampling. Example: [[min1, max1], [min2, max2], …].
calibration_quantities (list of str) – Names of the calibration targets (model outputs) used for calibration. These quantities usually correspond to the measured values for calibration purposes. Example: [‘WATER DEPTH’] for a single quantity. Example: [‘WATER DEPTH’, ‘SCALAR VELOCITY’] for multiple quantities.
extraction_quantities (list of str) – Names of the quantities to be extracted from the model output files. Generally, the same or more than the calibration_quantities. These quantities will be extracted from the model. Example:
calibration_quantities = ['WATER DEPTH'](WATER DEPTH as calibration parameter). Example:extraction_quantities = ['WATER DEPTH', 'SCALAR VELOCITY', 'TURBULENT ENERG', 'VELOCITY U', 'VELOCITY V']. Any of these additional quantities can be used for calibration purposes when restarting the calibration process withonly_bal_mode = True.dict_output_name (str) – Base name for output dictionary files where the outputs are saved as .json files. This dictionary will be saved in the calibration-data subfolder for the considered calibration target.
parameter_sampling_method (str) –
Method used for sampling parameter values during the calibration process. The available options are: - “random” : Random sampling. - “latin_hypercube” : Latin Hypercube Sampling (LHS). - “sobol” : Sobol sequence sampling. - “halton” : Halton sequence sampling. - “hammersley” : Hammersley sequence sampling. - “chebyshev(FT)” : Chebyshev nodes (Fourier Transform-based). - “grid(FT)” : Grid-based sampling (Fourier Transform-based). - “user” : User-defined sampling.
Example:
parameter_sampling_method = "sobol" # Uses Sobol sequence sampling.
If “user” is selected, a
.csvfile containing user-defined collocation points must be provided in the restart data folder. The file should follow this format:param1 param2 param3 param4 param5 0.148 0.770 0.014 0.014 0.700 0.066 0.066 0.066 0.066 0.066
max_runs (int) – Maximum (total) number of model simulations, including initial runs and Bayesian Active Learning iterations.
complete_bal_mode (bool, optional (Default: True)) –
If True: Bayesian Active Learning (BAL) is performed after the initial runs, enabling a complete surrogate‐assisted calibration process. This option MUST be selected if you choose to perform only BAL (i.e., when only_bal_mode = True).
If False: Only the initial runs of the full complexity model are executed, and the model outputs are stored as .json files.
only_bal_mode (bool, optional (Default: False)) –
If False: The process will either execute a complete surrogate‐assisted calibration or only the initial runs, depending on the value of complete_bal_mode.
If True: Only the surrogate model construction and Bayesian Active Learning of preexisting model outputs at predefined collocation points are performed. This mode can be executed only if either a complete process has already been performed (complete_bal_mode = True and only_bal_mode = True) or if only the initial runs have been executed (complete_bal_mode = False and only_bal_mode = False).
tasks:: (Shortcut combinations and their corresponding) – complete_bal_mode | only_bal_mode | task ——————+———————————+—————————————————– True | False | Complete surrogate-assisted calibration False | False | Only initial runs (no surrogate model) True | True, with init_runs = max_runs | Surrogate construction with predefined runs (no BAL) True | True, with init_runs > max_runs | Surrogate construction + Bayesian Active Learning
validation (bool, optional (Default: False)) – If True, creates output files (inputs and outputs) corresponding to validation process.
*args (tuple, optional) – Additional positional arguments.
**kwargs (dict, optional) – Additional keyword arguments.
- param_values
Parameter values used in calibration.
- Type:
array
- extraction_quantities
Quantities extracted from the model during calibration (calibration quantities must be included here).
- Type:
- parameter_sampling_method
Method used for sampling parameters during calibration. Options: - “random” - “latin_hypercube” - “sobol” - “halton” - “hammersley” - “chebyshev(FT)” - “grid(FT)” - “user” (requires a CSV file with user-defined collocation points in the restart data folder).
- Type:
- max_runs
Maximum number of calibration runs, including Bayesian Active Learning iterations.
- Type:
- complete_bal_mode
If True, enables complete surrogate-assisted calibration with Bayesian Active Learning. Must be selected if only_bal_mode = True.
- Type:
- only_bal_mode
If True, only surrogate model construction and Bayesian Active Learning are performed. Requires prior execution of either the full calibration process (complete_bal_mode = True) or initial runs (complete_bal_mode = False).
- Type:
- dict_output_name
Name of the output dictionary file. Appends “-validation” if validation mode is enabled.
- Type:
- observations
Observed values at each calibration point.
- Type:
array
- measurement_errors
Measurement errors associated with each calibration point.
- Type:
array
- calibration_pts_df
Contains calibration point information. Header format:
Point | X | Y | <quantity>_DATA | <quantity>_ERROR | ...
- Type:
pandas.DataFrame
- user_collocation_points
User-defined collocation points loaded from a CSV file (only applicable when parameter_sampling_method=”user”).
- Type:
array
- model_evaluations
2D array of processed model outputs, shape
[num_runs, nloc * num_calibration_quantities]:num_runsis the total number of evaluations (initial runs plus BAL iterations) and the columns interleave the calibration quantities per location. For example, with two quantities and two locations, columns 1-2 hold the two quantities at the first location and columns 3-4 the second.- Type:
- extract_data_point(input_file, calibration_pts_df, output_name, extraction_quantity, simulation_number, model_directory, results_folder_directory, *args, **kwargs)[source]
Extract data from a specified coordinate in a hydrodynamic model output file.
This generic method is designed for use with various hydrodynamic models (e.g., Telemac, OpenFOAM, etc.). It extracts data from an input file based on a provided CSV file containing the coordinates of the target points.
- Parameters:
input_file (str) – Path to the hydrodynamic model output file from which data will be extracted.
calibration_pts_df (pd.DataFrame) –
Contains the coordinates of the points where data extraction is required. It must include: - Point descriptions (e.g., “P1”). - X and Y coordinates of the measurement points. - Measured values and errors for the calibration quantities.
Expected columns: - For a single calibration quantity: [‘Point Name’, ‘X’, ‘Y’, ‘Measured Value’, ‘Measured Error’] - For two calibration quantities: [‘Point Name’, ‘X’, ‘Y’, ‘Measured Value 1’, ‘Measured Error 1’, ‘Measured Value 2’, ‘Measured Error 2’]
output_name (str) – Base name for the output file where extracted data will be stored.
extraction_quantity (list of str) – List of variables or quantities to be extracted. Example: extraction_quantities=[“WATER DEPTH”, “SCALAR VELOCITY”, “TURBULENT ENERG”]
simulation_number (int) – The current simulation number, used to manage and organize data extraction (e.g. simulation number).
model_directory (str) – Path to the directory containing the model output files.
results_folder_directory (str) – Path to the directory where the extracted data will be saved.
*args – Additional positional arguments defining specific extraction criteria, such as data indices or custom processing parameters.
**kwargs – Additional keyword arguments for flexible data extraction criteria, such as: - time: Specific time step for extraction. - location: Specific coordinate or region of interest. - variable_name: Name of the variable to extract. - Any other model-specific parameters required for data extraction.
- Returns:
The extracted data is saved to output files in the specified results directory.
- Return type:
None
- abstractmethod run_multiple_simulations(collocation_points=None, bal_new_set_parameters=None, bal_iteration=0, complete_bal_mode=True, validation=False, *args, **kwargs)[source]
Run the full-complexity model for a set of collocation points (BAL).
Executes multiple hydrodynamic simulations in the context of Bayesian Active Learning (BAL). A new set of calibration parameters may be added as an array during BAL iterations.
If
complete_bal_mode=True, the process includes initial runs, surrogate-model construction and BAL iterations.If
complete_bal_mode=False, only the initial model runs are performed.If
validation=True, a separate set of runs is executed for validation (e.g. assessing surrogate-model performance).
The number of processors is defined by
self.nprocat initialisation.- Parameters:
collocation_points (numpy.ndarray, optional) – Array of shape
[init_runs, n_parameters]with the initial collocation points for the iterative runs.Noneduring the BAL phase.bal_new_set_parameters (numpy.ndarray, optional) – Array of shape
[1, n_parameters]with the new parameter set for a BAL iteration.Noneduring the initial runs.bal_iteration (int, optional) – BAL iteration number (default
0).complete_bal_mode (bool, optional) –
True(default) to run the full process (initial runs, surrogate construction and BAL);Falsefor initial runs only.validation (bool, optional) –
Trueto run a separate set of validation simulations.*args – Binding-specific options (e.g. Telemac’s
output_extraction,output_extraction_timeandn). See the concrete subclass.**kwargs – Binding-specific options (e.g. Telemac’s
output_extraction,output_extraction_timeandn). See the concrete subclass.
- Returns:
2D array of processed model outputs, shape
[num_runs, nloc * num_calibration_quantities], wherenum_runsis the total number of evaluations (initial runs plus BAL iterations) and the columns interleave the calibration quantities per location. For example, with two quantities and two locations, columns 1-2 hold the two quantities at the first location and columns 3-4 the second.- Return type:
- run_single_simulation(control_file='control_file.hydro')[source]
Executes a single model run using a specified script or launcher file.
This method is intended to handle the execution of a single simulation for various models (e.g., Telemac, OpenFOAM, Basement) by calling the appropriate launcher script.
- Parameters:
control_file (str) – The name of the control file used to launch the simulation. Defaults to “control_file.hydro” as an example. This file should be present in the appropriate directory and executable through a terminal.
- Returns:
The method executes the model run using a launcher command.
- Return type:
None
- set_observations_and_variances(calibration_pts_file_path, calibration_quantities, extraction_quantities, gpe_error=0.1, measurement_error=0.1)[source]
Reads calibration point data and constructs observation variances.
Total variance is computed as:
variance = measurement_error**2 + gpe_error**2 + site_specific_error**2
where:
measurement_erroris assigned as a percentage of the measured value.gpe_erroris assigned as a percentage of the measured value.site_specific_erroris read from<quantity>_ERRORcolumns and should already be in the physical units of the corresponding calibration quantity.
- static read_data(results_folder, file_name)[source]
Reads and extracts data from various file types based on the provided file name.
The function supports file types such as .csv, .json, .txt, .pkl, and .pickle.
- Parameters:
- Returns:
data – The extracted data, which can be a DataFrame, dictionary, list, or other object depending on the file type. Returns None if the file type is unsupported or an error occurs while reading the file.
- Return type:
- set_calibration_parameters(params, values)[source]
Create a dictionary from calibration parameters and their value ranges if both params and values exist. If only one of them exists, compute the number of dimensions.
- Parameters:
params – List of parameter names.
values – List of value ranges corresponding to the parameter names.
- Returns:
Dictionary with parameter names as keys and value ranges as values, and the number of dimensions.
- Raises:
ValueError – If the number of parameters does not match the number of values when both are provided.
- update_model_controls(collocation_point_values, calibration_parameters, auxiliary_file_path, simulation_id=0)[source]
Updates the model control files for Bayesian calibration.Incorporates new parameter values, ensuring that the model runs with the specified settings during Bayesian calibration.
- Parameters:
collocation_point_values (array) – Contains values for the calibration parameters. These values are used to update the model control files.
calibration_parameters (list of str) – Calibration parameter names that are to be updated in the model control files. Each string in the list should correspond to a parameter used in the model.
auxiliary_file_path (str) – Path to an auxiliary file that may be required for running the model controls (i.e., .tbl file in Telemac).
simulation_id (int) – An optional identifier for the simulation. The default is 0. This ID can be used to distinguish different simulations or runs.
- Returns:
This method does not return any value. It modifies the model control files.
- Return type:
None
- abstractmethod output_processing(output_data='', delete_complex_outputs=False, validation=False, *args, **kwargs)[source]
Extract data from a file(.txt,json,etc) containing model outputs to 2D array ready to use in Bayesian calibration and saves the results to a CSV file.
- Parameters:
output_data_path (str) – Path to the file (.json) containing the model outputs. The file should be structured such that its keys correspond to calibration points, and its values are lists of nested dictionaries having the output values for each run and quantity/ies.
delete_complex_outputs (Boolean, Default: False) – Delete complex model output files from the results folder (e.g. auto-saved-results-HydroBayesCal/<variable>). Recommended when running several simulations of the full complexity model.
validation (Boolean, Default: False) – If True, new files for collocation points and model results are created. This is done to keep the collocation points and model results obtained during the calibration process.
- Returns:
model_results – A 2D array containing the processed model outputs. The shape of the array is [No. of total runs, No. of calibration points x No. of quantities], where ‘No. of quantities’ is the number of calibration quantities being processed, and ‘No. of total runs’ is the sum of initial runs and Bayesian active learning iterations. The array is also saved to a CSV file in the specified directory.
- Return type:
TELEMAC binding
Functional core for controlling Telemac simulations for coupling with the Surrogate-Assisted Bayesian inversion technique. Authors: Andres Heredia, Sebastian Schwindt
- class hydroBayesCal.telemac.control_telemac.TelemacModel(friction_file='', tm_xd='Telemac2d', gaia_steering_file=None, fortran_file=None, results_filename_base='', gaia_results_filename_base=None, stdout=6, python_shebang='#!/usr/bin/env python3', *args, **kwargs)[source]
Bases:
HydroSimulations- __init__(friction_file='', tm_xd='Telemac2d', gaia_steering_file=None, fortran_file=None, results_filename_base='', gaia_results_filename_base=None, stdout=6, python_shebang='#!/usr/bin/env python3', *args, **kwargs)[source]
Constructor for the TelemacModel Class. The class contains all necessary methods for Telemac simulations,extractions of simulation outputs and iterative updating of the control files.
- Parameters:
friction_file (str, optional) – Name of the friction file to be used in Telemac simulations (should end with “.tbl”); do not include the directory path.
tm_xd (str,) – Specifies the dimension of the Telemac hydrodynamic solver, either ‘Telemac2d’ or ‘Telemac3d’.
gaia_steering_file (str, optional) – Name of the Gaia steering file; should be provided if required. Not implemented on this HydroBayesCal version.
results_filename_base (str, optional) – Base name for the results file, which will be iteratively updated in the .cas file.
python_shebang (str, optional) – Shebang line for Python scripts (default is “#!/usr/bin/env python3”).
*args (tuple) – Additional positional arguments.
**kwargs (dict) – Additional keyword arguments.
- gaia_steering_file
Gaia steering file name if provided; otherwise, None. Not implemented on this HydroBayesCal Version.
- Type:
str or None
- comm
MPI communicator for parallel processing.
- Type:
MPI.Comm
- tm_xd_dict
Dictionary mapping ‘Telemac2d’ and ‘Telemac3d’ to their respective script names.
- Type:
Note
The attributes specific to Telemac are listed above. For attributes inherited from the HydroSimulations class, please refer to its documentation.
- update_model_controls(collocation_point_values, calibration_parameters, auxiliary_file_path=None, gaia_file_path=None, simulation_id=0)[source]
Modifies the .cas steering file for each of the Telemac runs according to the values of the collocation points and the calibration parameters. If a “FRICTION DATA FILE” is provided for Telemac simulations, it is possible to consider any zone as a calibration parameter. The parameters must start with the prefix “zone” and the number of the friction zone. The .tbl file will be modified for this purpose. This method is called every time it is required that the .cas or .tbl files are modified. It also modifies the gaia cas file. If the parameter starts with the prefix “gaia”, the method will look for the parameter in the gaia cas file and update it with the new value. If the parameter starts with “f.”, the method will look for it in the fortran file and update it with the new value. The rest of the parameters will be updated in the telemac cas file.
- Parameters:
collocation_point_values (list) – Values for each of the calibration parameters.
calibration_parameters (list) – Names of the calibration parameters.
auxiliary_file_path (str, optional) – Path to the friction file (.tbl).
gaia_file_path (str, optional) – Path to the GAIA steering file (.cas). If provided, GAIA calibration parameters will also be updated.
simulation_id (int, optional) – Identifier of the current simulation. Used when generating or updating control files for multiple simulations. Default is 0.
- Returns:
Modified control files (telemac.cas, gaia.cas, fortran file, and/or friction .tbl) for Telemac simulations.
- Return type:
None
- static create_cas_string(param_name, value)[source]
Create string names with new values to be used in Telemac2d / Gaia steering files
- rewrite_steering_file(param_name, updated_string, steering_module='telemac')[source]
Rewrite the
.cassteering file with updated parameters.
- run_single_simulation(control_file='tel.cas')[source]
Runs a Telemac2D or Telemac3D simulation with one or more processors. The number of processors to use is defined by self.nproc.
- Parameters:
control_file (str) – The name of the control file used to launch the simulation. Default is “tel.cas”. This file should be located in the model directory.
- Returns:
The method executes the model run using a launcher command.
- Return type:
None
- run_multiple_simulations(collocation_points=None, bal_new_set_parameters=None, bal_iteration=0, complete_bal_mode=True, output_extraction='interpolated', output_extraction_time='last', n=40, validation=False, kill_process=True)[source]
Runs multiple Telemac2d or Telemac3d simulations with a set of collocation points and a new set of calibration parameters when BAL mode is chosen. The number of processors to use is defined by self.nproc in user_inputs.
- Parameters:
collocation_points (array) – Numpy array of shape [No. init_runs x No. calibration parameters] which contains the initial collocation points (parameter combinations) for iterative Telemac runs. Default is None, and it is filled with values for the initial surrogate model phase. It remains None during the BAL phase.
bal_new_set_parameters (array) – 2D array of shape [1 x No. parameters] containing the new set of values after each BAL iteration.
bal_iteration (int) – The number of the BAL iteration. Default is 0.
complete_bal_mode (bool) – Default is True when the code accounts for initial runs, surrogate construction and BAL phase. False when only initial runs are required.
validation (bool) – If True, the method runs a separate set of simulations for validation purposes.
output_extraction (str) – The mode for extracting model outputs. Options are “nearest”, “index” or “interpolated”.
output_extraction_time (str) – The time mode for extracting model outputs. Options are “last”, “index”, or “mean_last”.
n (int) – The number of last time steps to consider when output_extraction_time is set to “mean_last”. Default is 40.
validation – If True, the method runs a separate set of simulations for validation purposes, and saves the collocation points used for validation in a separate CSV file.
kill_process (bool) – If True, the method will attempt to kill any remaining Telemac processes after running the simulations. This is useful when preventing to running BAL after the initial runs.
- Returns:
model_evaluations – 2D array containing processed model outputs. Shape: [num_runs, nloc * num_calibration_quantities], where: - num_runs is the total number of model evaluations, including both initial runs and Bayesian Active Learning iterations. - nloc * num_calibration_quantities represents the total number of outputs, with results interleaved in columns.
Example: For two calibration quantities and two calibration locations: - Columns 1 and 2 correspond to the outputs (2 quantities) of the first calibration location. - Columns 3 and 4 correspond to the outputs of the second location, and so on.
- Return type:
array
- output_processing(output_data_path='', calibration_quantities='', delete_slf_files=False, validation=False, save_extraction_outputs=False, filter_outputs=False, run_range_filtering=None, extraction_mode=False, calibration_mode=False)[source]
Processes model output data from a JSON file into a 2D array format for Bayesian calibration and saves the results to a CSV file.
This method reads a JSON file specified by output_data_path, extracts and processes the model outputs, and saves them in a CSV file format suitable for Bayesian calibration.
- Parameters:
output_data_path (str) – Path to the file (.json) containing the model outputs. The file should be structured such that its keys correspond to calibration points, and its values are lists of nested dictionaries having the output values for each run and quantity/ies.
delete_complex_outputs (Boolean, Default: False) – Delete complex model output files from the results folder (e.g. auto-saved-results-HydroBayesCal/<variable>). Recommended when running several simulations of the full complexity model.
validation (Boolean, Default: False) – If True, new files for collocation points and model results are created. This is done to keep the collocation points and model results obtained during the calibration process.
- Returns:
model_results – A 2D array containing the processed model outputs. The shape of the array is [number of total runs, number of calibration points x number of quantities], where ‘number of quantities’ represents the calibration quantities processed, and ‘number of total runs’ is the sum of initial runs and Bayesian active learning iterations. The columns are intercalated to store the quantities outputs. This array is also saved to a CSV file in the specified directory.
- Return type:
- extract_data_point(input_file, calibration_pts_df, output_name, extraction_quantity, simulation_number, model_directory, results_folder_directory, validation=False, user_param_values=False, output_extraction='interpolated', k=3, output_extraction_time='last', time_index=0, n=5, compute_wall_law_diagnostics=False)[source]
Extract model results at specified calibration or validation points from TELEMAC and/or GAIA SELAFIN result files.
The method supports extraction of scalar variables, vertical layer selection based on measurement height, inverse-distance interpolation, and optional wall-law diagnostics. Extracted values are written to JSON files and result files are moved to the designated results directory.
- Parameters:
input_file (str) – Name of the TELEMAC result file (.slf) to extract data from.
calibration_pts_df (pandas.DataFrame) –
DataFrame containing extraction locations. The first column must contain point identifiers. The following columns are expected to be:
column 1: x-coordinate
column 2: y-coordinate
column 3: vertical measurement offset (z)
output_name (str) – Base name used for generated JSON output files.
extraction_quantity (list of str) – Quantities to extract from the model results. Variables may originate from TELEMAC or GAIA according to the configuration mapping
classification_tm_gaia_dict.simulation_number (int) – Current simulation number within the calibration workflow.
model_directory (str) – Directory containing TELEMAC and GAIA result files.
results_folder_directory (str) – Directory where extracted results and moved result files are stored.
validation (bool, optional) – If True, extracted values are treated as validation results and are written to validation-specific JSON files. Default is False.
user_param_values (bool, optional) – Flag controlling restart-data generation. Default is False.
output_extraction ({"nearest", "interpolated"}, optional) –
Spatial extraction method.
"nearest": use the closest model node."interpolated": perform inverse-distance-weighted interpolation using the k nearest nodes.
Default is
"interpolated".k (int, optional) – Number of nearest nodes used for interpolation when
output_extraction="interpolated". Ignored when using nearest-node extraction. Default is 3.output_extraction_time ({"last", "index", "mean_last"}, optional) –
Temporal aggregation mode applied to the extracted time series.
"last": use the final time step."index": use the time step specified bytime_index."mean_last": average the lastntime steps.
Default is
"last".time_index (int, optional) – Time-step index used when
output_extraction_time="index". Default is 0.n (int, optional) – Number of final time steps used when
output_extraction_time="mean_last". Default is 5.compute_wall_law_diagnostics (bool, optional) – If True, compute wall-law diagnostic quantities from TELEMAC 3D results and the generated 2D result file. Diagnostics include friction velocity, y-plus values, bottom friction parameters, near-bed velocity information, and the complete modeled vertical velocity profile. Default is False.
- Returns:
Results are written to JSON files and model result files are moved to the results directory.
- Return type:
None
Notes
If
"3D VELOCITY MAGNITUDE"is requested, it is computed fromVELOCITY U,VELOCITY V, andVELOCITY W.For 3D simulations, the vertical layer closest to the measurement elevation is automatically selected using
ELEVATION Z.Wall-law diagnostics require at least two vertical planes (
NPLAN >= 2).
- static tbl_creator(zone_identifier, val, friction_file_path, veg_param_number=None, veg_indicator=False)[source]
Modifies the FRICTION DATA FILE (.tbl) for Telemac simulations based on the specified zone, value, and optional vegetation parameters. This method updates the friction values in the table for different zones as part of the calibration process and also the friction parameters for a previous selected vegetation friction rule.
- Parameters:
zone_identifier (str) – Identifier for the friction zone to be updated in the friction table.
val (str) – The new friction value to be set for the specified zone.
friction_file_path (str) – The file path to the existing friction file (.tbl) that will be modified.
veg_param_number (str, optional) – The vegetation parameter number associated with the zone, if applicable. Default is None, indicating no vegetation parameter is to be updated.
veg_indicator (bool, optional) – Indicator whether vegetation parameters should be modified in the friction file. Default is False, which means only friction values are updated.
- Returns:
The function updates the friction file in place and does not return any value.
- Return type:
None
OpenFOAM binding
- class hydroBayesCal.openfoam.control_openfoam.OpenFOAMController(case_dir)[source]
Bases:
object- Parameters:
case_dir (str)
- extract_fields_from_vtk(alpha_threshold=0.5, n_avg_timesteps=1)[source]
Extract velocity (U) and turbulent kinetic energy (k) fields from VTK output, averaged over the last n_avg_timesteps timesteps, filtered to water phase only.
k is read directly from the OpenFOAM k field (k-epsilon RANS turbulent kinetic energy).
If n_avg_timesteps=1 (default), only the last timestep is used (original behaviour). If n_avg_timesteps=N, the last N VTK files are averaged, giving a time-averaged result over N * writeInterval seconds.
- class hydroBayesCal.openfoam.control_openfoam.OpenFOAMModel(case_template_dir, solver_name='interFoam', n_processors=8, results_filename_base='results_interfoam', alpha_water_name='alpha.water', water_surface_alpha=0.5, reference_z=0.0, control_file='system/controlDict', model_dir='', res_dir='', calibration_pts_file_path='', n_cpus=8, init_runs=5, calibration_parameters=None, param_values=None, extraction_quantities=None, calibration_quantities=None, dict_output_name='extraction-data', user_param_values=False, max_runs=50, complete_bal_mode=True, only_bal_mode=False, delete_complex_outputs=False, validation=False, multitask_selection='variables', n_avg_timesteps=1, *args, **kwargs)[source]
Bases:
HydroSimulationsBAL-compatible wrapper around OpenFOAMController.
Provides the interface expected by bal_openfoam.py while using your existing OpenFOAMController for the actual OpenFOAM operations.
- run_multiple_simulations(collocation_points, complete_bal_mode=True, validation=False, bal_iteration=None, bal_new_set_parameters=None)[source]
Run multiple simulations - BAL interface.
- save_calibration_data(it, collocation_points, bayesian_dict)[source]
Write per-iteration CSV files to
calibration-data/<quantities>/.Called once per BAL iteration from
bal_openfoam.pyafterestimate_bme(). Produces three files per iteration:collocation_points_N{n_tp}.csv parameter values tested so far model_results_N{n_tp}.csv simulation outputs (model_evaluations) bayesian_scores.csv BME, RE, IE, ELPD for all iterations
bayesian_scores.csvis appended on each call (one row per iteration). The posterior is saved as a separate.npyfile because it is a variable-length array (rejection sampling keeps only accepted samples).
Delft3D-FLOW binding (planned)
Delft3D-FLOW binding for HydroBayesCal – planned, not yet implemented.
This module is a placeholder that mirrors the TELEMAC
(hydroBayesCal.telemac.control_telemac) and OpenFOAM
(hydroBayesCal.openfoam.control_openfoam) bindings. It defines the
intended public interface for coupling HydroBayesCal to the structured-grid
Delft3D-FLOW engine (Deltares) so that the coupling can be implemented
incrementally without changing the surrogate / Bayesian-active-learning layer.
The Delft3DModel class subclasses
hydroBayesCal.hysim.HydroSimulations; the Python attribute names are
shared across solvers, while the string and file conventions below are
Delft3D-specific and must be preserved when the binding is filled in:
<case>.mdf– master definition FLOW file (the control file); the engine is launched throughconfig_d_hydro.xmland thed_hydroexecutable.Bed roughness via Chézy / Manning / White-Colebrook (
.rghfile orRoughnesskeywords in the.mdf); eddy viscosity/diffusivityVicouv/Dicouv.trim-<case>.dat/trim-<case>.def– NEFIS map (field) output.trih-<case>.dat/trih-<case>.def– NEFIS history (monitoring-point) output.
See the usage-delft3d page for the planned workflow.
- hydroBayesCal.delft3d.control_delft3d.DELFT3D_BINDING_IMPLEMENTED = False
Marker so callers / tests can detect that the binding is not ready yet.
- class hydroBayesCal.delft3d.control_delft3d.Delft3DModel(control_file='control.mdf', d_hydro_config='config_d_hydro.xml', flow_executable='d_hydro', roughness_formulation='Manning', map_file_base='trim', history_file_base='trih', *args, **kwargs)[source]
Bases:
HydroSimulationsPlaceholder Delft3D-FLOW model wrapper (planned).
Defines the intended constructor signature and interface but raises
NotImplementedError. Instantiating it documents the Delft3D-specific configuration the binding will need; it does not run a simulation.- Parameters:
control_file (str) – Master definition FLOW file, default
"control.mdf"(Delft3D-FLOW convention<case>.mdf).d_hydro_config (str) – Runtime configuration passed to the
d_hydrolauncher, default"config_d_hydro.xml".flow_executable (str) – Name of the Delft3D-FLOW launcher on
PATH, default"d_hydro".roughness_formulation (str) – Bed-roughness law used for the calibration parameters (
"Chezy","Manning"or"WhiteColebrook").map_file_base (str) – Base names of the NEFIS map (
trim-<case>) and history (trih-<case>) output files.history_file_base (str) – Base names of the NEFIS map (
trim-<case>) and history (trih-<case>) output files.**kwargs – Common
HydroSimulationsparameters (model_dir,res_dir,calibration_pts_file_path,calibration_parameters,param_values,calibration_quantities,init_runs,max_runs…).
- Raises:
NotImplementedError – Always – the binding is not implemented yet.
Surrogate model and Bayesian Active Learning
Gaussian Process Emulators
This module inherits from the PyTorch library for training a Gaussian Process Emulator (GPE). The module supersede the ExactGP base class from GPyTorch and extend the functionality by customizing the mean function,likelihoods and kernel (covariance function) The MultitaskGPModel class also extends the ExactGP base class to handle multitask (multiple outputs) learning scenarios. It is designed to model multiple related tasks simultaneously especially if they have similarities by sharing information across them using a common GP framework. (https://docs.gpytorch.ai/en/stable/examples/03_Multitask_Exact_GPs/Multitask_GP_Regression.html). Author: Andres Heredia (2024)
- class hydroBayesCal.surrogate.gpe_gpytorch.MyExactGPyModel(*args, **kwargs)[source]
Bases:
ExactGPInstance of GPyTorch’s “ExactGP” library, with custom likelihood, kernel, training points.
The likelihood is kept constant: Gaussian Likelihood (https://docs.gpytorch.ai/en/latest/likelihoods.html)
:param : param train_x: <np.array[n_tp, n_p]> with parameter sets used to train GPR :param : param train_y: <np.array[n_tp, n_obs]> with forward model outputs used to train GPR :param : param kernel: <kernel instance> with kernel used in GPR :param : param likelihood <likelihood instance> to train noise in GPR
- class hydroBayesCal.surrogate.gpe_gpytorch.GPyTraining(collocation_points, model_evaluations, kernel, training_iter, likelihood, y_normalization=True, tp_normalization=False, optimizer='adam', lr=0.5, loss='exact', n_restarts=1, weight_decay=0, gradient_free_start=False, verbose=True, parallelize=False)[source]
Bases:
objectTrain a single-output Gaussian Process Emulator with GPyTorch.
Uses GPyTorch’s exact GP regression to build a GPE for a forward model, from collocation points produced by that model.
- Parameters:
collocation_points (numpy.ndarray) – Training points (parameter sets), shape
[n_tp, n_p].model_evaluations (numpy.ndarray) – Model outputs at each evaluated location, shape
[n_tp, n_obs].kernel (gpytorch.kernels.Kernel) – Kernel used to train the GPE. May use default values or a user-defined anisotropy (
ard_num_dims = n_parameters).likelihood (gpytorch.likelihoods.GaussianLikelihood) – Likelihood used to optimise the GPR. May use default constraints/initial values or be customised by the caller.
training_iter (int) – Number of optimiser iterations used to train the GPE.
optimizer (str, optional) – Optimiser to use:
"adam"(default) or"lbfgs".loss (str, optional) – Loss function:
"exact"or"loo".n_restarts (int, optional) – Number of optimisation restarts.
tp_normalization (bool, optional) –
Trueto normalise training-point parameter values before training (defaultFalse).y_normalization (bool, optional) –
Trueto normalise model outputs before training (predictions are de-normalised afterwards).parallelize (bool, optional) –
Trueto parallelise surrogate training,Falseto train sequentially.
Notes
Todo
Accept the evaluation location as input and add a function that extracts the GPE predictions at the observation point (used in BAL).
Todo
For GPyTorch, check the GPU settings and other
gpytorch.settingsfor prediction.- static convert_to_tensor(array)[source]
Function to transform np.array to a tensor :param array: <np.array> that you want to change to a tensor
Returns: <tensor> data in np.array transformed to tensor format
- normalize_tp(train_y)[source]
Function to normalize training points outputs before training :param train_y: <np.array[tp_size, n_obs]> with model output values to normalize
Returns: <tensor> with normalized input values.
- train_()[source]
Function trains the surrogate model using the GPyTorch library, using the given optimizer. Returns: ToDo: parallelize training
- hydroBayesCal.surrogate.gpe_gpytorch.validation_error(true_y, sim_y, output_names, n_per_type)[source]
Estimate validation criteria for a surrogate model per output location.
Results for each output type are stored under separate keys in a dictionary.
- Parameters:
true_y (numpy.ndarray) – Simulator outputs for the validation samples, shape
[mc_valid, n_obs].sim_y (numpy.ndarray or dict) – Surrogate/emulator outputs for the validation samples, shape
[mc_valid, n_obs]. If a dict, it holdsoutputandstdkeys.output_names (array-like of str) – Name of each output type, shape
[n_types].n_per_type (int) – Number of observations per output type.
- Returns:
Validation criteria for each output location and output type.
- Return type:
Notes
Todo
As in BayesValidRox, optionally estimate the surrogate predictions here by passing a surrogate object.
Todo
Move into the GPR class and return a dictionary keyed by output type.
- hydroBayesCal.surrogate.gpe_gpytorch.save_valid_criteria(new_dict, old_dict, n_tp)[source]
Append the current iteration’s validation criteria to a results dict.
Stores the validation criteria for the current iteration (
n_tp) in an existing dictionary, so the results for all iterations live in one file. Each validation criterion has a key per output type, holding a vector with one value per output location.- Parameters:
- Returns:
The updated dictionary including the current iteration.
- Return type:
- class hydroBayesCal.surrogate.gpe_gpytorch.MultiGPyTraining(collocation_points, model_evaluations, kernel, training_iter, likelihood, optimizer='adam', lr=0.5, n_restarts=1, parallelize=False, number_quantities=2, noise_constraint=gpytorch.constraints.GreaterThan)[source]
Bases:
objectClass to train multiple Gaussian Process models using given collocation points and model evaluations. It uses the MultiGPyTraining class for multitask regression using GPyTorch.
- __init__(collocation_points, model_evaluations, kernel, training_iter, likelihood, optimizer='adam', lr=0.5, n_restarts=1, parallelize=False, number_quantities=2, noise_constraint=gpytorch.constraints.GreaterThan)[source]
- Parameters:
details. (See original class docstring for parameter)
- train_tasks_variables()[source]
Train multitask Gaussian Process models using the provided collocation points and model evaluations.
- train_tasks_locations()[source]
Train multitask Gaussian Process models using the provided collocation points and model evaluations. Trains the model for each output variable (water depth OR velocity) at all locations simultaneously.
- class hydroBayesCal.surrogate.gpe_gpytorch.MultitaskGPModel(*args, **kwargs)[source]
Bases:
ExactGPGaussian Process model for multitask regression using the GPyTorch library. This model handles multiple tasks (or quantities) simultaneously by using a multitask kernel and multitask mean function.
- __init__(train_x, train_y, likelihood, kernel, number_tasks)[source]
- Parameters:
train_x – torch.Tensor The input training data. A tensor of shape (n_samples, n_params) where n_samples is the number of samples and n_params is the number of input model parameters.
train_y – torch.Tensor The output training data. A tensor of shape (n_samples, n_tasks) where n_samples is the number of samples and n_tasks is the number of tasks or quantities. The output is typically organized so that each column corresponds to a different task.
likelihood – gpytorch.likelihoods.MultitaskGaussianLikelihood A multitask likelihood function used with the GP model.
kernel – tuple(gpytorch.kernels.Kernel, gpytorch.kernels.Kernel) A tuple of kernel components to be used in the GP model. The tuple should contain two kernel components.
A new class is generated, which inherits all attributes from the GaussianProcessRegressor class from Scikit learn. This is done to manually set the “max_iter” and “gtol” values for the optimization of hyperparameters in the GPR kernel.
ToDo: Check GPyTorch+lbfgs to see if results can be improved by changing initial values or with Adam ? ToDo: Save each gp (for each loc) in a list, to call it later to do BAL+MCMC methods with them.
- class hydroBayesCal.surrogate.gpe_skl.MyGeneralGPR(collocation_points, model_evaluations)[source]
Bases:
objectClass assigns/creates the attributes which are constant for all GPR-library classes (such as SklTraining and GPyTraining)
- Parameters:
np.array (prior_samples =)
np.array –
model outputs in each location where the fcm was evaluated/in the locations being considered
# xx
np.array
GPE (trained)
- self.n_obs = int, number of locations from the fcm where the GPE is to be trained. It is not necessarily the
same as the number of true observations, since one could train the GPE in given locations (e.g. all grid points), where some locations coincide with the observation points.
- # xx
- self.surrogate_prediction = np.array
each GPE (n_obs) for each parameter set in prior_samples
- Type:
self.n_obs, self.prior_samples.shape[0]
- self.surrogate_std = np.array
each GPE (n_obs) for each parameter set in prior_samples
- Type:
self.n_obs, self.prior_samples.shape[0]
- self.surrogate_up = np.array
for each GPE (n_obs) for each parameter set in prior_samples
- Type:
self.n_obs, self.prior_samples.shape[0]
- self.surrogate_lc = np.array
for each GPE (n_obs) for each parameter set in prior_samples
- Type:
self.n_obs, self.prior_samples.shape[0]
- class hydroBayesCal.surrogate.gpe_skl.MySklGPR(*args, **kwargs)[source]
Bases:
GaussianProcessRegressor
- class hydroBayesCal.surrogate.gpe_skl.SklTraining(collocation_points, model_evaluations, kernel, alpha, n_restarts, noise=True, y_normalization=True, y_log=False, tp_normalization=False, optimizer='fmin_l_bfgs_b', parallelize=False, n_jobs=-2)[source]
Bases:
MyGeneralGPRTrain a single-output Gaussian Process Emulator with scikit-learn.
Uses scikit-learn’s GP regression to build a GPE for a forward model, from collocation points produced by that model. See the scikit-learn GaussianProcessRegressor for the underlying estimator.
- Parameters:
collocation_points (numpy.ndarray) – Training points (parameter sets), shape
[n_tp, n_params].model_evaluations (numpy.ndarray) – Full-complexity model outputs at each evaluated location, shape
[n_tp, n_locations].kernel (object or list of objects) –
sklearn.gaussian_process.kernelsinstance(s) used to train the GPE; converted internally to a list.alpha (float or list of float) – Value added to the diagonal to avoid numerical errors. A scalar is broadcast to a list.
n_restarts (int) – Number of optimiser restarts used to find the kernel hyper-parameters (avoids local minima).
noise (bool, optional) –
True(default) to add a white-noise kernel to the input kernel.y_normalization (bool, optional) –
True(default) to normalise model outputs before training.tp_normalization (bool, optional) –
Trueto normalise training-point parameter values before training (defaultFalse).optimizer (str, optional) – Name of the optimiser to use (default scikit-learn optimiser).
parallelize (bool, optional) –
Trueto parallelise surrogate training,Falseto train sequentially.
Notes
Todo
Accept the evaluation location as input and add a function that extracts the GPE predictions at the observation point (used in BAL).
- predict_(input_sets, get_conf_int=False)[source]
Evaluate the per-location surrogate models on all input sets.
- Parameters:
input_sets (numpy.ndarray) – Parameter sets to evaluate the surrogate models on, shape
[MC, n_params].get_conf_int (bool, optional) –
Trueto also estimate the upper and lower confidence intervals.
- Returns:
Surrogate-model mean (
output) and standard deviation (std) for each location, each of shape[n_obs, MC].- Return type:
- hydroBayesCal.surrogate.gpe_skl.validation_error(true_y, sim_y, output_names, n_per_type)[source]
Estimate validation criteria for a surrogate model per output location.
Results for each output type are stored under separate keys in a dictionary.
- Parameters:
true_y (numpy.ndarray) – Simulator outputs for the validation samples, shape
[mc_valid, n_obs].sim_y (numpy.ndarray or dict) – Surrogate/emulator outputs for the validation samples, shape
[mc_valid, n_obs]. If a dict, it holdsoutputandstdkeys.output_names (array-like of str) – Name of each output type, shape
[n_types].n_per_type (int) – Number of observations per output type.
- Returns:
Validation criteria for each output location and output type.
- Return type:
Notes
Todo
As in BayesValidRox, optionally estimate the surrogate predictions here by passing a surrogate object.
Todo
Move into the GPR class and return a dictionary keyed by output type.
- hydroBayesCal.surrogate.gpe_skl.save_valid_criteria(new_dict, old_dict, n_tp)[source]
Append the current iteration’s validation criteria to a results dict.
Stores the validation criteria for the current iteration (
n_tp) in an existing dictionary, so the results for all iterations live in one file. Each validation criterion has a key per output type, holding a vector with one value per output location.- Parameters:
- Returns:
The updated dictionary including the current iteration.
- Return type:
Bayesian inference and sequential design
TO DO: ADD DESCRIPTION
- class hydroBayesCal.surrogate.bal_functions.BayesianInference(model_predictions, observations, error, prior=None, prior_log_pdf=None, model_error=None, sampling_method='rejection_sampling')[source]
Bases:
objectBayesian inference of model parameters from surrogate predictions.
Computes the likelihood of the observations under the surrogate-model predictions, the Bayesian model evidence (BME) and relative entropy (RE), and draws a posterior parameter sample.
- Parameters:
model_predictions (numpy.ndarray) – Array of shape
[MC_size, n_observations]with the surrogate-model predictions.observations (numpy.ndarray) – Array of shape
[1, n_observations]with the measured observations.error (numpy.ndarray) – Array of shape
[n_observations]with the error/noise variances, inserted as-is on the diagonal of the covariance matrix.prior (numpy.ndarray, optional) – Array of shape
[MC_size, n_parameters]with the prior parameter sets. IfNone, no posterior parameter set is saved.prior_log_pdf (numpy.ndarray, optional) – Array of shape
[MC_size]with the prior log-probabilities of each parameter sample inprior. IfNone(default), the information entropy (IE) is not estimated. May be supplied with or withoutprior.model_error (optional) – Additional model-error term (default
None).sampling_method (str, optional) – Method used to sample from the posterior distribution, one of
"rejection_sampling"(default) or"bayesian_weighting".
- likelihood
Prior likelihood values (shape
[MC_size]) under a multivariate Gaussian distribution.- Type:
- cov_mat
Covariance matrix of shape
[n_observations, n_observations]with the variances on the diagonal.- Type:
- post_likelihood
Likelihood values of the posterior samples.
- Type:
- posterior
Posterior parameter sets.
- Type:
Notes
Posterior sampling options:
Bayesian weighting obtains the posterior likelihoods as a weighted average of the prior-based likelihood values, avoiding small posterior sample sizes. Results are similar to rejection sampling, but the posterior set is not easily available.
Rejection sampling divides all likelihoods by the maximum and accepts sample
iwhenlikelihood(i) / max(likelihood) > U[0, 1]. It yields a posterior distribution directly, but needs a larger Monte Carlo sample when the output dimension is large.
Todo
Add posterior MCMC sampling methods.
- calculate_constants()[source]
Calculates the covariance matrix based on the input variable “error”, which is a vector of variances, one for each observation point.
- Returns:
None
- calculate_likelihood()[source]
Function calculates likelihood between measured data and the model output using the stats module equations.
Notes: * Generates likelihood array with size [MCx1]. * Likelihood function is multivariate normal distribution, considering independent and Gaussian-distributed errors.
- calculate_likelihood_manual()[source]
Function calculates likelihood between observations and the model output manually, using numpy calculations.
Notes: * Generates likelihood array with size [MCxN], where N is the number of measurement data sets. * Likelihood function is multivariate normal distribution, considering independent and Gaussian-distributed errors. * Method is faster than using stats module (‘calculate_likelihood’ function).
- calculate_likelihood_with_error()[source]
Function calculates likelihood between observations and the model output manually, using numpy calculations. It considers model error, with an error associated to each model prediction.
Notes: * Generates likelihood array with size [MCxN], where N is the number of measurement data sets. * Likelihood function is multivariate normal distribution, considering independent and Gaussian-distributed errors. * Method is faster than using stats module (‘calculate_likelihood’ function).
- rejection_sampling()[source]
Run rejection sampling.
Generates
MCuniformly distributed random numbers (RN). If the normalised likelihoodlikelihood / max(likelihood)is smaller than the corresponding RN, the prior sample is rejected; the remaining samples form the posterior.Notes
Generates the posterior likelihood, posterior values and posterior density arrays.
If
max(likelihood) == 0there is no posterior distribution, or the posterior equals the prior.
- estimate_bme()[source]
Function calculates likelihood and BME (prior based) and then, based on the given posterior sampling criteria, obtains a posterior likelihood, ELPD and RE.
- Returns:
Note
If BME = 0, then it means that the model was not able to reproduce the observed data, and so we assume BME = ELPD, and thus RE is also 0, since nothing mas learned.
- class hydroBayesCal.surrogate.bal_functions.SequentialDesign(exp_design, sm_object, obs, n_cand_groups=4, secondary_sm=None, parallel=True, n_jobs=-1, backend='loky', errors=None, do_tradeoff=False, gaussian_assumption=False, mc_samples=1000, mc_exploration=10000, multitask=False)[source]
Bases:
objectClass runs the optimal design of experiments (sequential design) to select the new training points, to add to the existing training points for surrogate model training.
TO DO: DOUBLE-CHECK WITH DOEPY CLASSES AND FUNCTIONS
- Parameters:
exp_design – ExpDesign object Used to sample from the prior distribution, and extract exploit and explore methods.
sm_object – object surrogate model class object, either SklTraining, GPyTraining, must have a ‘self.predict_(input_params)’ function to evaluate surrogate.
n_cand_groups – int in how many lists to split the candidate set, to do MultiProcessing.
multiprocessing – bool True to use multiprocessing (parallelize) tasks. False to set n_cand_groups=1
obs – array [n_obs, ] (ToDo: dict, with a key for each output type ) array with observation values
errors – array [n_obs, ] (ToDo: dict, with a key for each output type) array with measurement error for each observation. Default is None
do_tradeoff – bool True to consider the total score a combination of exploration and exploitation score. False to just use either, depending on the exploitation method.
secondary_sm – object surrogate model class object, either SklTraining, GPyTraining, must have a ‘self.predict_(input_params)’ function to evaluate surrogate. It corresponds to the secondary, or error model, which is added to the sm_object main surrogate.
gaussian_assumption – bool True to assume a Gaussian prior and likelihood, so analytical equations for BAl are used. False to follow the traditional sampling approach.
Attributes:
- bayesian_active_learning(y_mean, y_std, observations, error, utility_function='dkl')[source]
Computes scores based on Bayesian active design criterion (utility_criteria).
It is based on the following paper: Oladyshkin, Sergey, Farid Mohammadi, Ilja Kroeker, and Wolfgang Nowak. “Bayesian3 active learning for the gaussian process emulator using information theory.” Entropy 22, no. 8 (2020): 890.
- Parameters:
y_mean (array [n_samples, n_obs] ToDo: Dictionary, with a key for each output type, each array [mc_size, n_obs]) – Array with surrogate model outputs (mean)
y_std (array [n_samples, n_obs] ToDo: Dictionary, with a key for each output type, each array [mc_size, n_obs]) – Array with output standard deviation
observations (array [n_obs, ] ToDo: Dictionary, with a key for each output type, each array [1, n_obs]) – array with measured observations
error (array [n_obs] ToDO: dict A dictionary containing the measurement errors (sigma^2). One dictionary for each output type) – an array with the observation errors associated to each output
utility_function (string, optional) – BAL design criterion. The default is ‘DKL’.
- Returns:
Score.
- Return type:
- analytical_bal(y_mean, y_std, observations, error, utility_function='dkl')[source]
Function computes the analytical BAL criteria (IE or DKL), when the prior and likelihood are both Gaussian distributions. It first estimates the posterior distribution, and then estimates either the Dkl or IE. For ill-posed priors, we check if the prior and posterior MG distributions overlap in any dimension, if not, then the BAl criteria are not estimated.
The post logBME equation was obtained from Oladyshkin and Nowak (2019) (doi: 10.3390/e21111081), eq.(28)
- Parameters:
y_mean – array [n_samples, n_obs] array with ith surrogate model outputs (mean)
y_std – array [n_samples, n_obs] array with output standard deviation
observations – array [n_obs, ] array with measured observations
error – array [n_obs] array with the observation errors associated to each output
utility_function – string, optional, BAL design criterion. The default is ‘DKL’.
- Returns:
analytical BAL criteria for the given input distribution
- Return type:
- run_al_functions(exploit_method, candidates, index, m_error, utility_func)[source]
Run the utility (active-learning) function for the given method.
- Parameters:
exploit_method (str) – Exploitation method. Currently supported:
"bal"(Bayesian active learning).candidates (numpy.ndarray) – Array
[mc_size, n_params]of candidate parameter sets to explore, each scored by the utility function.index (array-like) – Indices of the candidate samples within the prior pool.
m_error (numpy.ndarray) – Array
[n_obs]with the measurement error of each observation.utility_func (str) – Name of the utility function / active-learning criterion used to score each candidate set.
- Returns:
Array
[n_candidates]with the score assigned to each candidate, in descending order.- Return type:
- static gaussian_overlap(mu1, cov1, mu2, cov2)[source]
Function to determine if 2 multivariate Gaussian distributions overlap in any dimension. If they overlap in any dimension, then the analytical posterior-based criteria can be estimated. As overlap criteria we arbitrarily selected that, if the 2 distributions overlap anywhere within the 99% confidence intervals, then they do overlap.
- Parameters:
mu1 (np.array [n_dim, ]) – array with mean values for distribution 1 (prior)
cov1 (np.array [n_dim, n_dim]) – diagonal matrix with the variances for distribution 1 (prior)
mu2 (np.array [n_dim, ]) – array with mean values for distribution 2 (posterior)
cov2 (np.array [n_dim, n_dim]) – diagonal matrix with the variances for distribution 2 (posterior)
- Returns:
True if they overlap, False if they don’t.
- Return type:
- static multivariate_gaussian_kl_divergence(mu_p, cov_p, mu_q, cov_q)[source]
Function estimates the analytical solution for the Kullback-Leibler divergence when going from the prior (q) to the posterior (p) when oth prior and posteriors are Gaussian distributions.
- Parameters:
mu_p (np.array [n_obs, ]) – array with the mean values for the posterior distribution
cov_p (np.array [n_obs, n_obs]) – diagonal matrix with the variance for the posterior distribution
mu_q (np.array [n_obs, ]) – array with the mean values for the prior distribution
cov_q (np.array [n_obs, n_obs]) – diagonal matrix with the variance for the prior distribution distribution
- Returns:
Kullback-Leibler divergence between prior and posterior
- Return type:
- static posterior_log_likelihood(samples, mean, cov_mat)[source]
Function estimates the log pdf of a Gaussian distribution manually (faster than using stats)
- Parameters:
samples (np.array [mc_exploration, n_obs]) – array with samples to get the pdf from
mean (np.array [1, n_obs]) – mean array of Gaussian distribution
cov_mat (np.array [n_obs, n_obs]) – covariance of the Gaussian distribution
- Returns:
array with pdf value of each sample
- Return type:
np.array [mc_exploration, ]
- select_indexes(prior_samples, collocation_points)[source]
- Parameters:
prior_samples – array [mc_size, n_params] Pre-defined samples from the parameter space, out of which the sample sets should be extracted.
collocation_points – [tp_size, n_params] array with training points which were already used to train the surrogate model, and should therefore not be re-explored.
- Returns: array[self.mc_size,]
With indexes of the new candidate parameter sets, to be read from the prior_samples array
- class hydroBayesCal.surrogate.exploration.Exploration(n_candidate, old_tp, exp_design=None, mc_criterion='mc-intersite-proj-th', w=100)[source]
Bases:
objectGenerates samples from the prior distribution using the
ExpDesignclass attributes and functions. Two strategies are available:Voronoi sampling (to be defined).
mc_samples – Monte Carlo sampling using random, Sobol or Latin-hypercube samples. Previously sampled candidates may also be passed, in which case no new sampling is done and only the scores are estimated.
Each candidate is scored by its distance to the existing training points so that the whole domain is explored. Scores are normalised to
[0, 1]; the highest values are the best.Based on the Surrogate Modeling Toolbox (SUMO) [1] and modelled after code in BayesValidRox [2].
- exp_design
ExpDesign object, needed to sample from the prior distribution
- Type:
obj
- mc_criterion
Selection criterion. The default is ‘mc-intersite-proj-th’. Another option is ‘mc-intersite-proj’.
- Type:
- get_exploration_samples(prior_candidates=None)[source]
This function generates candidates to be selected as new design and their associated exploration scores.
- Returns:
all_candidates (array of shape (n_candidate, n_params)) – A list of samples.
exploration_scores (arrays of shape (n_candidate)) – Exploration scores.
- get_vornoi_samples()[source]
This function generates samples based on voronoi cells and their corresponding scores
- Returns:
new_samples (array of shape (n_candidate, n_params)) – A list of samples.
exploration_scores (arrays of shape (n_candidate)) – Exploration scores.
- get_mc_samples(all_candidates=None)[source]
This function generates random samples based on Global Monte Carlo methods and their corresponding scores, based on [1].
- [1] Crombecq, K., Laermans, E. and Dhaene, T., 2011. Efficient
space-filling and non-collapsing sequential design strategies for simulation-based modeling. European Journal of Operational Research , 214(3), pp.683-696. DOI: https://doi.org/10.1016/j.ejor.2011.05.032
- Implemented methods to compute scores:
mc-intersite-proj
mc-intersite-proj-th
- Parameters:
all_candidates (array, optional) – Samples to compute the scores for. The default is None. In this case, samples will be generated by defined model input marginals.
- Returns:
new_samples (array of shape (n_candidate, n_params)) – A list of samples.
exploration_scores (arrays of shape (n_candidate)) – Exploration scores.
- approximate_voronoi(w, samples)[source]
An approximate (monte carlo) version of Matlab’s voronoi command.
- Parameters:
samples (array) – Old experimental design to be used as center points for voronoi cells.
- Returns:
areas (array) – An approximation of the voronoi cells’ areas.
all_candidates (list of arrays) – A list of samples in each voronoi cell.
Shared utilities
Function pool for usage at different package levels
- hydroBayesCal.function_pool.append_new_line(file_name, text_to_append)[source]
Add new line to steering file
- hydroBayesCal.function_pool.call_process(bash_command, environment=None)[source]
Call a terminal process via subprocess and return its exit status.
The process return code is checked and reported: a non-zero code (e.g. a failed Telemac/OpenFOAM run) is logged together with the captured stderr and returned to the caller, instead of silently reporting success.
- Parameters:
bash_command (str) – terminal command to run
environment – optional environment mapping to run the process in
- Return int:
the process return code (0 on success, non-zero on failure, -1 if the process could not be started)
- hydroBayesCal.function_pool.calculate_settling_velocity(diameters)[source]
Calculate particle settling velocity as a function of diameter, densities of water and sediment, and kinematic viscosity
- Parameters:
diameters (np.array) – floats of sediment diameter in meters
- Return np.array settling_vevlocity:
settling velocities in m/s for every diameter in the diameters list
- hydroBayesCal.function_pool.concatenate_csv_pts(file_directory, *args)[source]
Concatenate a csv-files with lists of XYZ points into one CSV file that is saved to the same directory where the first CSV file name provided lives. The merged CSV file name starts with
merged_and also ends with the name of the first CSV file name provided.- Parameters:
file_directory – os.path of the directory where the CSV files live, and which must NOT end on ‘/’ or ‘'
args – string or list of csv files (only names) containing comma-seperated XYZ coordinates without header
- Return pandas.DataFrame:
merged points
- hydroBayesCal.function_pool.lookahead(iterable)[source]
Pass through all values of an iterable, augmented by the information if there are more values to come after the current one (True), or if it is the last value (False).
Source: Ferdinand Beyer (2015) on https://stackoverflow.com/questions/1630320/what-is-the-pythonic-way-to-detect-the-last-element-in-a-for-loop
- hydroBayesCal.function_pool.str2seq(list_like_string, separator=',', return_type='tuple')[source]
Convert a list-like string into a tuple or list based on a separator such as comma or semi-column
- hydroBayesCal.function_pool.log_actions(func)[source]
TODO: this is the logging wrapper! :param func: :return:
- hydroBayesCal.function_pool.update_collocation_pts_file(file_path, new_collocation_point, mode='update')[source]
Append a new row to a CSV file or create a new file depending on the mode.
- Parameters:
file_path – Path to the CSV file.
new_collocation_point – List of values to be added as a new row.
mode – Mode to determine whether to ‘update’ (append) or ‘generate’ (overwrite) the file.
- hydroBayesCal.function_pool.save_data(file_path, data)[source]
Save NumPy array data to a file based on the file extension in the file path.
- Parameters:
file_path – Path to the file where data should be saved.
data – NumPy array data to be saved.
- hydroBayesCal.function_pool.rearrange_array(data, num_quantities)[source]
Rearrange a NumPy array such that data from multiple quantities is interleaved by columns.
- Parameters:
data – A NumPy array of shape (num_quantities * n, m) where n is the number of data points per quantity.
num_quantities – An integer indicating the number of quantities (e.g., velocity, water depth, etc.).
- Returns:
A NumPy array with interleaved columns for all quantities.
- hydroBayesCal.function_pool.update_json_file(json_path, modeled_values_dict=None, detailed_dict=False, save_dict=False, saving_path=None)[source]
Updates the JSON file at json_path with data from modeled_values_dict.
If the file exists, it appends new values to the existing data. If the file does not exist, it creates a new file with the initial data.
- Parameters:
json_path (str) – The path to the JSON file to be updated or created.
modeled_values_dict (dict) – A dictionary with data to be added or updated in the JSON file.
detailed_dict (bool, optional) – Whether to handle the data as nested lists for detailed structures.
save_dict (bool, optional) – If True, saves the entire output_data to the saving_path.
saving_path (str, optional) – The path to save the final JSON file when save_dict is True. If not provided, defaults to json_path.
- hydroBayesCal.function_pool.delete_slf(folder_path)[source]
Deletes all files with the .slf extension in the specified folder.
- Parameters:
folder_path (str) – The path to the folder where the .slf files will be deleted.
- Return type:
None
- hydroBayesCal.function_pool.filter_model_outputs(data_dict, quantities, run_range_filtering=None)[source]
Filters the data from the model outputs dictionary based on desired quantities and optionally limits the runs included to a specific range.
- Parameters:
data_dict (dict) – Dictionary containing model outputs with points as keys and lists of run outputs as values.
quantities (list of str) – List of quantities to extract from the model outputs.
run_range (tuple of int, optional) – Range of runs to include (start, end). If None, includes all runs. The range is inclusive of the start index and exclusive of the end index.
- Returns:
Filtered dictionary containing only the selected quantities and runs within the specified range.
- Return type:
- hydroBayesCal.function_pool.interpolate_values(coords, values, point)[source]
Interpolates values at a given point using Inverse Distance Weighting.
- Parameters:
coords (np.ndarray) – Coordinates of the triangle’s vertices, shape (3, 2), where each row is [X, Y] for a vertex.
values (np.ndarray) – Values at each vertex for each variable, shape (3, num_variables).
point (tuple) – Coordinates of the point where interpolation is desired, (px, py).
- Returns:
Interpolated values at the given point for each variable, shape (num_variables,).
- Return type:
np.ndarray
- hydroBayesCal.function_pool.rasterize(saving_folder, slf_file_name, desired_variables, spacing)[source]
- hydroBayesCal.function_pool.classify_mu(raster_data, classification, output_folder, output_filename)[source]
Classify the morphological units (MU) based on velocity and depth and save as a raster file.
Parameters: raster_data (dict): Dictionary containing ‘velocity’ and ‘depth’ raster data as numpy arrays. classification (dict): Dictionary of classification criteria for different MUs. output_folder (str): Folder path where the output file will be saved. output_filename (str): The filename for the output raster file (without extension).
Returns: None: The function will save the classified MU raster as an ASCII file in the output folder.