API Reference

This section documents the public classes and functions, generated from the in-code docstrings. See Code Structure for how these modules fit together.

Binding layer

Abstract base class

Core code for coupling any hydrodynamic simulation software with the main script for GPE surrogate model construction and Bayesian Active Learning .

Author: Andres Heredia M.Sc.

class hydroBayesCal.hysim.HydroSimulations(control_file='control.cas', model_dir='', res_dir='', calibration_pts_file_path='', n_cpus=0, init_runs=1, calibration_parameters=None, param_values=None, calibration_quantities=None, extraction_quantities=None, dict_output_name='extraction-data', user_param_values=False, max_runs=1, complete_bal_mode=False, only_bal_mode=False, check_inputs=False, delete_complex_outputs=True, validation=False, multitask_selection='variables', *args, **kwargs)[source]

Bases: ABC

__init__(control_file='control.cas', model_dir='', res_dir='', calibration_pts_file_path='', n_cpus=0, init_runs=1, calibration_parameters=None, param_values=None, calibration_quantities=None, extraction_quantities=None, dict_output_name='extraction-data', user_param_values=False, max_runs=1, complete_bal_mode=False, only_bal_mode=False, check_inputs=False, delete_complex_outputs=True, validation=False, multitask_selection='variables', *args, **kwargs)[source]

Constructor of the HydroSimulations class to manage and run hydrodynamic simulations within the context of Bayesian Calibration using a Gaussian Process Emulator (GPE). The class is designed to handle simulation setup, execution, and result storage while managing calibration parameters and Bayesian Active Learning (BAL) iterations.

Parameters:
  • control_file (str) – Name of the file that controls the full complexity model simulation (default is “control.cas” as an example for Telemac).

  • model_dir (str) – Full complexity model directory where all simulation files (mesh, control file, boundary conditions) are located.

  • res_dir (str) – Directory where a subfolder called “auto-saved-results-HydroBayesCal” will be created to store all the result files. In this directory, the results of the calibration process will be stored according to the calibration quantity name. Addiionally, subfolders for plots, surrogate models, and restart data will be created.

  • calibration_pts_file_path (str or optional) – File path to the calibration points data file. Please check documentation for further details of the file format.

  • n_cpus (int) – Number of CPUs to be used for parallel processing (if available).

  • init_runs (int) – Initial runs of the full complexity model (before Bayesian Active Learning).

  • calibration_parameters (list of str) – Names of the considered calibration parameters (e.g. roughness coefficients, empirical constants, turbulent viscosity, etc).

  • param_values (list) – Value ranges considered for parameter sampling. Example: [[min1, max1], [min2, max2], …].

  • calibration_quantities (list of str) – Names of the calibration targets (model outputs) used for calibration. These quantities usually correspond to the measured values for calibration purposes. Example: [‘WATER DEPTH’] for a single quantity. Example: [‘WATER DEPTH’, ‘SCALAR VELOCITY’] for multiple quantities.

  • extraction_quantities (list of str) – Names of the quantities to be extracted from the model output files. Generally, the same or more than the calibration_quantities. These quantities will be extracted from the model. Example: calibration_quantities = ['WATER DEPTH'] (WATER DEPTH as calibration parameter). Example: extraction_quantities = ['WATER DEPTH', 'SCALAR VELOCITY', 'TURBULENT ENERG', 'VELOCITY U', 'VELOCITY V']. Any of these additional quantities can be used for calibration purposes when restarting the calibration process with only_bal_mode = True.

  • dict_output_name (str) – Base name for output dictionary files where the outputs are saved as .json files. This dictionary will be saved in the calibration-data subfolder for the considered calibration target.

  • parameter_sampling_method (str) –

    Method used for sampling parameter values during the calibration process. The available options are: - “random” : Random sampling. - “latin_hypercube” : Latin Hypercube Sampling (LHS). - “sobol” : Sobol sequence sampling. - “halton” : Halton sequence sampling. - “hammersley” : Hammersley sequence sampling. - “chebyshev(FT)” : Chebyshev nodes (Fourier Transform-based). - “grid(FT)” : Grid-based sampling (Fourier Transform-based). - “user” : User-defined sampling.

    Example:

    parameter_sampling_method = "sobol"  # Uses Sobol sequence sampling.
    

    If “user” is selected, a .csv file containing user-defined collocation points must be provided in the restart data folder. The file should follow this format:

    param1    param2    param3    param4    param5
    0.148     0.770     0.014     0.014     0.700
    0.066     0.066     0.066     0.066     0.066
    

  • max_runs (int) – Maximum (total) number of model simulations, including initial runs and Bayesian Active Learning iterations.

  • complete_bal_mode (bool, optional (Default: True)) –

    • If True: Bayesian Active Learning (BAL) is performed after the initial runs, enabling a complete surrogate‐assisted calibration process. This option MUST be selected if you choose to perform only BAL (i.e., when only_bal_mode = True).

    • If False: Only the initial runs of the full complexity model are executed, and the model outputs are stored as .json files.

  • only_bal_mode (bool, optional (Default: False)) –

    • If False: The process will either execute a complete surrogate‐assisted calibration or only the initial runs, depending on the value of complete_bal_mode.

    • If True: Only the surrogate model construction and Bayesian Active Learning of preexisting model outputs at predefined collocation points are performed. This mode can be executed only if either a complete process has already been performed (complete_bal_mode = True and only_bal_mode = True) or if only the initial runs have been executed (complete_bal_mode = False and only_bal_mode = False).

  • tasks:: (Shortcut combinations and their corresponding) – complete_bal_mode | only_bal_mode | task ——————+———————————+—————————————————– True | False | Complete surrogate-assisted calibration False | False | Only initial runs (no surrogate model) True | True, with init_runs = max_runs | Surrogate construction with predefined runs (no BAL) True | True, with init_runs > max_runs | Surrogate construction + Bayesian Active Learning

  • validation (bool, optional (Default: False)) – If True, creates output files (inputs and outputs) corresponding to validation process.

  • *args (tuple, optional) – Additional positional arguments.

  • **kwargs (dict, optional) – Additional keyword arguments.

asr_dir

Directory for auto-saved results.

Type:

str

model_dir

Path to the directory containing the model files.

Type:

str

res_dir

Path to the directory where results will be stored.

Type:

str

control_file

Path to the control file used in the calibration process.

Type:

str

calibration_pts_file_path

Path to the file containing the calibration points data.

Type:

str

nproc

Number of processors (CPUs) to be used.

Type:

int

param_values

Parameter values used in calibration.

Type:

array

calibration_quantities

Calibration quantities to be evaluated.

Type:

list

extraction_quantities

Quantities extracted from the model during calibration (calibration quantities must be included here).

Type:

list

calibration_parameters

Parameters involved in the calibration process.

Type:

list

parameter_sampling_method

Method used for sampling parameters during calibration. Options: - “random” - “latin_hypercube” - “sobol” - “halton” - “hammersley” - “chebyshev(FT)” - “grid(FT)” - “user” (requires a CSV file with user-defined collocation points in the restart data folder).

Type:

str

init_runs

Number of initial runs before surrogate-assisted calibration.

Type:

int

max_runs

Maximum number of calibration runs, including Bayesian Active Learning iterations.

Type:

int

complete_bal_mode

If True, enables complete surrogate-assisted calibration with Bayesian Active Learning. Must be selected if only_bal_mode = True.

Type:

bool

only_bal_mode

If True, only surrogate model construction and Bayesian Active Learning are performed. Requires prior execution of either the full calibration process (complete_bal_mode = True) or initial runs (complete_bal_mode = False).

Type:

bool

delete_complex_outputs

If True, deletes complex model outputs after processing.

Type:

bool

validation

If True, the model is run in validation mode.

Type:

bool

multitask_selection

If True, enables multitask selection for surrogate modeling.

Type:

bool

dict_output_name

Name of the output dictionary file. Appends “-validation” if validation mode is enabled.

Type:

str

nloc

Number of calibration points where measured data are available.

Type:

int

ndim

Number of model parameters used in the calibration.

Type:

int

param_dic

Dictionary containing calibration parameters and their respective ranges.

Type:

dict

num_calibration_quantities

Number of quantities used for calibration.

Type:

int

num_extraction_quantities

Number of additional quantities extracted from the model.

Type:

int

observations

Observed values at each calibration point.

Type:

array

measurement_errors

Measurement errors associated with each calibration point.

Type:

array

calibration_pts_df

Contains calibration point information. Header format:

Point | X | Y | <quantity>_DATA | <quantity>_ERROR | ...
Type:

pandas.DataFrame

user_collocation_points

User-defined collocation points loaded from a CSV file (only applicable when parameter_sampling_method=”user”).

Type:

array

calibration_folder

Directory where calibration data are stored.

Type:

str

restart_data_folder

Directory for restart data, used for resuming calibration runs.

Type:

str

model_evaluations

2D array of processed model outputs, shape [num_runs, nloc * num_calibration_quantities]: num_runs is the total number of evaluations (initial runs plus BAL iterations) and the columns interleave the calibration quantities per location. For example, with two quantities and two locations, columns 1-2 hold the two quantities at the first location and columns 3-4 the second.

Type:

numpy.ndarray

extract_data_point(input_file, calibration_pts_df, output_name, extraction_quantity, simulation_number, model_directory, results_folder_directory, *args, **kwargs)[source]

Extract data from a specified coordinate in a hydrodynamic model output file.

This generic method is designed for use with various hydrodynamic models (e.g., Telemac, OpenFOAM, etc.). It extracts data from an input file based on a provided CSV file containing the coordinates of the target points.

Parameters:
  • input_file (str) – Path to the hydrodynamic model output file from which data will be extracted.

  • calibration_pts_df (pd.DataFrame) –

    Contains the coordinates of the points where data extraction is required. It must include: - Point descriptions (e.g., “P1”). - X and Y coordinates of the measurement points. - Measured values and errors for the calibration quantities.

    Expected columns: - For a single calibration quantity: [‘Point Name’, ‘X’, ‘Y’, ‘Measured Value’, ‘Measured Error’] - For two calibration quantities: [‘Point Name’, ‘X’, ‘Y’, ‘Measured Value 1’, ‘Measured Error 1’, ‘Measured Value 2’, ‘Measured Error 2’]

  • output_name (str) – Base name for the output file where extracted data will be stored.

  • extraction_quantity (list of str) – List of variables or quantities to be extracted. Example: extraction_quantities=[“WATER DEPTH”, “SCALAR VELOCITY”, “TURBULENT ENERG”]

  • simulation_number (int) – The current simulation number, used to manage and organize data extraction (e.g. simulation number).

  • model_directory (str) – Path to the directory containing the model output files.

  • results_folder_directory (str) – Path to the directory where the extracted data will be saved.

  • *args – Additional positional arguments defining specific extraction criteria, such as data indices or custom processing parameters.

  • **kwargs – Additional keyword arguments for flexible data extraction criteria, such as: - time: Specific time step for extraction. - location: Specific coordinate or region of interest. - variable_name: Name of the variable to extract. - Any other model-specific parameters required for data extraction.

Returns:

The extracted data is saved to output files in the specified results directory.

Return type:

None

abstractmethod run_multiple_simulations(collocation_points=None, bal_new_set_parameters=None, bal_iteration=0, complete_bal_mode=True, validation=False, *args, **kwargs)[source]

Run the full-complexity model for a set of collocation points (BAL).

Executes multiple hydrodynamic simulations in the context of Bayesian Active Learning (BAL). A new set of calibration parameters may be added as an array during BAL iterations.

  • If complete_bal_mode=True, the process includes initial runs, surrogate-model construction and BAL iterations.

  • If complete_bal_mode=False, only the initial model runs are performed.

  • If validation=True, a separate set of runs is executed for validation (e.g. assessing surrogate-model performance).

The number of processors is defined by self.nproc at initialisation.

Parameters:
  • collocation_points (numpy.ndarray, optional) – Array of shape [init_runs, n_parameters] with the initial collocation points for the iterative runs. None during the BAL phase.

  • bal_new_set_parameters (numpy.ndarray, optional) – Array of shape [1, n_parameters] with the new parameter set for a BAL iteration. None during the initial runs.

  • bal_iteration (int, optional) – BAL iteration number (default 0).

  • complete_bal_mode (bool, optional) – True (default) to run the full process (initial runs, surrogate construction and BAL); False for initial runs only.

  • validation (bool, optional) – True to run a separate set of validation simulations.

  • *args – Binding-specific options (e.g. Telemac’s output_extraction, output_extraction_time and n). See the concrete subclass.

  • **kwargs – Binding-specific options (e.g. Telemac’s output_extraction, output_extraction_time and n). See the concrete subclass.

Returns:

2D array of processed model outputs, shape [num_runs, nloc * num_calibration_quantities], where num_runs is the total number of evaluations (initial runs plus BAL iterations) and the columns interleave the calibration quantities per location. For example, with two quantities and two locations, columns 1-2 hold the two quantities at the first location and columns 3-4 the second.

Return type:

numpy.ndarray

run_single_simulation(control_file='control_file.hydro')[source]

Executes a single model run using a specified script or launcher file.

This method is intended to handle the execution of a single simulation for various models (e.g., Telemac, OpenFOAM, Basement) by calling the appropriate launcher script.

Parameters:

control_file (str) – The name of the control file used to launch the simulation. Defaults to “control_file.hydro” as an example. This file should be present in the appropriate directory and executable through a terminal.

Returns:

The method executes the model run using a launcher command.

Return type:

None

set_observations_and_variances(calibration_pts_file_path, calibration_quantities, extraction_quantities, gpe_error=0.1, measurement_error=0.1)[source]

Reads calibration point data and constructs observation variances.

Total variance is computed as:

variance = measurement_error**2 + gpe_error**2 + site_specific_error**2

where:

  • measurement_error is assigned as a percentage of the measured value.

  • gpe_error is assigned as a percentage of the measured value.

  • site_specific_error is read from <quantity>_ERROR columns and should already be in the physical units of the corresponding calibration quantity.

static read_data(results_folder, file_name)[source]

Reads and extracts data from various file types based on the provided file name.

The function supports file types such as .csv, .json, .txt, .pkl, and .pickle.

Parameters:
  • results_folder (str) – The base directory where the results files are stored.

  • file_name (str) – The name of the file, including its extension (e.g., ‘data.csv’, ‘output.json’).

Returns:

data – The extracted data, which can be a DataFrame, dictionary, list, or other object depending on the file type. Returns None if the file type is unsupported or an error occurs while reading the file.

Return type:

object

set_calibration_parameters(params, values)[source]

Create a dictionary from calibration parameters and their value ranges if both params and values exist. If only one of them exists, compute the number of dimensions.

Parameters:
  • params – List of parameter names.

  • values – List of value ranges corresponding to the parameter names.

Returns:

Dictionary with parameter names as keys and value ranges as values, and the number of dimensions.

Raises:

ValueError – If the number of parameters does not match the number of values when both are provided.

update_model_controls(collocation_point_values, calibration_parameters, auxiliary_file_path, simulation_id=0)[source]

Updates the model control files for Bayesian calibration.Incorporates new parameter values, ensuring that the model runs with the specified settings during Bayesian calibration.

Parameters:
  • collocation_point_values (array) – Contains values for the calibration parameters. These values are used to update the model control files.

  • calibration_parameters (list of str) – Calibration parameter names that are to be updated in the model control files. Each string in the list should correspond to a parameter used in the model.

  • auxiliary_file_path (str) – Path to an auxiliary file that may be required for running the model controls (i.e., .tbl file in Telemac).

  • simulation_id (int) – An optional identifier for the simulation. The default is 0. This ID can be used to distinguish different simulations or runs.

Returns:

This method does not return any value. It modifies the model control files.

Return type:

None

abstractmethod output_processing(output_data='', delete_complex_outputs=False, validation=False, *args, **kwargs)[source]

Extract data from a file(.txt,json,etc) containing model outputs to 2D array ready to use in Bayesian calibration and saves the results to a CSV file.

Parameters:
  • output_data_path (str) – Path to the file (.json) containing the model outputs. The file should be structured such that its keys correspond to calibration points, and its values are lists of nested dictionaries having the output values for each run and quantity/ies.

  • delete_complex_outputs (Boolean, Default: False) – Delete complex model output files from the results folder (e.g. auto-saved-results-HydroBayesCal/<variable>). Recommended when running several simulations of the full complexity model.

  • validation (Boolean, Default: False) – If True, new files for collocation points and model results are created. This is done to keep the collocation points and model results obtained during the calibration process.

Returns:

model_results – A 2D array containing the processed model outputs. The shape of the array is [No. of total runs, No. of calibration points x No. of quantities], where ‘No. of quantities’ is the number of calibration quantities being processed, and ‘No. of total runs’ is the sum of initial runs and Bayesian active learning iterations. The array is also saved to a CSV file in the specified directory.

Return type:

numpy.ndarray

TELEMAC binding

Functional core for controlling Telemac simulations for coupling with the Surrogate-Assisted Bayesian inversion technique. Authors: Andres Heredia, Sebastian Schwindt

class hydroBayesCal.telemac.control_telemac.TelemacModel(friction_file='', tm_xd='Telemac2d', gaia_steering_file=None, fortran_file=None, results_filename_base='', gaia_results_filename_base=None, stdout=6, python_shebang='#!/usr/bin/env python3', *args, **kwargs)[source]

Bases: HydroSimulations

__init__(friction_file='', tm_xd='Telemac2d', gaia_steering_file=None, fortran_file=None, results_filename_base='', gaia_results_filename_base=None, stdout=6, python_shebang='#!/usr/bin/env python3', *args, **kwargs)[source]

Constructor for the TelemacModel Class. The class contains all necessary methods for Telemac simulations,extractions of simulation outputs and iterative updating of the control files.

Parameters:
  • friction_file (str, optional) – Name of the friction file to be used in Telemac simulations (should end with “.tbl”); do not include the directory path.

  • tm_xd (str,) – Specifies the dimension of the Telemac hydrodynamic solver, either ‘Telemac2d’ or ‘Telemac3d’.

  • gaia_steering_file (str, optional) – Name of the Gaia steering file; should be provided if required. Not implemented on this HydroBayesCal version.

  • results_filename_base (str, optional) – Base name for the results file, which will be iteratively updated in the .cas file.

  • python_shebang (str, optional) – Shebang line for Python scripts (default is “#!/usr/bin/env python3”).

  • *args (tuple) – Additional positional arguments.

  • **kwargs (dict) – Additional keyword arguments.

friction_file

Name of the Telemac friction file .tbl.

Type:

str

tm_xd

Dimension of the Telemac simulation (‘Telemac2d’ or ‘Telemac3d’).

Type:

str

gaia_steering_file

Gaia steering file name if provided; otherwise, None. Not implemented on this HydroBayesCal Version.

Type:

str or None

results_filename_base

Base name for the Telemac results file.

Type:

str

python_shebang

Shebang line for Python scripts.

Type:

str

tm_cas

Full path to the Telemac steering file (.cas).

Type:

str

fr_tbl

Full path to the friction file (.tbl).

Type:

str

comm

MPI communicator for parallel processing.

Type:

MPI.Comm

shebang

Shebang line for Python scripts.

Type:

str

tm_xd_dict

Dictionary mapping ‘Telemac2d’ and ‘Telemac3d’ to their respective script names.

Type:

dict

bal_iteration

Bayesian Active Learning iteration number based on max_runs.

Type:

int

num_run

Simulation number; iteratively updated based on collocation points.

Type:

int

tm_results_filename

File path for storing output data from Telemac simulations.

Type:

str

Note

The attributes specific to Telemac are listed above. For attributes inherited from the HydroSimulations class, please refer to its documentation.

update_model_controls(collocation_point_values, calibration_parameters, auxiliary_file_path=None, gaia_file_path=None, simulation_id=0)[source]

Modifies the .cas steering file for each of the Telemac runs according to the values of the collocation points and the calibration parameters. If a “FRICTION DATA FILE” is provided for Telemac simulations, it is possible to consider any zone as a calibration parameter. The parameters must start with the prefix “zone” and the number of the friction zone. The .tbl file will be modified for this purpose. This method is called every time it is required that the .cas or .tbl files are modified. It also modifies the gaia cas file. If the parameter starts with the prefix “gaia”, the method will look for the parameter in the gaia cas file and update it with the new value. If the parameter starts with “f.”, the method will look for it in the fortran file and update it with the new value. The rest of the parameters will be updated in the telemac cas file.

Parameters:
  • collocation_point_values (list) – Values for each of the calibration parameters.

  • calibration_parameters (list) – Names of the calibration parameters.

  • auxiliary_file_path (str, optional) – Path to the friction file (.tbl).

  • gaia_file_path (str, optional) – Path to the GAIA steering file (.cas). If provided, GAIA calibration parameters will also be updated.

  • simulation_id (int, optional) – Identifier of the current simulation. Used when generating or updating control files for multiple simulations. Default is 0.

Returns:

Modified control files (telemac.cas, gaia.cas, fortran file, and/or friction .tbl) for Telemac simulations.

Return type:

None

static create_cas_string(param_name, value)[source]

Create string names with new values to be used in Telemac2d / Gaia steering files

Parameters:
  • param_name (string) – Name of parameter to update

  • value (int , float or string) –

    Value to be assigned to param_name

    Returns

  • ------- – None Update parameter line for a steering file

rewrite_steering_file(param_name, updated_string, steering_module='telemac')[source]

Rewrite the .cas steering file with updated parameters.

Parameters:
  • param_name (str) – Name of the calibration parameter.

  • updated_string (str) – Updated string written into the .cas file with the new value.

  • steering_module (str, optional) – Steering module to rewrite: "telemac" (default) or "gaia".

Returns:

0 on success, -1 on error.

Return type:

int

run_single_simulation(control_file='tel.cas')[source]

Runs a Telemac2D or Telemac3D simulation with one or more processors. The number of processors to use is defined by self.nproc.

Parameters:

control_file (str) – The name of the control file used to launch the simulation. Default is “tel.cas”. This file should be located in the model directory.

Returns:

The method executes the model run using a launcher command.

Return type:

None

run_multiple_simulations(collocation_points=None, bal_new_set_parameters=None, bal_iteration=0, complete_bal_mode=True, output_extraction='interpolated', output_extraction_time='last', n=40, validation=False, kill_process=True)[source]

Runs multiple Telemac2d or Telemac3d simulations with a set of collocation points and a new set of calibration parameters when BAL mode is chosen. The number of processors to use is defined by self.nproc in user_inputs.

Parameters:
  • collocation_points (array) – Numpy array of shape [No. init_runs x No. calibration parameters] which contains the initial collocation points (parameter combinations) for iterative Telemac runs. Default is None, and it is filled with values for the initial surrogate model phase. It remains None during the BAL phase.

  • bal_new_set_parameters (array) – 2D array of shape [1 x No. parameters] containing the new set of values after each BAL iteration.

  • bal_iteration (int) – The number of the BAL iteration. Default is 0.

  • complete_bal_mode (bool) – Default is True when the code accounts for initial runs, surrogate construction and BAL phase. False when only initial runs are required.

  • validation (bool) – If True, the method runs a separate set of simulations for validation purposes.

  • output_extraction (str) – The mode for extracting model outputs. Options are “nearest”, “index” or “interpolated”.

  • output_extraction_time (str) – The time mode for extracting model outputs. Options are “last”, “index”, or “mean_last”.

  • n (int) – The number of last time steps to consider when output_extraction_time is set to “mean_last”. Default is 40.

  • validation – If True, the method runs a separate set of simulations for validation purposes, and saves the collocation points used for validation in a separate CSV file.

  • kill_process (bool) – If True, the method will attempt to kill any remaining Telemac processes after running the simulations. This is useful when preventing to running BAL after the initial runs.

Returns:

model_evaluations – 2D array containing processed model outputs. Shape: [num_runs, nloc * num_calibration_quantities], where: - num_runs is the total number of model evaluations, including both initial runs and Bayesian Active Learning iterations. - nloc * num_calibration_quantities represents the total number of outputs, with results interleaved in columns.

Example: For two calibration quantities and two calibration locations: - Columns 1 and 2 correspond to the outputs (2 quantities) of the first calibration location. - Columns 3 and 4 correspond to the outputs of the second location, and so on.

Return type:

array

output_processing(output_data_path='', calibration_quantities='', delete_slf_files=False, validation=False, save_extraction_outputs=False, filter_outputs=False, run_range_filtering=None, extraction_mode=False, calibration_mode=False)[source]

Processes model output data from a JSON file into a 2D array format for Bayesian calibration and saves the results to a CSV file.

This method reads a JSON file specified by output_data_path, extracts and processes the model outputs, and saves them in a CSV file format suitable for Bayesian calibration.

Parameters:
  • output_data_path (str) – Path to the file (.json) containing the model outputs. The file should be structured such that its keys correspond to calibration points, and its values are lists of nested dictionaries having the output values for each run and quantity/ies.

  • delete_complex_outputs (Boolean, Default: False) – Delete complex model output files from the results folder (e.g. auto-saved-results-HydroBayesCal/<variable>). Recommended when running several simulations of the full complexity model.

  • validation (Boolean, Default: False) – If True, new files for collocation points and model results are created. This is done to keep the collocation points and model results obtained during the calibration process.

Returns:

model_results – A 2D array containing the processed model outputs. The shape of the array is [number of total runs, number of calibration points x number of quantities], where ‘number of quantities’ represents the calibration quantities processed, and ‘number of total runs’ is the sum of initial runs and Bayesian active learning iterations. The columns are intercalated to store the quantities outputs. This array is also saved to a CSV file in the specified directory.

Return type:

numpy.ndarray

extract_data_point(input_file, calibration_pts_df, output_name, extraction_quantity, simulation_number, model_directory, results_folder_directory, validation=False, user_param_values=False, output_extraction='interpolated', k=3, output_extraction_time='last', time_index=0, n=5, compute_wall_law_diagnostics=False)[source]

Extract model results at specified calibration or validation points from TELEMAC and/or GAIA SELAFIN result files.

The method supports extraction of scalar variables, vertical layer selection based on measurement height, inverse-distance interpolation, and optional wall-law diagnostics. Extracted values are written to JSON files and result files are moved to the designated results directory.

Parameters:
  • input_file (str) – Name of the TELEMAC result file (.slf) to extract data from.

  • calibration_pts_df (pandas.DataFrame) –

    DataFrame containing extraction locations. The first column must contain point identifiers. The following columns are expected to be:

    • column 1: x-coordinate

    • column 2: y-coordinate

    • column 3: vertical measurement offset (z)

  • output_name (str) – Base name used for generated JSON output files.

  • extraction_quantity (list of str) – Quantities to extract from the model results. Variables may originate from TELEMAC or GAIA according to the configuration mapping classification_tm_gaia_dict.

  • simulation_number (int) – Current simulation number within the calibration workflow.

  • model_directory (str) – Directory containing TELEMAC and GAIA result files.

  • results_folder_directory (str) – Directory where extracted results and moved result files are stored.

  • validation (bool, optional) – If True, extracted values are treated as validation results and are written to validation-specific JSON files. Default is False.

  • user_param_values (bool, optional) – Flag controlling restart-data generation. Default is False.

  • output_extraction ({"nearest", "interpolated"}, optional) –

    Spatial extraction method.

    • "nearest": use the closest model node.

    • "interpolated": perform inverse-distance-weighted interpolation using the k nearest nodes.

    Default is "interpolated".

  • k (int, optional) – Number of nearest nodes used for interpolation when output_extraction="interpolated". Ignored when using nearest-node extraction. Default is 3.

  • output_extraction_time ({"last", "index", "mean_last"}, optional) –

    Temporal aggregation mode applied to the extracted time series.

    • "last": use the final time step.

    • "index": use the time step specified by time_index.

    • "mean_last": average the last n time steps.

    Default is "last".

  • time_index (int, optional) – Time-step index used when output_extraction_time="index". Default is 0.

  • n (int, optional) – Number of final time steps used when output_extraction_time="mean_last". Default is 5.

  • compute_wall_law_diagnostics (bool, optional) – If True, compute wall-law diagnostic quantities from TELEMAC 3D results and the generated 2D result file. Diagnostics include friction velocity, y-plus values, bottom friction parameters, near-bed velocity information, and the complete modeled vertical velocity profile. Default is False.

Returns:

Results are written to JSON files and model result files are moved to the results directory.

Return type:

None

Notes

  • If "3D VELOCITY MAGNITUDE" is requested, it is computed from VELOCITY U, VELOCITY V, and VELOCITY W.

  • For 3D simulations, the vertical layer closest to the measurement elevation is automatically selected using ELEVATION Z.

  • Wall-law diagnostics require at least two vertical planes (NPLAN >= 2).

static tbl_creator(zone_identifier, val, friction_file_path, veg_param_number=None, veg_indicator=False)[source]

Modifies the FRICTION DATA FILE (.tbl) for Telemac simulations based on the specified zone, value, and optional vegetation parameters. This method updates the friction values in the table for different zones as part of the calibration process and also the friction parameters for a previous selected vegetation friction rule.

Parameters:
  • zone_identifier (str) – Identifier for the friction zone to be updated in the friction table.

  • val (str) – The new friction value to be set for the specified zone.

  • friction_file_path (str) – The file path to the existing friction file (.tbl) that will be modified.

  • veg_param_number (str, optional) – The vegetation parameter number associated with the zone, if applicable. Default is None, indicating no vegetation parameter is to be updated.

  • veg_indicator (bool, optional) – Indicator whether vegetation parameters should be modified in the friction file. Default is False, which means only friction values are updated.

Returns:

The function updates the friction file in place and does not return any value.

Return type:

None

static check_tm_inputs(user_inputs)[source]

OpenFOAM binding

class hydroBayesCal.openfoam.control_openfoam.OpenFOAMController(case_dir)[source]

Bases: object

Parameters:

case_dir (str)

decompose_parallel_case(nprocs)[source]
Parameters:

nprocs (int)

Return type:

None

reconstruct_parallel_case()[source]
Return type:

None

update_boundary_condition(file, patch, field_type, bc_type, value)[source]
Parameters:
Return type:

None

update_dictionary_entry(file, subdict, key, value)[source]
Parameters:
Return type:

None

update_model_controls(params)[source]
Parameters:

params (Dict[str, Dict[str, Any]])

Return type:

None

run_simulation(nprocs=8, solver='interFoam')[source]
Parameters:
Return type:

None

convert_to_vtk()[source]
Return type:

None

extract_fields_from_vtk(alpha_threshold=0.5, n_avg_timesteps=1)[source]

Extract velocity (U) and turbulent kinetic energy (k) fields from VTK output, averaged over the last n_avg_timesteps timesteps, filtered to water phase only.

k is read directly from the OpenFOAM k field (k-epsilon RANS turbulent kinetic energy).

If n_avg_timesteps=1 (default), only the last timestep is used (original behaviour). If n_avg_timesteps=N, the last N VTK files are averaged, giving a time-averaged result over N * writeInterval seconds.

Parameters:
  • alpha_threshold (float) – Only include points where alpha.water > threshold (default 0.5)

  • n_avg_timesteps (int)

Returns:

Tuple of (coordinates, U_mean, k_mean) for water-phase points only. k is None if not found in the VTK files.

Return type:

Tuple[Any, Any, Any]

class hydroBayesCal.openfoam.control_openfoam.OpenFOAMModel(case_template_dir, solver_name='interFoam', n_processors=8, results_filename_base='results_interfoam', alpha_water_name='alpha.water', water_surface_alpha=0.5, reference_z=0.0, control_file='system/controlDict', model_dir='', res_dir='', calibration_pts_file_path='', n_cpus=8, init_runs=5, calibration_parameters=None, param_values=None, extraction_quantities=None, calibration_quantities=None, dict_output_name='extraction-data', user_param_values=False, max_runs=50, complete_bal_mode=True, only_bal_mode=False, delete_complex_outputs=False, validation=False, multitask_selection='variables', n_avg_timesteps=1, *args, **kwargs)[source]

Bases: HydroSimulations

BAL-compatible wrapper around OpenFOAMController.

Provides the interface expected by bal_openfoam.py while using your existing OpenFOAMController for the actual OpenFOAM operations.

run_multiple_simulations(collocation_points, complete_bal_mode=True, validation=False, bal_iteration=None, bal_new_set_parameters=None)[source]

Run multiple simulations - BAL interface.

save_calibration_data(it, collocation_points, bayesian_dict)[source]

Write per-iteration CSV files to calibration-data/<quantities>/.

Called once per BAL iteration from bal_openfoam.py after estimate_bme(). Produces three files per iteration:

collocation_points_N{n_tp}.csv   parameter values tested so far
model_results_N{n_tp}.csv        simulation outputs (model_evaluations)
bayesian_scores.csv              BME, RE, IE, ELPD for all iterations

bayesian_scores.csv is appended on each call (one row per iteration). The posterior is saved as a separate .npy file because it is a variable-length array (rejection sampling keeps only accepted samples).

output_processing(output_data_path=None, **kwargs)[source]

Load existing results for BAL restart.

Delft3D-FLOW binding (planned)

Delft3D-FLOW binding for HydroBayesCal – planned, not yet implemented.

This module is a placeholder that mirrors the TELEMAC (hydroBayesCal.telemac.control_telemac) and OpenFOAM (hydroBayesCal.openfoam.control_openfoam) bindings. It defines the intended public interface for coupling HydroBayesCal to the structured-grid Delft3D-FLOW engine (Deltares) so that the coupling can be implemented incrementally without changing the surrogate / Bayesian-active-learning layer.

The Delft3DModel class subclasses hydroBayesCal.hysim.HydroSimulations; the Python attribute names are shared across solvers, while the string and file conventions below are Delft3D-specific and must be preserved when the binding is filled in:

  • <case>.mdf – master definition FLOW file (the control file); the engine is launched through config_d_hydro.xml and the d_hydro executable.

  • Bed roughness via Chézy / Manning / White-Colebrook (.rgh file or Roughness keywords in the .mdf); eddy viscosity/diffusivity Vicouv / Dicouv.

  • trim-<case>.dat / trim-<case>.def – NEFIS map (field) output.

  • trih-<case>.dat / trih-<case>.def – NEFIS history (monitoring-point) output.

See the usage-delft3d page for the planned workflow.

hydroBayesCal.delft3d.control_delft3d.DELFT3D_BINDING_IMPLEMENTED = False

Marker so callers / tests can detect that the binding is not ready yet.

class hydroBayesCal.delft3d.control_delft3d.Delft3DModel(control_file='control.mdf', d_hydro_config='config_d_hydro.xml', flow_executable='d_hydro', roughness_formulation='Manning', map_file_base='trim', history_file_base='trih', *args, **kwargs)[source]

Bases: HydroSimulations

Placeholder Delft3D-FLOW model wrapper (planned).

Defines the intended constructor signature and interface but raises NotImplementedError. Instantiating it documents the Delft3D-specific configuration the binding will need; it does not run a simulation.

Parameters:
  • control_file (str) – Master definition FLOW file, default "control.mdf" (Delft3D-FLOW convention <case>.mdf).

  • d_hydro_config (str) – Runtime configuration passed to the d_hydro launcher, default "config_d_hydro.xml".

  • flow_executable (str) – Name of the Delft3D-FLOW launcher on PATH, default "d_hydro".

  • roughness_formulation (str) – Bed-roughness law used for the calibration parameters ("Chezy", "Manning" or "WhiteColebrook").

  • map_file_base (str) – Base names of the NEFIS map (trim-<case>) and history (trih-<case>) output files.

  • history_file_base (str) – Base names of the NEFIS map (trim-<case>) and history (trih-<case>) output files.

  • **kwargs – Common HydroSimulations parameters (model_dir, res_dir, calibration_pts_file_path, calibration_parameters, param_values, calibration_quantities, init_runs, max_runs …).

Raises:

NotImplementedError – Always – the binding is not implemented yet.

run_multiple_simulations(*args, **kwargs)[source]

Run the Delft3D-FLOW experimental-design simulations (planned).

output_processing(*args, **kwargs)[source]

Extract calibration quantities from NEFIS map/history output (planned).

Surrogate model and Bayesian Active Learning

Gaussian Process Emulators

This module inherits from the PyTorch library for training a Gaussian Process Emulator (GPE). The module supersede the ExactGP base class from GPyTorch and extend the functionality by customizing the mean function,likelihoods and kernel (covariance function) The MultitaskGPModel class also extends the ExactGP base class to handle multitask (multiple outputs) learning scenarios. It is designed to model multiple related tasks simultaneously especially if they have similarities by sharing information across them using a common GP framework. (https://docs.gpytorch.ai/en/stable/examples/03_Multitask_Exact_GPs/Multitask_GP_Regression.html). Author: Andres Heredia (2024)

class hydroBayesCal.surrogate.gpe_gpytorch.MyExactGPyModel(*args, **kwargs)[source]

Bases: ExactGP

Instance of GPyTorch’s “ExactGP” library, with custom likelihood, kernel, training points.

The likelihood is kept constant: Gaussian Likelihood (https://docs.gpytorch.ai/en/latest/likelihoods.html)

:param : param train_x: <np.array[n_tp, n_p]> with parameter sets used to train GPR :param : param train_y: <np.array[n_tp, n_obs]> with forward model outputs used to train GPR :param : param kernel: <kernel instance> with kernel used in GPR :param : param likelihood <likelihood instance> to train noise in GPR

forward(x)[source]

Takes in the training data (x) and returns a multivariate normal distribution with mean and covariance (kernel) set in “__init__()” :param x: training data (parameter sets) :return:

class hydroBayesCal.surrogate.gpe_gpytorch.GPyTraining(collocation_points, model_evaluations, kernel, training_iter, likelihood, y_normalization=True, tp_normalization=False, optimizer='adam', lr=0.5, loss='exact', n_restarts=1, weight_decay=0, gradient_free_start=False, verbose=True, parallelize=False)[source]

Bases: object

Train a single-output Gaussian Process Emulator with GPyTorch.

Uses GPyTorch’s exact GP regression to build a GPE for a forward model, from collocation points produced by that model.

Parameters:
  • collocation_points (numpy.ndarray) – Training points (parameter sets), shape [n_tp, n_p].

  • model_evaluations (numpy.ndarray) – Model outputs at each evaluated location, shape [n_tp, n_obs].

  • kernel (gpytorch.kernels.Kernel) – Kernel used to train the GPE. May use default values or a user-defined anisotropy (ard_num_dims = n_parameters).

  • likelihood (gpytorch.likelihoods.GaussianLikelihood) – Likelihood used to optimise the GPR. May use default constraints/initial values or be customised by the caller.

  • training_iter (int) – Number of optimiser iterations used to train the GPE.

  • optimizer (str, optional) – Optimiser to use: "adam" (default) or "lbfgs".

  • loss (str, optional) – Loss function: "exact" or "loo".

  • n_restarts (int, optional) – Number of optimisation restarts.

  • tp_normalization (bool, optional) – True to normalise training-point parameter values before training (default False).

  • y_normalization (bool, optional) – True to normalise model outputs before training (predictions are de-normalised afterwards).

  • parallelize (bool, optional) – True to parallelise surrogate training, False to train sequentially.

Notes

Todo

Accept the evaluation location as input and add a function that extracts the GPE predictions at the observation point (used in BAL).

Todo

For GPyTorch, check the GPU settings and other gpytorch.settings for prediction.

static convert_to_tensor(array)[source]

Function to transform np.array to a tensor :param array: <np.array> that you want to change to a tensor

Returns: <tensor> data in np.array transformed to tensor format

normalize_tp(train_y)[source]

Function to normalize training points outputs before training :param train_y: <np.array[tp_size, n_obs]> with model output values to normalize

Returns: <tensor> with normalized input values.

train_()[source]

Function trains the surrogate model using the GPyTorch library, using the given optimizer. Returns: ToDo: parallelize training

static init_model_params(model)[source]

Function to initalize model hyperparameters, for multi-start optimizations :param model: GPyTorch instance

Returns:

predict_(input_sets, get_conf_int=False)[source]

TO DO: DESCRIPTION TO BE COMPLETED

Parameters:
  • input_sets

  • get_conf_int

Returns:

hydroBayesCal.surrogate.gpe_gpytorch.validation_error(true_y, sim_y, output_names, n_per_type)[source]

Estimate validation criteria for a surrogate model per output location.

Results for each output type are stored under separate keys in a dictionary.

Parameters:
  • true_y (numpy.ndarray) – Simulator outputs for the validation samples, shape [mc_valid, n_obs].

  • sim_y (numpy.ndarray or dict) – Surrogate/emulator outputs for the validation samples, shape [mc_valid, n_obs]. If a dict, it holds output and std keys.

  • output_names (array-like of str) – Name of each output type, shape [n_types].

  • n_per_type (int) – Number of observations per output type.

Returns:

Validation criteria for each output location and output type.

Return type:

tuple

Notes

Todo

As in BayesValidRox, optionally estimate the surrogate predictions here by passing a surrogate object.

Todo

Move into the GPR class and return a dictionary keyed by output type.

hydroBayesCal.surrogate.gpe_gpytorch.save_valid_criteria(new_dict, old_dict, n_tp)[source]

Append the current iteration’s validation criteria to a results dict.

Stores the validation criteria for the current iteration (n_tp) in an existing dictionary, so the results for all iterations live in one file. Each validation criterion has a key per output type, holding a vector with one value per output location.

Parameters:
  • new_dict (dict) – Validation criteria for the current iteration.

  • old_dict (dict) – Validation criteria for all previous iterations, including an N_tp key that tracks the iteration number.

  • n_tp (int) – Number of training points for the current BAL iteration.

Returns:

The updated dictionary including the current iteration.

Return type:

dict

class hydroBayesCal.surrogate.gpe_gpytorch.MultiGPyTraining(collocation_points, model_evaluations, kernel, training_iter, likelihood, optimizer='adam', lr=0.5, n_restarts=1, parallelize=False, number_quantities=2, noise_constraint=gpytorch.constraints.GreaterThan)[source]

Bases: object

Class to train multiple Gaussian Process models using given collocation points and model evaluations. It uses the MultiGPyTraining class for multitask regression using GPyTorch.

__init__(collocation_points, model_evaluations, kernel, training_iter, likelihood, optimizer='adam', lr=0.5, n_restarts=1, parallelize=False, number_quantities=2, noise_constraint=gpytorch.constraints.GreaterThan)[source]
Parameters:

details. (See original class docstring for parameter)

train_tasks_variables()[source]

Train multitask Gaussian Process models using the provided collocation points and model evaluations.

train_tasks_locations()[source]

Train multitask Gaussian Process models using the provided collocation points and model evaluations. Trains the model for each output variable (water depth OR velocity) at all locations simultaneously.

train_tasks_all()[source]

Train a single multitask Gaussian Process model using all outputs (water depth and velocity) at all locations simultaneously.

predict_(input_sets, get_conf_int=False, multitask_cov=False)[source]

Predict the outputs and their standard deviations for given input sets using the trained GP models. Automatically selects the appropriate method based on the structure of self.gp_list.

class hydroBayesCal.surrogate.gpe_gpytorch.MultitaskGPModel(*args, **kwargs)[source]

Bases: ExactGP

Gaussian Process model for multitask regression using the GPyTorch library. This model handles multiple tasks (or quantities) simultaneously by using a multitask kernel and multitask mean function.

__init__(train_x, train_y, likelihood, kernel, number_tasks)[source]
Parameters:
  • train_x – torch.Tensor The input training data. A tensor of shape (n_samples, n_params) where n_samples is the number of samples and n_params is the number of input model parameters.

  • train_y – torch.Tensor The output training data. A tensor of shape (n_samples, n_tasks) where n_samples is the number of samples and n_tasks is the number of tasks or quantities. The output is typically organized so that each column corresponds to a different task.

  • likelihood – gpytorch.likelihoods.MultitaskGaussianLikelihood A multitask likelihood function used with the GP model.

  • kernel – tuple(gpytorch.kernels.Kernel, gpytorch.kernels.Kernel) A tuple of kernel components to be used in the GP model. The tuple should contain two kernel components.

forward(x)[source]

Computes the mean and covariance of the Gaussian Process given the input data x.

Parameters:

x – torch.Tensor A tensor containing the training data for the model.

Returns:

gpytorch.distributions.MultitaskMultivariateNormal

A new class is generated, which inherits all attributes from the GaussianProcessRegressor class from Scikit learn. This is done to manually set the “max_iter” and “gtol” values for the optimization of hyperparameters in the GPR kernel.

ToDo: Check GPyTorch+lbfgs to see if results can be improved by changing initial values or with Adam ? ToDo: Save each gp (for each loc) in a list, to call it later to do BAL+MCMC methods with them.

class hydroBayesCal.surrogate.gpe_skl.MyGeneralGPR(collocation_points, model_evaluations)[source]

Bases: object

Class assigns/creates the attributes which are constant for all GPR-library classes (such as SklTraining and GPyTraining)

Parameters:
  • np.array (prior_samples =)

  • np.array

    model outputs in each location where the fcm was evaluated/in the locations being considered

    # xx

  • np.array

  • GPE (trained)

self.n_obs = int, number of locations from the fcm where the GPE is to be trained. It is not necessarily the

same as the number of true observations, since one could train the GPE in given locations (e.g. all grid points), where some locations coincide with the observation points.

# xx
self.surrogate_prediction = np.array

each GPE (n_obs) for each parameter set in prior_samples

Type:

self.n_obs, self.prior_samples.shape[0]

self.surrogate_std = np.array

each GPE (n_obs) for each parameter set in prior_samples

Type:

self.n_obs, self.prior_samples.shape[0]

self.surrogate_up = np.array

for each GPE (n_obs) for each parameter set in prior_samples

Type:

self.n_obs, self.prior_samples.shape[0]

self.surrogate_lc = np.array

for each GPE (n_obs) for each parameter set in prior_samples

Type:

self.n_obs, self.prior_samples.shape[0]

class hydroBayesCal.surrogate.gpe_skl.MySklGPR(*args, **kwargs)[source]

Bases: GaussianProcessRegressor

class hydroBayesCal.surrogate.gpe_skl.SklTraining(collocation_points, model_evaluations, kernel, alpha, n_restarts, noise=True, y_normalization=True, y_log=False, tp_normalization=False, optimizer='fmin_l_bfgs_b', parallelize=False, n_jobs=-2)[source]

Bases: MyGeneralGPR

Train a single-output Gaussian Process Emulator with scikit-learn.

Uses scikit-learn’s GP regression to build a GPE for a forward model, from collocation points produced by that model. See the scikit-learn GaussianProcessRegressor for the underlying estimator.

Parameters:
  • collocation_points (numpy.ndarray) – Training points (parameter sets), shape [n_tp, n_params].

  • model_evaluations (numpy.ndarray) – Full-complexity model outputs at each evaluated location, shape [n_tp, n_locations].

  • kernel (object or list of objects) – sklearn.gaussian_process.kernels instance(s) used to train the GPE; converted internally to a list.

  • alpha (float or list of float) – Value added to the diagonal to avoid numerical errors. A scalar is broadcast to a list.

  • n_restarts (int) – Number of optimiser restarts used to find the kernel hyper-parameters (avoids local minima).

  • noise (bool, optional) – True (default) to add a white-noise kernel to the input kernel.

  • y_normalization (bool, optional) – True (default) to normalise model outputs before training.

  • tp_normalization (bool, optional) – True to normalise training-point parameter values before training (default False).

  • optimizer (str, optional) – Name of the optimiser to use (default scikit-learn optimiser).

  • parallelize (bool, optional) – True to parallelise surrogate training, False to train sequentially.

Notes

Todo

Accept the evaluation location as input and add a function that extracts the GPE predictions at the observation point (used in BAL).

train_()[source]

ToDo: Use joblib to parallelize training Returns:

predict_(input_sets, get_conf_int=False)[source]

Evaluate the per-location surrogate models on all input sets.

Parameters:
  • input_sets (numpy.ndarray) – Parameter sets to evaluate the surrogate models on, shape [MC, n_params].

  • get_conf_int (bool, optional) – True to also estimate the upper and lower confidence intervals.

Returns:

Surrogate-model mean (output) and standard deviation (std) for each location, each of shape [n_obs, MC].

Return type:

dict

hydroBayesCal.surrogate.gpe_skl.validation_error(true_y, sim_y, output_names, n_per_type)[source]

Estimate validation criteria for a surrogate model per output location.

Results for each output type are stored under separate keys in a dictionary.

Parameters:
  • true_y (numpy.ndarray) – Simulator outputs for the validation samples, shape [mc_valid, n_obs].

  • sim_y (numpy.ndarray or dict) – Surrogate/emulator outputs for the validation samples, shape [mc_valid, n_obs]. If a dict, it holds output and std keys.

  • output_names (array-like of str) – Name of each output type, shape [n_types].

  • n_per_type (int) – Number of observations per output type.

Returns:

Validation criteria for each output location and output type.

Return type:

tuple

Notes

Todo

As in BayesValidRox, optionally estimate the surrogate predictions here by passing a surrogate object.

Todo

Move into the GPR class and return a dictionary keyed by output type.

hydroBayesCal.surrogate.gpe_skl.save_valid_criteria(new_dict, old_dict, n_tp)[source]

Append the current iteration’s validation criteria to a results dict.

Stores the validation criteria for the current iteration (n_tp) in an existing dictionary, so the results for all iterations live in one file. Each validation criterion has a key per output type, holding a vector with one value per output location.

Parameters:
  • new_dict (dict) – Validation criteria for the current iteration.

  • old_dict (dict) – Validation criteria for all previous iterations, including an N_tp key that tracks the iteration number.

  • n_tp (int) – Number of training points for the current BAL iteration.

Returns:

The updated dictionary including the current iteration.

Return type:

dict

Bayesian inference and sequential design

TO DO: ADD DESCRIPTION

class hydroBayesCal.surrogate.bal_functions.BayesianInference(model_predictions, observations, error, prior=None, prior_log_pdf=None, model_error=None, sampling_method='rejection_sampling')[source]

Bases: object

Bayesian inference of model parameters from surrogate predictions.

Computes the likelihood of the observations under the surrogate-model predictions, the Bayesian model evidence (BME) and relative entropy (RE), and draws a posterior parameter sample.

Parameters:
  • model_predictions (numpy.ndarray) – Array of shape [MC_size, n_observations] with the surrogate-model predictions.

  • observations (numpy.ndarray) – Array of shape [1, n_observations] with the measured observations.

  • error (numpy.ndarray) – Array of shape [n_observations] with the error/noise variances, inserted as-is on the diagonal of the covariance matrix.

  • prior (numpy.ndarray, optional) – Array of shape [MC_size, n_parameters] with the prior parameter sets. If None, no posterior parameter set is saved.

  • prior_log_pdf (numpy.ndarray, optional) – Array of shape [MC_size] with the prior log-probabilities of each parameter sample in prior. If None (default), the information entropy (IE) is not estimated. May be supplied with or without prior.

  • model_error (optional) – Additional model-error term (default None).

  • sampling_method (str, optional) – Method used to sample from the posterior distribution, one of "rejection_sampling" (default) or "bayesian_weighting".

likelihood

Prior likelihood values (shape [MC_size]) under a multivariate Gaussian distribution.

Type:

numpy.ndarray

cov_mat

Covariance matrix of shape [n_observations, n_observations] with the variances on the diagonal.

Type:

numpy.ndarray

post_likelihood

Likelihood values of the posterior samples.

Type:

numpy.ndarray

posterior

Posterior parameter sets.

Type:

numpy.ndarray

BME

Bayesian model evidence, computed from prior Monte Carlo sampling.

Type:

float

ELPD

Expected log-predictive density (expected value of the posterior likelihoods).

Type:

float

RE

Relative entropy between prior and posterior.

Type:

float

IE

Information entropy (not currently computed).

Type:

float

Notes

Posterior sampling options:

  • Bayesian weighting obtains the posterior likelihoods as a weighted average of the prior-based likelihood values, avoiding small posterior sample sizes. Results are similar to rejection sampling, but the posterior set is not easily available.

  • Rejection sampling divides all likelihoods by the maximum and accepts sample i when likelihood(i) / max(likelihood) > U[0, 1]. It yields a posterior distribution directly, but needs a larger Monte Carlo sample when the output dimension is large.

Todo

Add posterior MCMC sampling methods.

calculate_constants()[source]

Calculates the covariance matrix based on the input variable “error”, which is a vector of variances, one for each observation point.

Returns:

None

calculate_likelihood()[source]

Function calculates likelihood between measured data and the model output using the stats module equations.

Notes: * Generates likelihood array with size [MCx1]. * Likelihood function is multivariate normal distribution, considering independent and Gaussian-distributed errors.

calculate_likelihood_manual()[source]

Function calculates likelihood between observations and the model output manually, using numpy calculations.

Notes: * Generates likelihood array with size [MCxN], where N is the number of measurement data sets. * Likelihood function is multivariate normal distribution, considering independent and Gaussian-distributed errors. * Method is faster than using stats module (‘calculate_likelihood’ function).

calculate_likelihood_with_error()[source]

Function calculates likelihood between observations and the model output manually, using numpy calculations. It considers model error, with an error associated to each model prediction.

Notes: * Generates likelihood array with size [MCxN], where N is the number of measurement data sets. * Likelihood function is multivariate normal distribution, considering independent and Gaussian-distributed errors. * Method is faster than using stats module (‘calculate_likelihood’ function).

rejection_sampling()[source]

Run rejection sampling.

Generates MC uniformly distributed random numbers (RN). If the normalised likelihood likelihood / max(likelihood) is smaller than the corresponding RN, the prior sample is rejected; the remaining samples form the posterior.

Notes

  • Generates the posterior likelihood, posterior values and posterior density arrays.

  • If max(likelihood) == 0 there is no posterior distribution, or the posterior equals the prior.

estimate_bme()[source]

Function calculates likelihood and BME (prior based) and then, based on the given posterior sampling criteria, obtains a posterior likelihood, ELPD and RE.

Returns:

Note

If BME = 0, then it means that the model was not able to reproduce the observed data, and so we assume BME = ELPD, and thus RE is also 0, since nothing mas learned.

class hydroBayesCal.surrogate.bal_functions.SequentialDesign(exp_design, sm_object, obs, n_cand_groups=4, secondary_sm=None, parallel=True, n_jobs=-1, backend='loky', errors=None, do_tradeoff=False, gaussian_assumption=False, mc_samples=1000, mc_exploration=10000, multitask=False)[source]

Bases: object

Class runs the optimal design of experiments (sequential design) to select the new training points, to add to the existing training points for surrogate model training.

TO DO: DOUBLE-CHECK WITH DOEPY CLASSES AND FUNCTIONS

Parameters:
  • exp_design – ExpDesign object Used to sample from the prior distribution, and extract exploit and explore methods.

  • sm_object – object surrogate model class object, either SklTraining, GPyTraining, must have a ‘self.predict_(input_params)’ function to evaluate surrogate.

  • n_cand_groups – int in how many lists to split the candidate set, to do MultiProcessing.

  • multiprocessing – bool True to use multiprocessing (parallelize) tasks. False to set n_cand_groups=1

  • obs – array [n_obs, ] (ToDo: dict, with a key for each output type ) array with observation values

  • errors – array [n_obs, ] (ToDo: dict, with a key for each output type) array with measurement error for each observation. Default is None

  • do_tradeoff – bool True to consider the total score a combination of exploration and exploitation score. False to just use either, depending on the exploitation method.

  • secondary_sm – object surrogate model class object, either SklTraining, GPyTraining, must have a ‘self.predict_(input_params)’ function to evaluate surrogate. It corresponds to the secondary, or error model, which is added to the sm_object main surrogate.

  • gaussian_assumption – bool True to assume a Gaussian prior and likelihood, so analytical equations for BAl are used. False to follow the traditional sampling approach.

Attributes:

check_inputs()[source]
run_sequential_design(prior_samples=None)[source]
bayesian_active_learning(y_mean, y_std, observations, error, utility_function='dkl')[source]

Computes scores based on Bayesian active design criterion (utility_criteria).

It is based on the following paper: Oladyshkin, Sergey, Farid Mohammadi, Ilja Kroeker, and Wolfgang Nowak. “Bayesian3 active learning for the gaussian process emulator using information theory.” Entropy 22, no. 8 (2020): 890.

Parameters:
  • y_mean (array [n_samples, n_obs] ToDo: Dictionary, with a key for each output type, each array [mc_size, n_obs]) – Array with surrogate model outputs (mean)

  • y_std (array [n_samples, n_obs] ToDo: Dictionary, with a key for each output type, each array [mc_size, n_obs]) – Array with output standard deviation

  • observations (array [n_obs, ] ToDo: Dictionary, with a key for each output type, each array [1, n_obs]) – array with measured observations

  • error (array [n_obs] ToDO: dict A dictionary containing the measurement errors (sigma^2). One dictionary for each output type) – an array with the observation errors associated to each output

  • utility_function (string, optional) – BAL design criterion. The default is ‘DKL’.

Returns:

Score.

Return type:

float

analytical_bal(y_mean, y_std, observations, error, utility_function='dkl')[source]

Function computes the analytical BAL criteria (IE or DKL), when the prior and likelihood are both Gaussian distributions. It first estimates the posterior distribution, and then estimates either the Dkl or IE. For ill-posed priors, we check if the prior and posterior MG distributions overlap in any dimension, if not, then the BAl criteria are not estimated.

The post logBME equation was obtained from Oladyshkin and Nowak (2019) (doi: 10.3390/e21111081), eq.(28)

Parameters:
  • y_mean – array [n_samples, n_obs] array with ith surrogate model outputs (mean)

  • y_std – array [n_samples, n_obs] array with output standard deviation

  • observations – array [n_obs, ] array with measured observations

  • error – array [n_obs] array with the observation errors associated to each output

  • utility_function – string, optional, BAL design criterion. The default is ‘DKL’.

Returns:

analytical BAL criteria for the given input distribution

Return type:

float

run_al_functions(exploit_method, candidates, index, m_error, utility_func)[source]

Run the utility (active-learning) function for the given method.

Parameters:
  • exploit_method (str) – Exploitation method. Currently supported: "bal" (Bayesian active learning).

  • candidates (numpy.ndarray) – Array [mc_size, n_params] of candidate parameter sets to explore, each scored by the utility function.

  • index (array-like) – Indices of the candidate samples within the prior pool.

  • m_error (numpy.ndarray) – Array [n_obs] with the measurement error of each observation.

  • utility_func (str) – Name of the utility function / active-learning criterion used to score each candidate set.

Returns:

Array [n_candidates] with the score assigned to each candidate, in descending order.

Return type:

numpy.ndarray

static gaussian_overlap(mu1, cov1, mu2, cov2)[source]

Function to determine if 2 multivariate Gaussian distributions overlap in any dimension. If they overlap in any dimension, then the analytical posterior-based criteria can be estimated. As overlap criteria we arbitrarily selected that, if the 2 distributions overlap anywhere within the 99% confidence intervals, then they do overlap.

Parameters:
  • mu1 (np.array [n_dim, ]) – array with mean values for distribution 1 (prior)

  • cov1 (np.array [n_dim, n_dim]) – diagonal matrix with the variances for distribution 1 (prior)

  • mu2 (np.array [n_dim, ]) – array with mean values for distribution 2 (posterior)

  • cov2 (np.array [n_dim, n_dim]) – diagonal matrix with the variances for distribution 2 (posterior)

Returns:

True if they overlap, False if they don’t.

Return type:

bool

static multivariate_gaussian_kl_divergence(mu_p, cov_p, mu_q, cov_q)[source]

Function estimates the analytical solution for the Kullback-Leibler divergence when going from the prior (q) to the posterior (p) when oth prior and posteriors are Gaussian distributions.

Parameters:
  • mu_p (np.array [n_obs, ]) – array with the mean values for the posterior distribution

  • cov_p (np.array [n_obs, n_obs]) – diagonal matrix with the variance for the posterior distribution

  • mu_q (np.array [n_obs, ]) – array with the mean values for the prior distribution

  • cov_q (np.array [n_obs, n_obs]) – diagonal matrix with the variance for the prior distribution distribution

Returns:

Kullback-Leibler divergence between prior and posterior

Return type:

float

static posterior_log_likelihood(samples, mean, cov_mat)[source]

Function estimates the log pdf of a Gaussian distribution manually (faster than using stats)

Parameters:
  • samples (np.array [mc_exploration, n_obs]) – array with samples to get the pdf from

  • mean (np.array [1, n_obs]) – mean array of Gaussian distribution

  • cov_mat (np.array [n_obs, n_obs]) – covariance of the Gaussian distribution

Returns:

array with pdf value of each sample

Return type:

np.array [mc_exploration, ]

select_indexes(prior_samples, collocation_points)[source]
Parameters:
  • prior_samples – array [mc_size, n_params] Pre-defined samples from the parameter space, out of which the sample sets should be extracted.

  • collocation_points – [tp_size, n_params] array with training points which were already used to train the surrogate model, and should therefore not be re-explored.

Returns: array[self.mc_size,]

With indexes of the new candidate parameter sets, to be read from the prior_samples array

class hydroBayesCal.surrogate.exploration.Exploration(n_candidate, old_tp, exp_design=None, mc_criterion='mc-intersite-proj-th', w=100)[source]

Bases: object

Generates samples from the prior distribution using the ExpDesign class attributes and functions. Two strategies are available:

  • Voronoi sampling (to be defined).

  • mc_samples – Monte Carlo sampling using random, Sobol or Latin-hypercube samples. Previously sampled candidates may also be passed, in which case no new sampling is done and only the scores are estimated.

Each candidate is scored by its distance to the existing training points so that the whole domain is explored. Scores are normalised to [0, 1]; the highest values are the best.

Based on the Surrogate Modeling Toolbox (SUMO) [1] and modelled after code in BayesValidRox [2].

exp_design

ExpDesign object, needed to sample from the prior distribution

Type:

obj

n_candidate

Number of candidate samples.

Type:

int

mc_criterion

Selection criterion. The default is ‘mc-intersite-proj-th’. Another option is ‘mc-intersite-proj’.

Type:

str

w

Number of random points in the domain for each sample of the training set.

Type:

int

get_exploration_samples(prior_candidates=None)[source]

This function generates candidates to be selected as new design and their associated exploration scores.

Returns:

  • all_candidates (array of shape (n_candidate, n_params)) – A list of samples.

  • exploration_scores (arrays of shape (n_candidate)) – Exploration scores.

get_vornoi_samples()[source]

This function generates samples based on voronoi cells and their corresponding scores

Returns:

  • new_samples (array of shape (n_candidate, n_params)) – A list of samples.

  • exploration_scores (arrays of shape (n_candidate)) – Exploration scores.

get_mc_samples(all_candidates=None)[source]

This function generates random samples based on Global Monte Carlo methods and their corresponding scores, based on [1].

[1] Crombecq, K., Laermans, E. and Dhaene, T., 2011. Efficient

space-filling and non-collapsing sequential design strategies for simulation-based modeling. European Journal of Operational Research , 214(3), pp.683-696. DOI: https://doi.org/10.1016/j.ejor.2011.05.032

Implemented methods to compute scores:
  1. mc-intersite-proj

  2. mc-intersite-proj-th

Parameters:

all_candidates (array, optional) – Samples to compute the scores for. The default is None. In this case, samples will be generated by defined model input marginals.

Returns:

  • new_samples (array of shape (n_candidate, n_params)) – A list of samples.

  • exploration_scores (arrays of shape (n_candidate)) – Exploration scores.

approximate_voronoi(w, samples)[source]

An approximate (monte carlo) version of Matlab’s voronoi command.

Parameters:

samples (array) – Old experimental design to be used as center points for voronoi cells.

Returns:

  • areas (array) – An approximation of the voronoi cells’ areas.

  • all_candidates (list of arrays) – A list of samples in each voronoi cell.

Shared utilities

Function pool for usage at different package levels

hydroBayesCal.function_pool.append_new_line(file_name, text_to_append)[source]

Add new line to steering file

Parameters:
  • file_name (str) – path and name of the file to which the line should be appended

  • text_to_append (str) – text of the line to append

Return None:

hydroBayesCal.function_pool.call_process(bash_command, environment=None)[source]

Call a terminal process via subprocess and return its exit status.

The process return code is checked and reported: a non-zero code (e.g. a failed Telemac/OpenFOAM run) is logged together with the captured stderr and returned to the caller, instead of silently reporting success.

Parameters:
  • bash_command (str) – terminal command to run

  • environment – optional environment mapping to run the process in

Return int:

the process return code (0 on success, non-zero on failure, -1 if the process could not be started)

hydroBayesCal.function_pool.calculate_settling_velocity(diameters)[source]

Calculate particle settling velocity as a function of diameter, densities of water and sediment, and kinematic viscosity

Parameters:

diameters (np.array) – floats of sediment diameter in meters

Return np.array settling_vevlocity:

settling velocities in m/s for every diameter in the diameters list

hydroBayesCal.function_pool.concatenate_csv_pts(file_directory, *args)[source]

Concatenate a csv-files with lists of XYZ points into one CSV file that is saved to the same directory where the first CSV file name provided lives. The merged CSV file name starts with merged_ and also ends with the name of the first CSV file name provided.

Parameters:
  • file_directory – os.path of the directory where the CSV files live, and which must NOT end on ‘/’ or ‘'

  • args – string or list of csv files (only names) containing comma-seperated XYZ coordinates without header

Return pandas.DataFrame:

merged points

hydroBayesCal.function_pool.lookahead(iterable)[source]

Pass through all values of an iterable, augmented by the information if there are more values to come after the current one (True), or if it is the last value (False).

Source: Ferdinand Beyer (2015) on https://stackoverflow.com/questions/1630320/what-is-the-pythonic-way-to-detect-the-last-element-in-a-for-loop

hydroBayesCal.function_pool.str2seq(list_like_string, separator=',', return_type='tuple')[source]

Convert a list-like string into a tuple or list based on a separator such as comma or semi-column

Parameters:
  • list_like_string (str) – string to convert

  • separator (str) – separator to use

  • return_type (str) – defines if a list or tuple is returned (default: tuple)

Returns:

list or tuple

hydroBayesCal.function_pool.log_actions(func)[source]

TODO: this is the logging wrapper! :param func: :return:

hydroBayesCal.function_pool.update_collocation_pts_file(file_path, new_collocation_point, mode='update')[source]

Append a new row to a CSV file or create a new file depending on the mode.

Parameters:
  • file_path – Path to the CSV file.

  • new_collocation_point – List of values to be added as a new row.

  • mode – Mode to determine whether to ‘update’ (append) or ‘generate’ (overwrite) the file.

hydroBayesCal.function_pool.save_data(file_path, data)[source]

Save NumPy array data to a file based on the file extension in the file path.

Parameters:
  • file_path – Path to the file where data should be saved.

  • data – NumPy array data to be saved.

hydroBayesCal.function_pool.rearrange_array(data, num_quantities)[source]

Rearrange a NumPy array such that data from multiple quantities is interleaved by columns.

Parameters:
  • data – A NumPy array of shape (num_quantities * n, m) where n is the number of data points per quantity.

  • num_quantities – An integer indicating the number of quantities (e.g., velocity, water depth, etc.).

Returns:

A NumPy array with interleaved columns for all quantities.

hydroBayesCal.function_pool.update_json_file(json_path, modeled_values_dict=None, detailed_dict=False, save_dict=False, saving_path=None)[source]

Updates the JSON file at json_path with data from modeled_values_dict.

If the file exists, it appends new values to the existing data. If the file does not exist, it creates a new file with the initial data.

Parameters:
  • json_path (str) – The path to the JSON file to be updated or created.

  • modeled_values_dict (dict) – A dictionary with data to be added or updated in the JSON file.

  • detailed_dict (bool, optional) – Whether to handle the data as nested lists for detailed structures.

  • save_dict (bool, optional) – If True, saves the entire output_data to the saving_path.

  • saving_path (str, optional) – The path to save the final JSON file when save_dict is True. If not provided, defaults to json_path.

hydroBayesCal.function_pool.delete_slf(folder_path)[source]

Deletes all files with the .slf extension in the specified folder.

Parameters:

folder_path (str) – The path to the folder where the .slf files will be deleted.

Return type:

None

hydroBayesCal.function_pool.filter_model_outputs(data_dict, quantities, run_range_filtering=None)[source]

Filters the data from the model outputs dictionary based on desired quantities and optionally limits the runs included to a specific range.

Parameters:
  • data_dict (dict) – Dictionary containing model outputs with points as keys and lists of run outputs as values.

  • quantities (list of str) – List of quantities to extract from the model outputs.

  • run_range (tuple of int, optional) – Range of runs to include (start, end). If None, includes all runs. The range is inclusive of the start index and exclusive of the end index.

Returns:

Filtered dictionary containing only the selected quantities and runs within the specified range.

Return type:

dict

hydroBayesCal.function_pool.interpolate_values(coords, values, point)[source]

Interpolates values at a given point using Inverse Distance Weighting.

Parameters:
  • coords (np.ndarray) – Coordinates of the triangle’s vertices, shape (3, 2), where each row is [X, Y] for a vertex.

  • values (np.ndarray) – Values at each vertex for each variable, shape (3, num_variables).

  • point (tuple) – Coordinates of the point where interpolation is desired, (px, py).

Returns:

Interpolated values at the given point for each variable, shape (num_variables,).

Return type:

np.ndarray

hydroBayesCal.function_pool.rasterize(saving_folder, slf_file_name, desired_variables, spacing)[source]
hydroBayesCal.function_pool.classify_mu(raster_data, classification, output_folder, output_filename)[source]

Classify the morphological units (MU) based on velocity and depth and save as a raster file.

Parameters: raster_data (dict): Dictionary containing ‘velocity’ and ‘depth’ raster data as numpy arrays. classification (dict): Dictionary of classification criteria for different MUs. output_folder (str): Folder path where the output file will be saved. output_filename (str): The filename for the output raster file (without extension).

Returns: None: The function will save the classified MU raster as an ASCII file in the output folder.

hydroBayesCal.function_pool.parse_classes_keyword(file_path, keyword)[source]
hydroBayesCal.function_pool.update_gaia_class_line(line, index, new_value)[source]
hydroBayesCal.function_pool.classify_parameters_tm_gaia(elements, classification_dict)[source]
hydroBayesCal.function_pool.vtk_to_2dm(input_file, output_file)[source]

Convert a VTK/VTP mesh to a 2DM file.

Parameters:
  • input_file (str) – Path to the input VTK or VTP file (ASCII, Binary, or Compressed).

  • output_file (str) – Path to the output 2DM file.

hydroBayesCal.function_pool.twodm2SLF(input_file_2dm, output_file_adcirc, output_file_slf)[source]