ReducedComplexityModeling

Documentation for ReducedComplexityModeling.

ReducedComplexityModeling.AutoEncoderModel
ReducedComplexityModeling.Batch
ReducedComplexityModeling.Batch
ReducedComplexityModeling.Batch
ReducedComplexityModeling.CartesianParameterSampler
ReducedComplexityModeling.DataLoader
ReducedComplexityModeling.DataLoader
ReducedComplexityModeling.DataLoader
ReducedComplexityModeling.DataLoader
ReducedComplexityModeling.DataLoader
ReducedComplexityModeling.DataLoader
ReducedComplexityModeling.DataLoader
ReducedComplexityModeling.DataLoader
ReducedComplexityModeling.DataLoader
ReducedComplexityModeling.Parameter
ReducedComplexityModeling.ParameterSampler
ReducedComplexityModeling.ParameterSpace
ReducedComplexityModeling.ReducedBasisModel
ReducedComplexityModeling.accuracy
ReducedComplexityModeling.accuracy
ReducedComplexityModeling.assign_output_estimate
ReducedComplexityModeling.convert_input_and_batch_indices_to_array
ReducedComplexityModeling.h5load
ReducedComplexityModeling.h5save
ReducedComplexityModeling.learn
ReducedComplexityModeling.number_of_batches
ReducedComplexityModeling.onehotbatch
ReducedComplexityModeling.read_parameters
ReducedComplexityModeling.sample
ReducedComplexityModeling.save_parameters
ReducedComplexityModeling.split_and_flatten

ReducedComplexityModeling.AutoEncoderModel — Type

Autoencoders

Uses Variational Autoencoders to represent the data in its reduced state.

source

ReducedComplexityModeling.Batch — Type

Batch(batch_size, seq_length)

Make an instance of Batch for a specific batch size and a sequence length.

This is used to train neural networks of GeometricMachineLearning.TransformerIntegrator type.

Optionally the prediction window can also be specified by calling:

using ReducedComplexityModeling
using ReducedComplexityModeling: Batch

batch_size = 2
seq_length = 3
prediction_window = 2

Batch(batch_size, seq_length, prediction_window)

# output

Batch{:Transformer}(2, 3, 2)

Note that here the batch is of type :Transformer.

source

ReducedComplexityModeling.Batch — Type

Batch

Batch is a struct whose functor acts on an instance of DataLoader to produce a sequence of training samples for training for one epoch.

See Batch(::Int) and Batch(::Int, ::Int, ::Int) for the different constructors.

The functor

An instance of Batch can be called on an instance of DataLoader to produce a sequence of samples that contain all the input data, i.e. for training for one epoch.

The output of applying batch:Batch to dl::DataLoader is a tuple of vectors of integers. Each of these vectors contains two integers: the first is the time index and the second one is the parameter index.

Examples

Consider the following example for drawing batches of size 2 for an instance of DataLoader constructed with a vector:

using ReducedComplexityModeling
using ReducedComplexityModeling: Batch
import Random

rng = Random.TaskLocalRNG()
Random.seed!(rng, 123)

dl = DataLoader(rand(rng, 5))
batch = Batch(2)

batch(dl)

# output

[ Info: You have provided a matrix as input. The axes will be interpreted as (i) system dimension and (ii) number of parameters.
([(1, 5), (1, 3)], [(1, 4), (1, 1)], [(1, 2)])

Here the first index is always 1 (the time dimension). We get a total number of 3 batches. The last batch is only of size 1 because we sample without replacement. Also see the docstring for DataLoader(::AbstractVector).

source

ReducedComplexityModeling.Batch — Method

Batch(batch_size)

Make an instance of Batch for a specific batch size.

This is, among others, used to train neural networks of GeometricMachineLearning.NeuralNetworkIntegrator type (as opposed to GeometricMachineLearning.TransformerIntegrator).

source

ReducedComplexityModeling.CartesianParameterSampler — Type

source

ReducedComplexityModeling.DataLoader — Type

DataLoader(data)

Make an instance based on a data set.

This is designed to make training convenient.

Fields of DataLoader

The fields of the DataLoader struct are the following:

input: The input data with axes (i) system dimension, (ii) number of time steps and (iii) number of parameters.
output: The tensor that contains the output (supervised learning) - this may be of type Nothing if the constructor is only called with one tensor (unsupervised learning).
input_dim: The dimension of the system, i.e. what is taken as input by a regular neural network.
input_time_steps: The length of the entire time series (length of the second axis).
n_params: The number of parameters that are present in the data set (length of third axis)
output_dim: The dimension of the output tensor (first axis). If output is of type Nothing, then this is also of type Nothing.
output_time_steps: The size of the second axis of the output tensor. If output is of type Nothing, then this is also of type Nothing.

Implementation

Even though DataLoader can be called with inputs of various forms, internally it always stores tensors with three axes.

using ReducedComplexityModeling

data = [1 2 3; 4 5 6]
dl = DataLoader(data)
dl.input

# output

[ Info: You have provided a matrix as input. The axes will be interpreted as (i) system dimension and (ii) number of parameters.
2×1×3 Array{Int64, 3}:
[:, :, 1] =
 1
 4

[:, :, 2] =
 2
 5

[:, :, 3] =
 3
 6

source

ReducedComplexityModeling.DataLoader — Method

DataLoader(data::AbstractVector)

Make an instance of DataLoader based on a vector.

Extended help

If the input to DataLoader is a vector, it is assumed that this vector represents one-dimensional time-series data and is therefore processed as:

    DataLoader(data::AbstractVector; autoencoder=true) = DataLoader(reshape(data, 1, length(data)); autoencoder = autoencoder)

source

ReducedComplexityModeling.DataLoader — Method

DataLoader(data::QPT)

Make an instance of DataLoader based on $(q, p)$ data.

Implementation

In this case the field input_dim of DataLoader is interpreted as the sum of the $q$- and $p$-dimensions, i.e. if $q$ and $p$ both evolve on $\mathbb{R}^n$, then input_dim is $2n$.

Apart from this the input is treated similarly as if it were an Array, i.e. everything is converted to tensors internally. See e.g. DataLoader{::AbstractArray{<:Number, 3}}.

source

ReducedComplexityModeling.DataLoader — Method

DataLoader(data::AbstractArray{<:Number, 3})

Make an instance of DataLoader for data that are in tensor format.

Arguments

There are two optional keyword arguments:

autoencoder = false and
suppress_info = false.

By default the data are stored as TimeSeries type. If you want to train an GeometricMachineLearning.AutoEncoder with your data call:

    DataLoader(data; autoencoder = true)

The default is equivalent to autoencoder = false.

By default we have:

using ReducedComplexityModeling

data = [ 1;  2;  3;;
         4;  5;  6;;;
         7;  8;  9;;
        10; 11; 12]

DataLoader(data)

# output

┌ Info: You have provided a tensor with three axes as input. They will be interpreted as
└  (i) system dimension, (ii) number of time steps and (iii) number of params.
DataLoader{Int64, Array{Int64, 3}, Nothing, :TimeSeries}([1 4; 2 5; 3 6;;; 7 10; 8 11; 9 12], nothing, 3, 2, 2, nothing, nothing)

But if we write

using ReducedComplexityModeling

data = [ 1;  2;  3;;
         4;  5;  6;;;
         7;  8;  9;;
        10; 11; 12]

DataLoader(data; suppress_info = true)

# output

DataLoader{Int64, Array{Int64, 3}, Nothing, :TimeSeries}([1 4; 2 5; 3 6;;; 7 10; 8 11; 9 12], nothing, 3, 2, 2, nothing, nothing)

the @info statement is not printed.

source

ReducedComplexityModeling.DataLoader — Method

DataLoader(data::AbstractMatrix)

Make an instance of DataLoader based on a matrix.

Arguments

See DataLoader(::AbstractArray{<:Number, 3}) for details.

Implementation

Internally the data are reshaped to a tensor of shape (size(data)..., 1) to make for a consistent representation.

source

ReducedComplexityModeling.DataLoader — Method

DataLoader(ensemble_solution)

Make an instance of DataLoader for a EnsembleSolution.

EnsembleSolutions are imported from the the package GeometricSolutions.jl.

Arguments

This functor for DataLoader also has the keyword arguments

autoencoder = false and
suppress_info = false.

See the docstring for DataLoader(::AbstractArray{<:Number, 3}).

Implementation

Internally this stores the data as a tensor where the third axis has length equal to the number of solutions in the ensemble.

source

ReducedComplexityModeling.DataLoader — Method

DataLoader(solution)

Make an instance of DataLoader for a GeometricSolution.

GeometricSolutions are imported from the the package GeometricSolutions.jl.

Arguments

This functor for DataLoader also has the keyword arguments

autoencoder = false and
suppress_info = false.

See the docstring for DataLoader(::AbstractArray{<:Number, 3}).

Implementation

Internally this stores the data as a tensor where the third axis has length 1.

source

ReducedComplexityModeling.DataLoader — Method

DataLoader(dl, backend)

Make a new instance of DataLoader based on an existing instance of DataLoader for a new backend.

This allocates new memory of the same size as is used for the original dl and then copies the data.

By default the data type remains unchanged, i.e. eltype(DataLoader(dl, backend)) == eltype(dl) is true.

If you want to change the data type write e.g.

    DataLoader(dl, backend, Float32)

Arguments

There is an optional keyword argument

autoencoder = nothing.

By default this inherits the autoencoder property form dl.

See the docstring for DataLoader(data::AbstractArray{<:Number, 3}).

source

ReducedComplexityModeling.DataLoader — Method

DataLoader(data::AbstractArray{T, 3}, target::AbstractVector)

Make an instance of DataLoader for a classification problem.

Target here is a vector of labels. This is tailored towards being used with the package MLDatasets.jl.

Arguments

There are two keyword arguments:

patch_length = 7. This is the length of the patch in the $x$ and the $y$ direction;
suppress_info = false.

For the example of the MNIST data set all images are of size $49\times49$. For patch_length = 7 the image is therefore split into 16 $7\times7$ patches [1].

source

ReducedComplexityModeling.Parameter — Type

source

ReducedComplexityModeling.ParameterSampler — Type

source

ReducedComplexityModeling.ParameterSpace — Type

ParameterSpace collects all parameters of a system as well as samples in the parameter space.

source

ReducedComplexityModeling.ReducedBasisModel — Type

Reduced Basis

Uses ModelOrderReduction methods from ReducedBasisMethods.jl to represent the data in latent space.

source

ReducedComplexityModeling.accuracy — Method

accuracy(nn, dl)

Compute the accuracy of a neural network classifier.

This is like accuracy(::Chain, ::Tuple, ::DataLoader), but for a NeuralNetwork.

source

ReducedComplexityModeling.accuracy — Method

accuracy(model, ps, dl)

Compute the accuracy of a neural network classifier.

This needs an instance of DataLoader that stores the test data.

source

ReducedComplexityModeling.assign_output_estimate — Method

assign_output_estimate(full_output, prediction_window)

Crop the output to get the correct number of output vectors.

The function assign_output_estimate is closely related to the GeometricMachineLearning.Transformer. It takes the last prediction_window columns of the output and uses them for the prediction.

i.e.

\[\mathbb{R}^{N\times{}T}\to\mathbb{R}^{N\times\mathtt{pw}}, \begin{bmatrix} z^{(1)}_1 & \cdots & z^{(T)}_1 \\ \cdots & \cdots & \cdots \\ z^{(1)}_n & \cdots & z^{(T})_n \end{bmatrix} \mapsto \begin{bmatrix} z^{(T - \mathtt{pw})}_1 & \cdots & z^{(T)}_1 \\ \cdots & \cdots & \cdots \\ z^{(T - \mathtt{pw})}_n & \cdots & z^{(T})_n\end{bmatrix}\]

If prediction_window is equal to sequence_length, then this is not needed.

source

ReducedComplexityModeling.convert_input_and_batch_indices_to_array — Method

convert_input_and_batch_indices_to_array(dl, batch, batch_indices)

Assign batch data based on (i) input and (ii) batch indices.

Examples

using ReducedComplexityModeling
using ReducedComplexityModeling: Batch

dl = DataLoader([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]; suppress_info = true)
batch = Batch(3)
batch_indices = [(1, 1), (1, 3), (1, 5)]

ReducedComplexityModeling.convert_input_and_batch_indices_to_array(dl, batch, batch_indices)

# output

1×1×3 Array{Float64, 3}:
[:, :, 1] =
 0.1

[:, :, 2] =
 0.3

[:, :, 3] =
 0.5

source

ReducedComplexityModeling.h5load — Method

Load ParameterSpace

source

ReducedComplexityModeling.h5save — Method

save ParameterSpace

source

ReducedComplexityModeling.learn — Method

learn(prob)

Uses data from high-fidelity simulations/experiments and learns the latent representation of the problem.

source

ReducedComplexityModeling.number_of_batches — Method

number_of_batches(dl, batch)

Compute the number of batches.

Here the distinction is between data that are time-series like and data that are autoencoder like.

Examples

using ReducedComplexityModeling
using ReducedComplexityModeling: number_of_batches
using ReducedComplexityModeling: Batch
import Random

Random.seed!(123)

dat = [1, 2, 3, 4, 5]
dl₁ = DataLoader(dat; autoencoder = false, suppress_info = true) # time series-like
dl₂ = DataLoader(dat; autoencoder = true, suppress_info = true) # autoencoder-like
batch = Batch(3)

nob₁ = number_of_batches(dl₁, batch)
nob₂ = number_of_batches(dl₂, batch)
println(stdout, "Number of batches of dl₁: ", nob₁)
println(stdout, "Number of batches of dl₂: ", nob₂)
println(stdout, batch(dl₁), "\n", batch(dl₂))

# output

Number of batches of dl₁: 2
Number of batches of dl₂: 2
([(1, 1), (4, 1), (2, 1)], [(3, 1)])
([(1, 3), (1, 2), (1, 4)], [(1, 1), (1, 5)])

Here we see that in the autoencoder case that last minibatch has an additional element.

source

ReducedComplexityModeling.onehotbatch — Method

onehotbatch(target)

Performs a one-hot-batch encoding of a vector of integers: $input\in\{0,1,\ldots,9\}^\ell$.

The output is a tensor of shape $10\times1\times\ell$.

If the input is $0$, this function produces:

\[0 \mapsto \begin{bmatrix} 1 & 0 & \ldots & 0 \end{bmatrix}^T.\]

In more abstract terms: $i \mapsto e_i$.

Examples

using ReducedComplexityModeling: onehotbatch

target = [0]
onehotbatch(target)

# output

10×1×1 Array{Int64, 3}:
[:, :, 1] =
 1
 0
 0
 0
 0
 0
 0
 0
 0
 0

source

ReducedComplexityModeling.read_parameters — Function

read parameters

source

ReducedComplexityModeling.sample — Method

source

ReducedComplexityModeling.save_parameters — Method

save parameters

source

ReducedComplexityModeling.split_and_flatten — Method

split_and_flatten(input::AbstractArray)::AbstractArray

Perform a preprocessing of an image into flattened patches.

This rearranges the input data so that it can easily be processed with a transformer.

Examples

Consider a matrix of size $6\times6$ which we want to divide into patches of size $3\times3$.

using ReducedComplexityModeling
using ReducedComplexityModeling: split_and_flatten

input = [ 1  2  3  4  5  6;
          7  8  9 10 11 12;
         13 14 15 16 17 18;
         19 20 21 22 23 24;
         25 26 27 28 29 30;
         31 32 33 34 35 36]

split_and_flatten(input; patch_length = 3, number_of_patches = 4)

# output

9×4 Matrix{Int64}:
  1  19   4  22
  7  25  10  28
 13  31  16  34
  2  20   5  23
  8  26  11  29
 14  32  17  35
  3  21   6  24
  9  27  12  30
 15  33  18  36

Here we see that split_and_flatten:

splits the original matrix into four $3\times3$ matrices and then
flattens each matrix into a column vector of size $9.$

After this all the vectors are put together again to yield a $9\times4$ matrix.

Arguments

The optional keyword arguments are:

patch_length: by default this is 7.
number_of_patches: by default this is 16.

The sizes of the first and second axis of the output of split_and_flatten are

$\mathtt{path\_length}^2$ and
number_of_patches.

source

[1]: B. Brantner. Generalizing Adam To Manifolds For Efficiently Training Transformers, arXiv preprint arXiv:2305.16901 (2023).