ReducedComplexityModeling
Documentation for ReducedComplexityModeling.
ReducedComplexityModeling.AutoEncoderModelReducedComplexityModeling.BatchReducedComplexityModeling.BatchReducedComplexityModeling.BatchReducedComplexityModeling.CartesianParameterSamplerReducedComplexityModeling.DataLoaderReducedComplexityModeling.DataLoaderReducedComplexityModeling.DataLoaderReducedComplexityModeling.DataLoaderReducedComplexityModeling.DataLoaderReducedComplexityModeling.DataLoaderReducedComplexityModeling.DataLoaderReducedComplexityModeling.DataLoaderReducedComplexityModeling.DataLoaderReducedComplexityModeling.ParameterReducedComplexityModeling.ParameterSamplerReducedComplexityModeling.ParameterSpaceReducedComplexityModeling.ReducedBasisModelReducedComplexityModeling.accuracyReducedComplexityModeling.accuracyReducedComplexityModeling.assign_output_estimateReducedComplexityModeling.convert_input_and_batch_indices_to_arrayReducedComplexityModeling.h5loadReducedComplexityModeling.h5saveReducedComplexityModeling.learnReducedComplexityModeling.number_of_batchesReducedComplexityModeling.onehotbatchReducedComplexityModeling.read_parametersReducedComplexityModeling.sampleReducedComplexityModeling.save_parametersReducedComplexityModeling.split_and_flatten
ReducedComplexityModeling.AutoEncoderModel — Type
AutoencodersUses Variational Autoencoders to represent the data in its reduced state.
ReducedComplexityModeling.Batch — Type
Batch(batch_size, seq_length)Make an instance of Batch for a specific batch size and a sequence length.
This is used to train neural networks of GeometricMachineLearning.TransformerIntegrator type.
Optionally the prediction window can also be specified by calling:
using ReducedComplexityModeling
using ReducedComplexityModeling: Batch
batch_size = 2
seq_length = 3
prediction_window = 2
Batch(batch_size, seq_length, prediction_window)
# output
Batch{:Transformer}(2, 3, 2)Note that here the batch is of type :Transformer.
ReducedComplexityModeling.Batch — Type
BatchBatch is a struct whose functor acts on an instance of DataLoader to produce a sequence of training samples for training for one epoch.
See Batch(::Int) and Batch(::Int, ::Int, ::Int) for the different constructors.
The functor
An instance of Batch can be called on an instance of DataLoader to produce a sequence of samples that contain all the input data, i.e. for training for one epoch.
The output of applying batch:Batch to dl::DataLoader is a tuple of vectors of integers. Each of these vectors contains two integers: the first is the time index and the second one is the parameter index.
Examples
Consider the following example for drawing batches of size 2 for an instance of DataLoader constructed with a vector:
using ReducedComplexityModeling
using ReducedComplexityModeling: Batch
import Random
rng = Random.TaskLocalRNG()
Random.seed!(rng, 123)
dl = DataLoader(rand(rng, 5))
batch = Batch(2)
batch(dl)
# output
[ Info: You have provided a matrix as input. The axes will be interpreted as (i) system dimension and (ii) number of parameters.
([(1, 5), (1, 3)], [(1, 4), (1, 1)], [(1, 2)])Here the first index is always 1 (the time dimension). We get a total number of 3 batches. The last batch is only of size 1 because we sample without replacement. Also see the docstring for DataLoader(::AbstractVector).
ReducedComplexityModeling.Batch — Method
Batch(batch_size)Make an instance of Batch for a specific batch size.
This is, among others, used to train neural networks of GeometricMachineLearning.NeuralNetworkIntegrator type (as opposed to GeometricMachineLearning.TransformerIntegrator).
ReducedComplexityModeling.DataLoader — Type
DataLoader(data)Make an instance based on a data set.
This is designed to make training convenient.
Fields of DataLoader
The fields of the DataLoader struct are the following:
input: The input data with axes (i) system dimension, (ii) number of time steps and (iii) number of parameters.output: The tensor that contains the output (supervised learning) - this may be of typeNothingif the constructor is only called with one tensor (unsupervised learning).input_dim: The dimension of the system, i.e. what is taken as input by a regular neural network.input_time_steps: The length of the entire time series (length of the second axis).n_params: The number of parameters that are present in the data set (length of third axis)output_dim: The dimension of the output tensor (first axis). Ifoutputis of typeNothing, then this is also of typeNothing.output_time_steps: The size of the second axis of the output tensor. Ifoutputis of typeNothing, then this is also of typeNothing.
Implementation
Even though DataLoader can be called with inputs of various forms, internally it always stores tensors with three axes.
using ReducedComplexityModeling
data = [1 2 3; 4 5 6]
dl = DataLoader(data)
dl.input
# output
[ Info: You have provided a matrix as input. The axes will be interpreted as (i) system dimension and (ii) number of parameters.
2×1×3 Array{Int64, 3}:
[:, :, 1] =
1
4
[:, :, 2] =
2
5
[:, :, 3] =
3
6ReducedComplexityModeling.DataLoader — Method
DataLoader(data::AbstractVector)Make an instance of DataLoader based on a vector.
Extended help
If the input to DataLoader is a vector, it is assumed that this vector represents one-dimensional time-series data and is therefore processed as:
DataLoader(data::AbstractVector; autoencoder=true) = DataLoader(reshape(data, 1, length(data)); autoencoder = autoencoder)ReducedComplexityModeling.DataLoader — Method
DataLoader(data::QPT)Make an instance of DataLoader based on $(q, p)$ data.
Implementation
In this case the field input_dim of DataLoader is interpreted as the sum of the $q$- and $p$-dimensions, i.e. if $q$ and $p$ both evolve on $\mathbb{R}^n$, then input_dim is $2n$.
Apart from this the input is treated similarly as if it were an Array, i.e. everything is converted to tensors internally. See e.g. DataLoader{::AbstractArray{<:Number, 3}}.
ReducedComplexityModeling.DataLoader — Method
DataLoader(data::AbstractArray{<:Number, 3})Make an instance of DataLoader for data that are in tensor format.
Arguments
There are two optional keyword arguments:
autoencoder = falseandsuppress_info = false.
By default the data are stored as TimeSeries type. If you want to train an GeometricMachineLearning.AutoEncoder with your data call:
DataLoader(data; autoencoder = true)The default is equivalent to autoencoder = false.
By default we have:
using ReducedComplexityModeling
data = [ 1; 2; 3;;
4; 5; 6;;;
7; 8; 9;;
10; 11; 12]
DataLoader(data)
# output
┌ Info: You have provided a tensor with three axes as input. They will be interpreted as
└ (i) system dimension, (ii) number of time steps and (iii) number of params.
DataLoader{Int64, Array{Int64, 3}, Nothing, :TimeSeries}([1 4; 2 5; 3 6;;; 7 10; 8 11; 9 12], nothing, 3, 2, 2, nothing, nothing)But if we write
using ReducedComplexityModeling
data = [ 1; 2; 3;;
4; 5; 6;;;
7; 8; 9;;
10; 11; 12]
DataLoader(data; suppress_info = true)
# output
DataLoader{Int64, Array{Int64, 3}, Nothing, :TimeSeries}([1 4; 2 5; 3 6;;; 7 10; 8 11; 9 12], nothing, 3, 2, 2, nothing, nothing)the @info statement is not printed.
ReducedComplexityModeling.DataLoader — Method
DataLoader(data::AbstractMatrix)Make an instance of DataLoader based on a matrix.
Arguments
See DataLoader(::AbstractArray{<:Number, 3}) for details.
Implementation
Internally the data are reshaped to a tensor of shape (size(data)..., 1) to make for a consistent representation.
ReducedComplexityModeling.DataLoader — Method
DataLoader(ensemble_solution)Make an instance of DataLoader for a EnsembleSolution.
EnsembleSolutions are imported from the the package GeometricSolutions.jl.
Arguments
This functor for DataLoader also has the keyword arguments
autoencoder = falseandsuppress_info = false.
See the docstring for DataLoader(::AbstractArray{<:Number, 3}).
Implementation
Internally this stores the data as a tensor where the third axis has length equal to the number of solutions in the ensemble.
ReducedComplexityModeling.DataLoader — Method
DataLoader(solution)Make an instance of DataLoader for a GeometricSolution.
GeometricSolutions are imported from the the package GeometricSolutions.jl.
Arguments
This functor for DataLoader also has the keyword arguments
autoencoder = falseandsuppress_info = false.
See the docstring for DataLoader(::AbstractArray{<:Number, 3}).
Implementation
Internally this stores the data as a tensor where the third axis has length 1.
ReducedComplexityModeling.DataLoader — Method
DataLoader(dl, backend)Make a new instance of DataLoader based on an existing instance of DataLoader for a new backend.
This allocates new memory of the same size as is used for the original dl and then copies the data.
By default the data type remains unchanged, i.e. eltype(DataLoader(dl, backend)) == eltype(dl) is true.
If you want to change the data type write e.g.
DataLoader(dl, backend, Float32)Arguments
There is an optional keyword argument
autoencoder = nothing.
By default this inherits the autoencoder property form dl.
See the docstring for DataLoader(data::AbstractArray{<:Number, 3}).
ReducedComplexityModeling.DataLoader — Method
DataLoader(data::AbstractArray{T, 3}, target::AbstractVector)Make an instance of DataLoader for a classification problem.
Target here is a vector of labels. This is tailored towards being used with the package MLDatasets.jl.
Arguments
There are two keyword arguments:
patch_length = 7. This is the length of the patch in the $x$ and the $y$ direction;suppress_info = false.
For the example of the MNIST data set all images are of size $49\times49$. For patch_length = 7 the image is therefore split into 16 $7\times7$ patches [1].
ReducedComplexityModeling.ParameterSpace — Type
ParameterSpace collects all parameters of a system as well as samples in the parameter space.
ReducedComplexityModeling.ReducedBasisModel — Type
Reduced Basis
Uses ModelOrderReduction methods from ReducedBasisMethods.jl to represent the data in latent space.
ReducedComplexityModeling.accuracy — Method
accuracy(nn, dl)Compute the accuracy of a neural network classifier.
This is like accuracy(::Chain, ::Tuple, ::DataLoader), but for a NeuralNetwork.
ReducedComplexityModeling.accuracy — Method
accuracy(model, ps, dl)Compute the accuracy of a neural network classifier.
This needs an instance of DataLoader that stores the test data.
ReducedComplexityModeling.assign_output_estimate — Method
assign_output_estimate(full_output, prediction_window)Crop the output to get the correct number of output vectors.
The function assign_output_estimate is closely related to the GeometricMachineLearning.Transformer. It takes the last prediction_window columns of the output and uses them for the prediction.
i.e.
\[\mathbb{R}^{N\times{}T}\to\mathbb{R}^{N\times\mathtt{pw}}, \begin{bmatrix} z^{(1)}_1 & \cdots & z^{(T)}_1 \\ \cdots & \cdots & \cdots \\ z^{(1)}_n & \cdots & z^{(T})_n \end{bmatrix} \mapsto \begin{bmatrix} z^{(T - \mathtt{pw})}_1 & \cdots & z^{(T)}_1 \\ \cdots & \cdots & \cdots \\ z^{(T - \mathtt{pw})}_n & \cdots & z^{(T})_n\end{bmatrix}\]
If prediction_window is equal to sequence_length, then this is not needed.
ReducedComplexityModeling.convert_input_and_batch_indices_to_array — Method
convert_input_and_batch_indices_to_array(dl, batch, batch_indices)Assign batch data based on (i) input and (ii) batch indices.
Examples
using ReducedComplexityModeling
using ReducedComplexityModeling: Batch
dl = DataLoader([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]; suppress_info = true)
batch = Batch(3)
batch_indices = [(1, 1), (1, 3), (1, 5)]
ReducedComplexityModeling.convert_input_and_batch_indices_to_array(dl, batch, batch_indices)
# output
1×1×3 Array{Float64, 3}:
[:, :, 1] =
0.1
[:, :, 2] =
0.3
[:, :, 3] =
0.5ReducedComplexityModeling.h5load — Method
Load ParameterSpace
ReducedComplexityModeling.h5save — Method
save ParameterSpace
ReducedComplexityModeling.learn — Method
learn(prob)Uses data from high-fidelity simulations/experiments and learns the latent representation of the problem.
ReducedComplexityModeling.number_of_batches — Method
number_of_batches(dl, batch)Compute the number of batches.
Here the distinction is between data that are time-series like and data that are autoencoder like.
Examples
using ReducedComplexityModeling
using ReducedComplexityModeling: number_of_batches
using ReducedComplexityModeling: Batch
import Random
Random.seed!(123)
dat = [1, 2, 3, 4, 5]
dl₁ = DataLoader(dat; autoencoder = false, suppress_info = true) # time series-like
dl₂ = DataLoader(dat; autoencoder = true, suppress_info = true) # autoencoder-like
batch = Batch(3)
nob₁ = number_of_batches(dl₁, batch)
nob₂ = number_of_batches(dl₂, batch)
println(stdout, "Number of batches of dl₁: ", nob₁)
println(stdout, "Number of batches of dl₂: ", nob₂)
println(stdout, batch(dl₁), "\n", batch(dl₂))
# output
Number of batches of dl₁: 2
Number of batches of dl₂: 2
([(1, 1), (4, 1), (2, 1)], [(3, 1)])
([(1, 3), (1, 2), (1, 4)], [(1, 1), (1, 5)])Here we see that in the autoencoder case that last minibatch has an additional element.
ReducedComplexityModeling.onehotbatch — Method
onehotbatch(target)Performs a one-hot-batch encoding of a vector of integers: $input\in\{0,1,\ldots,9\}^\ell$.
The output is a tensor of shape $10\times1\times\ell$.
If the input is $0$, this function produces:
\[0 \mapsto \begin{bmatrix} 1 & 0 & \ldots & 0 \end{bmatrix}^T.\]
In more abstract terms: $i \mapsto e_i$.
Examples
using ReducedComplexityModeling: onehotbatch
target = [0]
onehotbatch(target)
# output
10×1×1 Array{Int64, 3}:
[:, :, 1] =
1
0
0
0
0
0
0
0
0
0ReducedComplexityModeling.read_parameters — Function
read parameters
ReducedComplexityModeling.save_parameters — Method
save parameters
ReducedComplexityModeling.split_and_flatten — Method
split_and_flatten(input::AbstractArray)::AbstractArrayPerform a preprocessing of an image into flattened patches.
This rearranges the input data so that it can easily be processed with a transformer.
Examples
Consider a matrix of size $6\times6$ which we want to divide into patches of size $3\times3$.
using ReducedComplexityModeling
using ReducedComplexityModeling: split_and_flatten
input = [ 1 2 3 4 5 6;
7 8 9 10 11 12;
13 14 15 16 17 18;
19 20 21 22 23 24;
25 26 27 28 29 30;
31 32 33 34 35 36]
split_and_flatten(input; patch_length = 3, number_of_patches = 4)
# output
9×4 Matrix{Int64}:
1 19 4 22
7 25 10 28
13 31 16 34
2 20 5 23
8 26 11 29
14 32 17 35
3 21 6 24
9 27 12 30
15 33 18 36Here we see that split_and_flatten:
- splits the original matrix into four $3\times3$ matrices and then
- flattens each matrix into a column vector of size $9.$
After this all the vectors are put together again to yield a $9\times4$ matrix.
Arguments
The optional keyword arguments are:
patch_length: by default this is 7.number_of_patches: by default this is 16.
The sizes of the first and second axis of the output of split_and_flatten are
- $\mathtt{path\_length}^2$ and
number_of_patches.
- [1]
- B. Brantner. Generalizing Adam To Manifolds For Efficiently Training Transformers, arXiv preprint arXiv:2305.16901 (2023).