`UncertainData.jl`

is a Julia package for dealing with data values with associated uncertainties and datasets consisting of uncertain values. The main goals are:

Systematic and intuitive ways of representing uncertain data.

Easy and robust resampling of uncertain data, given pre-defined or custom user-defined constraints.

Provide a framework for robust computation of ensemble statistics for uncertain data.

`UncertainData.jl`

was originally designed to form the backbone of the uncertainty handling in the `CausalityTools.jl`

package.

Source code can be found at https://github.com/kahaaga/UncertainData.jl.

For more information, see the package documentation at https://kahaaga.github.io/UncertainData.jl/dev/.

If you use `UncertainData.jl`

for any of your projects or scientific publications, please cite this small Journal of Open Source Software (JOSS) publication.

# An example

Here’s an example illustrating how the functionality of the `UncertainData.jl`

package is integrated with the `CausalityTools.jl`

package. The `jupyter`

notebook was run with a Julia 1.2 kernel.

```
using UncertainData, CausalityTools, Distributions, StatsBase,
Test, Interpolations, Measures, LaTeXStrings
using Plots; pyplot();
```

## Time series with uncertainty

Let’s consider a case of two unidirectionally coupled first-order autoregressive (AR1) processes. We record `N`

points from the built-in `ar1_unidir`

system, collect the 1st and 2nd
variables as `X`

and `Y`

and add some uncertainties to both the indices and the values.

```
N = 100
sys = ar1_unidir(c_xy = 0.8)
X, Y = example_uncertain_indexvalue_datasets(sys, N, (1, 2), tstep = 5,
d_xval = Uniform(0.1, 0.3), d_yval = Uniform(0.1, 0.3),
d_xind = Uniform(1, 3), d_yind = Uniform(1, 3)
);
```

```
qs = [0.05, 0.95] #quantiles
px = plot(X, ylabel = "X", marker = stroke(0.01, :black), ms = 1, c = :black, qs, qs)
py = plot(Y, xlabel = "Index", ylabel = "Y", marker = stroke(0.01, :red), ms = 1, c = :red, qs, qs)
plot(px, py, layout = grid(2, 1), legend = false)
```

Create a fine-grained interpolation grid over the range of the data.

```
tmin = max(minimum(mean.(X.indices)), minimum(mean.(Y.indices)))
tmax = max(maximum(mean.(X.indices)), maximum(mean.(Y.indices)))
intp_grid = tmin:0.01:tmax
```

1.0:0.01:496.0

Decide on a coarser resolution at which we want to analyse the data. Pick all the linearly interpolated data that falls in each bin and summarise it using the `mean`

function.

```
binstep = 5
left_bin_edges = tmin:binstep:tmax
r = InterpolateAndBin(mean, left_bin_edges, Linear(), intp_grid, Flat(OnGrid()))
```

InterpolateAndBin{Linear}(Statistics.mean, 1.0:5.0:496.0, Linear(), 1.0:0.01:496.0, Flat(OnGrid()))

Draw some interpolated-and-binned realisations and plot them:

```
g = left_bin_edges[1:end-1] .+ step(left_bin_edges)/2
#pX = plot(legend = false)
pX = plot(mean.(X.indices), mean.(X.values), label = "X", c = :black)
plot!(X, ylabel = "X", marker = stroke(0.01), c = :black)
vline!(left_bin_edges, c = :black, alpha = 0.5, lw = 0.5, ls = :dot)
for i = 1:50
plot!(g, resample(X, r), label = "", alpha = 0.3, lw = 0.3, c = :black)
end
#pY = plot(legend = false)
pY = plot(mean.(Y.indices), mean.(Y.values), label = "Y", c = :red)
plot!(Y, xlabel = "Index", ylabel = "Y", marker = stroke(0.01), c = :red)
vline!(left_bin_edges, c = :black, alpha = 0.5, lw = 0.5, ls = :dot)
for i = 1:50
plot!(g, resample(Y, r), label = "", alpha = 0.3, lw = 0.3, c = :red)
end
plot(pX, pY, layout = grid(2, 1), legend = false, size = (700, 700),
bottom_margin = 5mm, left_margin = 5mm, tickfont = font(12),
guidefont = font(12))
```

Okay, that looks reasonable! Let’s apply a causality test.

## Defining an `InterpolateBinTest`

We’ll use a predictive asymmetry test as the causality test. Then we just couple that test with the interpolation-and-binning scheme, and compute the causality between `X`

and `Y`

over 100 independent realisations.

```
# Causality test
k, l, m = 1, 1, 1 # embedding parameters
n_subdivisions = floor(Int, N^(1/(k + l + m + 1)))
state_space_binning = RectangularBinning(n_subdivisions)
ηs = -8:8
te_test = VisitationFrequencyTest(k = k, l = l, m = m,
binning = state_space_binning,
ηs = ηs, b = 2) # use base-2 logarithms
pa_test = PredictiveAsymmetryTest(predictive_test = te_test)
# Define interpolate-and-bin test. Draw 100 independent realisations.
ib_test = InterpolateBinTest(pa_test, r, 100)
```

InterpolateBinTest{PredictiveAsymmetryTest{VisitationFrequencyTest}, InterpolateAndBin{Linear}}(PredictiveAsymmetryTest{VisitationFrequencyTest}(predictive_test = VisitationFrequencyTest(k = 1, l = 1, m = 1, n = 1, τ = 1, b = 2, binning_summary_statistic = mean, estimator = VisitationFrequency(), binning = RectangularBinning(3), ηs = -8:8)), InterpolateAndBin{Linear}(Statistics.mean, 1.0:5.0:496.0, Linear(), 1.0:0.01:496.0, Flat(OnGrid())), 100)

## Compute causality statistic

```
res_xy = causality(X, Y, ib_test);
res_yx = causality(Y, X, ib_test);
```

Summarise and get the mean and standard deviation for each prediction lag $\eta$. Plot it.

```
RES_xy = hcat(res_xy...)
RES_yx = hcat(res_yx...)
plot(xlabel = "Prediction lag", ylabel = L"\mathbb{A} (bits)", legend = :topleft)
plot!(1:maximum(ηs), mean(RES_xy, dims = 2), ribbon = std(RES_xy, dims = 2),
label = L"x \to y", c = :black)
plot!(1:maximum(ηs), mean(RES_yx, dims = 2), ribbon = std(RES_yx, dims = 2),
label = L"y \to x", c = :red)
hline!([0], lw = 1, ls = :dot, c = :black, label = "")
```

We find that the predictive asymmetry in the causal direction ($\mathbb{A}_{x \to y}$) is positive, whereas the asymmetry in the non-causal direction ($\mathbb{A}_{y \to x}$) is negative. We thus recover the expected unidirectional causal relationship $X \to Y$. Note that because we defined our time series as `UncertainIndexValueDataset`

, the `causality()`

test function returns a *distribution* of predictive asymmetry values for each prediction lag.

```
```