# Uncertain Data

UncertainData.jl provides tools to systematically deal with uncertain data, and to sample from uncertain datasets more rigorously. It makes workflows involving uncertain data of different types and from different sources significantly easier. It was originally designed to form the backbone of the uncertainty handling in CausalityTools.jl.

If you use UncertainData.jl for any of your projects or scientific publications, please cite this Journal of Open Source Software (JOSS) publication.

# An example

Here’s an example illustrating how the functionality of the UncertainData.jl package is integrated with the CausalityTools.jl package. The jupyter notebook was run with a Julia 1.2 kernel. (Note: Pending a 1.0 release of CausalityTools.jl, there may be breaking changes).

using UncertainData, CausalityTools, Distributions, StatsBase,
Test, Interpolations, Measures, LaTeXStrings
using Plots; pyplot();


## Time series with uncertainty

Let’s consider a case of two unidirectionally coupled first-order autoregressive (AR1) processes. We record N points from the built-in ar1_unidir system, collect the 1st and 2nd variables as X and Y and add some uncertainties to both the indices and the values.

N = 100
sys = ar1_unidir(c_xy = 0.8)
X, Y = example_uncertain_indexvalue_datasets(sys, N, (1, 2), tstep = 5,
d_xval = Uniform(0.1, 0.3), d_yval = Uniform(0.1, 0.3),
d_xind = Uniform(1, 3), d_yind = Uniform(1, 3)
);

qs = [0.05, 0.95] #quantiles

px = plot(X, ylabel = "X", marker = stroke(0.01, :black), ms = 1, c = :black, qs, qs)
py = plot(Y, xlabel = "Index", ylabel = "Y", marker = stroke(0.01, :red), ms = 1, c = :red, qs, qs)
plot(px, py, layout = grid(2, 1), legend = false) Create a fine-grained interpolation grid over the range of the data.

tmin = max(minimum(mean.(X.indices)), minimum(mean.(Y.indices)))
tmax = max(maximum(mean.(X.indices)), maximum(mean.(Y.indices)))
intp_grid = tmin:0.01:tmax


1.0:0.01:496.0

Decide on a coarser resolution at which we want to analyse the data. Pick all the linearly interpolated data that falls in each bin and summarise it using the mean function.

binstep = 5
left_bin_edges = tmin:binstep:tmax
r = InterpolateAndBin(mean, left_bin_edges, Linear(), intp_grid, Flat(OnGrid()))


InterpolateAndBin{Linear}(Statistics.mean, 1.0:5.0:496.0, Linear(), 1.0:0.01:496.0, Flat(OnGrid()))

Draw some interpolated-and-binned realisations and plot them:

g = left_bin_edges[1:end-1] .+ step(left_bin_edges)/2

#pX = plot(legend = false)
pX = plot(mean.(X.indices), mean.(X.values), label = "X", c = :black)
plot!(X, ylabel = "X", marker = stroke(0.01), c = :black)
vline!(left_bin_edges, c = :black, alpha = 0.5, lw = 0.5, ls = :dot)

for i = 1:50
plot!(g, resample(X, r), label = "", alpha = 0.3, lw = 0.3, c = :black)
end

#pY = plot(legend = false)
pY = plot(mean.(Y.indices), mean.(Y.values), label = "Y", c = :red)
plot!(Y, xlabel = "Index", ylabel = "Y", marker = stroke(0.01), c = :red)
vline!(left_bin_edges, c = :black, alpha = 0.5, lw = 0.5, ls = :dot)

for i = 1:50
plot!(g, resample(Y, r), label = "", alpha = 0.3, lw = 0.3, c = :red)
end

plot(pX, pY, layout = grid(2, 1), legend = false, size = (700, 700),
bottom_margin = 5mm, left_margin = 5mm, tickfont = font(12),
guidefont = font(12)) Okay, that looks reasonable! Let’s apply a causality test.

## Defining an InterpolateBinTest

We’ll use a predictive asymmetry test as the causality test. Then we just couple that test with the interpolation-and-binning scheme, and compute the causality between X and Y over 100 independent realisations.

# Causality test
k, l, m = 1, 1, 1 # embedding parameters
n_subdivisions = floor(Int, N^(1/(k + l + m + 1)))
state_space_binning = RectangularBinning(n_subdivisions)
ηs = -8:8
te_test = VisitationFrequencyTest(k = k, l = l, m = m,
binning = state_space_binning,
ηs = ηs, b = 2) # use base-2 logarithms
pa_test = PredictiveAsymmetryTest(predictive_test = te_test)

# Define interpolate-and-bin test. Draw 100 independent realisations.
ib_test = InterpolateBinTest(pa_test, r, 100)


InterpolateBinTest{PredictiveAsymmetryTest{VisitationFrequencyTest}, InterpolateAndBin{Linear}}(PredictiveAsymmetryTest{VisitationFrequencyTest}(predictive_test = VisitationFrequencyTest(k = 1, l = 1, m = 1, n = 1, τ = 1, b = 2, binning_summary_statistic = mean, estimator = VisitationFrequency(), binning = RectangularBinning(3), ηs = -8:8)), InterpolateAndBin{Linear}(Statistics.mean, 1.0:5.0:496.0, Linear(), 1.0:0.01:496.0, Flat(OnGrid())), 100)

## Compute causality statistic

res_xy = causality(X, Y, ib_test);
res_yx = causality(Y, X, ib_test);


Summarise and get the mean and standard deviation for each prediction lag $\eta$. Plot it.

RES_xy = hcat(res_xy...)
RES_yx = hcat(res_yx...)

plot(xlabel = "Prediction lag", ylabel = L"\mathbb{A} (bits)", legend = :topleft)
plot!(1:maximum(ηs), mean(RES_xy, dims = 2), ribbon = std(RES_xy, dims = 2),
label = L"x \to y", c = :black)
plot!(1:maximum(ηs), mean(RES_yx, dims = 2), ribbon = std(RES_yx, dims = 2),
label = L"y \to x", c = :red)
hline!(, lw = 1, ls = :dot, c = :black, label = "") We find that the predictive asymmetry in the causal direction ($\mathbb{A}_{x \to y}$) is positive, whereas the asymmetry in the non-causal direction ($\mathbb{A}_{y \to x}$) is negative. We thus recover the expected unidirectional causal relationship $X \to Y$. Note that because we defined our time series as UncertainIndexValueDataset, the causality() test function returns a distribution of predictive asymmetry values for each prediction lag.