Added some tests and full documentation

9d105182 · Alberto Ramos · 710252ae · 9d105182 · 9d105182 · 9d105182
Commit 9d105182 authored Jul 14, 2020 by Alberto Ramos
5 changed files
--- a/docs/src/api.md
+++ b/docs/src/api.md
+# API
+
+
+## Creating `uwerr` data types
+
+```@docs
+uwreal
+cobs
+```
+
+## I/O
+
+```@docs
+write_uwreal
+read_uwreal
+```
+
+## Error analysis 
+
+```@docs
+uwerr
+cov
+trcov
+```
+
+## Information on error analysis
+
+There are several methods to get information about the error analysis of a `uwreal` variable. With the exception of `value`, these methods require that a proper error analysis has been performed via a call to the `uwerr` method.
+
+```@docs
+value
+err
+derror
+taui
+dtaui
+rho
+drho
+window
+details
+neid
+```
+
+## Error propagation in iterative algorithms
+
+### Finding root of functions
+
+```@docs
+root_error
+```
+
+### Fits
+
+`ADerrors.jl` is agnostic about how you approach a fit (i.e. minimize a ``\chi^2`` function), but once the central values of the fit parameters have been found (i.e. using `LsqFit.jl`, `LeastSquaresOptim.jl`, etc... ) `ADerrors.jl` is able to determine the error of the fit parameters, returning them as `uwreal` type. It can also determine the expected value of your ``\chi^2`` given the correlation in your fit parameters.
+
+```@docs
+fit_error
+chiexp
+```
+
+### Integrals
+
+```@docs
+int_error
+```
+
--- a/docs/src/tutorial.md
+++ b/docs/src/tutorial.md
+# Getting started
+
+## Basic tutorial
+`ADerrors.jl` is a package for error propagation and analysis of Monte carlo data. At the core of the package is the `uwreal` data type, that is able to store variables with uncertainties
+```@repl gs
+using ADerrors
+a = uwreal([1.0, 0.1], 1) # 1.0 +/- 0.1
+```
+It can also store MC data
+```@repl gs
+# Generate some correlated data
+eta  = randn(1000);
+x    = Vector{Float64}(undef, 1000);
+x[1] = 0.0;
+for i in 2:1000
+    x[i] = x[i-1] + eta[i]
+    if abs(x[i]) > 1.0
+        x[i] = x[i-1]
+    end
+end
+
+b = uwreal(x.^2, 200)
+c = uwreal(x.^4, 200)
+```
+Correlations between variables are treated consistently. This requires that each variable that is defined with `uwreal` contains an ensemble `ID` tag. In the previous examples `a` has been measured on ensemble `ID` 1, while both `b` and `c` have been measured on ensemble `ID` 200. This will treat the measurements in `b` and `c` as statistically correlated, while `a` will be uncorrelated with both `b` and `c`.
+
+One can perform operations with `uwreal` variables as if there were normal floats
+```@repl gs
+d = 2.0 + sin(b)/a + 2.0*log(c)/(a+b+1.0)
+```
+Error propagation is done automatically, and correlations are consistently taken into account. Note that once complex derived `uwreal` variables are defined, as for example `d` above, they will in general depend on more than one ensemble `ID`. 
+
+In order to perform the error analysis of a variable, one should use the `uwerr` function
+```@repl gs
+uwerr(d);
+println("d: ", d)
+```
+One can get detailed information on the error of a variables 
+```@repl gs
+uwerr(d);
+println("Details on variable d: ")
+details(d)
+```
+where we can clearly see that there are two ensembles contributing to the uncertainty in `d`. We recognize this as `d` being a function of both `a` (measured on ensemble 1), and `b` and `c`, measured on ensemble 200.
+
+Note that one does not need to use `uwerr` on a variable unless one is interested in the error on that variable. For example
+```@repl gs
+global x = 1.0
+for i in 1:100
+	global x = x - (d*cos(x) - x)/(-d*sin(x) - 1.0)
+end
+uwerr(x)
+print("Root of d*cos(x) - x:" )
+details(x)
+```
+determines the root on ``d\cos(x) - x`` by using Newton's method, and propagates the error on `d` into an error on the root. The error on `x` is only determined once, after the 100 iterations are over. 
+
+## Advanced topics
+
+### Errors in fit parameters
+
+`ADerrors.jl` does not provide an interface to perform fits, but once the minima of the ``\chi^2`` has been found, it can help propagating the error from the data to the fit parameters. 
+
+Here we are going to [repeat the example](https://github.com/JuliaNLSolvers/LsqFit.jl) of the package `LsqFit.jl` using `ADerrors.jl` for error propagation.
+```@repl fits
+using LsqFit, ADerrors
+
+# a two-parameter exponential model
+# x: array of independent variables
+# p: array of model parameters
+# model(x, p) will accept the full data set as the first argument `x`.
+# This means that we need to write our model function so it applies
+# the model to the full dataset. We use `@.` to apply the calculations
+# across all rows.
+@. model(x, p) = p[1]*exp(-x*p[2])
+
+# some example data
+# xdata: independent variables
+# ydata: dependent variable as uwreal
+xdata = range(0, stop=10, length=20);
+ydata = Vector{uwreal}(undef, length(xdata));
+for i in eachindex(ydata)
+	ydata[i] = uwreal([model(xdata[i], [1.0 2.0]) + 0.01*getindex(randn(1),1), 0.01], i)
+	uwerr(ydata[i]) # We will need the errors for the weights of the fit
+end
+p0 = [0.5, 0.5];
+```
+We are ready to fit `(xdata, ydata)` to our model using `LsqFit.jl`, but for error propagation using `ADerrors.jl` we need the ``\chi^2`` function.
+```@repl fits
+# ADerrors will need the chi^2 as a function of the fit parameters and 
+# the data. This can be constructed easily from the model above
+chisq(p, d) = sum( (d .- model(xdata, p)) .^2 ./ 0.01^2)
+```
+Now we can fit the data and compute the uncertainties in our fit parameters
+```@repl fits
+fit = curve_fit(model, xdata, value.(ydata), 1.0 ./ err.(ydata).^2, p0) # This is LsqFit.jl
+(fitp, cexp) = fit_error(chisq, coef(fit), ydata); # This is error propagation with ADerrors.jl
+uwerr.(fitp);
+
+println("chi^2 / chi^2_exp: ", sum(fit.resid .^2), " / ", cexp, " (dof: ", dof(fit), ")")
+for i in 1:2
+	println("Fit parameter: ", i, ": ", fitp[i])
+end
+```
+
+!!! note 
+    `ADerrors.jl` is completely agnostic about fitting the data. Error propagation is performed once the minimum of the ``\chi^2`` function is known, and it does not matter how this minima is found. One is not constrained to use `LsqFit.jl`, as there are many alternatives in the `Julia` ecosystem: `LeastSquaresOptim.jl`, `MINPACK.jl`, `Optim.jl`, ...
+
+
+### Missing measurements 
+
+`ADerrors.jl` can deal with observables that are not measured on every configuration, and still take correctly the correlations/autocorrelations into account. Here we show an extreme case where one observable is measured only in the even configurations, while the oher is measured on the odd coficurations.
+```@repl gaps
+using ADerrors, Plots
+pgfplotsx();
+
+# Generate some correlated data
+eta  = randn(10000);
+x    = Vector{Float64}(undef, 10000);
+x[1] = 0.0;
+for i in 2:10000
+    x[i] = x[i-1] + 0.2*eta[i]
+    if abs(x[i]) > 1.0
+        x[i] = x[i-1]
+    end
+end
+
+# x^2 only measured on odd configurations
+x2 = uwreal(x[1:2:9999].^2,  1001, collect(1:2:9999),  10000)
+# x^4 only measured on even configurations
+x4 = uwreal(x[2:2:10000].^4, 1001, collect(2:2:10000), 10000)
+
+rat = x2/x4
+uwerr(rat);
+```
+In this case `uwerr` complains because with the chosen window the variance for ensemble with ID 1001 is negative. Looking at the normalized autocorrelation function we can easily spot the problem
+```@repl gaps
+iw = window(rat, 1001)
+r  = rho(rat, 1001);
+dr = drho(rat, 1001);
+plot(collect(1:100), 
+	r[1:100], 
+	yerr = dr[1:100], 
+	seriestype = :scatter, title = "Chosen Window: " * string(iw))
+savefig("rat_cf.png") # hide
+```
+![rat plot](rat_cf.png)
+The normalized autocorrelation function is oscillating, and the chosen window is 2! We better fix the window to 50 for this case
+```@repl gaps
+wpm = Dict{Int64, Vector{Float64}}()
+wpm[1001] = [50.0, -1.0, -1.0, -1.0]
+uwerr(rat, wpm)
+println("Ratio: ", rat)
+```
+
+Note however that this is very observable dependent
+```@repl gaps
+prod = x2*x4
+uwerr(prod)
+iw = window(prod, 1001)
+r  = rho(prod, 1001);
+dr = drho(prod, 1001);
+plot(collect(1:2*iw), 
+	r[1:2*iw], 
+	yerr = dr[1:2*iw], 
+	seriestype = :scatter, title = "Chosen Window: " * string(iw))
+savefig("prod_cf.png") # hide
+```
+![pod plot](prod_cf.png)
+In general error analysis with data with arbitrary gaps is possible, and fully supported in `ADerrors.jl`, but it can be tricky, and certaiinly requires to examine the data carefully.
+
+### `BDIO` Native format
+
+`ADerrors.jl` can save/read observables (i.e. `uwreal`'s) in `BDIO` records. Here we describe the format of these records.
+
+All observables are stored in a single `BDIO` record of type `BDIO_BIN_GENERIC`. The record has the following data in binary little endian format
+- mean (`Float64`): The mean value of the observable.
+- neid (`Int32`): The number of ensembles contributing to the observable.
+- ndata (`Vector{Int32}(neid)`): The length of each ensemble.
+- nrep (`Vector{Int32}(neid)`): The number of replica of each enemble.
+- vrep (`Vector{Int32}`): The replica vector for each ensemble.
+- ids (`Vector{Int32}(neid)`): The ensemble `ID`'s.
+- nt (`Vector{Int32}(neid)`): Half the largest replica of each ensemble.
+- zero (`Vector{Float64}(neid)`): just `neid` zeros.
+- four (`Vector{Float64}(neid)`): just `neid` fours.
+- delta  (`Vector{Float64}`): The fluctuations for each ensemble.
+
+!!! alert
+    Obviously this weird format is what it is for some legacy reasons, but it is strongly encouraged that new implementations respect this standard with all its weirdness. 
+
+
+
+
--- a/test/test_cov1.jl
+++ b/test/test_cov1.jl
+using ADerrors, QuadGK, ForwardDiff
+
+# Average values and covariance from
+# https://inspirehep.net/literature/1477411
+# here we try to reproduce the result
+# Scale ratio = 21.86(42) Eq.5.6
+avg = [16.26, 0.12, -0.0038]
+Mcov = [0.478071 -0.176116 0.0135305
+        -0.176116 0.0696489 -0.00554431
+        0.0135305 -0.00554431 0.000454180]
+
+p = cobs(avg, Mcov, [1, 2001, 32])
+g1s = uwreal([2.6723, 0.0064], 4)
+g2s = 11.31
+
+fs(a, b, p) = -p[1]/2.0 * (1.0/b-1.0/a) +
+    p[2]/2.0 * log(b/a) +
+    p[3]/2.0 * (b - a) 
+srat = 2.0*exp(fs(g1s, g2s, p))
+uwerr(srat)
+println("Computation of scale factor (should be 21.86(42))")
+println("  From direct evaluation:   ", srat)
+
+fint(x, p) = - (p[1] + p[2]*x^2 + p[3]*x^4)/x^3
+g1 = sqrt(g1s)
+g2 = sqrt(g2s)
+sint = 2.0*exp(-int_error(fint, g1, g2, p))
+uwerr(sint)
+print("  From integral evaluation: ")
+details(sint)
+( (abs(err(srat)-err(sint)) < 1.0E-10) && (abs(value(srat)-value(sint)) < 1.0E-10) )
+
+
+v = value.(p)
+ff(x) = fint(x, v)
+
+a = value(g1)
+b = g2
+(di,foo) = quadgk(ff, a, b)
+println(di)
+
+function ftest(p)
+    ff(x) = fint(x, p)
+    (di,foo) = quadgk(ff, a, b)
+    return di
+end
+
+function simps(f::Function, a, b, tol::Float64 = 1.0E-8)
+    function trap(al, bl, f::Function, sum, n)
+        nt = 2^(n-2)
+        hh = (bl-al)/nt
+        
+        x = al+hh/2.0
+        val = al + f(x)
+        for i in 2:nt
+            x    = x + hh
+            val += f(x)
+        end
+        return ( sum + (bl-al)*val/nt ) / 2.0
+    end
+    al = min(a,b)
+    bl = max(a,b)
+    s1 = (bl-al)*(f(al) + f(bl))/2.0
+    s  = s1
+    for i in 2:1000
+        s2 = trap(al, bl, f, s1, i)
+        err = s - (4.0*s2 - s1)/3.0
+        s   = s - err
+        println(s1, " ", s2, " ", err)
+        if (abs(err)<tol)
+            if (b<a)
+                return -s
+            else
+                return s
+            end
+        end
+        s1 = s2
+    end
+end
+
+function ftest2(p)
+    ff(x) = fint(x, p)
+    di = simps(ff, p[4], p[5])
+    return di
+end
+
+vv = copy(v)
+push!(vv, a)
+push!(vv, b)
+println(ftest(vv))
+println(ftest2(vv))
+hess = ForwardDiff.hessian(ftest, v)
+println(hess)
+hess = ForwardDiff.hessian(ftest2, vv)
+println(hess)
--- a/test/test_cov2.jl
+++ b/test/test_cov2.jl
+using ADerrors, LinearAlgebra # hide
+a = uwreal([1.3, 0.01], 1) # 1.3 +/- 0.01
+b = uwreal([5.3, 0.23], 2) # 5.3 +/- 0.23
+uwerr(a)
+uwerr(b)
+
+x = [a+b, a-b]
+mat = ADerrors.cov(x)
+( (mat[1,1] - mat[2,2]) < 1.0E-10) && (abs(mat[1,2] - (err(a)^2-err(b)^2)) < 1.0E-10) )
--- a/test/test_trcov.jl
+++ b/test/test_trcov.jl
+using ADerrors, LinearAlgebra # hide
+a = uwreal([1.3, 0.01], 1) # 1.3 +/- 0.01
+b = uwreal([5.3, 0.23], 2) # 5.3 +/- 0.23
+c = uwreal(rand(2000), 3)
+
+x = [a+b+sin(c), a-b+cos(c), c-b/a]
+M = [1.0 0.2 0.1
+     0.2 2.0 0.3
+     0.1 0.3 1.0]
+
+mcov = cov(x)
+d = tr(mcov * M)
+(abs(d -trcov(M, x)) < 1.0E-10)