tutorial.md 9.46 KB
 Alberto Ramos committed Jul 14, 2020 1 2 3 # Getting started ## Basic tutorial  Alberto Ramos committed Jul 17, 2020 4 5 6 7 ADerrors.jl is a package for error propagation and analysis of Monte carlo data. At the core of the package is the uwreal data type. It contains variables with uncertainties. In very simple cases this is just a central value and an error  Alberto Ramos committed Jul 14, 2020 8 9 10 11 @repl gs using ADerrors a = uwreal([1.0, 0.1], 1) # 1.0 +/- 0.1   Alberto Ramos committed Jul 17, 2020 12 13 But it also supports the case where the uncertainty comes from Monte Carlo measurements  Alberto Ramos committed Jul 14, 2020 14 @repl gs  Alberto Ramos committed Jul 17, 2020 15 # Generate some MC correlated data  Alberto Ramos committed Jul 14, 2020 16 17 18 19 20 21 22 23 24 25 26 27 28 eta = randn(1000); x = Vector{Float64}(undef, 1000); x[1] = 0.0; for i in 2:1000 x[i] = x[i-1] + eta[i] if abs(x[i]) > 1.0 x[i] = x[i-1] end end b = uwreal(x.^2, 200) c = uwreal(x.^4, 200)   Alberto Ramos committed Jul 17, 2020 29 Correlations between variables are treated consistently. Each call to uwreal contains an ensemble ID tag. In the previous examples a has been measured on ensemble ID 1, while both b and c have been measured on ensemble ID 200. Monte Carlo measurements with the same ID tag are assumed to be measured on the same configurations: b and c above are statistically correlated, while a will be uncorrelated with both b and c.  Alberto Ramos committed Jul 14, 2020 30 31 32 33 34 35 36  One can perform operations with uwreal variables as if there were normal floats @repl gs d = 2.0 + sin(b)/a + 2.0*log(c)/(a+b+1.0)  Error propagation is done automatically, and correlations are consistently taken into account. Note that once complex derived uwreal variables are defined, as for example d above, they will in general depend on more than one ensemble ID.  Alberto Ramos committed Jul 17, 2020 37 38 The message "(Error not available... maybe run uwerr)" in all the above statements suggests that newly defined uwreal variables do not know their error yet. In order to perform the error analysis of a variable, one should use the uwerr function  Alberto Ramos committed Jul 14, 2020 39 40 41 42 @repl gs uwerr(d); println("d: ", d)   Alberto Ramos committed Jul 17, 2020 43 A call to uwerr will apply the \Gamma-method to MC ensembles. ADerrors.jl can give detailed information on the error of a variable (this requires that uwerr has been called)  Alberto Ramos committed Jul 14, 2020 44 45 46 47 @repl gs println("Details on variable d: ") details(d)   Alberto Ramos committed Jul 17, 2020 48 Here we can see that there are two ensembles contributing to the uncertainty in d. We recognize this as d being a function of both a (measured on ensemble 1), and b and c, measured on ensemble 200. Most of the error in d comes in fact from ensemble 200.  Alberto Ramos committed Jul 14, 2020 49   Alberto Ramos committed Jul 17, 2020 50 Note that one does not need to use uwerr on a variable unless one is interested in the error on that variable. For example, continuing with the previous example  Alberto Ramos committed Jul 14, 2020 51 52 53 54 55 56 57 58 59 @repl gs global x = 1.0 for i in 1:100 global x = x - (d*cos(x) - x)/(-d*sin(x) - 1.0) end uwerr(x) print("Root of d*cos(x) - x:" ) details(x)   Alberto Ramos committed Jul 17, 2020 60 determines the root on d\cos(x) - x by using Newton's method, and propagates the error on d into an error on the root. The error on x is only determined once, after the 100 iterations are over. This feature is useful becase calls to uwerr are typically numerically expensive in comparison with mathematical operations of uwreal's.  Alberto Ramos committed Jul 14, 2020 61 62 63 64 65  ## Advanced topics ### Errors in fit parameters  Alberto Ramos committed Jul 17, 2020 66 ADerrors.jl does not provide an interface to perform fits, but once the minima of the \chi^2 has been found, it can return the fit parameters as uwreal variables.  Alberto Ramos committed Jul 14, 2020 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86  Here we are going to [repeat the example](https://github.com/JuliaNLSolvers/LsqFit.jl) of the package LsqFit.jl using ADerrors.jl for error propagation. @repl fits using LsqFit, ADerrors # a two-parameter exponential model # x: array of independent variables # p: array of model parameters # model(x, p) will accept the full data set as the first argument x. # This means that we need to write our model function so it applies # the model to the full dataset. We use @. to apply the calculations # across all rows. @. model(x, p) = p[1]*exp(-x*p[2]) # some example data # xdata: independent variables # ydata: dependent variable as uwreal xdata = range(0, stop=10, length=20); ydata = Vector{uwreal}(undef, length(xdata)); for i in eachindex(ydata)  Alberto Ramos committed Jul 17, 2020 87 88  # Note that we assign to point i the ID 1000*i ydata[i] = uwreal([model(xdata[i], [1.0 2.0]) + 0.01*getindex(randn(1),1), 0.01], 1000*i)  Alberto Ramos committed Jul 14, 2020 89 90 91 92  uwerr(ydata[i]) # We will need the errors for the weights of the fit end p0 = [0.5, 0.5];   Alberto Ramos committed Jul 17, 2020 93 94 95 96 97 98 We are ready to fit (xdata, ydata) to our model using LsqFit.jl @repl fits # Fit the data with LsqFit.jl fit = curve_fit(model, xdata, value.(ydata), 1.0 ./ err.(ydata).^2, p0)  In order to use ADerrors.jl we need the \chi^2 function.  Alberto Ramos committed Jul 14, 2020 99 100 @repl fits # ADerrors will need the chi^2 as a function of the fit parameters and  Alberto Ramos committed Jul 17, 2020 101 # the data. This can be constructed easily from the model function above  Alberto Ramos committed Jul 14, 2020 102 103 chisq(p, d) = sum( (d .- model(xdata, p)) .^2 ./ 0.01^2)   Alberto Ramos committed Jul 17, 2020 104 A call to fit_error will return the fit parameters as a Vector{uwreal} and also the expected value of the \chi^2 as a Float64  Alberto Ramos committed Jul 14, 2020 105 @repl fits  Alberto Ramos committed Jul 17, 2020 106 107 108 109 110 111 112 113 114 # This is error propagation with ADerrors.jl (fitp, cexp) = fit_error(chisq, coef(fit), ydata); println("Fit results: \n chi^2 / chi^2_exp: ", sum(fit.resid .^2), " / ", cexp, " (dof: ", dof(fit), ")") # fitp are the fit parameters in uwreal # format. One can use them as any other uwreal # For example to determine the main contribution # to its uncertainty  Alberto Ramos committed Jul 14, 2020 115 for i in 1:2  Alberto Ramos committed Jul 17, 2020 116 117 118  uwerr(fitp[i]) print("Fit parameter: ", i, ": ") details(fitp[i])  Alberto Ramos committed Jul 14, 2020 119 120 121 122 123 124 125 126 127 end  !!! note ADerrors.jl is completely agnostic about fitting the data. Error propagation is performed once the minimum of the \chi^2 function is known, and it does not matter how this minima is found. One is not constrained to use LsqFit.jl, as there are many alternatives in the Julia ecosystem: LeastSquaresOptim.jl, MINPACK.jl, Optim.jl, ... ### Missing measurements  Alberto Ramos committed Jul 17, 2020 128 ADerrors.jl can deal with observables that are not measured on every configuration, and still take correctly the correlations/autocorrelations into account. Here we show an extreme case where one observable is measured only in the even configurations, while the other is measured on the odd configurations.  Alberto Ramos committed Jul 14, 2020 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 @repl gaps using ADerrors, Plots pgfplotsx(); # Generate some correlated data eta = randn(10000); x = Vector{Float64}(undef, 10000); x[1] = 0.0; for i in 2:10000 x[i] = x[i-1] + 0.2*eta[i] if abs(x[i]) > 1.0 x[i] = x[i-1] end end # x^2 only measured on odd configurations x2 = uwreal(x[1:2:9999].^2, 1001, collect(1:2:9999), 10000) # x^4 only measured on even configurations x4 = uwreal(x[2:2:10000].^4, 1001, collect(2:2:10000), 10000)   Alberto Ramos committed Jul 17, 2020 149 150 151 152 153 154 the variables x2 and x4 are normal uwreal variables, and one can work with them as with any other variable. Note however that the automatic determination of the summation window can fail in these cases, because the autocorrelation function with missing measurements can be very complicated. For example @repl gaps rat = x2/x4 # This is perfectly fine uwerr(rat); # Use automatic window: might fail!  In this case uwerr complains because with the automatically chosen window the variance for ensemble with ID 1001 is negative. Let's have a look at the normalized autocorrelation function  Alberto Ramos committed Jul 14, 2020 155 156 157 158 159 160 161 @repl gaps iw = window(rat, 1001) r = rho(rat, 1001); dr = drho(rat, 1001); plot(collect(1:100), r[1:100], yerr = dr[1:100],  Alberto Ramos committed Jul 17, 2020 162  seriestype = :scatter, title = "Chosen Window: " * string(iw), label="autoCF")  Alberto Ramos committed Jul 14, 2020 163 164 165 savefig("rat_cf.png") # hide  ![rat plot](rat_cf.png)  Alberto Ramos committed Jul 17, 2020 166 The normalized autocorrelation function is oscillating, and the automatically chosen window is 2, producing a negative variance! We better fix the window to 50 for this case  Alberto Ramos committed Jul 14, 2020 167 168 169 @repl gaps wpm = Dict{Int64, Vector{Float64}}() wpm[1001] = [50.0, -1.0, -1.0, -1.0]  Alberto Ramos committed Jul 17, 2020 170 uwerr(rat, wpm) # repeat error analysis with our choice (window=50)  Alberto Ramos committed Jul 14, 2020 171 172 println("Ratio: ", rat)   Alberto Ramos committed Jul 17, 2020 173 174 After a visual examination of the autocorrelation function and a reasonable choice of the summation window, the error in rat is determined. Note however that it is very difficult to anticipate which observables will have a strange autocorrelation function. For example the product of x2 and x4 shows a reasonably well behaved autocorrelation function.  Alberto Ramos committed Jul 14, 2020 175 176 177 178 179 180 181 182 183 @repl gaps prod = x2*x4 uwerr(prod) iw = window(prod, 1001) r = rho(prod, 1001); dr = drho(prod, 1001); plot(collect(1:2*iw), r[1:2*iw], yerr = dr[1:2*iw],  Alberto Ramos committed Jul 17, 2020 184  seriestype = :scatter, title = "Chosen Window: " * string(iw), label="autoCF")  Alberto Ramos committed Jul 14, 2020 185 186 187 savefig("prod_cf.png") # hide  ![pod plot](prod_cf.png)  Alberto Ramos committed Jul 17, 2020 188 In general error analysis with data with arbitrary gaps is possible, and fully supported in ADerrors.jl. However in cases where many gaps are present, data analysis can be tricky, and certainly requires some care.  Alberto Ramos committed Jul 14, 2020 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211  ### BDIO Native format ADerrors.jl can save/read observables (i.e. uwreal's) in BDIO records. Here we describe the format of these records. All observables are stored in a single BDIO record of type BDIO_BIN_GENERIC. The record has the following data in binary little endian format - mean (Float64): The mean value of the observable. - neid (Int32): The number of ensembles contributing to the observable. - ndata (Vector{Int32}(neid)): The length of each ensemble. - nrep (Vector{Int32}(neid)): The number of replica of each enemble. - vrep (Vector{Int32}): The replica vector for each ensemble. - ids (Vector{Int32}(neid)): The ensemble ID's. - nt (Vector{Int32}(neid)): Half the largest replica of each ensemble. - zero (Vector{Float64}(neid)): just neid zeros. - four (Vector{Float64}(neid)): just neid fours. - delta (Vector{Float64}): The fluctuations for each ensemble. !!! alert Obviously this weird format is what it is for some legacy reasons, but it is strongly encouraged that new implementations respect this standard with all its weirdness.