diff --git a/Convergence/.ipynb_checkpoints/Effect of distance metric and dimension of summary statistic on computational cost of ABC model-checkpoint.ipynb b/Convergence/.ipynb_checkpoints/Effect of distance metric and dimension of summary statistic on computational cost of ABC model-checkpoint.ipynb new file mode 100644 index 0000000..677139c --- /dev/null +++ b/Convergence/.ipynb_checkpoints/Effect of distance metric and dimension of summary statistic on computational cost of ABC model-checkpoint.ipynb @@ -0,0 +1,713 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Effect of distance metric and dimension of summary statistic on computational cost of ABC model\n", + "\n", + "The aim of this notebook is to study how ELFI environment can be used to investigate how dimension of summary static and used distance metric affects to computational cost of ABC model. \n", + "\n", + "This notebook is done as a part of course CS-E4070 - Special Course in Machine Learning and Data Science: Seminar Course on Approximate Bayesian Computation, on spring 2018 in Aalto University.\n", + "\n", + "Author: [Ossi Galkin](https://github.com/OssiGalkin)\n", + "\n", + "## Theoretical background\n", + "\n", + "According to the paper \"The Rate of Convergence for Approximate Bayesian Computation\" (Stuart Barber, Jochen Voss, Mark Webster, 2015, https://arxiv.org/abs/1311.2038):\n", + "\n", + "The error of an ABC estimate is affected both by the bias of the ABC samples, controlled by the tolerance parameter $\\delta$, and by Monte Carlo error, controlled by the number n of accepted ABC samples\n", + "\n", + "The computational cost of ABC model satisfies:\n", + "\n", + "\n", + "$$ cost \\sim n * \\delta ^{-q} $$\n", + "\n", + "where n is number of accepted samples and q is dimension of summary statistic. $\\delta$ is tolerance parameter. And under optimal choice of $\\delta$:\n", + "\n", + "$$ error \\sim cost^{-2/(q+4)} $$\n", + "\n", + "Tolerance and number of accepted samples are plain numbers that can be freewly chosen. Dimension of summary however is more complex as it is related to what kind of ABC model is used. Therefore, this notebook focuses on dimension of summary statistic \n", + "\n", + "\n", + "## How dimension of summary statistic affects the cost of ABC\n", + "As summary statistic needs to be something that truly describes phenomenon being investigated and to investigate how dimension of summary statistic affects the computational cost the ABC model, the ABC model needs to be something that can be summarized in multiple dimensions. Therefore, in this notebook, various ABC models tries to predict means of 6-variate normal distribution. Summary statistics will be calculated sample means over different amount of axis. 6-dimensional summary statistic will calculate sample means over all six axes. 3-dimensional will compute 3 different means, thus needing to calculate at least one of those three means over more than one axis. Because there are multiple ways to order which particular axis those 3 means are calculated, more than one 3-dimensional summary statistic is investigated. \n", + "\n", + "This example follows same conventions that are used in [ELFI tutorial](https://elfi.readthedocs.io/en/latest/usage/tutorial.html). Basically 6 variate normal distribution is created with means [1, 3, -2, 0, -4, 0] and covariance matrix:\n", + "\n", + " [[ 0.54283214, -0.01897608, 0.09250538, -0.29508488, 0.20071569, 0.10224879],\n", + " [-0.01897608, 3.17949029, -0.16261362, 3.01116863, -0.05390658, 0.01796679],\n", + " [ 0.09250538, -0.16261362, 0.60222194, 0.0543998 , 0.0066263 , -0.01040673],\n", + " [-0.29508488, 3.01116863, 0.0543998 , 3.79268361, -0.1132359 , -0.07189611],\n", + " [ 0.20071569, -0.05390658, 0.0066263 , -0.1132359 , 0.46941068, 0.14608359],\n", + " [ 0.10224879, 0.01796679, -0.01040673, -0.07189611, 0.14608359, 0.49294955]]\n", + " \n", + "and ABC model is set to generate samples from that distributions. This is not very complicated problem and it could be solved statistically without ABC methods, but it makes easy to investigate how dimension of summary statistic affects results.\n", + "\n", + "In each case 100 samples are drawn from model with tolerance 0.5. This is done for 6, 3, 2, and 1 dimensional summary statistics. In addition, two different summary statistics are used for 3 and 2 dimensional cases.\n", + "\n", + "Following code cell defines model and 6-dimensional summary statistic and performs all necessary imports.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# -*- coding: utf-8 -*-\n", + "\"\"\"\n", + "Created on Fri Apr 13 09:34:36 2018\n", + "\n", + "@author: ossig\n", + "\"\"\"\n", + "import elfi\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "import logging\n", + "import scipy as sp\n", + "from sklearn import datasets\n", + "import scipy.stats as ss\n", + "import numpy as np\n", + "\n", + "logging.basicConfig(level=logging.INFO) # sometimes this is required to enable logging inside Jupyter\n", + "\n", + "#elfi.set_client('multiprocessing')\n", + "\n", + "%matplotlib inline\n", + "%precision 2\n", + "\n", + "# Set an arbitrary seed and a global random state to keep the randomly generated quantities the same between runs\n", + "# NoteToSelf: Statement above is not true, see bug 268\n", + "seed = 20170530\n", + "np.random.seed(seed)\n", + "\n", + "\n", + "# Priors indexing starts at 1, as in ELFI introduction, othervice indexing starts a 0 tought\n", + "prior1 = elfi.Prior('uniform', -2, 4)\n", + "prior2 = elfi.Prior('uniform', 1, 4)\n", + "prior3 = elfi.Prior('uniform', -3, 4)\n", + "prior4 = elfi.Prior('uniform', -2, 4)\n", + "prior5 = elfi.Prior('uniform', -6, 4)\n", + "prior6 = elfi.Prior('uniform', -1, 4)\n", + "\n", + "# Set the generating parameters that we will try to infe\n", + "mean0 = [1]\n", + "mean1 = [3]\n", + "mean2 = [-2]\n", + "mean3 = [0]\n", + "mean4 = [-4]\n", + "mean5 = [0]\n", + "\n", + "cov = datasets.make_spd_matrix(6)\n", + "\n", + "\n", + "# Simulator for 6 variate distribution\n", + "def simulator(mean0, mean1, mean2, mean3, mean4, mean5, random_state=None, batch_size=1):\n", + " assert batch_size == 1, \" Batch sizes other than 0 won't work :(\"\n", + " mean0, mean1, mean2, mean3, mean4, mean5 = np.atleast_1d(mean0, mean1, mean2, mean3, mean4, mean5)\n", + " \n", + " # wrap means to 1d array\n", + " means = [] \n", + " means.append(mean0[0])\n", + " means.append(mean1[0])\n", + " means.append(mean2[0])\n", + " means.append(mean3[0])\n", + " means.append(mean4[0])\n", + " means.append(mean5[0])\n", + " \n", + " # following takes global variable cov as input, as I didn't figure out how to pass constants in Elfi\n", + " return sp.stats.multivariate_normal.rvs(mean=means, cov=cov, size=(batch_size, 30), random_state=random_state)\n", + " \n", + "\n", + "def mean(y, i=0, j=None):\n", + " # NoteToSelf: y is not a numpy array\n", + " # NoteToSelf: if batch_size = 1 elfi uses different dimensions.\n", + " #print(np.shape(np.mean(y[i, :])), np.shape(np.mean(np.mean(y[i:j, :], axis=1))))\n", + " #print(\"debug\")\n", + " if j == None:\n", + " return np.mean(y[i, :]) # this need something \n", + " else:\n", + " return np.mean(np.mean(y[i:j, :], axis=1)) # NoteToSelf: axis=0 or axis=1 ?\n", + "\n", + " \n", + "# Generate some data\n", + "y0 = simulator(mean0, mean1, mean2, mean3, mean4, mean5)\n", + "\n", + "# Add the simulator node and observed data to the model\n", + "sim = elfi.Simulator(simulator, prior1, prior2, prior3, prior4, prior5, prior6, observed=y0)\n", + "\n", + "# Add summary statistics to the model\n", + "S1 = elfi.Summary(mean, sim, 0)\n", + "S2 = elfi.Summary(mean, sim, 1)\n", + "S3 = elfi.Summary(mean, sim, 2)\n", + "S4 = elfi.Summary(mean, sim, 3)\n", + "S5 = elfi.Summary(mean, sim, 4)\n", + "S6 = elfi.Summary(mean, sim, 5)\n", + "\n", + "# Specify distance as euclidean between summary vectors (S1, S2) from simulated and\n", + "# observed data\n", + "\n", + "d = elfi.Distance('euclidean', S1, S2, S3, S4, S5, S6)\n", + "\n", + "pool = elfi.OutputPool(['prior1', 'prior2', 'prior3', 'prior4', 'prior5', 'prior6', 'S1', 'S2', 'S3', 'S4', 'S5', 'S6'])\n", + "\n", + "amountOfSimulations = []\n", + "amountOfDimensions = [] \n", + "\n", + "tolerance = 0.5\n", + "N = 100\n" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "6 Method: Rejection\n", + "Number of samples: 100\n", + "Number of simulations: 199485\n", + "Threshold: 0.5\n", + "Sample means: prior1: -0.31, prior2: 2.7, prior3: -1.35, prior4: -0.203, prior5: -4.39, prior6: 0.626\n", + "\n" + ] + } + ], + "source": [ + "# NoteToSelf: why seed is passed again?\n", + "rej = elfi.Rejection(d, batch_size=1, seed=30052017, pool=pool) \n", + "res = rej.sample(N, threshold=tolerance)\n", + "print(\"6\", res)\n", + "\n", + "amountOfSimulations.append(res.n_sim)\n", + "amountOfDimensions.append(6)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now change summary statistic and generate samples for them." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "3 Method: Rejection\n", + "Number of samples: 100\n", + "Number of simulations: 4337\n", + "Threshold: 0.498\n", + "Sample means: prior1: -0.519, prior2: 2.74, prior3: -1.58, prior4: -0.345, prior5: -4.79, prior6: 0.516\n", + "\n", + "3 Method: Rejection\n", + "Number of samples: 100\n", + "Number of simulations: 1862\n", + "Threshold: 0.499\n", + "Sample means: prior1: -0.299, prior2: 3.41, prior3: -1.37, prior4: 0.11, prior5: -4.51, prior6: 0.653\n", + "\n", + "2 Method: Rejection\n", + "Number of samples: 100\n", + "Number of simulations: 570\n", + "Threshold: 0.496\n", + "Sample means: prior1: -0.527, prior2: 2.99, prior3: -1.65, prior4: -0.314, prior5: -4.6, prior6: 0.588\n", + "\n", + "2 Method: Rejection\n", + "Number of samples: 100\n", + "Number of simulations: 333\n", + "Threshold: 0.488\n", + "Sample means: prior1: -0.411, prior2: 2.64, prior3: -1.3, prior4: -0.401, prior5: -4.47, prior6: 0.361\n", + "\n", + "1 Method: Rejection\n", + "Number of samples: 100\n", + "Number of simulations: 191\n", + "Threshold: 0.499\n", + "Sample means: prior1: -0.373, prior2: 2.77, prior3: -1.3, prior4: -0.414, prior5: -4.24, prior6: 0.555\n", + "\n" + ] + } + ], + "source": [ + "# Three dimension summary statistic\n", + "\n", + "S12 = elfi.Summary(mean, sim, 0, 1)\n", + "S34 = elfi.Summary(mean, sim, 2, 3)\n", + "S56 = elfi.Summary(mean, sim, 4, 5)\n", + "\n", + "d.become(elfi.Distance('euclidean', S12, S34, S56))\n", + "\n", + "rej = elfi.Rejection(d, batch_size=1, seed=30052017, pool=pool) \n", + "res = rej.sample(N, threshold=tolerance)\n", + "print(\"3\", res)\n", + "\n", + "amountOfSimulations.append(res.n_sim)\n", + "amountOfDimensions.append(3)\n", + "\n", + "# Three dimension summary statistic ((different than previous)\n", + "\n", + "S1 = elfi.Summary(mean, sim, 0)\n", + "S2345 = elfi.Summary(mean, sim, 1, 4)\n", + "S6 = elfi.Summary(mean, sim, 5)\n", + "\n", + "\n", + "d.become(elfi.Distance('euclidean', S1, S2345, S6))\n", + "\n", + "rej = elfi.Rejection(d, batch_size=1, seed=30052017, pool=pool) \n", + "res = rej.sample(N, threshold=tolerance)\n", + "print(\"3\", res)\n", + "\n", + "amountOfSimulations.append(res.n_sim)\n", + "amountOfDimensions.append(3)\n", + "\n", + "# Two dimension summary statistic\n", + "\n", + "S12 = elfi.Summary(mean, sim, 0, 1)\n", + "S3456 = elfi.Summary(mean, sim, 2, 5)\n", + "\n", + "d.become(elfi.Distance('euclidean', S12, S3456))\n", + "\n", + "rej = elfi.Rejection(d, batch_size=1, seed=30052017, pool=pool) \n", + "res = rej.sample(N, threshold=tolerance)\n", + "print(\"2\", res)\n", + "\n", + "amountOfSimulations.append(res.n_sim)\n", + "amountOfDimensions.append(2)\n", + "\n", + "\n", + "# Two dimension summary statistic (different than previous)\n", + "\n", + "S123 = elfi.Summary(mean, sim, 0, 2)\n", + "S456 = elfi.Summary(mean, sim, 3, 5)\n", + "\n", + "d.become(elfi.Distance('euclidean', S123, S456))\n", + "\n", + "rej = elfi.Rejection(d, batch_size=1, seed=30052017, pool=pool) \n", + "res = rej.sample(N, threshold=tolerance)\n", + "print(\"2\", res)\n", + "\n", + "amountOfSimulations.append(res.n_sim)\n", + "amountOfDimensions.append(2)\n", + "\n", + "# One dimension summary statistic\n", + "\n", + "S1233456 = elfi.Summary(mean, sim, 0, 5)\n", + "\n", + "d.become(elfi.Distance('euclidean', S1233456))\n", + "\n", + "rej = elfi.Rejection(d, batch_size=1, seed=30052017, pool=pool) \n", + "res = rej.sample(N, threshold=tolerance)\n", + "print(\"1\", res)\n", + "\n", + "amountOfSimulations.append(res.n_sim)\n", + "amountOfDimensions.append(1)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[]" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXwAAAEOCAYAAACKDawAAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAGRFJREFUeJzt3X10XNV57/HvT0hILZFxCjaNQ7F5CRBKgqHCkMBNMDeLQpKaQItbXkqhNDRNmkC4CZCWNCSX3kK5ZFHatMWFQMJbLxdIA128FZcXJ6UYGQg4AeMWHOKEW8tAsCGxQOi5f5wjGIQ02pJmSzM6v89as2bmnH32eWaO9MyePfvso4jAzMxmvrbpDsDMzKaGE76ZWUU44ZuZVYQTvplZRTjhm5lVhBO+mVlFOOGbmVWEE76ZWUU44ZuZVYQTvplZRbRPdwC1tt9++1iwYMF0h2Fm1jJWrVq1MSLmpJRtqoS/YMECent7pzsMM7OWIemHqWXdpWNmVhFO+GZmFeGEb2ZWEU74ZmYVkTXhS5ot6QZJT0h6XNL7cu7PzKzVbNi0haWX3s+GzVuy7yt3C/+vgNsjYk9gH+DxzPszM2splyxfy4PrnueSu9Zm35dyXeJQ0izge8AukbiTnp6e8LBMM6uCPc65jf6Bwbcs72xvY815RyTXI2lVRPSklM3Zwt8F6AOukPSwpMskbTO8kKRTJfVK6u3r68sYjplZ81hx5mKWLJxHV0eRhrs62jhy4TxWnLU42z5zJvx2YD/g7yJiX+Bl4OzhhSJiWUT0RETPnDlJJ4uZmbW8ubO66O5sp39gkM72NvoHBunubGdud1e2feY803Y9sD4iHiif38AICd/MrKo2vtTP8QfM57hFO3Htymfoy/zDbbaEHxH/T9KPJO0REWuA/w78INf+zMxazaW/+0bX+3kf2zv7/nLPpfNp4BpJWwNPASdn3p+ZmY0ia8KPiEeApF+PzcwsL59pa2ZWEU74ZmYV4YRvZlYRTvhmZhXhhG9mVhFO+GZmFeGEb2ZWEU74ZmYV4YRvZlYRTvhmZhXhhG9mVhFO+GZmFeGEb2ZWEU74ZmYV4YRvZlYRTvhmZhXhhG9mVhFjJnxJO0vqqnn+C5IW5AzKzMwaL6WF/3+BwZrnr5XLzMyshaQk/PaIeGXoSfl463whmZlZDikJv0/SkqEnko4ENuYLyczMcmhPKPMJ4BpJfwMI+BFwYtaozMys4cZM+BHxn8CBkt4GKCI25w/LzMwabdSEL+mEiLha0hnDlgMQEV/NHJuZmTVQvRb+NuV991QEYmZmeY2a8CPi0vL+y1MXjpmZ5ZJy4tVfSpolqUPSckkbJZ0wFcGZmVnjpAzLPCwiNgEfBdYDuwOfzxqVmZk1XErC7yjvPwxcFxHPZ4zHzMwySRmHf4ukJ4CfA5+UNAfYkjcsMzNrtJRx+GdLugDYFBGvSXoZODKlcknrgM0U8+8MRETPZII1M7OJqzcO/9CI+FdJR9csqy1yU+I+FkeEp2IwM5tm9Vr4HwT+FfiNEdYF6QnfzMyaQL1x+F8qH34lIp6uXSdp58T6A7hTUgCXRsSyiYVpZmaTlTJK58YRlt2QWP9BEbEfcATwKUkfGF5A0qmSeiX19vX1JVZrZmbjVa8Pf0/gV4Fta/vxgVlA18hbvVlE/KS83yDpW8Ai4L5hZZYBywB6enpiXNGbmVmyen34e1CcbDWbN/fjbwY+PlbFkrYB2iJic/n4MOArk4jVzMwmoV4f/reBb0t6X0TcP4G6dwC+VY7saQeujYjbJxammZlNVsqJVw9L+hRF987rXTkR8fv1NoqIp4B9JheemZk1SsqPtlcBvwz8OnAvsCNFt46ZmbWQlIS/W0R8EXg5Ir4BfAR4T96wzMys0VIS/qvl/U8l7Q1sCyzIFpGZmWWR0oe/TNLbgXOAm4G3AV/MGpWZmTVcSsJfHhEvUIyf3wXGdaatmZk1idxn2pqZWZPIeqatmZk1j2xn2pqZWXPJeaatmZk1kZQ+/KMkzZLUIWm5pI2STsgemZmZNVRKwj8sIjZRdO+sB3YHPp81KjMza7iUhN9R3n8YuC4ins8Yj5mZZZIyDv8WSU8APwc+KWkOsCVvWGZm1mhjtvAj4mzgfUBPRLwK/Aw4MndgZmbWWCktfMozbYcevwy8nC0iMzPLIqUP38zMZgAnfDOzihgz4Uu6UdJHJPnDwcyshaUk8b8DjgPWSjq/nGPHzMxaTMoonbsi4nhgP2Ad8C+S/k3SyZI66m9tZmbNIqmbRtJ2wEnAHwAPA39F8QHwL9kiMzOzhhpzWKakm4A9KS5m/hsR8Wy56v9I6s0ZnJmZNU7dhF/+UPtIRBw90vqI6MkSlZmZNVzdLp2IGASOmKJYzMwso5Q+/Dsl/aYkZY/GzMyySZla4QxgG2BA0hZAQETErKyRmZlZQ42Z8COieyoCMTOzvJImT5P0duBd1Fy8PCLuyxWUmZk1XsqwzD8ATgN2BB4BDgTuBw7NG5qZmTVSyo+2pwH7Az+MiMXAvkBf1qjMzKzhUhL+lojYAiCpMyKeAPbIG5aZmTVaSh/+ekmzgX+imEfnBeAnqTuQtBXQC/w4Ij46sTDNzGyyUkbpHFU+PFfS3cC2wO3j2MdpwOOAh3GamU2j1MnT3i7pvcBmYD2wd+J2OwIfAS6bcIRmZtYQKaN0/ifFTJlPAYPl4iBtlM7FwJmAx/KbmU2zlD78pcCuEfHKeCqW9FFgQ0SsknRInXKnAqcC7LTTTuPZhZmZjUNKl85qYPYE6j4IWCJpHfCPwKGSrh5eKCKWRURPRPTMmTNnArsxM7MUKS38vwAelrQa6B9aGBFL6m0UEV8AvgBQtvA/FxEnTDxUMzObjJSE/w3gAuAx3ujDNzOzFpOS8DdGxCWT2UlE3APcM5k6zMxsclIS/ipJfwHczJu7dB7KFpWZmTVcSsLft7w/sGZZ6rBMMzNrEiln2i6eikDMzCyvlBOvZgMnAgtqy0fEZ/KFZWZmjZbSpXMr8O94lI6ZWUtLSfhdEXFG9kjMzCyrlDNtr5L0cUnvkPRLQ7fskZmZWUOltPBfAS4E/pRidA7l/S65gjIzs8ZLSfhnALtFxMbcwZiZWT4pXTrfB36WOxAzM8srpYX/GvBIebWr2jNtPSzTzKyFpCT8fypvZmbWwlLOtP3GVARiZmZ5pZxp+zRvjM55XUR4lI6ZWQtJ6dLpqXncBRwDeBy+mVmLGXOUTkQ8V3P7cURcjGfKNDNrOSldOvvVPG2jaPF3Z4vIzMyySOnSuajm8QCwDliaJRozM8vG8+GbmVXEmH34kk6TNEuFyyQ9JOmwqQjOzMwaJ2Vqhd+PiE3AYcBc4GTg/KxRmZlZw6UkfJX3HwauiIjv1SwzM7MWkZLwV0m6kyLh3yGpG1/5ysys5aSM0jkFWAg8FRE/k7QdRbeOmZm1kJRROoPAQzXPnwOeyxmUmZk1XkqXjpmZzQCjJnxJO09lIGZmlle9Fv4NAJKWT1EsZmaWUb0+/DZJXwJ2l3TG8JUR8dV8YZmZWaPVa+H/DrCF4kOhe4SbmZm1kFFb+BGxBrhA0qMRcdt4K5bUBdwHdJb7uSEivjThSM3MbFJSRun8m6SvSuotbxdJ2jZhu37g0IjYh2Ic/+GSDpxUtNb0NmzawtJL72fD5i3THcqbNGtcZlMpJeF/HdhMMSXyUmATcMVYG0XhpfJpR3l7y6USbWa5ZPlaHlz3PJfctXa6Q3mTZo3LbCopon4OlvRIRCwca9ko224FrAJ2A74WEWfVK9/T0xO9vb1jR21NZ49zbqN/4K0zbnS2t7HmvCOmIaJCs8Zl1iiSVkVEz9gl01r4P5d0cE3lBwE/T6k8Il4rPxh2BBZJ2nuEYE8d6i7q6+tLqdaa0IozF7Nk4Ty6Ooo/qa6ONo5cOI8VZ03v5RSaNS6z6ZAyl84ngG/W9Nu/APzeeHYSET+VdA9wOLB62LplwDIoWvjjqdeax9xZXXR3ttM/MEhnexv9A4N0d7Yzt7vLcZk1iZS5dL4H7CNpVvl8U0rFkuYAr5bJ/heADwEXTCZYa24bX+rn+APmc9yinbh25TP0NckPpM0al9lUG7MPf8IVS+8FvgFsRdF1dH1EfKXeNu7DNzMbn/H04ad06UxIRDwK7JurfjMzGx/PlmlmVhFJLXxJ7wcW1JaPiG9misnMzDIYM+FLugrYFXgEeK1cHIATvplZC0lp4fcAe0WuX3fNzGxKpPThrwZ+OXcgZmaWV0oLf3vgB5JWUkyIBkBELMkWlZmZNVxKwj83dxBmZpZfypm290raAdi/XLQyIjbkDcvMzBptzD58SUuBlcAxFNMjPyDpt3IHZmZmjZXSpfOnwP5Drfpyjpy7KC9ybmZmrSFllE7bsC6c5xK3MzOzJpLSwr9d0h3AdeXz3wZuzReSmZnlkPKj7ecl/SZwECBgWUR8K3tkZmbWUElz6UTEjcCNmWMxM7OMRk34kr4TEQdL2sybLz4uimuUz8oenZmZNcyoCT8iDi7vu6cuHDMzyyVlHP6ukjrLx4dI+oyk2flDMzOzRkoZXnkj8Jqk3YDLgZ2Ba7NGZWZmDZeS8AcjYgA4Crg4Ij4LvCNvWGZm1mgpCf9VSccCvwf8c7msI19IZmaWQ0rCPxl4H/DnEfG0pJ2Bq/OGZWZmjZZy4tUPgM/UPH8aOD9nUGZm1ngp17Q9iGJO/Pll+aFx+LvkDc3MzBop5Uzby4HPAqt44yLmZmbWYlIS/osRcVv2SMzMLKuUhH+3pAuBm3jzNW0fyhaVmZk1XErCP6C876lZFsChjQ/HzMxySRmls3gqAjEzs7xS5tLZQdLlkm4rn+8l6ZT8oZmZWSOlnHh1JXAHMK98/iRweq6AzMwsj5SEv31EXA8MApTz6nh4pplZi0lJ+C9L2o7yIiiSDgReHGsjSb8i6W5Jj0v6vqTTJhmrmZlNQkrCPwO4GdhV0neBbwKfTthuAPgfEfFu4EDgU5L2mnCk02jDpi0svfR+Nmze0pL1m5lBQsIvx9t/EHg/8IfAr0bEownbPTs0Vj8iNgOPA++cXLjT45Lla3lw3fNcctfalqzfzAxAEVG/gLQV8BFgATXDOCPiq8k7kRYA9wF7R8Sm0cr19PREb29varXZ7XHObfQPDL5leWd7G2vOO6Lp6zezmU/SqojoGbtkWpfOLcBJwHZAd80tNZi3UVw16/SRkr2kUyX1Surt6+tLrXZKrDhzMUsWzqOro3ibujraOHLhPFac1ZhTE3LXb2ZWK+VM2x0j4r0TqVxSB0WyvyYibhqpTEQsA5ZB0cKfyH5ymTuri+7OdvoHBulsb6N/YJDuznbmdne1RP1mZrVSEv5tkg6LiDvHU7EkUcy0+fh4un+azcaX+jn+gPkct2gnrl35DH0N/mE1d/1mZkNS+vCPorjCVRvwKm/Mhz9rjO0OBlYAj1GO4Qf+JCJuHW2bZuvDNzNrduPpw09p4V9EcYnDx2KsT4caEfEdig8HMzNrAik/2q4FVo8n2ZuZWfNJaeE/C9xTTp5WOx9+y/bLm5lVUUrCf7q8bV3ezMysBaXMh//lqQjEzMzyGjXhS7o4Ik6XdAvlxGm1ImJJ1sjMzKyh6rXwryrv//dUBGJmZnmNmvAjYlV5f6+kOeXj5pr7wMzMko06LFOFcyVtBJ4AnpTUJ+nPpi48MzNrlHrj8E8HDgL2j4jtIuLtwAHAQZI+OyXRmZlZw9RL+CcCx0bE00MLIuIp4IRynZmZtZB6Cb8jIjYOX1j243fkC8nMzHKol/BfmeA6MzNrQvWGZe4jaaSrUwnwhO1mZi2m3rDMraYyEDMzyytltkwzM5sBnPDNzCrCCd/MrCKc8M3MKsIJ38ysIpzwzcwqwgnfzKwinPDNzCrCCd/MrCKc8M3MKsIJ38ysIpzwzcwqwgnfzKwinPDNzCrCCd/MrCKc8M3MKiJbwpf0dUkbJK3OtQ8zM0uXs4V/JXB4xvpft2HTFpZeej8bNm+Zit2ZmbWkbAk/Iu4Dns9Vf61Llq/lwXXPc8lda6did2ZmLaneRcyb3h7n3Eb/wODrz69+4BmufuAZOtvbWHPeEdMYmZlZ85n2H20lnSqpV1JvX1/fuLZdceZiliycR1dH8TK6Oto4cuE8Vpy1OEeoZmYtbdoTfkQsi4ieiOiZM2fOuLadO6uL7s52+gcG6Wxvo39gkO7OduZ2d2WK1sysdbV0lw7Axpf6Of6A+Ry3aCeuXfkMff7h1sxsRIqIPBVL1wGHANsD/wV8KSIur7dNT09P9Pb2ZonHzGwmkrQqInpSymZr4UfEsbnqNjOz8Zv2PnwzM5saTvhmZhXhhG9mVhFO+GZmFeGEb2ZWEdmGZU6EpD7gh5OsZlvgxQaEM956UsuPVa7e+tHWjbR8+LLtgY0J8eXQiGMykTpyH5PJHA+YvmMyXf8j49nGxyTd/IhIO2s1ImbUDVg2HfWklh+rXL31o60bafnwZUBvKx+TidSR+5hM5nhM5zGZrv8RH5P8x2Ss20zs0rllmupJLT9WuXrrR1s30vJGvQ+N0IhYJlJH7mNS5eMx0Xp8TEY2JbE0VZeO5SOpNxLPxrOp4WPSfGb6MZmJLXwb2bLpDsDewsek+czoY+IWvplZRbiFb2ZWEU74ZmYV4YRvZlYRTvgVJendkv5e0g2S/mi647GCpG0krZL00emOxUDSIZJWlP8rh0x3PJPlhD+DSPq6pA2SVg9bfrikNZL+Q9LZABHxeER8AlgKzNhhaNNtPMekdBZw/dRGWS3jPCYBvAR0AeunOtZGc8KfWa4EDq9dIGkr4GvAEcBewLGS9irXLQG+Ayyf2jAr5UoSj4mkDwE/oLhCnOVzJen/Jysi4giKD+IvT3GcDeeEP4NExH3A88MWLwL+IyKeiohXgH8EjizL3xwR7weOn9pIq2Ocx2QxcCBwHPBxSf7/zGA8xyQiBsv1LwCdUxhmFi1/EXMb0zuBH9U8Xw8cUPZHHk3xR3zrNMRVZSMek4j4YwBJJwEba5KN5Tfa/8nRwK8Ds4G/mY7AGskJf+bTCMsiIu4B7pnaUKw04jF5/UHElVMXipVG+z+5CbhpqoPJxV8ZZ771wK/UPN8R+Mk0xWIFH5PmU4lj4oQ/8z0IvEvSzpK2Bn4HuHmaY6o6H5PmU4lj4oQ/g0i6Drgf2EPSekmnRMQA8MfAHcDjwPUR8f3pjLNKfEyaT5WPiSdPMzOrCLfwzcwqwgnfzKwinPDNzCrCCd/MrCKc8M3MKsIJ38ysIpzwrSEkhaSLap5/TtK5Dar7Skm/1Yi6xtjPMZIel3R3YvlbJc1u0L5fGmP9bEmfrHk+T9INjdi3VYcTvjVKP3C0pO2nO5Ba5bS3qU4BPhkRi1MKR8SHI+KnE4ts3GYDryf8iPhJRGT/ELSZxQnfGmUAWAZ8dviK4S30odZseTWheyVdL+lJSedLOl7SSkmPSdq1ppoPlVceenLoalCStpJ0oaQHJT0q6Q9r6r1b0rXAYyPEc2xZ/2pJF5TL/gw4GPh7SRcOK/8OSfdJeqTc5r+Vy9dJ2l7SAklPSLqsXH+NpA9J+q6ktZIWleXPlfS5mnpXS1owbF9vk7Rc0kNljEeWq84Hdi1juLDc5+pymy5JV5TlH5a0uFx+kqSbJN1exvGXNe/bleX+H5P0lmNmM5Nny7RG+hrw6FBiSbQP8G6K+cmfAi6LiEWSTgM+DZxellsAfBDYFbhb0m7AicCLEbG/pE7gu5LuLMsvAvaOiKdrdyZpHnAB8GsUc5zfKeljEfEVSYcCn4uI3mExHgfcERF/Xn5j+MURXsduwDHAqRTzshxH8QGyBPgT4GOJ78cW4KiI2FR+W/p3STcDZ5evZ2H5OhbUbPMpgIh4j6Q9y9e0e7luIbAvxTewNZL+GpgLvDMi9i7raki3lDU/t/CtYSJiE/BN4DPj2OzBiHg2IvqB/wSGEvZjFEl+yPURMRgRayk+GPYEDgNOlPQI8ACwHfCusvzK4cm+tD9wT0T0lfOnXAN8YKwYgZPL3yTeExGbRyjzdEQ8Vs5h/31geRTzlgx/HWMR8L8kPQrcRTFP+w5jbHMwcBVARDwB/BAYSvjLI+LFiNhCcTWt+RTv3y6S/lrS4cCmccRnLcwJ3xrtYoq+8G1qlg1Q/q1JErB1zbr+mseDNc8HefM30OGTPgVFcvx0RCwsbztHxNAHxsujxDfSvOd1lVdI+gDwY+AqSSeOUCzldbz+PpS6RqjneGAO8Gtla/6/RilXq95rqo3rNaA9Il6g+GZ1D8W3g8vGqN9mCCd8a6iIeJ7iItyn1CxeR9GFAsWl/DomUPUxktrKfv1dgDUUMxv+kaQOAEm7S9qmXiUU3wQ+WPa9bwUcC9xbbwNJ84ENEfEPwOXAfhOIH4r3Yb+yzv2AnUcos225r1fLvvj55fLNQPco9d5HeZnKsitnJ4r3Z0RlV1FbRNwIfJGJvx5rMe7Dtxwuophqdsg/AN+WtJLigumjtb7rWUORmHcAPhERWyRdRtFd8lD5zaGPMfrKI+JZSV8A7qZoGd8aEd8eY9+HAJ+X9CrwEsVvBxNxI290QT0IPDlCmWuAWyT1Ao8AT5RxP1f+CLwauI3i95Ihf0vxY/NjFN8iToqI/uItGdE7gSv0xjVzvzDB12MtxtMjm5lVhLt0zMwqwgnfzKwinPDNzCrCCd/MrCKc8M3MKsIJ38ysIpzwzcwqwgnfzKwi/j8Eosyw8DlZQwAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plt.figure()\n", + "plt.xlabel('Number of simulations')\n", + "plt.ylabel('Dimension of summary statisic')\n", + "plt.semilogx(amountOfSimulations, amountOfDimensions, \"*\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Over all One 6 dimensional, two 3-dimensional, two 2-dimensional and one 1-dimensional summary statistic were used. Results are illustrated in plot above and each simulation are marked with star. As x-axis is logarithmic in plot above there is exponential relationship, as theory suggest, between dimension of summary statistic and number of simulations. It was intuitive for me at start, but using summary statistic that summarize more dimensions (and loose much more information) were incredibly faster, while maintaining same threshold level. \n", + "\n", + "## Different distance\n", + "\n", + "Sometimes it is not trivial or even possible to figure out different summary statistic for ABC model. However, it is possible to use different distances. Let’s make a brief scratch on those! \n", + "\n", + "Following defines same MA2 model and rejecting sampling similarly as they were used in ELFI introduction. It will also define autocorrelation as an alternative for autocovariance. For both summary statistic both Euclidian and Manhattan distance is used.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "import scipy.stats\n", + "\n", + "seed = 20170530\n", + "np.random.seed(seed)\n", + "\n", + "N = 100\n", + "\n", + "threshold = 0.5\n", + "\n", + "def MA2(t1, t2, n_obs=100, batch_size=1, random_state=None):\n", + " # Make inputs 2d arrays for numpy broadcasting with w\n", + " t1 = np.asanyarray(t1).reshape((-1, 1))\n", + " t2 = np.asanyarray(t2).reshape((-1, 1))\n", + " random_state = random_state or np.random\n", + "\n", + " w = random_state.randn(batch_size, n_obs+2) # i.i.d. sequence ~ N(0,1)\n", + " x = w[:, 2:] + t1*w[:, 1:-1] + t2*w[:, :-2]\n", + " return x\n", + "\n", + "\n", + "# define prior for t1 as in Marin et al., 2012 with t1 in range [-b, b]\n", + "class CustomPrior_t1(elfi.Distribution):\n", + " def rvs(b, size=1, random_state=None):\n", + " u = sp.stats.uniform.rvs(loc=0, scale=1, size=size, random_state=random_state)\n", + " t1 = np.where(u<0.5, np.sqrt(2.*u)*b-b, -np.sqrt(2.*(1.-u))*b+b)\n", + " \n", + " return t1\n", + "\n", + "\n", + "# define prior for t2 conditionally on t1 as in Marin et al., 2012, in range [-a, a]\n", + "class CustomPrior_t2(elfi.Distribution):\n", + " def rvs(t1, a, size=1, random_state=None):\n", + " locs = np.maximum(-a-t1, t1-a)\n", + " scales = a - locs\n", + " t2 = sp.stats.uniform.rvs(loc=locs, scale=scales, size=size, random_state=random_state)\n", + " return t2\n", + "\n", + "\n", + "def autocov(x, lag=1):\n", + " # BOF: introduction doesn't tfollow pep8, this happens elswhere too\n", + " C = np.mean(x[:,lag:] * x[:,:-lag], axis=1)\n", + " #print(C.shape)\n", + "\n", + " \n", + " return C\n", + "\n", + "\n", + "def autocorr(x):\n", + " # print(x.shape)\n", + " #result = np.correlate(x[0,:], x[0,:], mode='full')\n", + " #return result[result.size/2:]\n", + " \n", + " #print(x.shape)\n", + " ret = np.correlate(x[0,:], x[0,:], mode='same')\n", + " \n", + " #print(ret.shape)\n", + "\n", + " return np.mean(ret, axis=0)\n", + "\n", + "\n", + "\n", + "# true parameters\n", + "t1_true = 0.6\n", + "t2_true = 0.2\n", + "\n", + "y_obs = MA2(t1_true, t2_true)\n", + "\n", + "\n", + "t1 = elfi.Prior(CustomPrior_t1, 2)\n", + "t2 = elfi.Prior(CustomPrior_t2, t1, 1)\n", + "\n", + "Y = elfi.Simulator(MA2, t1, t2, observed=y_obs)\n", + "\n", + "pool = elfi.OutputPool(['t1', 't2', 'S1', 'S2'])\n", + "resultsArray = []\n", + "\n", + "S1 = elfi.Summary(autocov, Y)\n", + "S2 = elfi.Summary(autocov, Y, 2) # the optional keyword lag is given the value 2\n", + "d = elfi.Distance('euclidean', S1, S2)\n", + "\n", + "batch_size = 1" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Starting rejection sampling...\n", + "One rejection samplng done.\n", + "Two rejection samplng done.\n", + "\n", + "Three rejection samplng done.\n", + "Four rejection samplng done.\n", + "\n" + ] + } + ], + "source": [ + "print(\"Starting rejection sampling...\")\n", + "\n", + "rej = elfi.Rejection(d, pool=pool, batch_size=batch_size, seed=seed)\n", + "result = rej.sample(N, threshold=threshold)\n", + "resultsArray.append(result.n_sim)\n", + "\n", + "print(\"One rejection samplng done.\")\n", + "\n", + "# Replace the current distance with a cityblock (manhattan) distance and recreate the inference\n", + "\n", + "d.become(elfi.Distance('cityblock', S1, S2, p=1))\n", + "rej = elfi.Rejection(d, pool=pool, batch_size=batch_size, seed=seed) # pool =10000\n", + "result = rej.sample(N, threshold=threshold)\n", + "resultsArray.append(result.n_sim)\n", + "\n", + "print(\"Two rejection samplng done.\")\n", + "\n", + "S1.become(elfi.Summary(autocorr, Y))\n", + "S2.become(elfi.Summary(autocorr, Y))\n", + "d.become(elfi.Distance('euclidean', S1, S2, p=1))\n", + "rej = elfi.Rejection(d, pool=pool, batch_size=batch_size, seed=seed)\n", + "result = rej.sample(N, threshold=threshold)\n", + "print(result.summary)\n", + "\n", + "print(\"Three rejection samplng done.\")\n", + "\n", + "d.become(elfi.Distance('cityblock', S1, S2, p=1))\n", + "rej = elfi.Rejection(d, pool=pool, batch_size=batch_size, seed=seed)\n", + "result = rej.sample(N, threshold=threshold)\n", + "resultsArray.append(result.n_sim)\n", + "\n", + "print(\"Four rejection samplng done.\")\n", + "print(result.summary)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As can be see model with autocorrelation as summary statistic produces wrong means. Clearly one cannot use any summary statistic and expect convergence.\n", + "\n", + "Next Sequential Monte Carlo ABC is used with different distance metrics." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:elfi.methods.parameter_inference:---------------- Starting round 0 ----------------\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Starting Sequential Monte Carlo samplng...\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:elfi.methods.parameter_inference:---------------- Starting round 1 ----------------\n", + "WARNING:elfi.methods.utils:SMC: It appears to be difficult to find enough valid proposals with prior pdf > 0. ELFI will keep trying, but you may wish to kill the process and adjust the model priors.\n", + "WARNING:elfi.methods.utils:SMC: It appears to be difficult to find enough valid proposals with prior pdf > 0. ELFI will keep trying, but you may wish to kill the process and adjust the model priors.\n", + "INFO:elfi.methods.parameter_inference:---------------- Starting round 2 ----------------\n", + "WARNING:elfi.methods.utils:SMC: It appears to be difficult to find enough valid proposals with prior pdf > 0. ELFI will keep trying, but you may wish to kill the process and adjust the model priors.\n", + "INFO:elfi.methods.parameter_inference:---------------- Starting round 0 ----------------\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "One SMC smapling done.\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:elfi.methods.parameter_inference:---------------- Starting round 1 ----------------\n", + "WARNING:elfi.methods.utils:SMC: It appears to be difficult to find enough valid proposals with prior pdf > 0. ELFI will keep trying, but you may wish to kill the process and adjust the model priors.\n", + "WARNING:elfi.methods.utils:SMC: It appears to be difficult to find enough valid proposals with prior pdf > 0. ELFI will keep trying, but you may wish to kill the process and adjust the model priors.\n", + "INFO:elfi.methods.parameter_inference:---------------- Starting round 2 ----------------\n", + "WARNING:elfi.methods.utils:SMC: It appears to be difficult to find enough valid proposals with prior pdf > 0. ELFI will keep trying, but you may wish to kill the process and adjust the model priors.\n", + "WARNING:elfi.methods.utils:SMC: It appears to be difficult to find enough valid proposals with prior pdf > 0. ELFI will keep trying, but you may wish to kill the process and adjust the model priors.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Two SMC smapling done.\n" + ] + } + ], + "source": [ + "# define prior for t1 as in Marin et al., 2012 with t1 in range [-b, b]\n", + "class CustomPrior_t1(elfi.Distribution):\n", + " def rvs(b, size=1, random_state=None):\n", + " u = scipy.stats.uniform.rvs(loc=0, scale=1, size=size, random_state=random_state)\n", + " t1 = np.where(u<0.5, np.sqrt(2.*u)*b-b, -np.sqrt(2.*(1.-u))*b+b)\n", + " return t1\n", + "\n", + " def pdf(x, b):\n", + " p = 1./b - np.abs(x) / (b*b)\n", + " p = np.where(p < 0., 0., p) # disallow values outside of [-b, b] (affects weights only)\n", + " return p\n", + "\n", + "\n", + "# define prior for t2 conditionally on t1 as in Marin et al., 2012, in range [-a, a]\n", + "class CustomPrior_t2(elfi.Distribution):\n", + " def rvs(t1, a, size=1, random_state=None):\n", + " locs = np.maximum(-a-t1, t1-a)\n", + " scales = a - locs\n", + " t2 = scipy.stats.uniform.rvs(loc=locs, scale=scales, size=size, random_state=random_state)\n", + " return t2\n", + "\n", + " def pdf(x, t1, a):\n", + " locs = np.maximum(-a-t1, t1-a)\n", + " scales = a - locs\n", + " p = scipy.stats.uniform.pdf(x, loc=locs, scale=scales)\n", + " p = np.where(scales>0., p, 0.) # disallow values outside of [-a, a] (affects weights only)\n", + " return p\n", + "\n", + "schedule = [0.7, 0.2, 0.05]\n", + "N2 = 100 # 1000000\n", + "\n", + " \n", + "print(\"Starting Sequential Monte Carlo samplng...\")\n", + "\n", + "# Redefine the priors\n", + "t1.become(elfi.Prior(CustomPrior_t1, 2, model=t1.model))\n", + "t2.become(elfi.Prior(CustomPrior_t2, t1, 1))\n", + "S1.become(elfi.Summary(autocov, Y))\n", + "S2.become(elfi.Summary(autocov, Y))\n", + "d.become(elfi.Distance('euclidean', S1, S2))\n", + "smc = elfi.SMC(d, pool=pool, batch_size=1, seed=seed)\n", + "result_smc = smc.sample(N2, schedule)\n", + "# result_smc.sample_means_summary(all=True)\n", + "resultsArray.append(result_smc.n_sim)\n", + "\n", + "print(\"One SMC smapling done.\")\n", + "\n", + "d.become(elfi.Distance('cityblock', S1, S2))\n", + "smc = elfi.SMC(d, pool=pool, batch_size=1, seed=seed)\n", + "result_smc = smc.sample(N2, schedule)\n", + "resultsArray.append(result_smc.n_sim)\n", + "\n", + "print(\"Two SMC smapling done.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally print how many simulations were needed to achieve samples with same threshold. First two are for rejection sampling, where first Euclidian distance is used then Manhattan distance. Third is rejection sampling with autocorrelation as a summary statistc. Last two are for SMC, where first Euclidian distance is used then Manhattan distance." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[614, 1042, 9498, 3076, 3792]\n" + ] + } + ], + "source": [ + "print(resultsArray)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As can be seen from saved number of iterations model with poor distance metric will use much more simulations. And with silly and unworking summary statistic even more runs are needed to get wrong results. \n", + "\n", + "## Final notes on ELFI\n", + "\n", + "In gneral ELFI feels good but unstable enviroment for ABC computing. For example enabling multicore computing for this notebook produces following error on servers console and python kernel will need reboot to recover, I didn't figure clear reason for this, but in general using pools and multiple cores felt like road to misery:\n", + "\n", + "```\n", + "Process SpawnPoolWorker-7:\n", + "Traceback (most recent call last):\n", + " File \"C:\\ProgramData\\Anaconda3\\lib\\multiprocessing\\process.py\", line 258, in _bootstrap\n", + " self.run()\n", + " File \"C:\\ProgramData\\Anaconda3\\lib\\multiprocessing\\process.py\", line 93, in run\n", + " self._target(*self._args, **self._kwargs)\n", + " File \"C:\\ProgramData\\Anaconda3\\lib\\multiprocessing\\pool.py\", line 108, in worker\n", + " task = get()\n", + " File \"C:\\ProgramData\\Anaconda3\\lib\\multiprocessing\\queues.py\", line 337, in get\n", + " return _ForkingPickler.loads(res)\n", + "AttributeError: Can't get attribute 'simulator' on \n", + "```\n", + "\n", + "In addition I encoutred other ELFI related bugs and problems in basic operations like runing python file with ipython (see issues [267](https://github.com/elfi-dev/elfi/issues/267) and [268](https://github.com/elfi-dev/elfi/issues/268).\n", + "\n", + "Also many times it is not evident what is data type or what is dimension of several objects that are manipulated in ELFI related operations. Example of these are marked with NoteToSelf in commnets in code cells. " + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.5" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/Convergence/Effect of distance metric and dimension of summary statistic on computational cost of ABC model.ipynb b/Convergence/Effect of distance metric and dimension of summary statistic on computational cost of ABC model.ipynb new file mode 100644 index 0000000..677139c --- /dev/null +++ b/Convergence/Effect of distance metric and dimension of summary statistic on computational cost of ABC model.ipynb @@ -0,0 +1,713 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Effect of distance metric and dimension of summary statistic on computational cost of ABC model\n", + "\n", + "The aim of this notebook is to study how ELFI environment can be used to investigate how dimension of summary static and used distance metric affects to computational cost of ABC model. \n", + "\n", + "This notebook is done as a part of course CS-E4070 - Special Course in Machine Learning and Data Science: Seminar Course on Approximate Bayesian Computation, on spring 2018 in Aalto University.\n", + "\n", + "Author: [Ossi Galkin](https://github.com/OssiGalkin)\n", + "\n", + "## Theoretical background\n", + "\n", + "According to the paper \"The Rate of Convergence for Approximate Bayesian Computation\" (Stuart Barber, Jochen Voss, Mark Webster, 2015, https://arxiv.org/abs/1311.2038):\n", + "\n", + "The error of an ABC estimate is affected both by the bias of the ABC samples, controlled by the tolerance parameter $\\delta$, and by Monte Carlo error, controlled by the number n of accepted ABC samples\n", + "\n", + "The computational cost of ABC model satisfies:\n", + "\n", + "\n", + "$$ cost \\sim n * \\delta ^{-q} $$\n", + "\n", + "where n is number of accepted samples and q is dimension of summary statistic. $\\delta$ is tolerance parameter. And under optimal choice of $\\delta$:\n", + "\n", + "$$ error \\sim cost^{-2/(q+4)} $$\n", + "\n", + "Tolerance and number of accepted samples are plain numbers that can be freewly chosen. Dimension of summary however is more complex as it is related to what kind of ABC model is used. Therefore, this notebook focuses on dimension of summary statistic \n", + "\n", + "\n", + "## How dimension of summary statistic affects the cost of ABC\n", + "As summary statistic needs to be something that truly describes phenomenon being investigated and to investigate how dimension of summary statistic affects the computational cost the ABC model, the ABC model needs to be something that can be summarized in multiple dimensions. Therefore, in this notebook, various ABC models tries to predict means of 6-variate normal distribution. Summary statistics will be calculated sample means over different amount of axis. 6-dimensional summary statistic will calculate sample means over all six axes. 3-dimensional will compute 3 different means, thus needing to calculate at least one of those three means over more than one axis. Because there are multiple ways to order which particular axis those 3 means are calculated, more than one 3-dimensional summary statistic is investigated. \n", + "\n", + "This example follows same conventions that are used in [ELFI tutorial](https://elfi.readthedocs.io/en/latest/usage/tutorial.html). Basically 6 variate normal distribution is created with means [1, 3, -2, 0, -4, 0] and covariance matrix:\n", + "\n", + " [[ 0.54283214, -0.01897608, 0.09250538, -0.29508488, 0.20071569, 0.10224879],\n", + " [-0.01897608, 3.17949029, -0.16261362, 3.01116863, -0.05390658, 0.01796679],\n", + " [ 0.09250538, -0.16261362, 0.60222194, 0.0543998 , 0.0066263 , -0.01040673],\n", + " [-0.29508488, 3.01116863, 0.0543998 , 3.79268361, -0.1132359 , -0.07189611],\n", + " [ 0.20071569, -0.05390658, 0.0066263 , -0.1132359 , 0.46941068, 0.14608359],\n", + " [ 0.10224879, 0.01796679, -0.01040673, -0.07189611, 0.14608359, 0.49294955]]\n", + " \n", + "and ABC model is set to generate samples from that distributions. This is not very complicated problem and it could be solved statistically without ABC methods, but it makes easy to investigate how dimension of summary statistic affects results.\n", + "\n", + "In each case 100 samples are drawn from model with tolerance 0.5. This is done for 6, 3, 2, and 1 dimensional summary statistics. In addition, two different summary statistics are used for 3 and 2 dimensional cases.\n", + "\n", + "Following code cell defines model and 6-dimensional summary statistic and performs all necessary imports.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# -*- coding: utf-8 -*-\n", + "\"\"\"\n", + "Created on Fri Apr 13 09:34:36 2018\n", + "\n", + "@author: ossig\n", + "\"\"\"\n", + "import elfi\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "import logging\n", + "import scipy as sp\n", + "from sklearn import datasets\n", + "import scipy.stats as ss\n", + "import numpy as np\n", + "\n", + "logging.basicConfig(level=logging.INFO) # sometimes this is required to enable logging inside Jupyter\n", + "\n", + "#elfi.set_client('multiprocessing')\n", + "\n", + "%matplotlib inline\n", + "%precision 2\n", + "\n", + "# Set an arbitrary seed and a global random state to keep the randomly generated quantities the same between runs\n", + "# NoteToSelf: Statement above is not true, see bug 268\n", + "seed = 20170530\n", + "np.random.seed(seed)\n", + "\n", + "\n", + "# Priors indexing starts at 1, as in ELFI introduction, othervice indexing starts a 0 tought\n", + "prior1 = elfi.Prior('uniform', -2, 4)\n", + "prior2 = elfi.Prior('uniform', 1, 4)\n", + "prior3 = elfi.Prior('uniform', -3, 4)\n", + "prior4 = elfi.Prior('uniform', -2, 4)\n", + "prior5 = elfi.Prior('uniform', -6, 4)\n", + "prior6 = elfi.Prior('uniform', -1, 4)\n", + "\n", + "# Set the generating parameters that we will try to infe\n", + "mean0 = [1]\n", + "mean1 = [3]\n", + "mean2 = [-2]\n", + "mean3 = [0]\n", + "mean4 = [-4]\n", + "mean5 = [0]\n", + "\n", + "cov = datasets.make_spd_matrix(6)\n", + "\n", + "\n", + "# Simulator for 6 variate distribution\n", + "def simulator(mean0, mean1, mean2, mean3, mean4, mean5, random_state=None, batch_size=1):\n", + " assert batch_size == 1, \" Batch sizes other than 0 won't work :(\"\n", + " mean0, mean1, mean2, mean3, mean4, mean5 = np.atleast_1d(mean0, mean1, mean2, mean3, mean4, mean5)\n", + " \n", + " # wrap means to 1d array\n", + " means = [] \n", + " means.append(mean0[0])\n", + " means.append(mean1[0])\n", + " means.append(mean2[0])\n", + " means.append(mean3[0])\n", + " means.append(mean4[0])\n", + " means.append(mean5[0])\n", + " \n", + " # following takes global variable cov as input, as I didn't figure out how to pass constants in Elfi\n", + " return sp.stats.multivariate_normal.rvs(mean=means, cov=cov, size=(batch_size, 30), random_state=random_state)\n", + " \n", + "\n", + "def mean(y, i=0, j=None):\n", + " # NoteToSelf: y is not a numpy array\n", + " # NoteToSelf: if batch_size = 1 elfi uses different dimensions.\n", + " #print(np.shape(np.mean(y[i, :])), np.shape(np.mean(np.mean(y[i:j, :], axis=1))))\n", + " #print(\"debug\")\n", + " if j == None:\n", + " return np.mean(y[i, :]) # this need something \n", + " else:\n", + " return np.mean(np.mean(y[i:j, :], axis=1)) # NoteToSelf: axis=0 or axis=1 ?\n", + "\n", + " \n", + "# Generate some data\n", + "y0 = simulator(mean0, mean1, mean2, mean3, mean4, mean5)\n", + "\n", + "# Add the simulator node and observed data to the model\n", + "sim = elfi.Simulator(simulator, prior1, prior2, prior3, prior4, prior5, prior6, observed=y0)\n", + "\n", + "# Add summary statistics to the model\n", + "S1 = elfi.Summary(mean, sim, 0)\n", + "S2 = elfi.Summary(mean, sim, 1)\n", + "S3 = elfi.Summary(mean, sim, 2)\n", + "S4 = elfi.Summary(mean, sim, 3)\n", + "S5 = elfi.Summary(mean, sim, 4)\n", + "S6 = elfi.Summary(mean, sim, 5)\n", + "\n", + "# Specify distance as euclidean between summary vectors (S1, S2) from simulated and\n", + "# observed data\n", + "\n", + "d = elfi.Distance('euclidean', S1, S2, S3, S4, S5, S6)\n", + "\n", + "pool = elfi.OutputPool(['prior1', 'prior2', 'prior3', 'prior4', 'prior5', 'prior6', 'S1', 'S2', 'S3', 'S4', 'S5', 'S6'])\n", + "\n", + "amountOfSimulations = []\n", + "amountOfDimensions = [] \n", + "\n", + "tolerance = 0.5\n", + "N = 100\n" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "6 Method: Rejection\n", + "Number of samples: 100\n", + "Number of simulations: 199485\n", + "Threshold: 0.5\n", + "Sample means: prior1: -0.31, prior2: 2.7, prior3: -1.35, prior4: -0.203, prior5: -4.39, prior6: 0.626\n", + "\n" + ] + } + ], + "source": [ + "# NoteToSelf: why seed is passed again?\n", + "rej = elfi.Rejection(d, batch_size=1, seed=30052017, pool=pool) \n", + "res = rej.sample(N, threshold=tolerance)\n", + "print(\"6\", res)\n", + "\n", + "amountOfSimulations.append(res.n_sim)\n", + "amountOfDimensions.append(6)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now change summary statistic and generate samples for them." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "3 Method: Rejection\n", + "Number of samples: 100\n", + "Number of simulations: 4337\n", + "Threshold: 0.498\n", + "Sample means: prior1: -0.519, prior2: 2.74, prior3: -1.58, prior4: -0.345, prior5: -4.79, prior6: 0.516\n", + "\n", + "3 Method: Rejection\n", + "Number of samples: 100\n", + "Number of simulations: 1862\n", + "Threshold: 0.499\n", + "Sample means: prior1: -0.299, prior2: 3.41, prior3: -1.37, prior4: 0.11, prior5: -4.51, prior6: 0.653\n", + "\n", + "2 Method: Rejection\n", + "Number of samples: 100\n", + "Number of simulations: 570\n", + "Threshold: 0.496\n", + "Sample means: prior1: -0.527, prior2: 2.99, prior3: -1.65, prior4: -0.314, prior5: -4.6, prior6: 0.588\n", + "\n", + "2 Method: Rejection\n", + "Number of samples: 100\n", + "Number of simulations: 333\n", + "Threshold: 0.488\n", + "Sample means: prior1: -0.411, prior2: 2.64, prior3: -1.3, prior4: -0.401, prior5: -4.47, prior6: 0.361\n", + "\n", + "1 Method: Rejection\n", + "Number of samples: 100\n", + "Number of simulations: 191\n", + "Threshold: 0.499\n", + "Sample means: prior1: -0.373, prior2: 2.77, prior3: -1.3, prior4: -0.414, prior5: -4.24, prior6: 0.555\n", + "\n" + ] + } + ], + "source": [ + "# Three dimension summary statistic\n", + "\n", + "S12 = elfi.Summary(mean, sim, 0, 1)\n", + "S34 = elfi.Summary(mean, sim, 2, 3)\n", + "S56 = elfi.Summary(mean, sim, 4, 5)\n", + "\n", + "d.become(elfi.Distance('euclidean', S12, S34, S56))\n", + "\n", + "rej = elfi.Rejection(d, batch_size=1, seed=30052017, pool=pool) \n", + "res = rej.sample(N, threshold=tolerance)\n", + "print(\"3\", res)\n", + "\n", + "amountOfSimulations.append(res.n_sim)\n", + "amountOfDimensions.append(3)\n", + "\n", + "# Three dimension summary statistic ((different than previous)\n", + "\n", + "S1 = elfi.Summary(mean, sim, 0)\n", + "S2345 = elfi.Summary(mean, sim, 1, 4)\n", + "S6 = elfi.Summary(mean, sim, 5)\n", + "\n", + "\n", + "d.become(elfi.Distance('euclidean', S1, S2345, S6))\n", + "\n", + "rej = elfi.Rejection(d, batch_size=1, seed=30052017, pool=pool) \n", + "res = rej.sample(N, threshold=tolerance)\n", + "print(\"3\", res)\n", + "\n", + "amountOfSimulations.append(res.n_sim)\n", + "amountOfDimensions.append(3)\n", + "\n", + "# Two dimension summary statistic\n", + "\n", + "S12 = elfi.Summary(mean, sim, 0, 1)\n", + "S3456 = elfi.Summary(mean, sim, 2, 5)\n", + "\n", + "d.become(elfi.Distance('euclidean', S12, S3456))\n", + "\n", + "rej = elfi.Rejection(d, batch_size=1, seed=30052017, pool=pool) \n", + "res = rej.sample(N, threshold=tolerance)\n", + "print(\"2\", res)\n", + "\n", + "amountOfSimulations.append(res.n_sim)\n", + "amountOfDimensions.append(2)\n", + "\n", + "\n", + "# Two dimension summary statistic (different than previous)\n", + "\n", + "S123 = elfi.Summary(mean, sim, 0, 2)\n", + "S456 = elfi.Summary(mean, sim, 3, 5)\n", + "\n", + "d.become(elfi.Distance('euclidean', S123, S456))\n", + "\n", + "rej = elfi.Rejection(d, batch_size=1, seed=30052017, pool=pool) \n", + "res = rej.sample(N, threshold=tolerance)\n", + "print(\"2\", res)\n", + "\n", + "amountOfSimulations.append(res.n_sim)\n", + "amountOfDimensions.append(2)\n", + "\n", + "# One dimension summary statistic\n", + "\n", + "S1233456 = elfi.Summary(mean, sim, 0, 5)\n", + "\n", + "d.become(elfi.Distance('euclidean', S1233456))\n", + "\n", + "rej = elfi.Rejection(d, batch_size=1, seed=30052017, pool=pool) \n", + "res = rej.sample(N, threshold=tolerance)\n", + "print(\"1\", res)\n", + "\n", + "amountOfSimulations.append(res.n_sim)\n", + "amountOfDimensions.append(1)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[]" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXwAAAEOCAYAAACKDawAAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAGRFJREFUeJzt3X10XNV57/HvT0hILZFxCjaNQ7F5CRBKgqHCkMBNMDeLQpKaQItbXkqhNDRNmkC4CZCWNCSX3kK5ZFHatMWFQMJbLxdIA128FZcXJ6UYGQg4AeMWHOKEW8tAsCGxQOi5f5wjGIQ02pJmSzM6v89as2bmnH32eWaO9MyePfvso4jAzMxmvrbpDsDMzKaGE76ZWUU44ZuZVYQTvplZRTjhm5lVhBO+mVlFOOGbmVWEE76ZWUU44ZuZVYQTvplZRbRPdwC1tt9++1iwYMF0h2Fm1jJWrVq1MSLmpJRtqoS/YMECent7pzsMM7OWIemHqWXdpWNmVhFO+GZmFeGEb2ZWEU74ZmYVkTXhS5ot6QZJT0h6XNL7cu7PzKzVbNi0haWX3s+GzVuy7yt3C/+vgNsjYk9gH+DxzPszM2splyxfy4PrnueSu9Zm35dyXeJQ0izge8AukbiTnp6e8LBMM6uCPc65jf6Bwbcs72xvY815RyTXI2lVRPSklM3Zwt8F6AOukPSwpMskbTO8kKRTJfVK6u3r68sYjplZ81hx5mKWLJxHV0eRhrs62jhy4TxWnLU42z5zJvx2YD/g7yJiX+Bl4OzhhSJiWUT0RETPnDlJJ4uZmbW8ubO66O5sp39gkM72NvoHBunubGdud1e2feY803Y9sD4iHiif38AICd/MrKo2vtTP8QfM57hFO3Htymfoy/zDbbaEHxH/T9KPJO0REWuA/w78INf+zMxazaW/+0bX+3kf2zv7/nLPpfNp4BpJWwNPASdn3p+ZmY0ia8KPiEeApF+PzcwsL59pa2ZWEU74ZmYV4YRvZlYRTvhmZhXhhG9mVhFO+GZmFeGEb2ZWEU74ZmYV4YRvZlYRTvhmZhXhhG9mVhFO+GZmFeGEb2ZWEU74ZmYV4YRvZlYRTvhmZhXhhG9mVhFjJnxJO0vqqnn+C5IW5AzKzMwaL6WF/3+BwZrnr5XLzMyshaQk/PaIeGXoSfl463whmZlZDikJv0/SkqEnko4ENuYLyczMcmhPKPMJ4BpJfwMI+BFwYtaozMys4cZM+BHxn8CBkt4GKCI25w/LzMwabdSEL+mEiLha0hnDlgMQEV/NHJuZmTVQvRb+NuV991QEYmZmeY2a8CPi0vL+y1MXjpmZ5ZJy4tVfSpolqUPSckkbJZ0wFcGZmVnjpAzLPCwiNgEfBdYDuwOfzxqVmZk1XErC7yjvPwxcFxHPZ4zHzMwySRmHf4ukJ4CfA5+UNAfYkjcsMzNrtJRx+GdLugDYFBGvSXoZODKlcknrgM0U8+8MRETPZII1M7OJqzcO/9CI+FdJR9csqy1yU+I+FkeEp2IwM5tm9Vr4HwT+FfiNEdYF6QnfzMyaQL1x+F8qH34lIp6uXSdp58T6A7hTUgCXRsSyiYVpZmaTlTJK58YRlt2QWP9BEbEfcATwKUkfGF5A0qmSeiX19vX1JVZrZmbjVa8Pf0/gV4Fta/vxgVlA18hbvVlE/KS83yDpW8Ai4L5hZZYBywB6enpiXNGbmVmyen34e1CcbDWbN/fjbwY+PlbFkrYB2iJic/n4MOArk4jVzMwmoV4f/reBb0t6X0TcP4G6dwC+VY7saQeujYjbJxammZlNVsqJVw9L+hRF987rXTkR8fv1NoqIp4B9JheemZk1SsqPtlcBvwz8OnAvsCNFt46ZmbWQlIS/W0R8EXg5Ir4BfAR4T96wzMys0VIS/qvl/U8l7Q1sCyzIFpGZmWWR0oe/TNLbgXOAm4G3AV/MGpWZmTVcSsJfHhEvUIyf3wXGdaatmZk1idxn2pqZWZPIeqatmZk1j2xn2pqZWXPJeaatmZk1kZQ+/KMkzZLUIWm5pI2STsgemZmZNVRKwj8sIjZRdO+sB3YHPp81KjMza7iUhN9R3n8YuC4ins8Yj5mZZZIyDv8WSU8APwc+KWkOsCVvWGZm1mhjtvAj4mzgfUBPRLwK/Aw4MndgZmbWWCktfMozbYcevwy8nC0iMzPLIqUP38zMZgAnfDOzihgz4Uu6UdJHJPnDwcyshaUk8b8DjgPWSjq/nGPHzMxaTMoonbsi4nhgP2Ad8C+S/k3SyZI66m9tZmbNIqmbRtJ2wEnAHwAPA39F8QHwL9kiMzOzhhpzWKakm4A9KS5m/hsR8Wy56v9I6s0ZnJmZNU7dhF/+UPtIRBw90vqI6MkSlZmZNVzdLp2IGASOmKJYzMwso5Q+/Dsl/aYkZY/GzMyySZla4QxgG2BA0hZAQETErKyRmZlZQ42Z8COieyoCMTOzvJImT5P0duBd1Fy8PCLuyxWUmZk1XsqwzD8ATgN2BB4BDgTuBw7NG5qZmTVSyo+2pwH7Az+MiMXAvkBf1qjMzKzhUhL+lojYAiCpMyKeAPbIG5aZmTVaSh/+ekmzgX+imEfnBeAnqTuQtBXQC/w4Ij46sTDNzGyyUkbpHFU+PFfS3cC2wO3j2MdpwOOAh3GamU2j1MnT3i7pvcBmYD2wd+J2OwIfAS6bcIRmZtYQKaN0/ifFTJlPAYPl4iBtlM7FwJmAx/KbmU2zlD78pcCuEfHKeCqW9FFgQ0SsknRInXKnAqcC7LTTTuPZhZmZjUNKl85qYPYE6j4IWCJpHfCPwKGSrh5eKCKWRURPRPTMmTNnArsxM7MUKS38vwAelrQa6B9aGBFL6m0UEV8AvgBQtvA/FxEnTDxUMzObjJSE/w3gAuAx3ujDNzOzFpOS8DdGxCWT2UlE3APcM5k6zMxsclIS/ipJfwHczJu7dB7KFpWZmTVcSsLft7w/sGZZ6rBMMzNrEiln2i6eikDMzCyvlBOvZgMnAgtqy0fEZ/KFZWZmjZbSpXMr8O94lI6ZWUtLSfhdEXFG9kjMzCyrlDNtr5L0cUnvkPRLQ7fskZmZWUOltPBfAS4E/pRidA7l/S65gjIzs8ZLSfhnALtFxMbcwZiZWT4pXTrfB36WOxAzM8srpYX/GvBIebWr2jNtPSzTzKyFpCT8fypvZmbWwlLOtP3GVARiZmZ5pZxp+zRvjM55XUR4lI6ZWQtJ6dLpqXncBRwDeBy+mVmLGXOUTkQ8V3P7cURcjGfKNDNrOSldOvvVPG2jaPF3Z4vIzMyySOnSuajm8QCwDliaJRozM8vG8+GbmVXEmH34kk6TNEuFyyQ9JOmwqQjOzMwaJ2Vqhd+PiE3AYcBc4GTg/KxRmZlZw6UkfJX3HwauiIjv1SwzM7MWkZLwV0m6kyLh3yGpG1/5ysys5aSM0jkFWAg8FRE/k7QdRbeOmZm1kJRROoPAQzXPnwOeyxmUmZk1XkqXjpmZzQCjJnxJO09lIGZmlle9Fv4NAJKWT1EsZmaWUb0+/DZJXwJ2l3TG8JUR8dV8YZmZWaPVa+H/DrCF4kOhe4SbmZm1kFFb+BGxBrhA0qMRcdt4K5bUBdwHdJb7uSEivjThSM3MbFJSRun8m6SvSuotbxdJ2jZhu37g0IjYh2Ic/+GSDpxUtNb0NmzawtJL72fD5i3THcqbNGtcZlMpJeF/HdhMMSXyUmATcMVYG0XhpfJpR3l7y6USbWa5ZPlaHlz3PJfctXa6Q3mTZo3LbCopon4OlvRIRCwca9ko224FrAJ2A74WEWfVK9/T0xO9vb1jR21NZ49zbqN/4K0zbnS2t7HmvCOmIaJCs8Zl1iiSVkVEz9gl01r4P5d0cE3lBwE/T6k8Il4rPxh2BBZJ2nuEYE8d6i7q6+tLqdaa0IozF7Nk4Ty6Ooo/qa6ONo5cOI8VZ03v5RSaNS6z6ZAyl84ngG/W9Nu/APzeeHYSET+VdA9wOLB62LplwDIoWvjjqdeax9xZXXR3ttM/MEhnexv9A4N0d7Yzt7vLcZk1iZS5dL4H7CNpVvl8U0rFkuYAr5bJ/heADwEXTCZYa24bX+rn+APmc9yinbh25TP0NckPpM0al9lUG7MPf8IVS+8FvgFsRdF1dH1EfKXeNu7DNzMbn/H04ad06UxIRDwK7JurfjMzGx/PlmlmVhFJLXxJ7wcW1JaPiG9misnMzDIYM+FLugrYFXgEeK1cHIATvplZC0lp4fcAe0WuX3fNzGxKpPThrwZ+OXcgZmaWV0oLf3vgB5JWUkyIBkBELMkWlZmZNVxKwj83dxBmZpZfypm290raAdi/XLQyIjbkDcvMzBptzD58SUuBlcAxFNMjPyDpt3IHZmZmjZXSpfOnwP5Drfpyjpy7KC9ybmZmrSFllE7bsC6c5xK3MzOzJpLSwr9d0h3AdeXz3wZuzReSmZnlkPKj7ecl/SZwECBgWUR8K3tkZmbWUElz6UTEjcCNmWMxM7OMRk34kr4TEQdL2sybLz4uimuUz8oenZmZNcyoCT8iDi7vu6cuHDMzyyVlHP6ukjrLx4dI+oyk2flDMzOzRkoZXnkj8Jqk3YDLgZ2Ba7NGZWZmDZeS8AcjYgA4Crg4Ij4LvCNvWGZm1mgpCf9VSccCvwf8c7msI19IZmaWQ0rCPxl4H/DnEfG0pJ2Bq/OGZWZmjZZy4tUPgM/UPH8aOD9nUGZm1ngp17Q9iGJO/Pll+aFx+LvkDc3MzBop5Uzby4HPAqt44yLmZmbWYlIS/osRcVv2SMzMLKuUhH+3pAuBm3jzNW0fyhaVmZk1XErCP6C876lZFsChjQ/HzMxySRmls3gqAjEzs7xS5tLZQdLlkm4rn+8l6ZT8oZmZWSOlnHh1JXAHMK98/iRweq6AzMwsj5SEv31EXA8MApTz6nh4pplZi0lJ+C9L2o7yIiiSDgReHGsjSb8i6W5Jj0v6vqTTJhmrmZlNQkrCPwO4GdhV0neBbwKfTthuAPgfEfFu4EDgU5L2mnCk02jDpi0svfR+Nmze0pL1m5lBQsIvx9t/EHg/8IfAr0bEownbPTs0Vj8iNgOPA++cXLjT45Lla3lw3fNcctfalqzfzAxAEVG/gLQV8BFgATXDOCPiq8k7kRYA9wF7R8Sm0cr19PREb29varXZ7XHObfQPDL5leWd7G2vOO6Lp6zezmU/SqojoGbtkWpfOLcBJwHZAd80tNZi3UVw16/SRkr2kUyX1Surt6+tLrXZKrDhzMUsWzqOro3ibujraOHLhPFac1ZhTE3LXb2ZWK+VM2x0j4r0TqVxSB0WyvyYibhqpTEQsA5ZB0cKfyH5ymTuri+7OdvoHBulsb6N/YJDuznbmdne1RP1mZrVSEv5tkg6LiDvHU7EkUcy0+fh4un+azcaX+jn+gPkct2gnrl35DH0N/mE1d/1mZkNS+vCPorjCVRvwKm/Mhz9rjO0OBlYAj1GO4Qf+JCJuHW2bZuvDNzNrduPpw09p4V9EcYnDx2KsT4caEfEdig8HMzNrAik/2q4FVo8n2ZuZWfNJaeE/C9xTTp5WOx9+y/bLm5lVUUrCf7q8bV3ezMysBaXMh//lqQjEzMzyGjXhS7o4Ik6XdAvlxGm1ImJJ1sjMzKyh6rXwryrv//dUBGJmZnmNmvAjYlV5f6+kOeXj5pr7wMzMko06LFOFcyVtBJ4AnpTUJ+nPpi48MzNrlHrj8E8HDgL2j4jtIuLtwAHAQZI+OyXRmZlZw9RL+CcCx0bE00MLIuIp4IRynZmZtZB6Cb8jIjYOX1j243fkC8nMzHKol/BfmeA6MzNrQvWGZe4jaaSrUwnwhO1mZi2m3rDMraYyEDMzyytltkwzM5sBnPDNzCrCCd/MrCKc8M3MKsIJ38ysIpzwzcwqwgnfzKwinPDNzCrCCd/MrCKc8M3MKsIJ38ysIpzwzcwqwgnfzKwinPDNzCrCCd/MrCKc8M3MKiJbwpf0dUkbJK3OtQ8zM0uXs4V/JXB4xvpft2HTFpZeej8bNm+Zit2ZmbWkbAk/Iu4Dns9Vf61Llq/lwXXPc8lda6did2ZmLaneRcyb3h7n3Eb/wODrz69+4BmufuAZOtvbWHPeEdMYmZlZ85n2H20lnSqpV1JvX1/fuLZdceZiliycR1dH8TK6Oto4cuE8Vpy1OEeoZmYtbdoTfkQsi4ieiOiZM2fOuLadO6uL7s52+gcG6Wxvo39gkO7OduZ2d2WK1sysdbV0lw7Axpf6Of6A+Ry3aCeuXfkMff7h1sxsRIqIPBVL1wGHANsD/wV8KSIur7dNT09P9Pb2ZonHzGwmkrQqInpSymZr4UfEsbnqNjOz8Zv2PnwzM5saTvhmZhXhhG9mVhFO+GZmFeGEb2ZWEdmGZU6EpD7gh5OsZlvgxQaEM956UsuPVa7e+tHWjbR8+LLtgY0J8eXQiGMykTpyH5PJHA+YvmMyXf8j49nGxyTd/IhIO2s1ImbUDVg2HfWklh+rXL31o60bafnwZUBvKx+TidSR+5hM5nhM5zGZrv8RH5P8x2Ss20zs0rllmupJLT9WuXrrR1s30vJGvQ+N0IhYJlJH7mNS5eMx0Xp8TEY2JbE0VZeO5SOpNxLPxrOp4WPSfGb6MZmJLXwb2bLpDsDewsek+czoY+IWvplZRbiFb2ZWEU74ZmYV4YRvZlYRTvgVJendkv5e0g2S/mi647GCpG0krZL00emOxUDSIZJWlP8rh0x3PJPlhD+DSPq6pA2SVg9bfrikNZL+Q9LZABHxeER8AlgKzNhhaNNtPMekdBZw/dRGWS3jPCYBvAR0AeunOtZGc8KfWa4EDq9dIGkr4GvAEcBewLGS9irXLQG+Ayyf2jAr5UoSj4mkDwE/oLhCnOVzJen/Jysi4giKD+IvT3GcDeeEP4NExH3A88MWLwL+IyKeiohXgH8EjizL3xwR7weOn9pIq2Ocx2QxcCBwHPBxSf7/zGA8xyQiBsv1LwCdUxhmFi1/EXMb0zuBH9U8Xw8cUPZHHk3xR3zrNMRVZSMek4j4YwBJJwEba5KN5Tfa/8nRwK8Ds4G/mY7AGskJf+bTCMsiIu4B7pnaUKw04jF5/UHElVMXipVG+z+5CbhpqoPJxV8ZZ771wK/UPN8R+Mk0xWIFH5PmU4lj4oQ/8z0IvEvSzpK2Bn4HuHmaY6o6H5PmU4lj4oQ/g0i6Drgf2EPSekmnRMQA8MfAHcDjwPUR8f3pjLNKfEyaT5WPiSdPMzOrCLfwzcwqwgnfzKwinPDNzCrCCd/MrCKc8M3MKsIJ38ysIpzwrSEkhaSLap5/TtK5Dar7Skm/1Yi6xtjPMZIel3R3YvlbJc1u0L5fGmP9bEmfrHk+T9INjdi3VYcTvjVKP3C0pO2nO5Ba5bS3qU4BPhkRi1MKR8SHI+KnE4ts3GYDryf8iPhJRGT/ELSZxQnfGmUAWAZ8dviK4S30odZseTWheyVdL+lJSedLOl7SSkmPSdq1ppoPlVceenLoalCStpJ0oaQHJT0q6Q9r6r1b0rXAYyPEc2xZ/2pJF5TL/gw4GPh7SRcOK/8OSfdJeqTc5r+Vy9dJ2l7SAklPSLqsXH+NpA9J+q6ktZIWleXPlfS5mnpXS1owbF9vk7Rc0kNljEeWq84Hdi1juLDc5+pymy5JV5TlH5a0uFx+kqSbJN1exvGXNe/bleX+H5P0lmNmM5Nny7RG+hrw6FBiSbQP8G6K+cmfAi6LiEWSTgM+DZxellsAfBDYFbhb0m7AicCLEbG/pE7gu5LuLMsvAvaOiKdrdyZpHnAB8GsUc5zfKeljEfEVSYcCn4uI3mExHgfcERF/Xn5j+MURXsduwDHAqRTzshxH8QGyBPgT4GOJ78cW4KiI2FR+W/p3STcDZ5evZ2H5OhbUbPMpgIh4j6Q9y9e0e7luIbAvxTewNZL+GpgLvDMi9i7raki3lDU/t/CtYSJiE/BN4DPj2OzBiHg2IvqB/wSGEvZjFEl+yPURMRgRayk+GPYEDgNOlPQI8ACwHfCusvzK4cm+tD9wT0T0lfOnXAN8YKwYgZPL3yTeExGbRyjzdEQ8Vs5h/31geRTzlgx/HWMR8L8kPQrcRTFP+w5jbHMwcBVARDwB/BAYSvjLI+LFiNhCcTWt+RTv3y6S/lrS4cCmccRnLcwJ3xrtYoq+8G1qlg1Q/q1JErB1zbr+mseDNc8HefM30OGTPgVFcvx0RCwsbztHxNAHxsujxDfSvOd1lVdI+gDwY+AqSSeOUCzldbz+PpS6RqjneGAO8Gtla/6/RilXq95rqo3rNaA9Il6g+GZ1D8W3g8vGqN9mCCd8a6iIeJ7iItyn1CxeR9GFAsWl/DomUPUxktrKfv1dgDUUMxv+kaQOAEm7S9qmXiUU3wQ+WPa9bwUcC9xbbwNJ84ENEfEPwOXAfhOIH4r3Yb+yzv2AnUcos225r1fLvvj55fLNQPco9d5HeZnKsitnJ4r3Z0RlV1FbRNwIfJGJvx5rMe7Dtxwuophqdsg/AN+WtJLigumjtb7rWUORmHcAPhERWyRdRtFd8lD5zaGPMfrKI+JZSV8A7qZoGd8aEd8eY9+HAJ+X9CrwEsVvBxNxI290QT0IPDlCmWuAWyT1Ao8AT5RxP1f+CLwauI3i95Ihf0vxY/NjFN8iToqI/uItGdE7gSv0xjVzvzDB12MtxtMjm5lVhLt0zMwqwgnfzKwinPDNzCrCCd/MrCKc8M3MKsIJ38ysIpzwzcwqwgnfzKwi/j8Eosyw8DlZQwAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plt.figure()\n", + "plt.xlabel('Number of simulations')\n", + "plt.ylabel('Dimension of summary statisic')\n", + "plt.semilogx(amountOfSimulations, amountOfDimensions, \"*\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Over all One 6 dimensional, two 3-dimensional, two 2-dimensional and one 1-dimensional summary statistic were used. Results are illustrated in plot above and each simulation are marked with star. As x-axis is logarithmic in plot above there is exponential relationship, as theory suggest, between dimension of summary statistic and number of simulations. It was intuitive for me at start, but using summary statistic that summarize more dimensions (and loose much more information) were incredibly faster, while maintaining same threshold level. \n", + "\n", + "## Different distance\n", + "\n", + "Sometimes it is not trivial or even possible to figure out different summary statistic for ABC model. However, it is possible to use different distances. Let’s make a brief scratch on those! \n", + "\n", + "Following defines same MA2 model and rejecting sampling similarly as they were used in ELFI introduction. It will also define autocorrelation as an alternative for autocovariance. For both summary statistic both Euclidian and Manhattan distance is used.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "import scipy.stats\n", + "\n", + "seed = 20170530\n", + "np.random.seed(seed)\n", + "\n", + "N = 100\n", + "\n", + "threshold = 0.5\n", + "\n", + "def MA2(t1, t2, n_obs=100, batch_size=1, random_state=None):\n", + " # Make inputs 2d arrays for numpy broadcasting with w\n", + " t1 = np.asanyarray(t1).reshape((-1, 1))\n", + " t2 = np.asanyarray(t2).reshape((-1, 1))\n", + " random_state = random_state or np.random\n", + "\n", + " w = random_state.randn(batch_size, n_obs+2) # i.i.d. sequence ~ N(0,1)\n", + " x = w[:, 2:] + t1*w[:, 1:-1] + t2*w[:, :-2]\n", + " return x\n", + "\n", + "\n", + "# define prior for t1 as in Marin et al., 2012 with t1 in range [-b, b]\n", + "class CustomPrior_t1(elfi.Distribution):\n", + " def rvs(b, size=1, random_state=None):\n", + " u = sp.stats.uniform.rvs(loc=0, scale=1, size=size, random_state=random_state)\n", + " t1 = np.where(u<0.5, np.sqrt(2.*u)*b-b, -np.sqrt(2.*(1.-u))*b+b)\n", + " \n", + " return t1\n", + "\n", + "\n", + "# define prior for t2 conditionally on t1 as in Marin et al., 2012, in range [-a, a]\n", + "class CustomPrior_t2(elfi.Distribution):\n", + " def rvs(t1, a, size=1, random_state=None):\n", + " locs = np.maximum(-a-t1, t1-a)\n", + " scales = a - locs\n", + " t2 = sp.stats.uniform.rvs(loc=locs, scale=scales, size=size, random_state=random_state)\n", + " return t2\n", + "\n", + "\n", + "def autocov(x, lag=1):\n", + " # BOF: introduction doesn't tfollow pep8, this happens elswhere too\n", + " C = np.mean(x[:,lag:] * x[:,:-lag], axis=1)\n", + " #print(C.shape)\n", + "\n", + " \n", + " return C\n", + "\n", + "\n", + "def autocorr(x):\n", + " # print(x.shape)\n", + " #result = np.correlate(x[0,:], x[0,:], mode='full')\n", + " #return result[result.size/2:]\n", + " \n", + " #print(x.shape)\n", + " ret = np.correlate(x[0,:], x[0,:], mode='same')\n", + " \n", + " #print(ret.shape)\n", + "\n", + " return np.mean(ret, axis=0)\n", + "\n", + "\n", + "\n", + "# true parameters\n", + "t1_true = 0.6\n", + "t2_true = 0.2\n", + "\n", + "y_obs = MA2(t1_true, t2_true)\n", + "\n", + "\n", + "t1 = elfi.Prior(CustomPrior_t1, 2)\n", + "t2 = elfi.Prior(CustomPrior_t2, t1, 1)\n", + "\n", + "Y = elfi.Simulator(MA2, t1, t2, observed=y_obs)\n", + "\n", + "pool = elfi.OutputPool(['t1', 't2', 'S1', 'S2'])\n", + "resultsArray = []\n", + "\n", + "S1 = elfi.Summary(autocov, Y)\n", + "S2 = elfi.Summary(autocov, Y, 2) # the optional keyword lag is given the value 2\n", + "d = elfi.Distance('euclidean', S1, S2)\n", + "\n", + "batch_size = 1" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Starting rejection sampling...\n", + "One rejection samplng done.\n", + "Two rejection samplng done.\n", + "\n", + "Three rejection samplng done.\n", + "Four rejection samplng done.\n", + "\n" + ] + } + ], + "source": [ + "print(\"Starting rejection sampling...\")\n", + "\n", + "rej = elfi.Rejection(d, pool=pool, batch_size=batch_size, seed=seed)\n", + "result = rej.sample(N, threshold=threshold)\n", + "resultsArray.append(result.n_sim)\n", + "\n", + "print(\"One rejection samplng done.\")\n", + "\n", + "# Replace the current distance with a cityblock (manhattan) distance and recreate the inference\n", + "\n", + "d.become(elfi.Distance('cityblock', S1, S2, p=1))\n", + "rej = elfi.Rejection(d, pool=pool, batch_size=batch_size, seed=seed) # pool =10000\n", + "result = rej.sample(N, threshold=threshold)\n", + "resultsArray.append(result.n_sim)\n", + "\n", + "print(\"Two rejection samplng done.\")\n", + "\n", + "S1.become(elfi.Summary(autocorr, Y))\n", + "S2.become(elfi.Summary(autocorr, Y))\n", + "d.become(elfi.Distance('euclidean', S1, S2, p=1))\n", + "rej = elfi.Rejection(d, pool=pool, batch_size=batch_size, seed=seed)\n", + "result = rej.sample(N, threshold=threshold)\n", + "print(result.summary)\n", + "\n", + "print(\"Three rejection samplng done.\")\n", + "\n", + "d.become(elfi.Distance('cityblock', S1, S2, p=1))\n", + "rej = elfi.Rejection(d, pool=pool, batch_size=batch_size, seed=seed)\n", + "result = rej.sample(N, threshold=threshold)\n", + "resultsArray.append(result.n_sim)\n", + "\n", + "print(\"Four rejection samplng done.\")\n", + "print(result.summary)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As can be see model with autocorrelation as summary statistic produces wrong means. Clearly one cannot use any summary statistic and expect convergence.\n", + "\n", + "Next Sequential Monte Carlo ABC is used with different distance metrics." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:elfi.methods.parameter_inference:---------------- Starting round 0 ----------------\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Starting Sequential Monte Carlo samplng...\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:elfi.methods.parameter_inference:---------------- Starting round 1 ----------------\n", + "WARNING:elfi.methods.utils:SMC: It appears to be difficult to find enough valid proposals with prior pdf > 0. ELFI will keep trying, but you may wish to kill the process and adjust the model priors.\n", + "WARNING:elfi.methods.utils:SMC: It appears to be difficult to find enough valid proposals with prior pdf > 0. ELFI will keep trying, but you may wish to kill the process and adjust the model priors.\n", + "INFO:elfi.methods.parameter_inference:---------------- Starting round 2 ----------------\n", + "WARNING:elfi.methods.utils:SMC: It appears to be difficult to find enough valid proposals with prior pdf > 0. ELFI will keep trying, but you may wish to kill the process and adjust the model priors.\n", + "INFO:elfi.methods.parameter_inference:---------------- Starting round 0 ----------------\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "One SMC smapling done.\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:elfi.methods.parameter_inference:---------------- Starting round 1 ----------------\n", + "WARNING:elfi.methods.utils:SMC: It appears to be difficult to find enough valid proposals with prior pdf > 0. ELFI will keep trying, but you may wish to kill the process and adjust the model priors.\n", + "WARNING:elfi.methods.utils:SMC: It appears to be difficult to find enough valid proposals with prior pdf > 0. ELFI will keep trying, but you may wish to kill the process and adjust the model priors.\n", + "INFO:elfi.methods.parameter_inference:---------------- Starting round 2 ----------------\n", + "WARNING:elfi.methods.utils:SMC: It appears to be difficult to find enough valid proposals with prior pdf > 0. ELFI will keep trying, but you may wish to kill the process and adjust the model priors.\n", + "WARNING:elfi.methods.utils:SMC: It appears to be difficult to find enough valid proposals with prior pdf > 0. ELFI will keep trying, but you may wish to kill the process and adjust the model priors.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Two SMC smapling done.\n" + ] + } + ], + "source": [ + "# define prior for t1 as in Marin et al., 2012 with t1 in range [-b, b]\n", + "class CustomPrior_t1(elfi.Distribution):\n", + " def rvs(b, size=1, random_state=None):\n", + " u = scipy.stats.uniform.rvs(loc=0, scale=1, size=size, random_state=random_state)\n", + " t1 = np.where(u<0.5, np.sqrt(2.*u)*b-b, -np.sqrt(2.*(1.-u))*b+b)\n", + " return t1\n", + "\n", + " def pdf(x, b):\n", + " p = 1./b - np.abs(x) / (b*b)\n", + " p = np.where(p < 0., 0., p) # disallow values outside of [-b, b] (affects weights only)\n", + " return p\n", + "\n", + "\n", + "# define prior for t2 conditionally on t1 as in Marin et al., 2012, in range [-a, a]\n", + "class CustomPrior_t2(elfi.Distribution):\n", + " def rvs(t1, a, size=1, random_state=None):\n", + " locs = np.maximum(-a-t1, t1-a)\n", + " scales = a - locs\n", + " t2 = scipy.stats.uniform.rvs(loc=locs, scale=scales, size=size, random_state=random_state)\n", + " return t2\n", + "\n", + " def pdf(x, t1, a):\n", + " locs = np.maximum(-a-t1, t1-a)\n", + " scales = a - locs\n", + " p = scipy.stats.uniform.pdf(x, loc=locs, scale=scales)\n", + " p = np.where(scales>0., p, 0.) # disallow values outside of [-a, a] (affects weights only)\n", + " return p\n", + "\n", + "schedule = [0.7, 0.2, 0.05]\n", + "N2 = 100 # 1000000\n", + "\n", + " \n", + "print(\"Starting Sequential Monte Carlo samplng...\")\n", + "\n", + "# Redefine the priors\n", + "t1.become(elfi.Prior(CustomPrior_t1, 2, model=t1.model))\n", + "t2.become(elfi.Prior(CustomPrior_t2, t1, 1))\n", + "S1.become(elfi.Summary(autocov, Y))\n", + "S2.become(elfi.Summary(autocov, Y))\n", + "d.become(elfi.Distance('euclidean', S1, S2))\n", + "smc = elfi.SMC(d, pool=pool, batch_size=1, seed=seed)\n", + "result_smc = smc.sample(N2, schedule)\n", + "# result_smc.sample_means_summary(all=True)\n", + "resultsArray.append(result_smc.n_sim)\n", + "\n", + "print(\"One SMC smapling done.\")\n", + "\n", + "d.become(elfi.Distance('cityblock', S1, S2))\n", + "smc = elfi.SMC(d, pool=pool, batch_size=1, seed=seed)\n", + "result_smc = smc.sample(N2, schedule)\n", + "resultsArray.append(result_smc.n_sim)\n", + "\n", + "print(\"Two SMC smapling done.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally print how many simulations were needed to achieve samples with same threshold. First two are for rejection sampling, where first Euclidian distance is used then Manhattan distance. Third is rejection sampling with autocorrelation as a summary statistc. Last two are for SMC, where first Euclidian distance is used then Manhattan distance." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[614, 1042, 9498, 3076, 3792]\n" + ] + } + ], + "source": [ + "print(resultsArray)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As can be seen from saved number of iterations model with poor distance metric will use much more simulations. And with silly and unworking summary statistic even more runs are needed to get wrong results. \n", + "\n", + "## Final notes on ELFI\n", + "\n", + "In gneral ELFI feels good but unstable enviroment for ABC computing. For example enabling multicore computing for this notebook produces following error on servers console and python kernel will need reboot to recover, I didn't figure clear reason for this, but in general using pools and multiple cores felt like road to misery:\n", + "\n", + "```\n", + "Process SpawnPoolWorker-7:\n", + "Traceback (most recent call last):\n", + " File \"C:\\ProgramData\\Anaconda3\\lib\\multiprocessing\\process.py\", line 258, in _bootstrap\n", + " self.run()\n", + " File \"C:\\ProgramData\\Anaconda3\\lib\\multiprocessing\\process.py\", line 93, in run\n", + " self._target(*self._args, **self._kwargs)\n", + " File \"C:\\ProgramData\\Anaconda3\\lib\\multiprocessing\\pool.py\", line 108, in worker\n", + " task = get()\n", + " File \"C:\\ProgramData\\Anaconda3\\lib\\multiprocessing\\queues.py\", line 337, in get\n", + " return _ForkingPickler.loads(res)\n", + "AttributeError: Can't get attribute 'simulator' on \n", + "```\n", + "\n", + "In addition I encoutred other ELFI related bugs and problems in basic operations like runing python file with ipython (see issues [267](https://github.com/elfi-dev/elfi/issues/267) and [268](https://github.com/elfi-dev/elfi/issues/268).\n", + "\n", + "Also many times it is not evident what is data type or what is dimension of several objects that are manipulated in ELFI related operations. Example of these are marked with NoteToSelf in commnets in code cells. " + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.5" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}