Skip to content

study-groups/papers-study-group

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

papers-study-group

Papers, historical and otherwise.

  • 1948 The Mathematical Theory of Communication by Claude Shannon."The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that the actual message is one selected from a set of possible messages. The system must be designed to operate for each possible selection, not just the one which will actually be chosen since this is unknown at the time of design."

  • 1949 The Monte Carlo Method by Metropolis and Ulam. September 1949 published in the Journal of The American Statistical Association.

  • 1953 Equations Of State Calculations by Metropolis and Teller. The method consists of a modifed Monte Carlo integration over configuration space.

  • 1960 A New Approach to Linear Filtering and Prediction Problems by R.E. Kalman. The classical filtering and prediction problem is re-examined using the BodeShannon representation of random processes and the “state transition” method of analysis of dynamic systems. New results are: (1) The formulation and methods of solution of the problem apply without modification to stationary and nonstationary statistics and to growing-memory and infinitememory filters. (2) A nonlinear difference (or differential) equation is derived for the covariance matrix of the optimal estimation error. From the solution of this equation the coefficients of the difference (or differential) equation of the optimal linear filter are obtained without further calculations. (3) The filtering problem is shown to be the dual of the noise-free regulator problem. The new method developed here is applied to two well-known problems, confirming and extending earlier results. The discussion is largely self-contained and proceeds from first principles; basic concepts of the theory of random processes are reviewed in the Appendix.

  • 1974 Spurious Regressions In Econometrics Journal of Econometrics. The point of view we intend to take is that of the statistical time series analyst, rather than the more classic econometric approach. In this way it is hoped that we might be able to illuminate the problem from a new angle, and hence perhaps present new insights. Accordingly, in the following section we summarize some relevant results in time series analysis. In sect. 3 we indicate how nonsense regressions relating economic time series can arise, and illustrate these points in sect. 4 with the results of a simulation study. Finally, in sect. 5, we re-emphasize the importance of error specification and draw a distinction between the philosophy of time series analysis and econometric methodology, which we feel to be of great importance to practitioners of the latter.

  • 1990 Why Functional Programming Matters by John Hughes. As software becomes more and more complex, it is more and more important to structure it well. Well-structured software is easy to write and to debug, and provides a collection of modules that can be reused to reduce future programming costs. In this paper we show that two features of functional languages in particular, higher-order functions and lazy evaluation, can contribute significantly to modularity. As examples, we manipulate lists and trees, program several numerical algorithms, and implement the alpha-beta heuristic (an algorithm from Artificial Intelligence used in game-playing programs). We conclude that since modularity is the key to successful programming, functional programming offers important advantages for software development.

  • 2001 A preprocessing scheme for high-cardinality categorical attributes in classification and prediction problems by Daniele Micci-Barreca. Categorical data fields characterized by a large number of distinct values represent a serious challenge for many classification and regression algorithms that require numerical inputs. On the other hand, these types of data fields are quite common in real-world data mining applications and often contain potentially relevant information that is difficult to represent for modeling purposes.This paper presents a simple preprocessing scheme for high-cardinality categorical data that allows this class of attributes to be used in predictive models such as neural networks, linear and logistic regression. The proposed method is based on a well-established statistical method (empirical Bayes) that is straightforward to implement as an in-database procedure. Furthermore, for categorical attributes with an inherent hierarchical structure, like ZIP codes, the preprocessing scheme can directly leverage the hierarchy by blending statistics at the various levels of aggregation.While the statistical methods discussed in this paper were first introduced in the mid 1950's, the use of these methods as a preprocessing step for complex models, like neural networks, has not been previously discussed in any literature.

  • 2002: The origins of Logistic Regression by J.S. Cramer. This paper describes the origins of thelogistic function, its adoption in bio-assay, and its wider acceptance in statistics. Its roots spread far back to the early 19th century, the sruvial of the term logistc and the wide application of the device have been determined decisively by the personal history and individual action of a few scholars.

  • 2011 Category Theoretic Analysis of Hierarchical Protein Materials and Social Networks by David I. Spivak, Tristan Giesa, Elizabeth Wood, Markus J. Buehler. "Here we describe an application of category theory to describe structural and resulting functional properties of biological protein materials by developing so-called ologs. An olog is like a ‘‘concept web’’ or ‘‘semantic network’’ except that it follows a rigorous mathematical formulation based on category theory."

  • 2014 A Tutorial on Principal Component Analysis by Jonathon Shlens. Principal component analysis (PCA) is a mainstay of modern data analysis - a black box that is widely used but (sometimes) poorly understood. The goal of this paper is to dispel the magic behind this black box. This manuscript focuses on building a solid intuition for how and why principal component analysis works. This manuscript crystallizes this knowledge by deriving from simple intuitions, the mathematics behind PCA. This tutorial does not shy away from explaining the ideas informally, nor does it shy away from the mathematics. The hope is that by addressing both aspects, readers of all levels will be able to gain a better understanding of PCA as well as the when, the how and the why of applying this technique.

Notable blog posts

About

Papers, historical and otherwise.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published