3
3.0

Jun 30, 2018
06/18

by
Che-Yu Liu; Sébastien Bubeck

texts

#
eye 3

#
favorite 0

#
comment 0

We study the problem of finding the most mutually correlated arms among many arms. We show that adaptive arms sampling strategies can have significant advantages over the non-adaptive uniform sampling strategy. Our proposed algorithms rely on a novel correlation estimator. The use of this accurate estimator allows us to get improved results for a wide range of problem instances.

Topics: Machine Learning, Computing Research Repository, Statistics, Learning

Source: http://arxiv.org/abs/1404.5903

7
7.0

Jun 30, 2018
06/18

by
Hana Ajakan; Pascal Germain; Hugo Larochelle; François Laviolette; Mario Marchand

texts

#
eye 7

#
favorite 0

#
comment 0

We introduce a new representation learning algorithm suited to the context of domain adaptation, in which data at training and test time come from similar but different distributions. Our algorithm is directly inspired by theory on domain adaptation suggesting that, for effective domain transfer to be achieved, predictions must be made based on a data representation that cannot discriminate between the training (source) and test (target) domains. We propose a training objective that implements...

Topics: Machine Learning, Neural and Evolutionary Computing, Computing Research Repository, Statistics,...

Source: http://arxiv.org/abs/1412.4446

3
3.0

Jun 29, 2018
06/18

by
Arash Shahriari

texts

#
eye 3

#
favorite 0

#
comment 0

Pruning of redundant or irrelevant instances of data is a key to every successful solution for pattern recognition. In this paper, we present a novel ranking-selection framework for low-length but highly correlated instances. Instead of working in the low-dimensional instance space, we learn a supervised projection to high-dimensional space spanned by the number of classes in the dataset under study. Imposing higher distinctions via exposing the notion of labels to the instances, lets to deploy...

Topics: Machine Learning, Learning, Computer Vision and Pattern Recognition, Computing Research Repository,...

Source: http://arxiv.org/abs/1606.07575

4
4.0

Jun 30, 2018
06/18

by
Jiashi Feng; Huan Xu; Shie Mannor

texts

#
eye 4

#
favorite 0

#
comment 0

We propose a framework for distributed robust statistical learning on {\em big contaminated data}. The Distributed Robust Learning (DRL) framework can reduce the computational time of traditional robust learning methods by several orders of magnitude. We analyze the robustness property of DRL, showing that DRL not only preserves the robustness of the base robust learning method, but also tolerates contaminations on a constant fraction of results from computing nodes (node failures). More...

Topics: Machine Learning, Computing Research Repository, Statistics, Learning

Source: http://arxiv.org/abs/1409.5937

2
2.0

Jun 29, 2018
06/18

by
Michael C. Hughes; Huseyin Melih Elibol; Thomas McCoy; Roy Perlis; Finale Doshi-Velez

texts

#
eye 2

#
favorite 0

#
comment 0

Supervised topic models can help clinical researchers find interpretable cooccurence patterns in count data that are relevant for diagnostics. However, standard formulations of supervised Latent Dirichlet Allocation have two problems. First, when documents have many more words than labels, the influence of the labels will be negligible. Second, due to conditional independence assumptions in the graphical model the impact of supervised labels on the learned topic-word probabilities is often...

Topics: Machine Learning, Statistics

Source: http://arxiv.org/abs/1612.01678

2
2.0

Jun 29, 2018
06/18

by
Nikolaus Hansen; Anne Auger; Olaf Mersmann; Tea Tusar; Dimo Brockhoff

texts

#
eye 2

#
favorite 0

#
comment 0

COCO is a platform for Comparing Continuous Optimizers in a black-box setting. It aims at automatizing the tedious and repetitive task of benchmarking numerical optimization algorithms to the greatest possible extent. We present the rationals behind the development of the platform as a general proposition for a guideline towards better benchmarking. We detail underlying fundamental concepts of COCO such as its definition of a problem, the idea of instances, the relevance of target values, and...

Topics: Machine Learning, Artificial Intelligence, Numerical Analysis, Computing Research Repository,...

Source: http://arxiv.org/abs/1603.08785

5
5.0

Jun 29, 2018
06/18

by
Marco Scutari

texts

#
eye 5

#
favorite 0

#
comment 0

Bayesian network structure learning is often performed in a Bayesian setting, by evaluating candidate structures using their posterior probabilities for a given data set. Score-based algorithms then use those posterior probabilities as an objective function and return the maximum a posteriori network as the learned model. For discrete Bayesian networks, the canonical choice for a posterior score is the Bayesian Dirichlet equivalent uniform (BDeu) marginal likelihood with a uniform (U) graph...

Topics: Machine Learning, Methodology, Statistics

Source: http://arxiv.org/abs/1605.03884

3
3.0

Jun 30, 2018
06/18

by
Anshumali Shrivastava; Ping Li

texts

#
eye 3

#
favorite 0

#
comment 0

Minwise hashing (Minhash) is a widely popular indexing scheme in practice. Minhash is designed for estimating set resemblance and is known to be suboptimal in many applications where the desired measure is set overlap (i.e., inner product between binary vectors) or set containment. Minhash has inherent bias towards smaller sets, which adversely affects its performance in applications where such a penalization is not desirable. In this paper, we propose asymmetric minwise hashing (MH-ALSH), to...

Topics: Statistics, Computing Research Repository, Data Structures and Algorithms, Machine Learning,...

Source: http://arxiv.org/abs/1411.3787

5
5.0

Jun 28, 2018
06/18

by
Pushpendre Rastogi; Benjamin Van Durme

texts

#
eye 5

#
favorite 0

#
comment 0

The output scores of a neural network classifier are converted to probabilities via normalizing over the scores of all competing categories. Computing this partition function, $Z$, is then linear in the number of categories, which is problematic as real-world problem sets continue to grow in categorical types, such as in visual object recognition or discriminative language modeling. We propose three approaches for sublinear estimation of the partition function, based on approximate nearest...

Topics: Statistics, Computing Research Repository, Machine Learning, Learning

Source: http://arxiv.org/abs/1508.01596

6
6.0

Jun 28, 2018
06/18

by
Eunho Yang; Aurélie C. Lozano

texts

#
eye 6

#
favorite 0

#
comment 0

Gaussian Graphical Models (GGMs) are popular tools for studying network structures. However, many modern applications such as gene network discovery and social interactions analysis often involve high-dimensional noisy data with outliers or heavier tails than the Gaussian distribution. In this paper, we propose the Trimmed Graphical Lasso for robust estimation of sparse GGMs. Our method guards against outliers by an implicit trimming mechanism akin to the popular Least Trimmed Squares method...

Topics: Statistics, Machine Learning

Source: http://arxiv.org/abs/1510.08512

2
2.0

Jun 30, 2018
06/18

by
Lester Mackey; Jordan Bryan; Man Yue Mo

texts

#
eye 2

#
favorite 0

#
comment 0

We introduce a minorization-maximization approach to optimizing common measures of discovery significance in high energy physics. The approach alternates between solving a weighted binary classification problem and updating class weights in a simple, closed-form manner. Moreover, an argument based on convex duality shows that an improvement in weighted classification error on any round yields a commensurate improvement in discovery significance. We complement our derivation with experimental...

Topics: Machine Learning, Computing Research Repository, Statistics, Learning

Source: http://arxiv.org/abs/1409.2655

3
3.0

Jun 29, 2018
06/18

by
Konrad Zolna

texts

#
eye 3

#
favorite 0

#
comment 0

The method presented extends a given regression neural network to make its performance improve. The modification affects the learning procedure only, hence the extension may be easily omitted during evaluation without any change in prediction. It means that the modified model may be evaluated as quickly as the original one but tends to perform better. This improvement is possible because the modification gives better expressive power, provides better behaved gradients and works as a...

Topics: Machine Learning, Artificial Intelligence, Statistics, Learning, Neural and Evolutionary Computing,...

Source: http://arxiv.org/abs/1612.01589

2
2.0

Jun 28, 2018
06/18

by
Niko Brümmer

texts

#
eye 2

#
favorite 0

#
comment 0

The EM training algorithm of the classical i-vector extractor is often incorrectly described as a maximum-likelihood method. The i-vector model is however intractable: the likelihood itself and the hidden-variable posteriors needed for the EM algorithm cannot be computed in closed form. We show here that the classical i-vector extractor recipe is actually a mean-field variational Bayes (VB) recipe. This theoretical VB interpretation turns out to be of further use, because it also offers an...

Topics: Statistics, Learning, Machine Learning, Computing Research Repository

Source: http://arxiv.org/abs/1510.03203

6
6.0

Feb 23, 2021
02/21

by
Changelog Master Feed

audio

#
eye 6

#
favorite 0

#
comment 0

Production ML systems include more than just the model. In these complicated systems, how do you ensure quality over time, especially when you are constantly updating your infrastructure, data and models? Tania Allard joins us to discuss the ins and outs of testing ML systems. Among other things, she presents a simple formula that helps you score your progress towards a robust system and identify problem areas.

Topics: Podcast, changelog, open source, oss, software, development, developer, hackerchangelog, ai,...

4
4.0

Jun 29, 2018
06/18

by
Alexander Cloninger; Stefan Steinerberger

texts

#
eye 4

#
favorite 0

#
comment 0

Spectral embedding uses eigenfunctions of the discrete Laplacian on a weighted graph to obtain coordinates for an embedding of an abstract data set into Euclidean space. We propose a new pre-processing step of first using the eigenfunctions to simulate a low-frequency wave moving over the data and using both position as well as change in time of the wave to obtain a refined metric to which classical methods of dimensionality reduction can then applied. This is motivated by the behavior of...

Topics: Machine Learning, Statistics

Source: http://arxiv.org/abs/1607.04566

6
6.0

audio

#
eye 6

#
favorite 0

#
comment 0

01:00:00AM-06:00:00AM BST — As BBC Radio 5 live 26/09/2019 BBC Radio Guernsey joins BBC Radio 5 live.

Topics: Radio Program, Artificial intelligence, Learning, Cybernetics, Geodesy, Machine learning, Climate...

Topics: Radio Program, Cybernetics, Learning, Oxides, Mass media, Machine learning, Artificial...

3
3.0

Jun 30, 2018
06/18

by
Fredrik Lindsten; Adam M. Johansen; Christian A. Naesseth; Bonnie Kirkpatrick; Thomas B. Schön; John Aston; Alexandre Bouchard-Côté

texts

#
eye 3

#
favorite 0

#
comment 0

We propose a novel class of Sequential Monte Carlo (SMC) algorithms, appropriate for inference in probabilistic graphical models. This class of algorithms adopts a divide-and-conquer approach based upon an auxiliary tree-structured decomposition of the model of interest, turning the overall inferential task into a collection of recursively solved sub-problems. The proposed method is applicable to a broad class of probabilistic graphical models, including models with loops. Unlike a standard SMC...

Topics: Computation, Machine Learning, Statistics

Source: http://arxiv.org/abs/1406.4993

Topics: Radio Program, Learning, Artificial intelligence, Cybernetics, Gold, Legal professions, Email, Law...

4
4.0

Jun 30, 2018
06/18

by
Yu-Xiang Wang; Alex Smola; Ryan J. Tibshirani

texts

#
eye 4

#
favorite 0

#
comment 0

We study a novel spline-like basis, which we name the "falling factorial basis", bearing many similarities to the classic truncated power basis. The advantage of the falling factorial basis is that it enables rapid, linear-time computations in basis matrix multiplication and basis matrix inversion. The falling factorial functions are not actually splines, but are close enough to splines that they provably retain some of the favorable properties of the latter functions. We examine...

Topics: Machine Learning, Statistics

Source: http://arxiv.org/abs/1405.0558

3
3.0

Jun 29, 2018
06/18

by
Jonathan Bates

texts

#
eye 3

#
favorite 0

#
comment 0

Any closed, connected Riemannian manifold $M$ can be smoothly embedded by its Laplacian eigenfunction maps into $\mathbb{R}^m$ for some $m$. We call the smallest such $m$ the maximal embedding dimension of $M$. We show that the maximal embedding dimension of $M$ is bounded from above by a constant depending only on the dimension of $M$, a lower bound for injectivity radius, a lower bound for Ricci curvature, and a volume bound. We interpret this result for the case of surfaces isometrically...

Topics: Computer Vision and Pattern Recognition, Machine Learning, Mathematics, Differential Geometry,...

Source: http://arxiv.org/abs/1605.01643

2
2.0

Jun 30, 2018
06/18

by
Faicel Chamroukhi

texts

#
eye 2

#
favorite 0

#
comment 0

Regression mixture models are widely studied in statistics, machine learning and data analysis. Fitting regression mixtures is challenging and is usually performed by maximum likelihood by using the expectation-maximization (EM) algorithm. However, it is well-known that the initialization is crucial for EM. If the initialization is inappropriately performed, the EM algorithm may lead to unsatisfactory results. The EM algorithm also requires the number of clusters to be given a priori; the...

Topics: Machine Learning, Computing Research Repository, Statistics, Learning, Methodology

Source: http://arxiv.org/abs/1409.6981

2
2.0

Jun 28, 2018
06/18

by
Weici Hu; Peter I. Frazier

texts

#
eye 2

#
favorite 0

#
comment 0

We consider effort allocation in crowdsourcing, where we wish to assign labeling tasks to imperfect homogeneous crowd workers to maximize overall accuracy in a continuous-time Bayesian setting, subject to budget and time constraints. The Bayes-optimal policy for this problem is the solution to a partially observable Markov decision process, but the curse of dimensionality renders the computation infeasible. Based on the Lagrangian Relaxation technique in Adelman & Mersereau (2008), we...

Topics: Learning, Statistics, Machine Learning, Computing Research Repository, Artificial Intelligence

Source: http://arxiv.org/abs/1512.09204

2
2.0

Jun 29, 2018
06/18

by
Jianwen Xie; Pamela K. Douglas; Ying Nian Wu; Arthur L. Brody; Ariana E. Anderson

texts

#
eye 2

#
favorite 0

#
comment 0

Brain networks in fMRI are typically identified using spatial independent component analysis (ICA), yet mathematical constraints such as sparse coding and positivity both provide alternate biologically-plausible frameworks for generating brain networks. Non-negative Matrix Factorization (NMF) would suppress negative BOLD signal by enforcing positivity. Spatial sparse coding algorithms ($L1$ Regularized Learning and K-SVD) would impose local specialization and a discouragement of multitasking,...

Topics: Machine Learning, Neurons and Cognition, Statistics, Quantitative Biology, Learning, Computing...

Source: http://arxiv.org/abs/1607.00435

3
3.0

Jun 28, 2018
06/18

by
Hideaki Kim; Hiroshi Sawada

texts

#
eye 3

#
favorite 0

#
comment 0

The histogram method is a powerful non-parametric approach for estimating the probability density function of a continuous variable. But the construction of a histogram, compared to the parametric approaches, demands a large number of observations to capture the underlying density function. Thus it is not suitable for analyzing a sparse data set, a collection of units with a small size of data. In this paper, by employing the probabilistic topic model, we develop a novel Bayesian approach to...

Topics: Statistics, Machine Learning

Source: http://arxiv.org/abs/1512.07960

2
2.0

Jun 29, 2018
06/18

by
Pedro A. Ortega; Naftali Tishby

texts

#
eye 2

#
favorite 0

#
comment 0

There is a consensus that human and non-human subjects experience temporal distortions in many stages of their perceptual and decision-making systems. Similarly, intertemporal choice research has shown that decision-makers undervalue future outcomes relative to immediate ones. Here we combine techniques from information theory and artificial intelligence to show how both temporal distortions and intertemporal choice preferences can be explained as a consequence of the coding efficiency of...

Topics: Machine Learning, Artificial Intelligence, Neurons and Cognition, Statistics, Quantitative Biology,...

Source: http://arxiv.org/abs/1604.05129

7
7.0

audio

#
eye 7

#
favorite 0

#
comment 0

Topics: Radio Program, Russell Group, Incorporated cities and towns in California, Artificial intelligence,...

5
5.0

Jun 29, 2018
06/18

by
Alexander Cloninger

texts

#
eye 5

#
favorite 0

#
comment 0

We consider the problem of constructing diffusion operators high dimensional data $X$ to address counterfactual functions $F$, such as individualized treatment effectiveness. We propose and construct a new diffusion metric $K_F$ that captures both the local geometry of $X$ and the directions of variance of $F$. The resulting diffusion metric is then used to define a localized filtration of $F$ and answer counterfactual questions pointwise, particularly in situations such as drug trials where an...

Topics: Machine Learning, Statistics

Source: http://arxiv.org/abs/1610.10025

Topics: Radio Program, Artificial intelligence, Cybernetics, Formal sciences, Machine learning, Learning,...

7
7.0

Jun 30, 2018
06/18

by
Ishanu Chattopadhyay

texts

#
eye 7

#
favorite 0

#
comment 0

While correlation measures are used to discern statistical relationships between observed variables in almost all branches of data-driven scientific inquiry, what we are really interested in is the existence of causal dependence. Designing an efficient causality test, that may be carried out in the absence of restrictive pre-suppositions on the underlying dynamical structure of the data at hand, is non-trivial. Nevertheless, ability to computationally infer statistical prima facie evidence of...

Topics: Statistics, Mathematics, Computing Research Repository, Information Theory, Statistical Finance,...

Source: http://arxiv.org/abs/1406.6651

4
4.0

Jun 29, 2018
06/18

by
Tongliang Liu; Dacheng Tao; Dong Xu

texts

#
eye 4

#
favorite 0

#
comment 0

The $k$-dimensional coding schemes refer to a collection of methods that attempt to represent data using a set of representative $k$-dimensional vectors, and include non-negative matrix factorization, dictionary learning, sparse coding, $k$-means clustering and vector quantization as special cases. Previous generalization bounds for the reconstruction error of the $k$-dimensional coding schemes are mainly dimensionality independent. A major advantage of these bounds is that they can be used to...

Topics: Machine Learning, Learning, Computing Research Repository, Statistics

Source: http://arxiv.org/abs/1601.00238

3
3.0

Jun 30, 2018
06/18

by
Terrance DeVries; Graham W. Taylor

texts

#
eye 3

#
favorite 0

#
comment 0

Dataset augmentation, the practice of applying a wide array of domain-specific transformations to synthetically expand a training set, is a standard tool in supervised learning. While effective in tasks such as visual recognition, the set of transformations must be carefully designed, implemented, and tested for every new domain, limiting its re-use and generality. In this paper, we adopt a simpler, domain-agnostic approach to dataset augmentation. We start with existing data points and apply...

Topics: Learning, Machine Learning, Statistics, Computing Research Repository

Source: http://arxiv.org/abs/1702.05538

5
5.0

Jun 29, 2018
06/18

by
Jesse H. Krijthe; Marco Loog

texts

#
eye 5

#
favorite 0

#
comment 0

For the supervised least squares classifier, when the number of training objects is smaller than the dimensionality of the data, adding more data to the training set may first increase the error rate before decreasing it. This, possibly counterintuitive, phenomenon is known as peaking. In this work, we observe that a similar but more pronounced version of this phenomenon also occurs in the semi-supervised setting, where instead of labeled objects, unlabeled objects are added to the training...

Topics: Machine Learning, Learning, Computing Research Repository, Statistics

Source: http://arxiv.org/abs/1610.05160

3
3.0

Jun 30, 2018
06/18

by
Kaspar Märtens; Michalis K Titsias; Christopher Yau

texts

#
eye 3

#
favorite 0

#
comment 0

Bayesian inference for complex models is challenging due to the need to explore high-dimensional spaces and multimodality and standard Monte Carlo samplers can have difficulties effectively exploring the posterior. We introduce a general purpose rejection-free ensemble Markov Chain Monte Carlo (MCMC) technique to improve on existing poorly mixing samplers. This is achieved by combining parallel tempering and an auxiliary variable move to exchange information between the chains. We demonstrate...

Topics: Computation, Statistics, Machine Learning, Methodology

Source: http://arxiv.org/abs/1703.08520

10
10.0

Jun 29, 2018
06/18

by
Julien Mairal

texts

#
eye 10

#
favorite 0

#
comment 0

In this paper, we introduce a new image representation based on a multilayer kernel machine. Unlike traditional kernel methods where data representation is decoupled from the prediction task, we learn how to shape the kernel with supervision. We proceed by first proposing improvements of the recently-introduced convolutional kernel networks (CKNs) in the context of unsupervised learning; then, we derive backpropagation rules to take advantage of labeled training data. The resulting model is a...

Topics: Machine Learning, Learning, Computer Vision and Pattern Recognition, Computing Research Repository,...

Source: http://arxiv.org/abs/1605.06265

3
3.0

Jun 29, 2018
06/18

by
Ganzhao Yuan; Yin Yang; Zhenjie Zhang; Zhifeng Hao

texts

#
eye 3

#
favorite 0

#
comment 0

Differential privacy enables organizations to collect accurate aggregates over sensitive data with strong, rigorous guarantees on individuals' privacy. Previous work has found that under differential privacy, computing multiple correlated aggregates as a batch, using an appropriate \emph{strategy}, may yield higher accuracy than computing each of them independently. However, finding the best strategy that maximizes result accuracy is non-trivial, as it involves solving a complex constrained...

Topics: Machine Learning, Statistics, Databases, Computing Research Repository, Learning

Source: http://arxiv.org/abs/1602.04302

2
2.0

Jun 30, 2018
06/18

by
Philipp Geiger; Kun Zhang; Mingming Gong; Dominik Janzing; Bernhard Schölkopf

texts

#
eye 2

#
favorite 0

#
comment 0

A widely applied approach to causal inference from a non-experimental time series $X$, often referred to as "(linear) Granger causal analysis", is to regress present on past and interpret the regression matrix $\hat{B}$ causally. However, if there is an unmeasured time series $Z$ that influences $X$, then this approach can lead to wrong causal conclusions, i.e., distinct from those one would draw if one had additional information such as $Z$. In this paper we take a different...

Topics: Machine Learning, Statistics

Source: http://arxiv.org/abs/1411.3972

11
11

Jun 26, 2018
06/18

by
Emanuele Frandi; Ricardo Nanculef; Johan A. K. Suykens

texts

#
eye 11

#
favorite 0

#
comment 0

Frank-Wolfe algorithms have recently regained the attention of the Machine Learning community. Their solid theoretical properties and sparsity guarantees make them a suitable choice for a wide range of problems in this field. In addition, several variants of the basic procedure exist that improve its theoretical properties and practical performance. In this paper, we investigate the application of some of these techniques to Machine Learning, focusing in particular on a Parallel Tangent...

Topics: Mathematics, Optimization and Control, Statistics, Computing Research Repository, Learning, Machine...

Source: http://arxiv.org/abs/1502.01563

3
3.0

Jun 28, 2018
06/18

by
Andrew M. McDonald; Massimiliano Pontil; Dimitris Stamos

texts

#
eye 3

#
favorite 0

#
comment 0

We study a regularizer which is defined as a parameterized infimum of quadratics, and which we call the box-norm. We show that the k-support norm, a regularizer proposed by [Argyriou et al, 2012] for sparse vector prediction problems, belongs to this family, and the box-norm can be generated as a perturbation of the former. We derive an improved algorithm to compute the proximity operator of the squared box-norm, and we provide a method to compute the norm. We extend the norms to matrices,...

Topics: Learning, Statistics, Machine Learning, Computing Research Repository

Source: http://arxiv.org/abs/1512.08204

4
4.0

Jun 29, 2018
06/18

by
Camille Jandot; Patrice Simard; Max Chickering; David Grangier; Jina Suh

texts

#
eye 4

#
favorite 0

#
comment 0

In text classification, dictionaries can be used to define human-comprehensible features. We propose an improvement to dictionary features called smoothed dictionary features. These features recognize document contexts instead of n-grams. We describe a principled methodology to solicit dictionary features from a teacher, and present results showing that models built using these human-comprehensible features are competitive with models trained with Bag of Words features.

Topics: Machine Learning, Computation and Language, Computing Research Repository, Statistics

Source: http://arxiv.org/abs/1606.07545

2
2.0

Jun 29, 2018
06/18

by
Hafiz Tiomoko Ali; Romain Couillet

texts

#
eye 2

#
favorite 0

#
comment 0

In this article, we study spectral methods for community detection based on $ \alpha$-parametrized normalized modularity matrix hereafter called $ {\bf L}_\alpha $ in heterogeneous graph models. We show, in a regime where community detection is not asymptotically trivial, that $ {\bf L}_\alpha $ can be well approximated by a more tractable random matrix which falls in the family of spiked random matrices. The analysis of this equivalent spiked random matrix allows us to improve spectral methods...

Topics: Machine Learning, Statistics

Source: http://arxiv.org/abs/1611.01096

2
2.0

Jun 29, 2018
06/18

by
Yu-Xiang Wang; Jing Lei; Stephen E. Fienberg

texts

#
eye 2

#
favorite 0

#
comment 0

In adaptive data analysis, the user makes a sequence of queries on the data, where at each step the choice of query may depend on the results in previous steps. The releases are often randomized in order to reduce overfitting for such adaptively chosen queries. In this paper, we propose a minimax framework for adaptive data analysis. Assuming Gaussianity of queries, we establish the first sharp minimax lower bound on the squared error in the order of $O(\frac{\sqrt{k}\sigma^2}{n})$, where $k$...

Topics: Machine Learning, Learning, Computing Research Repository, Statistics

Source: http://arxiv.org/abs/1602.04287

5
5.0

Jun 29, 2018
06/18

by
Brijnesh J. Jain

texts

#
eye 5

#
favorite 0

#
comment 0

Condorcet's Jury Theorem has been invoked for ensemble classifiers to indicate that the combination of many classifiers can have better predictive performance than a single classifier. Such a theoretical underpinning is unknown for consensus clustering. This article extends Condorcet's Jury Theorem to the mean partition approach under the additional assumptions that a unique ground-truth partition exists and sample partitions are drawn from a sufficiently small ball containing the ground-truth....

Topics: Machine Learning, Learning, Computing Research Repository, Statistics

Source: http://arxiv.org/abs/1604.07711

2
2.0

Jun 29, 2018
06/18

by
Jonathan Scarlett; Volkan Cevher

texts

#
eye 2

#
favorite 0

#
comment 0

In this paper, we study the information-theoretic limits of community detection in the symmetric two-community stochastic block model, with intra-community and inter-community edge probabilities $\frac{a}{n}$ and $\frac{b}{n}$ respectively. We consider the sparse setting, in which $a$ and $b$ do not scale with $n$, and provide upper and lower bounds on the proportion of community labels recovered on average. We provide a numerical example for which the bounds are near-matching for moderate...

Topics: Machine Learning, Mathematics, Information Theory, Statistics, Computing Research Repository,...

Source: http://arxiv.org/abs/1602.00877

4
4.0

Jun 29, 2018
06/18

by
Hervé Bredin

texts

#
eye 4

#
favorite 0

#
comment 0

TristouNet is a neural network architecture based on Long Short-Term Memory recurrent networks, meant to project speech sequences into a fixed-dimensional euclidean space. Thanks to the triplet loss paradigm used for training, the resulting sequence embeddings can be compared directly with the euclidean distance, for speaker comparison purposes. Experiments on short (between 500ms and 5s) speech turn comparison and speaker change detection show that TristouNet brings significant improvements...

Topics: Machine Learning, Statistics, Sound, Computing Research Repository

Source: http://arxiv.org/abs/1609.04301

10
10.0

Jun 26, 2018
06/18

by
Milad Kharratzadeh; Mark Coates

texts

#
eye 10

#
favorite 0

#
comment 0

We consider the problem of multivariate regression in a setting where the relevant predictors could be shared among different responses. We propose an algorithm which decomposes the coefficient matrix into the product of a long matrix and a wide matrix, with an elastic net penalty on the former and an $\ell_1$ penalty on the latter. The first matrix linearly transforms the predictors to a set of latent factors, and the second one regresses the responses on these factors. Our algorithm...

Topics: Machine Learning, Statistics

Source: http://arxiv.org/abs/1502.07334

2
2.0

Jun 30, 2018
06/18

by
Marie Schrynemackers; Louis Wehenkel; M. Madan Babu; Pierre Geurts

texts

#
eye 2

#
favorite 0

#
comment 0

Networks are ubiquitous in biology and computational approaches have been largely investigated for their inference. In particular, supervised machine learning methods can be used to complete a partially known network by integrating various measurements. Two main supervised frameworks have been proposed: the local approach, which trains a separate model for each network node, and the global approach, which trains a single model over pairs of nodes. Here, we systematically investigate,...

Topics: Machine Learning, Computing Research Repository, Statistics, Learning

Source: http://arxiv.org/abs/1404.6074

7
7.0

Jun 29, 2018
06/18

by
Dipayan Maiti; Mohammad Raihanul Islam; Scotland Leman; Naren Ramakrishnan

texts

#
eye 7

#
favorite 0

#
comment 0

Storytelling algorithms aim to 'connect the dots' between disparate documents by linking starting and ending documents through a series of intermediate documents. Existing storytelling algorithms are based on notions of coherence and connectivity, and thus the primary way by which users can steer the story construction is via design of suitable similarity functions. We present an alternative approach to storytelling wherein the user can interactively and iteratively provide 'must use'...

Topics: Machine Learning, Statistics, Artificial Intelligence, Computing Research Repository, Learning

Source: http://arxiv.org/abs/1602.06566

3
3.0

Jun 30, 2018
06/18

by
Sherjil Ozair; Yoshua Bengio

texts

#
eye 3

#
favorite 0

#
comment 0

For discrete data, the likelihood $P(x)$ can be rewritten exactly and parametrized into $P(X = x) = P(X = x | H = f(x)) P(H = f(x))$ if $P(X | H)$ has enough capacity to put no probability mass on any $x'$ for which $f(x')\neq f(x)$, where $f(\cdot)$ is a deterministic discrete function. The log of the first factor gives rise to the log-likelihood reconstruction error of an autoencoder with $f(\cdot)$ as the encoder and $P(X|H)$ as the (probabilistic) decoder. The log of the second term can be...

Topics: Machine Learning, Neural and Evolutionary Computing, Computing Research Repository, Statistics,...

Source: http://arxiv.org/abs/1410.0630

9
9.0

Jun 28, 2018
06/18

by
Olivier Francois

texts

#
eye 9

#
favorite 0

#
comment 0

The principle of peer review is central to the evaluation of research, by ensuring that only high-quality items are funded or published. But peer review has also received criticism, as the selection of reviewers may introduce biases in the system. In 2014, the organizers of the ``Neural Information Processing Systems\rq\rq{} conference conducted an experiment in which $10\%$ of submitted manuscripts (166 items) went through the review process twice. Arbitrariness was measured as the conditional...

Topics: Statistics, Digital Libraries, Computing Research Repository, Other Statistics, Machine Learning

Source: http://arxiv.org/abs/1507.06411