Master thesis zurich

finally, we discuss smote as an interesting but not fully satisfy method for dealing with highly unbalanced data. different approaches, and their pitfalls, to estimate the involved parameters via the maximum likelihood method are discussed. for classification, we use prefiltering with the mann-whitney u test, followed by modern high-dimensional classification methods. a method to solve this problem is to use the lasso (least absolute shrinkage and selection operator) to estimate the regression's coefficients. in this master thesis, an attempt was made to prove consistency of the pc-algorithm applied to categorical data in low- and high-dimensional settings. the aim of this thesis was (i) to analyze the genetic variation of the timing of budset of norway spruce (picea abies l) within and among 15 populations covering the natural range of the species and (ii) to relate the variation among population for timing of budset with the variation observed at both neutral and candidate genes. in this thesis, we have tested whether model selection  techniques can be used to improve the performance of existing statistical methods to detect differential gene splicing in rna-seq data sets. als erstes wird durch eine lasso-ähnlichemethode, die zusätzliche gewichte im bestrafungsterm benutzt, die korrelation zumteil beträchtlich verringert. chair and individual laboratory at eth offers semester projects and master theses: check their websites and noticeboards for a list of topics. can consult any authorised supervisor of master’s theses in d-usys/environmental sciences (see “who can supervise my master’s thesis? in our second measure we consider the reliability of each method for not yielding a node as a parent which is not. in this thesis groundwater temperature is used as an indicator for groundwater quality. the optimal choice of the method depends on the objective and on the specific parameters combination. finally, when the companies belonging to the s&p 100 were clustered, the method pam applied to the corresponding dissimilarity matrix estimated with the hoeffding’s d gave the best solution compared to the clustering meth­ods agnes, diana and dsc, which agreed with the results from the simulation studies and the reviewed theory. i have tried to make this thesis as self contained and as comprehensive as possible, while keeping to the essentials. must have met all the conditions for being admitted to the master’s degree programme (any additional requirements and/or restricted choice of subjects) (this applies to students not taking an eth bachelor). prediction intervals methods are based on the wald's test and derived separately for observed and unobserved groups. in this thesis, we use the proved properties and propose a semiparametric algorithm for optimal bandwidth selection. this model can be fitted either by a method of moments approach or a maximum likelihood approach. the new method was tested on an snp phenotype association dataset, which also allowed for investigation of different approaches to bias correction. thesis first discusses the statistical challenges that artise when estimating the cost efficiency of mc plans. score-based method for inferrig structural equation models with additive noisep. weiter sollen auch bestehende schätzmethoden mit clest verglichen werden. as our method is based on local properties of the distribution, it extends without constraints to high-dimensional settings. this thesis studies blind deconvolution from theoretical and practical point of view.

Master thesis eth zurich

first of all, i  have introduced the basic notions of stochastic theory and a special and  unusual limit theorem that i will use throughout the thesis. a discussion of the advantages and disadvantagesof semi-supervised label propagation and its applicability concludes this thesis. an introduction into the problem of spike sorting and the data encountered in such settings, we review two different modern spike sorting frameworks, one being binary pursuit (pillow, shlens, chichilnisky, and simoncelli, 2013) and the other one relying on a method called continuous basis pursuit (ekanadham, tranchina, and simoncelli, 2014). second, thethree newly generated ground truth datasets were used to learn the semanticsegmentation of aerial image by using fully convolutional networks (fcns),which have been introduced recently for accurate pixel-dense semantic seg-mentation tasks. since it can be a difficult task to discover structure, especially in high dimensional setting, we combine the ida algorithm with stability selection, a subsampling method to select the most stable causal effects. the new methods were successful, and have been implemented as an r package available on github. finally, we evaluate the performance of these methods in a small simulation study. by using the classical mallows' $c_{p}$ criterion in an example, we discuss the importance of using robust methods for model selection. master's thesis considers local polynomial matching which is a popular methodin econometrics for estimating counterfactual outcomes and average treatment effects. must have at least requested the issuing of your bachelor’s diploma (this applies only to students taking an eth bachelor). for this variable selection process we use the approach of maathuis, kalisch and bühlmann which uses graph estimation techniques in combination with a causal method called back door adjustment. second, the chain ladder method is not able to deal with diagonal effects (i. method of dictionary learning was introduced by olshausen and field (1997) as a model for images based on the primary visual cortex. thesis investigates the performances of various estimators in density estimation and mode estimation for bounded data. our method is theoretically sound, even for nonlinear or non-additive ar(p) models and computationally efficient, requiring on average 0. then followed by a review of goodness-of-fit methods of copulas including tests based on empirical copulas, rosenblatt transform, kendall transform and hering-hofert transform. mcmc methods are given for efficient sampling from the posterior of this model. regressionsmethoden lasso, relaxed lasso und boosting werden benutzt, umsowohl simulierte wie natürliche hochdimensionale daten vorherzusagen und zu klassieren. in order to assess how accurate the output of an estimation method is, we would like to be able to compare causal structures in terms of their causal inference statements. it was shown that this method outperforms the conventional methods in simulations. methods were first tested on simulated data before having been applied to the dow jones industrial average. two estimation methods, the minimum contrast and estimating functions, are introduced and it is shown that, under certain assumptions, the estimators obtained are consistent and asymptotically normal. this thesis we investigate several classifiers to discriminate between autistic and typically developing subjects based on resting state fmri data. for this purpose a number of statistical learning methods are employed and models fit to publicly available data. this thesis, the demand for personalized mobility by swiss households has been studied using vehicle stock parameters, geographic and socio-economic characteristics.

Master thesis ibm zurich

it is shown that the local polynomial based method has no significant boundary bias in the considered examples. the method is based on the pc-algorithm combined with a bayesian style mcmc search. however, if these estimates are used for purposes such as inference or denoising their performance is comparable with the one of exact methods. for both importance measures we explore whether the importance of clusters of highly correlated variables can be identified correctly. in simulationsstudien werden die methoden an verschiedenen multivariat normalverteilten modellen untersucht. in general we observe that low-variance (higher bias) methods perform better on this sample size. diese neuen varianten des pc-algorithmus werden mittels roc-plots und weiteren graphischen vergleichsmethoden mit der standardvariante verglichen. this master thesis we develop prediction algorithms which optimize a performance measure over a specified set of wavelet packet trees and smoothing parameters. on the other hand it must be mentioned that the computational effort, compared to classical methods, is tremendous and a serious drawback. thesis studies the theoretical and empirical behaviour of the bias and the variance  of the estimator. the first goal of the thesis is to demonstrate the above mentioned problem and present some alternative techniques, like the instrumental variables technique and a new identification method, that can be useful in estimation of causal effects (chapter 2)., opportunities for a master's thesis are displayed on this site, please see open master's thesis positions. this master thesis, a few other graphical model fitting algorithms are compared to the logilasso. on longitudinal data from spinal cord injured patients participating in the european multicenter study about spinal cord injury, the focus of this thesis lies on the assessments of lower limb performance. for this reason many statistical methods have been developed trying to obtain the most relevant information from these data. important for the analysis was, that some patients had ppi administred together with clopidogrel but had no prescription before. with new data at hand, the traffic engineering department of eth zurich was interested in finding out which factors determine the severity of a car accident. the main concern is to give a general methodology for estimating the unknown parameters from a discrete set of observations of the stock price. all the practical methodologies discussed in the paper are coded in the r programming language, and are contained in appendix e. master thesis deals with the most important challenges facing practitioners in portfolio and risk management. additionally, a relatively new method of diversification optimization is implemented and compared against return maximization, subject to a cvar constraint. the author conducts a large simulation experiment to investigate the effect of the dimension on the level and power of goodness-of-fit tests for various combinations of null hypothesis copulas and alternative copulas. in the second part of this thesis, we consider mode estimation and several methods are examined for bounded data. focus of this thesis lies on the comparison of the lasso method, the elastic net method and ridge regression with the lava method in theory and application and can be split into five main parts. goal of this thesis is to give the reader an introduction to cross-over trials.

Master thesis eth zurich

an advantage of this method in comparison to a correlation test is that mutual information measures also non-linear dependency. at each observation time it is recorded whether an event happened or not and one is interested in estimating the distribution function of the time to such an event, also called failure. in some studies the delta method has been used whereas in others nonparametric bootstrapping was favored. in this thesis post-processing of runoff forecasts from summer 2007 to the end of 2009 for the river alp in switzerland is done by applying bayesian model averaging (bma). this problem is being addressed in the netherlands, where research with a hospital has been developing logistic  regression models that may help train nurses to diagnose skin cancer, and that are accessible via a mobile application. the author finds an algorithm to choose appropriate methods under various conditions. work presents a method that uses publicly available remote sensingdata to generate large and diverse new ground truth datasets, which can beused to train neural networks for the pixel-wise, semantic segmentation ofaerial images. the last part of this thesis we apply the time series version of the fci algorithm to a dataset from molecular biology, with the goal to infer causal relations among the factors of interest and thereby getting a better understanding of how the transcription process of genes works. the two methods using nlme() do not differ much in their estimates. our main achievements of this thesis are step-by-step derivations of the confidence intervals of edwards-hsu’s mcb method in the balanced and unbalanced one-way anova model as well as a successful implementation into r. main purpose of this thesis was to implement edwards-hsu’s mcb method into r, which is not part of the r package multcomp. starting the thesis, it must be registered online via mystudies. study the work of sen and banerjee (2007), focusing on their method based on apseudo-likelihood-ratio statistic to obtain point-wise confidence intervals for null hypotheses on the distribution function of the survival time in a mixed-case interval censoring model. then we focus on examining methods for dealing with data from causal systems when there are no hidden variables (pc), as opposed to those when hidden variables are present (fci, fci+). our research is based on a microsimulation of the city of zurich implemented in vissim. for this newly developed methodology, we provide implementations and an empirical comparison to state-of-the-art methods on synthetic as well as real-world high-dimensional data sets. programme regulations address the subject of the master’s thesis in article 38. nonetheless, comparing the two different approaches for small numbers of crimes we get very similar results. the clinical score of coma patients - a comparison of model selection methods. this thesis, we perform the statistical analysis in order to figure out the characteristics of wind turbulence. main objective of this thesis is to develop a markov chain monte carlo (mcmc) method under the bayesian inference framework for estimating meta-t copula functions for modeling financial market risks. this thesis, risk models are evaluated in a joint project with the swissquant groupag and a major swiss bank. zweiten teil dieser arbeit wird die korrelation der durch die drei regressionsmethodenausgewählten modellvariablen betrachtet und zu verringern versucht. the goals of this master thesis were to compare the performance (measured as level and power) of widely used partial mantel tests using state-of-the-art simulation techniques and to describe new implementations with improved performance. your tutor will also be able to help you making the right choice for a semester project or master thesis.

Phd thesis eth zurich

\concerning the cf study, the best performing classification method was principal component analysis followed by linear discriminant analysis (pca-lda) with an auc value of 0. thesis is a 6-months full-time workload including an oral presentation and a written report (the master thesis). the performance of all the methods is carried out for all three types of parameters: the fixed effects, variance-covariance components and the within-group standard deviation.-parametric methods for estimation of hawkes process for high-frequency financial data. although many research papers present extensions to the classical chain ladder method, none has addressed the issue of using constrained optimization with lasso-type estimators. however, under some fairly general assumptions, ida (intervention calculus when the dag is absent) is a methodology that can deduce information on causal effects from observational data. the results agree with previous studies and confirm that the method proposed by freedman & lane has the best overall performance. in your thesis within the agreed date (6 month), including the signed declaration of originality. this thesis we explore the problem of learning the structure in an undirected gaussian graphical model. aim of this study is to compare the accuracy of commonly used roc curve estimation methods. my primary goal is to find a set of methods (classifiers) which classify with high accuracies for o ser and death. thesis studies the performance of random projection - one of the relatively new dimensionality reduction techniques - when applied to the area of clustering, classification and regression, through reproducing or testing the results in three papers by boutsidis and zouzias (2010), paul and boutsidis (2013) and kaban (2014), each from one of the three domains. in order to understand how this algorithm works, a short introduction to causality and some methods of dealing with causal data are examined.-parametric models, such as regression trees, are often used as a primary estimation method in prediction problems. in this thesis we study a special case of a restricted structural equation model (sem). we present results for the three methods that show us that the rate of decay of the prediction error depends on the curvature of the loss function over the space of the predictor's choices. this thesis we study the quality of inference performed by of twostate-of-the-art robust regression procedures. we present a method that efficiently finds a maximum likelihood estimator for the causal structure among all trees. attention of statisticians and computer scientists for variational methods has increased considerably in the last few decades. standard methods as the cutoff method and logistic regression may have biased efficacy estimates. the primary goal of this thesis is to investigate relation between the genome represented by snps and the behavioural characteristics (such as risk aversion) of an individual, using supervised learning techniques. five months of intense work, i am proud to submit my master’s thesis. we also investigate the robustness of our method with a simulation study where we look at violations of assumptions. dabei zeigt sich, dass die missregr methode die besten resultate erzielt. in contrast to conventional methods like lasso or ridge regression, this method is able to discover signals, which are neither sparse nor dense.

Master's thesis – Department of Environmental Systems Science

furthermore, in order to see whether a class satisfies ulln or uclt, it is convenient to use vapnik chervonenkis index, vc index. the thesis is aimed at enhancing the student's capability to work independently towards the solution of a theoretical or applied engineering or techno-economic problem. the effectiveness of the methods is validated through a simulation study based on real world datasets. there are several methods for multiple comparisons, in particular the multiple comparisons with the best (mcb), which is our main focus for this thesis. this master's thesis some boundary corrections were combined with the smooth distribution estimates for current status censored data. within this thesis we describe the structure of booking data within the airline industry that needs to be forecasted and discuss the current bayesian forecasting model implemented by swiss revenue management. furthermore, we summarize the theory of selective inference and of the methods used to construct confidence intervals. we have a sequence of observations and we would like to detect whether their generating distribution has remained stable or has undergone some abrupt change. we will propose four different estimation methods for θ∗ and study their asymptotic properties under appropriate distributional assumptions as well as model assumptions on the concentration matrix θ∗. among the considered methods, mode estimation based on local polynomial shows to have superior performance and it does not seem to have considerable boundary bias problem. this extension of clsr performs similarly to the competing methods on a variety of real data sets. we perform an empirical comparison of predictive performance between clsr and other widely used methods for high-dimensional least squares estimation, such as ridge regression, principal component and the lasso. thesis applies a bayesian hierarchical model as developed by buser et al. however, this method has several disadvantages, an important one being the inability to handle overlapping spikes. we prove consistency and the asymptotic limit distribution of the naive estimators and present a method to draw point-wise confidence intervals for these sub-distribution functions based on the pseudo-likelihood-ratio statistic introduced by sen and banerjee (2007). three methods used a closed-form analytical solution of a  system of ordinary differential equations (odes), the fourth method used the system of  odes directly. provide assessments of computational complexity of a new approach, which grows polynomially with the size of network and exponentially with the size of maximal neigh- borhood, which is the main limitation of the method. choosing methods such that the results may best help answer the respective research questions, non-parametric two-sample testing, canonical correlation analysis, principal component analysis, latent class factor analysis, as well as linear mixed effects models and random forest were relied upon. in this thesis, we will show that the method can also be derived from an information theoretical point of view. we evaluate the performance of the algorithm in a simulation study and compare it to the performance of two other existing methods: the pc algorithm and greedy equivalence search. in this paper we will introduce and compare several methods that can support the decision of whether to offer a product to a customer or not. for the vaccine we considered, to get a power of 80% the probabilistic method needs 3 to 12 times more individuals as in the cutoff method. furthermore, an artificial outlier study is conducted to assess the outlier detection power of the four considered regularization methods. the development and improvement of anomaly detection methods is therefore of everincreasing importance. the filtering was well handled by the different methods, the estimation of the parameters was much more diffcult.

Dissertations, reports / Resources / Home - Wissensportal ETH

drei solche methoden werden im technical report von fridlyand und dudoit (citeyear{clest}) vorgestellt. finally, partially based on the theoretical results derived earlier in the thesis, the question is addressed in a simulation study in chapter 4. the title of the thesis suggests, this thesis belongs to the domain of robust statistics. kapitel 2: microarray prädiktoren werden verschiedene methoden vorgestellt, welche wir später verwenden, um microarrays zu klassifizieren. it has the lowest confi- dence interval mp among all methods and its coverage rate is closest to the nominal rate (α). addition, other specialists – normally lecturers on the environmental sciences study programme – can be authorised to supervise a master’s thesis.  however,robust statistics is not restricted to the use of robust estimationmethods alone. for both methods the theoretical foundations are presented and the algorithms are analysed and described in detail. the next goal and the main theme of the thesis is to answer a question: "how restrictive is it if we restrict causal inference to adjustment methods? methods to adjust for confounders in observational data were applied, and new graph theory developments in combination with probability theory were evaluated. study the work of durot and tocquet (2001), whom proposed a new test of the hypothesis h0 : ”f = f0” versus the composite alternative hn : ”f ! mystudies), who is going to provide the second assessment, must receive the completed written thesis for assessment by that date (any corrections made after the deadline may not be taken into account in the assessment). about the submission date, you must hand in the title page of your thesis at the study administration office, either in hard copy delivered in person or sent by post, or as a pdf/scan sent by e-mail. best performing methods for both biomarker detection and classification are different for the two studies. in this thesis, we take a look at some market models and studies of market efficiency. in addition, the functioning of the regularization methods on the temperature anomaly data is studied for some examples to compare the methods and to understand the distributions of the residuals. the experiments concentrate on a comparison of the complete identification algorithm and the covariate adjustment method in terms of proportions of identifiable causal effects."we try to answer this question in the last chapter of this thesis (chapter 5). in einem ersten teil der arbeit kommen die klassischen selektions-methoden schrittweise-vorwärts, schrittweise-rückwärts und "all subsets" zum zuge. the second chapter of the thesis is dedicated to explaining the dataset constructed from these variables and their relationship with the response variables. this thesis, we investigate random projection ensemble methods for multiclass classification based on the combination of arbitrary base classifiers operating on appropriately chosen low-dimensional random projections of the feature space. firstly, up to now, there exists no proof for the con- sistency of the method of moments estimator. in the first part of this thesis, we consider applications of this method and parzen's method in density estimation of some bounded univariate and bivariate data examples. the problem is, that the covariate adjustment method is not complete, in the sense that it may not identify a causal effect even if it is identifiable by some other methods. in our first measure we consider the probability of each of the considered methods for finding exactly all the parents of a randomly chosen target variable.

Master's theses – Seminar for Statistics | ETH Zurich

in the last part of the thesis, these residuals are used to detect outliers in the temperature anomaly data. thesis deals with bayesian inference of a mixture of gaussian distributions. the present work investigated whether the logistic regression models could be improved or outperformed. first, we consider two methods for estimating the parameters in bap models. chain ladder method is by far the most popular method for predicting non-life claims reserves in the insurance industry. in the last years two methods have been refined in order to study the gene regulation mechanism: microarray and chip on chip experiments. in the chapter on model selection, we introduce an important method, the classical mallows' $c_{p}$ criterion. this thesis will focus on how to detect the presence of such sparse information with the aid of a method called higher criticism. thesis investigates the estimation of parameters for discretely observed stochastic volatility models. the methods are compared bytesting for variable importance in linear models for a  variety of test setups, including real datasets. patients with both drugs prescribed have a higher risk for such an event, but whether this is due to individual risk factors or a reduced effect of clopidogrel is an open question. for simulated data sets we often achieve a better prediction accuracy with lasso-type estimators compared to the chain ladder method, especially in situations where chain ladder model assumptions are not fulfilled. comparisons with the best methods and their implementations in r. the test statistic is based on the l1-distance between the isotonic estimator ˆ fn of f and the given function f0, since a centered and normalized version of this distance, is asymptotically standard normal distributed under the null hypothesis h0, provided that the given function f0 satisfies some regularity conditions. thesis gives a general approach to deriving screening rules for convex optimization problems. after a theoretical study of the methods, we run simulations to assess their quality. conceptually, this method weighs each data point by its inverse probability of treatment weight (iptw), creating data of an unconfounded pseudo-population. this thesis we compare different approaches to sparse principal component analysis (sparse pca) and then extend our investigation to sparse canonical correlation analysis (sparse cca). the methods are demonstrated on a real data set, and large simulations on synthetic and semi-synthetic data sets are carried out. also explore assumptions based on the properties of an estimator, such as homogeneous type i and type ii discrepancies, or the underlying logistic model as a function from an estimator output and the output of the method. it can therefore be concluded that the goals set for this thesis have been achieved by providing substantial insight in theoretical and applied aspects of statistical models relating to forecasting of futures closing prices. in this thesis, we focus on the problem of how to derive the sign of the causal effects. the most prominent robust tests for the composite hypothesis are the robust wald test and the $ au$-test. the last part of the thesis runs experiment to examine the necessity of these assumptions upon random matrices and finds that each of them could be loosened without breaking the bounds. thesis investigates the estimation of the volatility of an asset  return process.

Master Thesis – Department of Materials | ETH Zurich

period allowed for a master’s thesis is set at a maximum of 28 weeks (6 months plus 2 weeks). erstens möchten sie durch die resampling methode clest die clusteranzahl schätzen. the three methods were nlme() from the package of the same name,  nlmer() from package lme4a and nlmer() from package lme4. damit wurden methoden nötig, welche mit so umfangreichen datensätzen umgehene können - und gleichzeitig möglichst wenige fehler machen. cite{tibshirani96} introduced a method called lasso, which deals precisely with this problem and sets some variables exactly to zero. one results from the proof of the identifiability of bap models and is implemented in this thesis. in this thesis we consider a more general subclass of linear structural equation models for structure learning, where correlation of the errors is allowed unless the corresponding random variables are in a direct causal relation. non-parametric tests are perform the whether the dependency exists between the increments of wind velocities and block mean velocities. to further improve the probability predictions, two probability calibration methods are respectively applied to each of the above mentioned best classifiers, and in the majority of the twelve examined cases, the probability calibrations make some levels of improvements. firstly, all considered regularization methods are described for a multiple linear regression setting and are brought into relation in the orthonormal design case. to compare in advance the performance of the different clustering methods and the validation statistics under different circumstances re­garding the overlapping level of the clusters, two simulation studies were carried out. have to register the start of your master’s thesis in mystudies under > functions > theses (register new thesis). methods were tested on synthetic data generated from the hawkes modelwith different kernels and different parameters. in this thesis, the observational data are assumed to be generated from an unknown directed acyclic graph (dag). this thesis is mainly based on their work as well as the work of cai et al. goal of the thesis was to investigate, understand and implement the so called particle markov chain monte carlo (pmcmc) algorithms introduced by andrieu, doucet, and holenstein (2010) and to compare them to classical mcmc algorithms. study guide (pdf, 367 kb) addresses the subject of the master’s thesis in chapter 6. master’s degree programme culminates in
 the master’s thesis (6 months full time). wainwright (2006) proved that parameters estimates obtained with this method are asymptotically normal but don’t converge toward the true parameter. monte carlo methods for a dynamical model of stock prices. investigate possibilities offered by subsampling to etimate the distribution of the lasso estimator and construct confidence intervals/hypothesis tests. proposed a method to discover the causal structure from observational data in linear non-gaussian acyclic models, abbreviated by lingam (see shimizu et al. da man mit einer grossen anzahl von genen zu tun hat, werden regressionsmethoden angewendet, die für hochdimensionale probleme geeignet sind, und variablen selektieren können. it embeds a variety of risk- and optimization methodologies into a common framework and performs an empirical backtest on a typical sector rotation strategy in the us market. we therefore simulated several datasets and applied miscellaneous methods to find the most appropriate method.

in light of these findings, we question the practice of using the traditional method of return maximization, as the cost of ignoring estimation error in the optimization seems to be significant. finally, we also test for a logistic regression model toinvestigate if testing in generalized linear models is reliable with state of the art methods. these two issues were addressed by combining bühlmann (2012)’s method of constructing p-values based on ridge estimation with an additional bias correction term, and meinshausen (2008)’s proposal of a hierarchical testing procedure that controls fwer (family wise error rate) at all levels. i present a structured way to prepare this challenging dataset for statistical learning methods. approval of your supervisor and the study administration you may start your thesis. object of study of the present thesis are the daily closing prices of the futures contract for the base-13. for confidence interval methods, a general finding is that most of the intervals are too small. after describing the classical methods, we introduce robust estimation, namely the smdm estimator. ziel dieser arbeit ist es, mittels einer linearen regression und verschiedener variablen-selektions-methoden möglichst allgemeingültige modelle zu entwickeln, welche die glukose-konzentration in abhängigkeit der impedanzsignale und anderer einflussfaktoren vorhersagen. the aim of this master's thesis is to investigate general properties of random ferns and compare them to random forests. we test them on two studies done recently in the zenobi research group at eth zürich on chronic obstructive pulmonary disease (copd) and cystic fibrosis (cf). we also illustrate the developed methodology on a real-life example of stock returns. in this thesis, some variants of stability selection are introduced. this thesis embeds a variety of risk and optimization methods into a common framework and performs an empirical backtest on a typical sector rotation strategy in the us market. thesis presents the theory and main ideas behind some of the nowadays most popular methods used for causal structure learning as well as the icp algorithm, a new algorithm based on a method recently developed at eth zurich. of different confidence interval method for linear mixed effect models. we then look at a strategy, using our result, and try to answer whether we found market inefficiency or not. some methods were able to approximately reach the desired levelin the corresponding tests for most tested scenarios while othersproduced estimates that were only useful in specific high samplesettings. weitere methoden, die die clusteranzahl schätzen, werden im kapitel 3 vorgestellt. this study showed that the transformation of the data, either from de­pendence measures or distances to (dis)similarities, has an impact in the performance of the clustering methods. primary focus of this thesis was to understand and implement the fci+ algorithm as described in “learning sparse causal models is not np-hard” claassen, mooij, and heskes (2013a). one problem using this probabilistic approach is that it is not clear how big a trial would need to be in order to have comparable power to that of the cutoff method. furthermore, in the same section, the random walk hypothesis could not be confirmed concerning independent and incidental alteration in prices for the financial contracts in the case of the base-13. do i have to submit the thesis and in what form? this master thesis focuses on the log returns and volatilities of very long financial time series.

study characterizes the behaviour of four different methods to estimate a non-linear  mixed-effects model in r . matrices and the statistical methods used to analyse them are of growing importance in science because of the increasing number of systems that are represented by networks. beide methoden führen zu sehr ähnlichen und vernünftigen modellen mit einem r2 von 0.-supervised learning methods for problems having positive and unlabeled examples. thesis is concluded by a brief qualitative analysis of the limits of deconvolutionwith regard to image restoration. the chapter shows, that we cannot lose a possibility to identify a causal effect by covariate adjustment by a conversion from a dag to the corresponding latent projection and provides a criterion that characterizes, when a given causal effect is identifiable at all (by any method), but not by covariate adjustment in an admg g. thesis describes the development of robust algorithms for the analysis of geostatistical data. secondly, the ida methodology is extended to cases where one seeks information about the causal effect of joint outside interventions on a system. first, we study sparse pca methods, where regularization techniques are included in classical pca to obtain sparse loadings. we argue that these four methodologies, which have often been considered separately in the empirical literature, are paramount for the factor model to achieve a better forecasting performance than lower dimensional models. our work consistsin the implementation of these two methods in r. this master's thesis two approaches of causal inference for time series data are studied. um die genauigkeit zu verbessern, schlagen sie zwei bagging methoden für clusteralgorithmen vor. ideas to relax certain model restrictionsand to expand their applicability are outlined, together with a suggested measureof unlabeled node importance (miun statistic). dissertations from eth zurich are available in print and electronic form (full text or summary version). the first considered method is the widelyused in non-parametric statistics em algorithm, adjusted to the case of ahawkes process. robust statistics aims for methods that are based on weaker assumptionsand thus allow small deviations from the classical model. purpose of this thesis is to study a set of companies from the s&p 100 and determine whether share closing prices that move together correspond to companies belonging to the same economic sector. graph was constructed based on knowledge to derive theoretically, whether the effect was identifiable. hospitalization due to cardiac event and death were used as the clinical endpoints to assess, whether proton pump inhibitor prescription was associated with a higher risk of rehospitalization.. our contribution to this study includes: the definition of other score methods that take into account the quantitative data given by the scmd dataset, in opposition to the dichotomization applied to these data made by mcgary et all. starting with a brief overview of the topic, a simulation study is carried out that is intended to shed light on the mode of action of such a model and to highlight some potential flaws of the method. subject of this thesis is to model and predict the outcome of professional football matches played in the premier leagues around the globe. the main focus of this thesis lies on confidence intervals adjusted for selective inference in the high-dimensional case. master’s thesis can also be carried out and supervised outside d-usys: please contact the administration office.

Master thesis eth zurich

in almost all studies, the choice of a certain method was not justified or discussed, nor, when bootstrap was retained, was the choice of a particular bootstrap strategy of type warranted. this master thesis we study the isotonic regression model for one or more covariates. this master-thesis will initially refresh the basic ideas related to the lasso (least absolute shrinkage and selection operator) introduced by tibshirani in 1996 and the basics of convex optimization.-dimensional inference: presenting the major inference methods, introducing the unbalanced multi sample splitting method and comparing all in an empirical study. data using statistical methods means to break reality down toa mathematical framework, a model. that is, one only obtains the information whether failure occurred between two successive observation time-points or not. 6 different confidence interval methods from pacakges lme4, nlme, lmertest and boot are studied and compared in our study. if vaccine type is not well defined standard methods to estimate vaccine efficacy could produce large biased estimates which can result in a rejection of the vaccine. zur approximativen bestimmung des maximum-likelihood-schätzers wird der em-algorithmus gepaart mit monte-carlo- beziehungsweise markov-ketten-monte-carlo-methoden verwendet.-based variational methods for parameter estimation, inference and denoising on markov random fields. should demonstrate in their master’s thesis that they are capable of working independently in a scientifically structured way. it combines a method for model selection and subsampling to deliver a form of error control. i investigated how the size of the sample and the overlapping of point clusters influences performance ofdifferent estimation methods. in the last part of this study, factor-augmented models, with and without the above mentioned methodologies, are implemented using the stock and watson (2006) dataset of macroeconomic and financial predictors to forecast the time series of monthly returns of the standard and poor 500 index., lee and marcotte have related a gene network of the baker yeast, called the yeastnet, with a morphological traits variation dataset, the scmd, and have defined a method which assigns scores to each gene of the network in order to predict their activity. this is a hypothesis test to determine whether we have a very small fraction of non-null hypotheses amongst many null hypotheses or if this fraction is indeed zero. new algorithm (logilasso) to learn network structures from data has been introduced in “penalized likelihood and bayesian methods for sparse contingency tables with an application to full-length cdna libraries” (dahinden, parmigiani, emerick and buehlmann, 2007). this thesis we also talk about survival analysis: this branch of the statistic studies the failure times of an individual (or of a group of individuals) to conclude if for example a new treatment is effective, or if a certain group of individuals has more survival probability than another. we analyze the performance of this method compared with the well-known method of estimating the correlation graph by defining a threshold for the mutual information regarding to roc-curves. the proposed method is able to detect various connections between proximal and distal variables in consideration of the provided sample weights. some reports from eth zurich are available as an electronic version with free access via eth e-collection (external link). the present work extends the ida methodology in two ways. if approved, the student then elaborates the thesis plan together with their thesis supervisor. from eth zurich can also be searched in the eth e-collection (external link), the institutional repository for eth zurich. by means of appropriate methods and procedures, the most important of numerous variables are selected and five different models developed to describe the base-13.

a comparison is made on variable selection and model fitting methods by cross-validation. significance of the estimates is assessed using permutation tests and a method called stability selection presented by meinshausen and bühlmann. it is obtained by first determining the bit precision when using the benchmark method dssib. this thesis studies in particular the unit simplex, the unit box and polytopes as domain. the widely accepted method for boundary bias correction in regression and density estimation is automatic boundary correction [1]. the comparison on uniformly sampled admgs shows a big advantage of the former method. to verify this, different clustering methods were applied to a dis­similarity matrix corresponding to the degree of dependence between the companies. in order to answer this first set of questions, parametric and non parametric methods were used and then compared in terms of misclassification errors and variable ranking. this work, recently published methods for hypothesis testing inhigh-dimensional statistics are studied. profile method is not better than lmertest, and bootstrap-type methods perform worst. the database ntis (external link) and order the report via the link "artikel bestellen" from eth-bibliothek. next, we use these two fitting methods in a greedy search algorithm for structure learning, which repeatedly fits and scores bap models and chooses the model with the highest score. professor in d-usys is entitled to supervise a master’s thesis, including associated professors in d-usys and privatdozenten (senior lecturers) in d-usys. the recent past, the development of statistical methods for high-dimensional problems has greatly advanced leading to methods for model selection such as the lasso. the proximal methods face profitably and in a diversified manner these optimization problems and become of considerable interest from the computational point of view because the proposed algorithms have good convergence rates. in my thesis i first give a selective overview of the high-dimensional inference methods, which have been developed to assign p-values and confidence intervals in linear models, including a graphical survey of every presented inference method. therefore, the aim of this master thesis in statistics is to analyze the performance of the newly developed genereader instrument, which carries out the sequencing substep of the workflow, with statistical learning techniques. successful completion of the thesis, students will be awarded 30 credit points ects. in this work some empirical evidence confirming these properties for an ising model on a grid graph was produced and general definitions and results about graphical models and variational methods were resumed. the chapter describes implementation issues, methodology and several different experiments. all methods are implemented in r and we give experimental results for synthetic data and one set of real high-dimensional data. master’s thesis is written in the subject area of the specialisation you have chosen. thesis aims to quantify the subject level uncertainties of the classification between subjects with and without autism spectrum disorder using a type of brain image data, namely, the resting state functional magnetic resonance imaging data. nonetheless, this investigation demonstrates the potential lying in the systematic application of statistical analysis to asses and guarantee high quality and stability in qiagen’s development and production processes that is currently largely untapped. this thesis we want to estimate the cost reduction effects by managed care plans that were introduced in swiss health insurance in 1996.

biomarker detection, we use the mann-whitney u test, as well as subsampling with either the mann-whitney u test or the elastic net regression as selection method. student discusses the subject of the thesis with their tutor. these methods are particularly intended for high-dimensional data sets where the dimension of the variables is comparable to or even greater than the number of available training data samples. zusammenhang mit microarray-experimenten werden laufend neue methoden der cluster-analyse entwickelt. if the expected shortfall and the value-at-risk at higher confidence levels are considered, however, the sophisticated methods improve the risk estimates. aim of this thesis is to explore the possibility of estimating coma patients' clinical awareness score by objective clinical measurements, in order to substitute the rather subjective doctors' examination which is expensive and time consuming. in addition to the ols approach, robust methods using mm-estimators have been incorporated to obtain improved model fits and estimation results. such an approach leads to the simple but very effective improvement of the covariate adjustment method, that can significantly increase the proportion of identifiable causal effects. there are many methods and models to set the predictedclaims reserves. as alternative treatments for ppi are available, patients should not take these two drugs together. widely used approach is a so-called clustering method consisting of a thresholding step to detect the occurrence of a spike, a feature-reduction step (e. to improve the speed of our new robust score test, we develop methods to estimate the scale parameter $sigma$ from the reduced modal. the overview is split in two parts: methods for detecting single predictor variables and methods for detecting groups of predictor variables. then different constructing methods of confidence curves are given for cases without nuisance parameters and cases with nuisance parameters, respectively. for this purpose i tested the estimation methods on the high-frequency data of price changes of e-mini s&p 500 and brent crude futures contracts. study is a simulation analysis of different confidence intervals methods of fixed effect parameter in linear mixed-effects models. this thesis we pursue the goal of high-dimensional covariance matrix estimation for data with abrupt structural changes. um einigen schwierigkeiten der statistischen schätzung umweltnaturwissenschaftlicher messgrössen zu begegnen, wird eine kombination verschiedener statistischer methoden eingesetzt, insbesondere der geostatistik und der robusten statistik. goal of this thesis is to provide and test a method for causal inference from binary data. various sequential monte carlo methods were applied to the problem at hand. the results from the study indicate that several methods have great potential. in order to evaluate the performance of these methods, a sensible way to simulate datasets is indispensable. our results indicate that switching to second-line treatment is beneficial, and slightly more so than an analysis with classical methods would imply. in this thesis we are going to investigate these issues by focusing on four different models: the distribution free chain ladder model, the cumulative log normal model, a bornhuetter-ferguson model and generalized linear models. aim of this master thesis is to give an overview about ssl and study two different methods, transductive support vector machine and anchor graph regularization.
, eth-bibliothek holds a collection of more than 2 million reports on microfiche. most important insight from this thesis is that a good design choice is is always a multifactorial trade-off between subject recruiting, study duration and design complexity. in dieser arbeit untersuchen wir mit hilfe von methoden der extremwerttheorie den zeitlichen verlauf von starkniederschlägen für 104 messstationen in der schweiz. other components of the master’s degree programme should be completed before the beginning of the master’s thesis. thesis is about modeling growth and mortality of picea abies. insbesonder untersuchen wir zwei versicherungsspezifische methoden: das erweiterungsverfahren, welches eine verallgemeinerung der maximum likelihood methode ist, und die join operation, welche als eine erste "grobe" anpassung einer portfolioschadenhöhenverteilung an ein individuelles risiko interpretiert werden kann. this thesis is also a continuation of the ch2011 initiative which aims to provide scientifically-grounded information on a changing climate in switzerland to aid decision-making and planning with regard to climate change strategies., the method presented here allows to generate huge and in partic-ular diverse ground truth datasets that enable neural networks to generalizetheir predictions to geographic regions that have not been used for training. the method is easy to use, has a well-understood theory and can be combined with other statistical techniques for efficient estimation of a given causal effect. two of the three methods proposed to construct bayesian confidence intervals based on ridge regression perform well only in some set-ups. potential advantages of semi supervised learning compared to more traditional learning methods like supervised and unsupervised learning has attracted many researchers in the recent past. work consists of modifying motif regressor, an already existing method to analyze data of microarray experiments, and using this new algorithm to search the transcription factor dna-binding motifs of hif1-alpha, a protein involved in gene regulation under hypoxia. however, the proposed method has difficulties determining causal directions as these are not robust under resampling. in this setting, we study two group lasso methods:the structured group lasso and the weighted group lasso. we show that many nonparametric mode estimation methods have boundary bias if the true global mode is located in boundary region. it is therefore the aim of this thesis to identify causal associations between proximal and distal factors that may drive the hiv epidemic. you start your thesis, organize an authorized supervisor (if not sure ask the study administration) and discuss with him/her the procedure. the goal count estimations of the regression models are translated into the same categorical results as were modelled by the classification models for comparison between all methods. we use this expression to show how the lower bound of a pwea minimax regret can give us some information about the form of the experts class being used, in particular, whether it is a vc class or not., i introduce a new inference method in the high-dimensional setting, called unbalanced multi sample splitting, which is a modification of the multi sample splitting method of meinshausen, meier, and bühlmann (2009). the mfd of zurich: identifying and evaluating strategies for an efficient placement of detectors. this thesis we study methods to infer causal relationships from observational data. in this work the relaxation was made by considering all combinations of locally consistent marginal distributions and the objective function was approximated with a convex combination of bethe entropy approximations based on the spanning trees of the underlying graph. secondly, for the application on the temperature data the lava method and a corresponding cross-validation approach had to be implemented with r. directly in the eth e-collection includes the following additional options:Browsing by subject field, author, organisational unit, document type and year. Sitemap