VIRTUAL CONFERENCES
Mathematical Methods of Modern Statistics 2
La mise en place des conférences virtuelles se fait un 2 temps :
1 - enregistrement et mise en ligne de la présentation par l'orateur
2 - réunion virtuelle afin de discuter ensemble. Vous pouvez soumettre des questions selon des thèmes pré-définis par le.s organisateur.s .
Pour ce faire, sélectionner votre nom dans la liste puis sélectionner le thème et saisir la question.
The setting up of virtual conferences is done in two stages:
1 - recording and online publishing of the presentation by the speaker
2 - virtual meeting in order to discuss together. You can submit questions according to themes pre-defined by the organizer(s). To do so, select your name in the list then select the theme and enter the question.
Liste des questions déjà postées / List of questions already submitted
- A general messages board: Here you can post things of general interest to everyone at the conference
- Abramovich: High-dimensional classification by sparse logistic regression
- Thank you for your talk. You assume that your data follow the logistic model. Suppose that this condition is not satisfied, but I use penalized logistic regression. It leads to a misspecified model. Could you obtain analogous results to yours in that case? How can we bound the approximation risk(i.e. how close is my data to the logistic model) in the misspecified case? (Wojciech Rejchel)
- Thanks Felix for your talk. I have a naive question about VC-dimension. In the L^0 case, the error bounds rely on VC-dimension. In the L^1 case (LASSO/SLOPE), does the proof partially rely on the bounds obtained in the L^0 case or is it possible to obtain estimates of «L1»-VC-dimension ? (Fabien Panloup)
- Bardet: Consistent model selection criteria and goodness-of-fit test for common time series models
- Dear Jean Marc,
thank you for your talk. My question is related to the one of Frederic. Should not the choice of K in Q_hat be crucial when analysing the gof test? Does it appear in the asymptotic under H0 in some way? In practice under H1? Could you give some recommendation in practice (why K=10 on SP)? (Olivier Wintenberger)
- Bellec: De-biasing arbitrary convex regularizers and asymptotic normality
- Dear Pierre, thank you for your talk! Your results giving asymptotic normality and confidence intervals derived from penalized estimators are very nice.
My first question concerns the case where p/n tends to c>1. In this case, you stress that the penalty term should be strongly convex (there is also the same assumption in the theorem of Zhang et al. in 2019). In high dimension (p/n>1), do you think that your results still hold when the penalty term is the L1 norm or the SLOPE norm which are not strongly convex? (Patrick Tardivel)
- Dear Pierre,
Thank you for your talk, I think the general direction of your research is very exciting. Could you expand a little bit on the relation of second-order Poincare inequalities to the condition of \frac{\mathbb{E} \Vert \nabla f(z_0) \Vert_F^2 }{\mathbb{E}\Vert f(z_0) \Vert^2} being small? The second-order Poincare inequalities that I know of include fourth derivatives on the RHS of the bound, which I don't see here. Could you sketch the idea of the proof of this fact?
I was also wondering if you have considered the problem of joint normality of several estimators, or maybe even a countable number of estimators (given that the model grows asymptotically to infinity). (Szymon Jan Majewski)
- Thank you for the talk. Could tell something about possible extensions of your paper? I mean, what about going beyond normality of predictors and/or noise variables? Considering logistic model (or other GLMs or nonlinear models) instead of linear regresion? (Wojciech Rejchel)
- Gneiting: Isotonic Distributional Regression (IDR): Leveraging Monotonicity, Uniquely So !
- deep thinking.
Could one replace this with an optimisation together with bagging? (Jonas Wallin)
- General question about PIT.
Have you considered transforming to N(0,1) instead easier to make it easier interpret tail behaviour? (Jonas Wallin)
- I viewed and discussed your passionate talk
with Dr M.Truchan, medical scientist and doctor at CHU Angers/Saumur.
We have a common question:
Could the IDR be used to study probabilistic viral morbidity and mortality
forecasts from a leading numerical long-term weather prediction model?
Eg. for influenza virus? (Piotr Graczyk)
- IDR is optimal under just any proper scoring
rule that depends on quantile or binary probability assessments only.
Could you elaborate on this further what does it mean?
Does Dawid-Sebastiani score satisfy this? (Jonas Wallin)
- Isotonic Distributional Regression (IDR) . . . in Pictures
Why is the cutoff (almost) always at the same place? (Jonas Wallin)
- Mixture (Choquet) Representations of the CRPS
Is there any intuition of the mixture representation? (Jonas Wallin)
- Synthetic Example: Subset Aggregation
The full sample looks biased?
While the subagging looks unbiased is this the case? (Jonas Wallin)
- Heller: Optimal control of false discovery criteria in the general two-group model
- How would your method relate with DESeq of Simon Anders ? (Carine Legrand)
- Thank you for the great talk! Did you compare your procedure with the procedure simply averaging the multivariate local FDR? * Also, could your analysis be extended to the case of HMM multiple testing setting (hidden states = true null/false null). (Etienne Roquain)
- Ishi: On Cholesky structures on real symmetric matrices and their applications
- 1. Can we introduce "dual" algebras of Cholesky algebras? 2. What is a key idea for finding $A$ in Theorem 4? (Hideto Nakashima)
- 1. why is the Choleski structure useful in the estimation of the parameter?
2. It seems to me that the application is parameter estimation rather than model selection in the class of decomposable graphical models. Again, would the Choleski structure help in model selection in the class of decomposable graphs, that is when we have data but when the graph is unknown?
(Helene Massam)
- 1.Is it easy to compute in practice the function \Gamma_Z(s),
for any RCOP colored
decomposable graph?
(Piotr Graczyk)
- 2. Consider the case of the complete graph G.
Does the results from your talk improve the results
known before in this case? (Piotr Graczyk)
- Can you give explicit formulas for gamma integrals without invariant measure $\varphi_{\mathcal{Z}}$? (Bartosz Kołodziejek)
- Thank you for your talk! I have a question about the difference between dim span(...) and dim Z1. Which information can be obtained from its value, when it is larger than zero?
(Tomasz Skalski)
- Thank you for your talk. I have a question on Theorem 6 of page 18. Do you have distribution formulae on the elements of $ T _y $ when $ x $ is the identity matrix. (Yoshihiko KONNO). (Yoshihiko Konno)
- Janson: Floodgate: Inference for Model-Free Variable Importance
- For sample splitting, I am wondering if balanced splits produce the best results (n/2,n/2)
or if unbalanced splits perform better empirically (say n/10,9n/10) and which task should use the largest part of the data.
(Pierre Bellec)
- Thanks Lucas for the talk!
In a Gaussian linear model with Gaussian noise and p/n=delta<1 (for simplicity),
how powerful is a test based on the lower confidence bound
L_n^\alpha to test H_0:beta^*_j=0 against |beta^*_j|>Cn^{-1/2} with \hat\mu being Ridge or Least-Squares? Say, how does the detection boundary compares empirically versus testing using least-squares and classical F-test?
(Pierre Bellec)
- Josse: Treatment effect estimation with missing attributes
- Particularly for MNAR, are there any consistency results for the AIPW? With MIA for instance? Unlike the classical unconfoundedness assumption, it seems that the missing mechanism can break the latent confounding assumption. How do you intend to extend the MDC to this context ? (Mikael Escobar-Bach)
- Thank you for the talk! I might have missed something: most of the talk is about estimating $\tau$, but confidence intervals are built on slides 19-20. Could you explain how they have been obtained? (Etienne Roquain)
- The AIPW estimator features this nice property of double robustness. Could you give the intuition behind it ? (Mikael Escobar-Bach)
- Konno: Shrinkage estimation of mean for complex multivariate normal distribution...
- 1. Are there practical problems where complex normal distributions appear?
2. Is it possible to generalize the results to a situation where the space of concentration matrices have an invariant structure?
3. What is an intuitive explanation of the Shrinkage estimation?
4. We have many Baranchik-like estimator with a smooth function r satisfying some conditions. JSL estimator is a
special case of Baranchik-like estimator with constant r.
Are there any nice Barabchik-like estimator with non-constant r? (Hideyuki Ishi)
- 1.The mean is unknown. Wouldn't considering NON-CENTRAL Wishart
matrix S be more natural? (Piotr Graczyk)
- 2. Do the Baranchik estimators \hat\theta_r converge to the MLE
estimator \hat\theta_0 when r tends to 0, or is it an accidental
coincidence of notation? (Piotr Graczyk)
- Can this result be generalized to the space Herm(n,H) of quaternion Hermitian matrices? (Hideto Nakashima)
- Thank you for your talk! Could you please tell us about some examples, in which the Baranchik-like estimators will be good to use? (Tomasz Skalski)
- Letac: Quasi logistic distributions and Gaussian scale mixing
- 1. Is there any interesting phenomenon when we consider $a=\pi/2$ in Theorem 1 (the case of the Mellin transform being almost the Riemann zeta function)? 2. Does its weight $(-1)^{n-1} \sin(na) / na$ of the Mellin transform have any special meaning in quasi logistic distributions (like moments)? (Hideto Nakashima)
- Do you know why the anonymous physicist was interested in existence of $\mu_{a,b}$? Was he satisfied with your answer? (Bartosz Kołodziejek)
- Thank you for the inspiring talk! Could you please tell something more about the generalized square roots of a matrix (apart of the one obtained with Cholesky decomposition)? (Tomasz Skalski)
- You are saying that the law of Kolmogorov Smirnov is the law of the sup of the absolute value of the Brownian bridge and you say that you have some idea on the probabilistic interpretation in terms of Brownian bridge for the quasi Kolomogorov Smirnov laws. Could you tell us some more? (Helene Massam)
- Miasojedow: Structure learning for CTBN's
- For large dimension have you though of anyway to reduce the model so one gets fewer parameters? (Jonas Wallin)
- For the examples you had what is the run time?
How large data-sets are feasible? (Jonas Wallin)
- Slide 10-11, Previous approaches had local minimum what formulation of Q did they use,
how does it differ from yours? (Jonas Wallin)
- slide 11. This problem look ideal for strong rules have you considered this for improving computational time? (Jonas Wallin)
- slide 18, could you give a intuition how the cone differs from the regular Hessian? (Jonas Wallin)
- Slide 22: is that possible to consider other variations of the LASSO (e.g., SLOPE) that would control the FDR below some pre-specified level, say 0.1? (Etienne Roquain)
- Neuvial: Post hoc bounds on false positives using reference families
- Dear Pierre, thank you for your nice talk. This topic is really interesting. On slide 4, I spent few minutes to understand the main expression. Do you mean for all subsets S of R instead of for all subsets S of {1,...,m}? (Patrick Tardivel)
- Picard: How to estimate a density on a spider web?
- Dear Dominique, thank you for the talk! I guess that, in the case of your kernel or wavelet estimators, as those are not always positive, it may be advantageous for small sample sizes to `project' them onto the set of densities? (Ismaël Castillo)
- On the upper-bound rates: apart from the dimension 'd', is it possible to see influences of the `geometry' in terms of the constants appearing in front of the rates? For instance, is it known how those depend on the constants coming from the Ahlfors condition? (Ismaël Castillo)
- Suppose I am not the spider, or that I have not attended an exhibition on spider webs (so that I do not know those very well). Is there a way I could propose an estimator of the density without the precise knowledge of M? Related to this, if one is given a graph as in slide 7, are there known graph structures on which conditions C1-C5 are satisfied for the geodesic distance? (Ismaël Castillo)
- Thank you for your talk! Do you have some guesses, what could be a hint for answering the question about the variety of dimensions in the real spider's web, for example about its behaviour between two regions of different dimensions? (Tomasz Skalski)
- Thank you for your talk. I would like to ask a naive question: can we apply your work to the study of the World Wide Web (=Internet)? (Hideyuki Ishi)
- Ramdas: Universal inference using the split likelihood ratio test
- Usually, sample splitting come with a power loss. Did you measure the impact of the split on the power (say, in a toy setting)? (Etienne Roquain)
- Rockova: Bayesian Spatial Adaptation
- Thank you for your talk. Do you think that the results may be extended to cases of spatial regressions, inhomogeneities are usual in such models. Namely, the variables of interest are observations of spatial processes, for instance, the index i is bivariate (or multivariate) in the model you consider on page 4? (Sophie Dabo)
- Rockova: High dim / non parametric Bayesian
- Roquain: Sparse multiple testing: can one estimate the null distribution?
- Rosset: Optimal and Maximin Procedures for Multiple Testing Problems
- Thank you for your talk. I am wondering how to choose L in your optimal procedure. (Etienne Roquain)
- Sabatti: Knockoff genotypes: value in counterfeit
- * If I understand correctly slide #31, there is a relation between a low correlation between X_2 and \tide{X}_2, and the power. Could you give an intuition for this?
* More generally, linkage disequilibrium (LD) may display complex patterns, that do not fit the Markov assumption. How robust is the proposed approach to such departures? (for statistical the audience, we may recall that LD just means correlation)
* One purpose of knock-off is to guaranty the replicability of the results across studies. Does it also mean across populations? What if the LD structure varies from one population to another or the LD structure is due to the existence of sub-populations?
* The group knock-off approach is very nice and states the genetic association studies on large data set in a sensible way. Building the blocks is certainly both delicate and critical. Could you comment on this? Is the clustering algorithm required to be consistent with ordering of the loci along the genome?
(Stéphane Robin)
- For constructing HMM knockoffs, a valid but less powerful option would be to resample \tilde{X} conditional on Z without resampling Z. Empirically, how much power is lost with this version? And, is there any qualitative difference in terms of which types of discoveries have their power most affected - for example would the power loss be particularly high/low for discoveries of genetic variants that are highly correlated with ethnicity? (Rina Barber)
- For handling the issues arising from population structure - if it were the case that the individuals in the sample could be clustered by population, would it be valid to construct cluster-specific knockoffs (i.e. using a HMM fitted to that cluster) and then pool the data across all clusters to define the statistics / run the selection procedure? Or would that still have the same issues of long range dependence? (Rina Barber)
- Salmon: The smoothed multivariate square-root Lasso: an optimization lens on concomitant estimation
- Dear Joseph, thank you for very nice presentation.
You talk on the smoothed multivariate square-root LASSO with your application to brain imaging is really nice.
As you explained, square-root LASSO is closely related to concomitant LASSO and I understand there is an issue in the optimization problem when sigma is small which motivates your intuitive smoothing approach. With your smoothing approach, do you keep sparsity property of LASSO-type estimators? (Patrick Tardivel)
- Thank you for the nice talk. I have two questions concerning Proposition at the end of your presentation. You stress the fact that lambda is proportional to the expression, which does not depend on unknown variance sigma. However, is the proportionality constant known/useful in practice? If not, then it does not help much in choosing lambda. Could you comment it? Besides, probability in Proposition is close to zero, if time 'T' is much larger than 'n'. Is it an artifact of proof methods or experiments with 'T (Wojciech Rejchel)
- Samworth: High-dimensional, multiscale online changepoint detection
- Could you please outline the main proof ingredients specific to the sequential nature of the problem (e.g., martingale arguments)? Thank you! (Sébastien Gerchinovitz)
- Dear Richard, On Slide 22, second plot: Could you please explain the intuition behind the elbow effect? (Sébastien Gerchinovitz)
- Do you know if the theoretical bounds are tight? (e.g., are there lower bounds that match the response delay upper bounds from Slide 21, for all algorithms with patience of at least \gamma?) Thank you. (Sébastien Gerchinovitz)
- Siegmund: Change: Detection, Estimation, Segmentation
- In the simplest case, process Y_t seems to be an Ornstein-Uhlenbeck possibly with some irregular drift. It seems that in this case also Z_t is closely related to the O-U process, which is not smooth. Thus the smoothness of the covariance Sigma(s,t) is not that evident. Does the smoothness needed for the Rice formula require some filtering of the score process?
(Krzysztof Podgorski)
- The formula (5) in the fourth slide is approximating the distribution of the maximum by the averaged number
of crossing of the level b. The normal density phi(b) typically appears there because for a stationary smooth gaussian
process its derivative at a given time t is independent of the value at this point. However, if Z_t is not stationary
it would not be true anymore. Does it mean that Z_t is stationary, or rather that this independence is approximately valid for the specific likelihood process?
(Krzysztof Podgorski)
- Su: Gaussian Differential Privacy
- Wager: High dimensional statistics
- By modifying the noise, we could probably try to learn the second derivative which can be of interest for the rate of convergence of the gradient descent. Do you think that such a direction could be reasonable ? (Fabien Panloup)
- Dear Stefan, thanks for your very interesting talk. I do not see which assumption guarantees the uniqueness of the equilibrium. Could you come back on this point ?
(Fabien Panloup)
- In standard armed-bandit theory, I remember that the $\frac{1}{\sqrt{T}$-order is usually related to pessimistic bounds, $i.e.$ related to the worse regret ? Here, the ``$\frac{1}{\sqrt{T}}$'' comes from interference of ``pessimism'' ? (Fabien Panloup)
- Wallin: Scaling of scoring rules
- Could you provide some intuition behind the form of SCRPS?
And more general what kind of analytical property of function h or pair (h,g) from theorem about proper scoring rules guarantees locally scale invariant scores? (Blazej Miasojedow)
- Did you consider the possible application of SCRPS to estimate parameters when the likelihood is intractable? (Blazej Miasojedow)
- Wang: The Price of Competition:Effect Size Heterogeneity Matters in High Dimensions!
- Yekutieli: Hierarchical Bayes Modeling for Large-Scale Inference
- Thank you for the wonderful talk. Slide 18 compares with the adjusted MLE, which scales the original MLE by a specific constant \alpha to obtain asymptotic unbiasedness. If we care about the MSE, then for any linear transform c_1 \hat{\beta} + c_2 of the MLE, the asymptotic MSE can be quantified (Sur and Candes, '19). Then c_1, c_2 may be picked to minimize this MSE. Have you compared with this estimator? A similar thing can also be done for regularized logistic regression estimators for any convex penalty. How would your approach compare with these estimators? (Pragya Sur)