Question about PLS-DA hyperparameter tuning [R]

Reddit r/MachineLearning / 5/6/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical Usage

Key Points

  • The post asks why, in a sparse PLS-DA model in R (tuned with two latent components using centroids.dist), the observed performance metrics worsen compared with expectations that adding components should reduce error rates.
  • The author describes first fitting a non-sparse “global” PLS-DA model to choose starting settings (including selecting two components) before applying sparsity and running the final model.
  • After fitting the sparse final model, the user reports that error/performance looks less favorable than anticipated, creating confusion about the relationship between component count, feature selection, and classification error.
  • The author seeks guidance on what might be causing the counterintuitive results, noting they believe the selected features should best separate two conditions.
  • Overall, the content is a troubleshooting question focused on interpreting and debugging hyperparameter tuning and performance assessment for sparse PLS-DA in a bioinformatics setting.
Question about PLS-DA hyperparameter tuning [R]

Hi all! I am a bioinformatician and I am working on learning some ML tools for some disease/biomarker stuff. I am working with sparse PLS-DA at the moment. Before actually tuning the model, I run on overall global model (without sparsity) to get an idea of what my data looks like and to get to a starting point. Here is what that global model ends up looking like:

global model

So from this, I'm seeing that I should include 2 latent components in my model tuning and I chose to use the centroids.dist. So I tune the model with two components, it gives me the # of features to keep on each component and then I run the final model. However, when I do performance assessment on the final model, it looks like this:

final model (sparse)

I guess I am a little confused. From what I am reading online, and from my own data, error rates should go down with added components. It also doesn't make a ton of sense to me because I should have only picked the features that best distinguish two conditions, so again, I should be seeing error rates decrease.

Can someone please help me understand what I'm seeing here and what could be causing this? I am still learning how all of this works, so amy sort of guidance is appreciated. Thank you!

submitted by /u/dacherrr
[link] [comments]

Question about PLS-DA hyperparameter tuning [R] | AI Navigate