[D] ICML 2026: Policy A vs Policy B impact on scores discussion

Reddit r/MachineLearning / 3/25/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The author asks whether ICML 2026 review outcomes differed between two reviewer LLM-usage policies: a stricter policy banning LLM use (Policy A) versus a permissive policy allowing limited LLM assistance (Policy B).
Based on a small, anecdotal sample and online/community discussions, the author suspects Policy A papers received harsher average scores and that review tone may be less polished or more skeptical.
The proposed mechanism is that LLM-assisted reviewing could produce more lenient tone, inject broader background knowledge, generate cleaner text, and potentially increase the likelihood of giving the benefit of the doubt.
Because the paper-selection and reporting are self-selected and noisy, the author is collecting community data via an anonymous poll to form a rough snapshot rather than treating it as definitive evidence.
The author notes one professor’s view that ICML might normalize or z-score scores across policy groups, but seeks confirmation by comparing observed score distributions and review styles.

I am curious whether others observed the same thing.

At ICML 2026, papers could be reviewed under two LLM-review policies: a stricter one where reviewers were not supposed to use LLMs, and a more permissive one where limited LLM assistance was allowed. I chose Policy A for my paper.

My impression, based on a small sample from:

our batch,
comments I have seen on Reddit and X,
and discussions with professors / ACs around me,

is that Policy A papers ended up with harsher scores on average than Policy B papers.

Of course, this is anecdotal and I am not claiming this as a proven fact. But honestly, it is frustrating if true: I spent nearly a week doing every review as carefully as I could, only to feel that papers under the stricter policy may have been judged more harshly than papers reviewed under the more permissive policy.

My take is that this outcome would not even be that surprising. In practice, LLM-assisted reviewing may lead to:

more lenient tone,
broader background knowledge being injected into reviews,
cleaner and more polished reviewer text,
and possibly a higher tendency to give the benefit of the doubt.

In my local sample, among about 15 Policy A papers we know of (reviewed or from peers), our score is apparently one of the highest. But when I compare that to what people report online, it feels much closer to average (ofcourse people that tend to post their scores have normally average and above scores). That is what made me wonder whether the score distributions may differ by policy.

One professor believes that ICML will normalize or z-score scores across groups, but I do not want to assume it.

So I wanted to ask:

Did you notice any difference in scores or review style between Policy A and Policy B papers? It would be helpful if you comment with the scores for your paper and your batch:

which policy your paper used,
your score vector,
the reviewed papers' scores
and whether the reviews felt unusually harsh / lenient / polished.

I know this will not be a clean sample, but even a rough community snapshot would be interesting.

I made an anonymous informal poll to get a rough snapshot of scores by ICML 2026 review policy:
https://docs.google.com/forms/d/e/1FAIpQLSdQilhiCx_dGLgx0tMVJ1NDX1URdJoUGIscFoPCpe6qE2Ph8w/viewform?usp=publish-editor

Please do not include identifying details.

Obviously this will be noisy and self-selected, so I am not treating it as evidence, only as a rough community snapshot.

If enough responses come in, I may summarize the aggregate patterns back on Reddit without sharing raw identifying text responses.

submitted by /u/Available_Net_6429
[link] [comments]

Santa Augmentcode Intent Ep.6

Dev.to

Your Agent Hired Another Agent. The Output Was Garbage. The Money's Gone.

Dev.to

ClawRouter vs TeamoRouter: one requires a crypto wallet, one doesn't

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Palantir’s billionaire CEO says only two kinds of people will succeed in the AI era: trade workers — ‘or you’re neurodivergent’

Reddit r/artificial

[D] ICML 2026: Policy A vs Policy B impact on scores discussion

Key Points

Related Articles

Santa Augmentcode Intent Ep.6

Your Agent Hired Another Agent. The Output Was Garbage. The Money's Gone.

ClawRouter vs TeamoRouter: one requires a crypto wallet, one doesn't

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Palantir’s billionaire CEO says only two kinds of people will succeed in the AI era: trade workers — ‘or you’re neurodivergent’

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer