Skip to content

feat: add validation for reserved MPI environment variables#3145

Open
VishalPainjane wants to merge 1 commit intokubeflow:masterfrom
VishalPainjane:feat/mpi-env-validation
Open

feat: add validation for reserved MPI environment variables#3145
VishalPainjane wants to merge 1 commit intokubeflow:masterfrom
VishalPainjane:feat/mpi-env-validation

Conversation

@VishalPainjane
Copy link

This PR implements validation logic in the MPI runtime plugin to prevent users from manually setting internal MPI environment variables in TrainJob.spec.trainer.env.

Currently, users can accidentally set environment variables like OMPI_MCA_orte_default_hostfile which conflicts with the operator's internal configuration. This can cause the MPI launcher to fail to discover workers correctly, leading to training failures that are difficult to debug.

Changes

  1. Define the Forbidden List (pkg/constants/constants.go):

    • Added MPIReservedEnvNames constant containing the reserved MPI environment variables:
      • OMPI_MCA_orte_default_hostfile
      • OMPI_MCA_plm_rsh_args
      • OMPI_MCA_orte_keep_fqdn_hostnames
      • OMPI_MCA_orte_set_default_slots
  2. Enforce the Rule (pkg/runtime/framework/plugins/mpi/mpi.go):

    • Updated the Validate function to check user-provided environment variables
    • Rejects job creation with a clear error message when reserved envs are detected
    • Follows the same pattern used in the Torch plugin for TorchRunReservedEnvNames
  3. Unit Tests (pkg/runtime/framework/plugins/mpi/mpi_test.go):

    • Added test case for single reserved env detection
    • Added test case for multiple reserved envs detection
    • Added test case confirming non-reserved envs are still allowed

Which issue(s) this PR fixes

Fixes #3126

Checklist

  • Code follows the existing patterns in the codebase (mirrors Torch plugin validation)
  • Unit tests added and passing
  • go vet passes
  • go fmt passes
  • Docs included if any changes are user facing

@google-oss-prow google-oss-prow bot requested a review from jinchihe January 28, 2026 11:20
@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tenzen-y for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Vishal Painjane <painjanevishal2204@gmail.com>
@VishalPainjane VishalPainjane force-pushed the feat/mpi-env-validation branch from 17263fe to 705467e Compare January 28, 2026 11:25
@VishalPainjane VishalPainjane changed the title feat(mpi): add validation for reserved MPI environment variables feat: add validation for reserved MPI environment variables Jan 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Implement MPI Environment Variable Validation

1 participant