Skip to content

feat: add TTLSecondsAfterFinished and ActiveDeadlineSeconds fields to TrainJob CRD#3065

Open
XploY04 wants to merge 10 commits intokubeflow:masterfrom
XploY04:master
Open

feat: add TTLSecondsAfterFinished and ActiveDeadlineSeconds fields to TrainJob CRD#3065
XploY04 wants to merge 10 commits intokubeflow:masterfrom
XploY04:master

Conversation

@XploY04
Copy link

@XploY04 XploY04 commented Jan 4, 2026

KEP PR #3068
FIXES ISSUE #2899

…ment.

Signed-off-by: XploY04 <2004agarwalyash@gmail.com>
@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign astefanutti for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@github-actions
Copy link

github-actions bot commented Jan 4, 2026

🎉 Welcome to the Kubeflow Trainer! 🎉

Thanks for opening your first PR! We're happy to have you as part of our community 🚀

Here's what happens next:

  • If you haven't already, please check out our Contributing Guide for repo-specific guidelines and the Kubeflow Contributor Guide for general community standards.
  • Our team will review your PR soon! cc @kubeflow/kubeflow-trainer-team

Join the community:

Feel free to ask questions in the comments if you need any help or clarification!
Thanks again for contributing to Kubeflow! 🙏

Signed-off-by: XploY04 <2004agarwalyash@gmail.com>
@google-oss-prow google-oss-prow bot added size/XL and removed size/L labels Jan 5, 2026
…ment

Signed-off-by: XploY04 <2004agarwalyash@gmail.com>
…or deadline enforcement

Signed-off-by: XploY04 <2004agarwalyash@gmail.com>
…and TTL enforcement in TrainJob

Signed-off-by: XploY04 <2004agarwalyash@gmail.com>
Signed-off-by: XploY04 <2004agarwalyash@gmail.com>
@XploY04 XploY04 marked this pull request as ready for review January 5, 2026 19:58
Copilot AI review requested due to automatic review settings January 5, 2026 19:58
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds TTLSecondsAfterFinished and ActiveDeadlineSeconds fields to the TrainJob CRD to enable automatic cleanup of finished jobs and enforce maximum runtime limits. The implementation follows Kubernetes Job/JobSet patterns and includes comprehensive test coverage across unit, integration, and e2e tests.

Key changes:

  • Added two new optional fields to TrainJobSpec: TTLSecondsAfterFinished (int32) and ActiveDeadlineSeconds (int64) with immutability constraints
  • Implemented controller logic for TTL-based deletion and deadline enforcement with appropriate requeue mechanisms
  • Added RBAC permissions for TrainJob deletion and webhook validation for TTL warnings

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
pkg/apis/trainer/v1alpha1/trainjob_types.go Added TTLSecondsAfterFinished and ActiveDeadlineSeconds fields to TrainJobSpec with validation markers and DeadlineExceeded condition reason
pkg/controller/trainjob_controller.go Implemented reconcileTTL and reconcileDeadline functions to handle automatic deletion and deadline enforcement; added delete RBAC permission
pkg/webhooks/trainjob_webhook.go Added validateTTLSecondsAfterFinished function to warn users about short TTL values (<60s)
pkg/util/testing/wrapper.go Added helper methods TTLSecondsAfterFinished and ActiveDeadlineSeconds to TrainJobWrapper for testing
pkg/controller/trainjob_ttl_test.go Added comprehensive unit tests for TTL cleanup and deadline enforcement logic
test/integration/controller/trainjob_controller_test.go Added integration tests for TTL deletion (no TTL, TTL=3s, TTL=0) and deadline enforcement scenarios
test/e2e/e2e_test.go Added e2e tests for real-world TTL deletion and deadline exceeded scenarios
manifests/base/rbac/role.yaml Split trainjobs resource into separate rule entries with delete verb added for TTL cleanup
manifests/base/crds/trainer.kubeflow.org_trainjobs.yaml Added activeDeadlineSeconds and ttlSecondsAfterFinished to CRD schema with validation and immutability constraints
charts/kubeflow-trainer/crds/trainer.kubeflow.org_trainjobs.yaml Mirrored CRD changes for Helm chart deployment
pkg/apis/trainer/v1alpha1/zz_generated.deepcopy.go Generated DeepCopyInto methods for new pointer fields
pkg/apis/trainer/v1alpha1/zz_generated.openapi.go Generated OpenAPI schema definitions for new fields
pkg/client/applyconfiguration/trainer/v1alpha1/trainjobspec.go Added WithTTLSecondsAfterFinished and WithActiveDeadlineSeconds methods to apply configuration builder
api/python_api/kubeflow_trainer_api/models/trainer_v1alpha1_train_job_spec.py Added active_deadline_seconds and ttl_seconds_after_finished fields to Python API model
api/openapi-spec/swagger.json Updated OpenAPI specification with new field definitions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: XploY04 <2004agarwalyash@gmail.com>
Signed-off-by: XploY04 <2004agarwalyash@gmail.com>
…w runtime-level ActiveDeadlineSeconds with TrainJob override.

Signed-off-by: XploY04 <2004agarwalyash@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant