Skip to content

chore(storage): add chunking, composing and cleanup logic for PCU#13600

Open
cpriti-os wants to merge 6 commits intogoogleapis:mainfrom
cpriti-os:pcu
Open

chore(storage): add chunking, composing and cleanup logic for PCU#13600
cpriti-os wants to merge 6 commits intogoogleapis:mainfrom
cpriti-os:pcu

Conversation

@cpriti-os
Copy link
Contributor

@cpriti-os cpriti-os commented Jan 14, 2026

Asynchronous Chunking and Uploads

  • Data written by the user is buffered into fixed-size parts. Once a part buffer is full, or during Close(), it's dispatched for asynchronous upload via a dedicated pool of worker goroutines.
  • The pcuState.write() method handles buffering incoming data, dispatching full buffers, and acquiring new buffers from a pool (bufferCh).
  • pcuState.flushCurrentBuffer() encapsulates the logic for taking a filled buffer, incrementing the part number (s.partNum++), creating an uploadTask, and sending it to the uploadCh for processing by a worker.

Recursive Composition for Large Objects

  • To handle objects larger than the maximum allowed composite components (currently 32), a multi-level composition strategy is implemented
  • pcuState.composeParts() manages the recursive composition process, creating intermediate objects (up to 32 components per compose operation) until a final object is formed.
  • Each level of intermediate composition can occur in parallel using goroutines, with errors aggregated and handled.
  • Temporary object names for parts and intermediates are generated using a default naming strategy, including random bytes (generateRandomBytes) to prevent hot-spotting

Robust Cleanup

  • Temporary part and intermediate objects are actively managed and cleaned up upon successful completion or error, based on a configurable cleanupStrategy (e.g., cleanupAlways, cleanupOnSuccess).
  • pcuState.doCleanup() orchestrates the deletion of all temporary parts and intermediate composed objects, performing these deletions concurrently using a semaphore to limit goroutines
  • Deletion errors are recorded in failedDeletes and can be retrieved in future.

@cpriti-os cpriti-os requested a review from a team as a code owner January 14, 2026 11:01
@cpriti-os cpriti-os requested a review from a team January 14, 2026 11:01
@cpriti-os cpriti-os requested a review from a team as a code owner January 14, 2026 11:01
@product-auto-label product-auto-label bot added the api: storage Issues related to the Cloud Storage API. label Jan 14, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @cpriti-os, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Parallel Composite Upload (PCU) functionality by integrating comprehensive data chunking, multi-level object composition, and robust cleanup mechanisms. These changes enable the system to efficiently handle large file uploads by breaking them into manageable parts, assembling them intelligently, and ensuring that temporary artifacts are properly managed, all while improving the overall testability of the component.

Highlights

  • Chunking and Buffering Logic: Introduced core logic for chunking incoming data, buffering it, and dispatching full parts for parallel upload. This includes write and flushCurrentBuffer methods within pcuState to manage data flow efficiently.
  • Multi-Level Object Composition: Implemented composeParts to handle the composition of uploaded parts into a final object. This includes a multi-level composition strategy to manage scenarios where the number of parts exceeds the maximum allowed for a single compose operation, creating intermediate composite objects as needed.
  • Robust Cleanup Mechanism: Added doCleanup logic to remove temporary and intermediate objects created during the parallel upload process. This cleanup respects configurable strategies (e.g., cleanupAlways, cleanupOnSuccess, cleanupNever) and handles potential failures during deletion.
  • Enhanced Testability: Refactored pcuState to use injectable function fields (deleteFn, doCleanupFn, composePartsFn, composeFn) for key operations, significantly improving the testability and modularity of the parallel composite upload process.
  • Cryptographically Secure Randomness: Switched from math/rand to crypto/rand for generating random bytes used in temporary part names, enhancing security and uniqueness.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant new functionality for parallel composite uploads (PCU), including chunking data into parts, composing parts (recursively if needed), and cleaning up temporary objects. The changes are well-structured, using dependency injection for testability, and are accompanied by a comprehensive suite of unit tests covering many edge cases. The logic for concurrent part uploads, composition, and cleanup appears robust. I have two main points of feedback: a critical issue in generateRandomBytes where a potential error from crypto/rand.Read is ignored, which could lead to non-random data and object name collisions, and a high-severity issue in the close method's error handling, which could cause the cleanup logic to behave incorrectly on composition failures. After addressing these points, the implementation will be much more robust.

// Acquire a buffer from the pool if we don't have one.
if s.currentBuffer == nil {
// Fail-fast check before taking a new buffer.
s.mu.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this logic to a function. This is repeated at multiple places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are just three lines that can be moved to a function but it can only possibly replace three instances, I think adding a function for so small a logic will not add to readability.

@krishnamd-jkp
Copy link
Contributor

initial review

@cpriti-os cpriti-os requested review from a team and igorbernstein2 as code owners February 12, 2026 04:28
@snippet-bot
Copy link

snippet-bot bot commented Feb 12, 2026

No region tags are edited in this PR.

This comment is generated by snippet-bot.
If you find problems with this result, please file an issue at:
https://github.com/googleapis/repo-automation-bots/issues.
To update this comment, add snippet-bot:force-run label or use the checkbox below:

  • Refresh this comment

@cpriti-os
Copy link
Contributor Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant new logic for parallel composite uploads, including asynchronous chunking, recursive composition for large objects, and robust cleanup of temporary parts. The changes are well-structured, leveraging goroutines, channels, and other concurrency primitives effectively. The addition of comprehensive unit tests for the new functionality is also a great inclusion.

I've identified a couple of issues related to error handling that could lead to bugs under specific failure conditions, and one minor anti-pattern in the test code. My feedback includes suggestions to address these points.

Comment on lines +629 to +631
b := make([]byte, n)
_, _ = rand.Read(b)
return b
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The error returned by rand.Read is ignored. According to the documentation, crypto/rand can return an error if the underlying OS entropy source fails. Ignoring this error could result in the function returning a buffer of zeros, which would lead to predictable temporary object names and potential collisions, defeating the purpose of using a cryptographically secure random generator. This is a critical failure condition, and the application should probably panic to avoid operating in an insecure state.

    b := make([]byte, n)
    if _, err := rand.Read(b); err != nil {
        // This is a catastrophic failure. The OS's entropy source is broken.
        panic(fmt.Sprintf("crypto/rand.Read failed: %v", err))
    }
    return b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: storage Issues related to the Cloud Storage API.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants