Skip to content

SOLR-15701: Complete configsets api#4264

Open
epugh wants to merge 14 commits intoapache:mainfrom
epugh:complete_configsets_api
Open

SOLR-15701: Complete configsets api#4264
epugh wants to merge 14 commits intoapache:mainfrom
epugh:complete_configsets_api

Conversation

@epugh
Copy link
Copy Markdown
Contributor

@epugh epugh commented Apr 4, 2026

https://issues.apache.org/jira/browse/SOLR-15701

Description

Add Download and GetFile and PutFile to ConfigSets API and SolrJ.

Solution

This was extracted with copilots help from the SchemaDesigner code base. We are going to merge this as part of ConfigSets API. In a future PR, once this is done, we'll finish the currently started SchemaDesigner work to use these APIs.

The PutFile was broken out of the Upload feature into it's own files with it's own parameters. Before it was inter twingled with Upload of a zipped configset.

Tests

Added some tests.

@epugh epugh requested a review from gerlowskija April 4, 2026 00:04
@github-actions github-actions bot added documentation Improvements or additions to documentation tests cat:api labels Apr 4, 2026
Comment thread solr/core/src/java/org/apache/solr/handler/configsets/GetConfigSetFile.java Outdated
Copy link
Copy Markdown
Contributor

@VishnuPriyaChandraSekar VishnuPriyaChandraSekar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just started reviewing the Solr code base.
I just left a minor comment other than that, PR seems to be fine.

Copy link
Copy Markdown
Contributor

@gerlowskija gerlowskija left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good overall. I left some inline comments, mostly small things.

I do have one larger question here though: I gather much of this functionality already existed under the aegis of the SchemaDesigner, but I don't see any deletions or modifications to those files in this PR. Shouldn't this PR be deleting the "download-configset" and "get-configset-file" APIs that exist over in schema-designer land?

Comment thread solr/api/src/java/org/apache/solr/client/api/endpoint/ConfigsetsApi.java Outdated
@Path("/configsets/{configSetName}")
interface Download {
@GET
@Path("/download")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[-1] /download isn't very REST-ful. The fact that we're fetching/downloading the resource is already kindof implied by the HTTP 'GET' verb/method.

Could we drop it and make this "just" GET /api/configsets/{configsetName}?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we could, and I was thinking about it and then asked Claude for soem suggestions, and he kind of came back to "/download" is easy. Here is the write up:
RESTful Alternatives

Option 1: Keep current design ✅ (Recommended)

GET /api/configsets                          → List (JSON)
GET /api/configsets/{name}/download          → Download ZIP
GET /api/configsets/{name}/{filePath}        → Get single file

Pros:

  • Explicit and unambiguous
  • Self-documenting (/download tells you what it does)
  • No conflicts with file paths
  • Common pattern (GitHub uses /archive, GitLab uses /repository/archive, etc.)

Cons:

  • Not purely resource-oriented REST

Option 2: Use /files sub-resource

GET /api/configsets/{name}                   → Metadata (JSON) - NEW
GET /api/configsets/{name}/files             → Download all as ZIP
GET /api/configsets/{name}/files/{path}      → Get single file

Pros:

  • More RESTful hierarchy
  • Room for metadata endpoint

Cons:

  • Breaking change to existing API
  • More complex paths

Option 3: Content negotiation (Pure REST)

GET /api/configsets/{name}
  Accept: application/json  → Metadata
  Accept: application/zip   → ZIP download

Pros:

  • True REST content negotiation

Cons:

  • Solr doesn't typically use Accept headers this way
  • Harder to use with curl/browsers
  • Still conflicts with /{filePath} pattern

Recommendation

Keep the current design (/download) because:

  1. Unambiguous - no path conflicts
  2. Self-documenting - clear what you get back
  3. Pragmatic - works well without complex content negotiation
  4. Common pattern - many APIs use action-style endpoints for different representations
  5. Backward compatible - if we add metadata later, we can use GET /api/configsets/{name} for JSON

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we had another places that we did the zip download, then I could see making them all work the same, with whatever pattern we picked...

Copy link
Copy Markdown
Contributor

@gerlowskija gerlowskija Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about it and then asked Claude for soem suggestions, and he kind of came back to

I don't mean to be glib, but I don't really care what Claude thinks - I care what you think! Maybe by quoting Claude you're implying you agree? But I'd love to hear which option you think is better, and why!

If we had another places that we did the zip download, then I could see making them all work the same

Idk about ZIP downloads, that's admittedly a little more niche 🤷

What is clear to me is that we have a bunch of APIs that let you "download some thing" or "download some part of some thing", and none of those APIs take the /download approach suggested by your LLM.

  • Download a file from the filestore: GET /api/cluster/filestore/files/<path>
  • Download an index file from leader: GET /api/cores/coreName/replication/files/<path>
  • Fetch the data in a ZooKeeper node: GET /api/cluster/zookeeper/data/<path>
  • etc.

We've got an established pattern here: rely on the HTTP verb to imply "download", and offer a "files/" sub-path as a way to support individual file download. If we want to deviate from that there should be a pretty compelling reason.

And while I hear Claude's point that we may want to reserve /api/configsets/<name> for some future metadata at some point...that doesn't feel very compelling to me. "configset metadata" isn't a thing in Solr today. And as Claude pointed out there are options (e.g. Accept header) for adding that pretty seamlessly down the road.

To be transparent, if you personally feel strongly about /download I'm not gonna veto the PR or anything. But Claude's argument for /download doesn't make a ton of sense to me, and I think we should be intentional about these endpoints.

Comment thread solr/api/src/java/org/apache/solr/client/api/endpoint/ConfigsetsApi.java Outdated
Comment thread solr/api/src/java/org/apache/solr/client/api/endpoint/ConfigsetsApi.java Outdated
Comment thread solr/api/src/java/org/apache/solr/client/api/endpoint/ConfigsetsApi.java Outdated
Comment thread solr/core/src/java/org/apache/solr/handler/configsets/DownloadConfigSet.java Outdated
Comment thread solr/solr-ref-guide/modules/configuration-guide/pages/configsets-api.adoc Outdated
Comment thread solr/solr-ref-guide/modules/configuration-guide/pages/configsets-api.adoc Outdated
Comment thread solr/solr-ref-guide/modules/configuration-guide/pages/configsets-api.adoc Outdated
@epugh
Copy link
Copy Markdown
Contributor Author

epugh commented Apr 11, 2026

I have just started reviewing the Solr code base. I just left a minor comment other than that, PR seems to be fine.

@VishnuPriyaChandraSekar by the way, we need more code reviewers for the PR's in Solr. With AI, it's easier then ever to generate code, but reviewing it really isn't any easier. It would be great if you reviewed more PRs, as that would help contributors get some early feedback! I believe you can use the Code Review feature and approve a PR, even if you aren't a committer, you just can't merge.

@epugh
Copy link
Copy Markdown
Contributor Author

epugh commented Apr 11, 2026

Lots of progress, I think I am down to doc fixes.

@epugh
Copy link
Copy Markdown
Contributor Author

epugh commented Apr 12, 2026

Working through the docs.. I am going to split uploading a complete configset from the putting of a single file in an existing configset. Right now the methods are on the same upload interface, and it's a bit confusing.

@epugh
Copy link
Copy Markdown
Contributor Author

epugh commented Apr 12, 2026

@gerlowskija I couldn't help it, I also found some remnants of the trusted configset concept that i could remove in this... Simplified osme of the tests!

I think I'm ready for final review!

@epugh
Copy link
Copy Markdown
Contributor Author

epugh commented Apr 13, 2026

Test all pass locally.

@gerlowskija
Copy link
Copy Markdown
Contributor

I couldn't help it, I also found some remnants of the trusted configset concept

shakes fist at cloud

I think you know what I think about bundling that into this PR 😛 You definitely get a "pass" though IMO since this PR itself was split off from a different one haha

To re-state my concern/issue for others though:

I get that "this should be a separate PR" is really whiny and really pedantic. But in a post-AI world where it's easier than ever to write PRs (but no easier to review them) it's worth doing whatever we can to make the review side of things easier. Including being a little more militant about this PR-scope-creep stuff.

It's not just a bit of change to the diff - now anyone that wants to do a thorough review of this PR has to understand all of the context around trusted configsets, why we removed them recently, and whether there was a reason this code wasn't included in that removal, etc.

(To save others the work: SOLR-17584 removed "trusted configsets", and it was intended to get all the logic, so we're 👍 to remove any vestigial bits here.)

I'll get off my soapbox, and I won't ask that you break things up at this point. Mainly hoping I can win hearts and minds on this for the future 😛

Copy link
Copy Markdown
Contributor

@gerlowskija gerlowskija left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely moving in the right direction, but still a few things that should be tidied up before committing. Hopefully all minor stuff.

One larger PR-level question I'm still hoping for clarification on though: I get the impression that the "configset download" capability in this PR is migrated from an existing schema-designer API. But I don't see any deletions or removals over in the schema-designer code. Am I misunderstanding things here? Or am I correct about the duplication, but it's something you're doing intentionally at this point? Could just use a little more context on that...

@Path("/configsets/{configSetName}")
interface PutFile {
@PUT
@Path("{filePath:.+}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[-0] We should probably add /files to the path here to align with what we now have for getConfigSetFile above.

}

if (overwriteOnExists || !Files.exists(configsetFilePath)) {
// Create parent directories if they don't exist (similar to ZK's makePath)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Q] Why is this code here?

This feels kindof suspect at a glance, from a security perspective. The risk is a bit lower in ZK, but since we're on the filesystem here we probably shouldn't blindly be creating directories based on a user-specified parameter. If we're 100% certain that there's no path-escape or any other funny-business possible here, maybe this is fine. But I'd think this through really carefully if you haven't already.

* @param configSetId the internal configset name to download (may differ from displayName, e.g.
* for schema-designer's mutable copies)
* @param displayName the user-visible name used to derive the download filename
* @param configSetId the internal configset name to download
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[0] "internal" probably isn't needed here anymore.

Boolean cleanup,
InputStream requestBody)
throws IOException {
public SolrJerseyResponse putConfigSetFile(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[0] I liked the old method name better, fwiw. uploadConfigSetFile is clearer than putConfigSetFile

SolrJerseyResponse putConfigSetFile(
@PathParam("configSetName") String configSetName,
@PathParam("filePath") String filePath,
@QueryParam("overwrite") Boolean overwrite,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Q] Can you please add some context on why these boolean flags are going away?

'Overwrite' in particular seems like a nice safety check that I imagine users would like to use.

assertEquals(filePath, response.path);
assertEquals("", response.content);
// Verify PNG signature is intact
assertEquals((byte) 0x89, responseBytes[0]);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[-0] We already did an 'assertArrayEquals' above to validate that we received exactly what we expected...what do these assertions provide in addition to that?

final var api = new UploadConfigSet(mockCoreContainer, null, null);
final var response = api.putConfigSetFile(configSetName, filePath, fileStream);

assertNotNull(response);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[0] IMO these not-null checks don't provide a ton of value. If you want to validate the response, look at the responseHeader 'status' field, or make sure that the 'error' property is null/empty, or...

}

/** Creates a configset with files on disk for testing overwrites. */
private void createExistingConfigSet(String configSetName, String... filePathAndContent)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[-1] Nothing ever calls this.

// Test with empty string
final var ex =
assertThrows(
SolrException.class, () -> api.putConfigSetFile(configSetName, "", fileStream));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[0] Are we pre-creating 'configSetName' as a configset in some way? (Or is ConfigSetService mocked in some way such that it always exists?)

If not, shouldn't the API be returning a 404 or something bc the configset doesn't exist?


=== API Changes

Solr 10.1 changes the V1 API signature for uploading a single file to remove the overwrite parameter. Previously the default for `overwrite` is false when using the v1 API, but true when using the v2 API.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[-0] I'm not sure I understand what this means. Does "API signature" here mean the method-signature in the Java code? Or are saying that the v1 API no longer supports this param at all?

I must've missed it earlier on in my review, but if we're removing support for a parameter from v1 can you share a bit more context on why that was done?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cat:api cat:cloud documentation Improvements or additions to documentation tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants