Skip to content

Handling-multi-byte-unicode-characters#1650

Open
teg-atlassian wants to merge 1 commit intomainfrom
Handling-multi-byte-unicode-characters
Open

Handling-multi-byte-unicode-characters#1650
teg-atlassian wants to merge 1 commit intomainfrom
Handling-multi-byte-unicode-characters

Conversation

@teg-atlassian
Copy link
Copy Markdown
Contributor

@teg-atlassian teg-atlassian commented Feb 24, 2026

What Is This Change?

In the process of the backend RovoDev sending a response message and Atlascode receiving the message, the following transformations occur

  • HTTP delivers raw bytes in arbitrary-sized chunks
  • The SSE (Server Side Event) parser splits on \n\n - but this split happens at the byte level, not character level
  • If \n\n falls in the middle of a multi-byte UTF-8 character, the split corrupts the data

This sometimes split tokens such as ****tool_name**** as ***t' and ool_name**** and this results in Atlascode throwing error and the session failing.

A more detailed (a little bit long) discussion about the problem and the solution can be found here.
With this PR, we

  1. safely parse the the string: we don't assume the string is json
  2. when the parsing fails, we attempt again by combining different consecutive chunks. So, the trick is just try different combinations.

How Has This Been Tested?

Basic checks:

  • npm run lint
  • npm run test
  • new tests

Advanced checks:

  • If Atlassian employee & Bitbucket changes: did you test with DC in mind? See Instructions

Recommendations:

  • Update the CHANGELOG if making a user facing change

Rovo Dev code review: Rovo Dev couldn't review this pull request
Upgrade to Rovo Dev Standard to continue using code review.

@teg-atlassian teg-atlassian changed the title ensuring "invalid" jsons are retried again to see if they are valid Handling-multi-byte-unicode-characters Feb 24, 2026
} catch {
// JSON parse failed - likely due to incomplete multi-byte UTF-8 character at chunk boundary
// Put this chunk back in the buffer and wait for more data
this.buffer = chunkRaw + '\n\n' + this.buffer;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

match[2] is the end of the chunkRaw.
If match[2] can't be parsed, adding it back to the buffer followed by '\n\n' will keep it broken.

Let's discuss about this

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code is wrapped in another loop, . So, as you said match[2] is the last token for this buffer but we get new data since we are in the loop.

Copy link
Copy Markdown
Collaborator

@marcomura marcomura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on how Rovo Dev responds, this change should not be necessary.
Let's discuss it.

@bwieger-atlassian-com
Copy link
Copy Markdown
Collaborator

My thought here is that this should be solved at the Rovo Dev Server layer, not at the client level... but that's just a first impression. Worth chatting with Tim Esler on this.

@marcomura
Copy link
Copy Markdown
Collaborator

Agree with @bwieger-atlassian-com that any issue in the response should be fixed at Rovo Dev level.

However, looking at the telemetry, I don't believe the response is split incorrectly, but I think the tool-response may be responding with a different format than what we are expecting (e.g., string instead of json).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants