Talk:LEB128

Rationale

Can somebody please explain why this format ever saw the light of day? It seems so overly complex, unnecessary and wasteful. A simple zero byte terminated little endian string satisfies the requirement for "variable-length code compression used to store arbitrarily large integers" (signed and unsigned). It would be shorter and in some cases, a single instruction to load. The LEC128 is not even a compression - it is an expansion! ArtKocsis (talk) 16:58, 23 February 2023 (UTC)[reply]

@ArtKocsis Although this is not the place to discuss the topic, but the article, there are certain number ranges in which LEB128 is shorter than your suggestion, in particular 0-127 (I guess that's where the name is from), where LEB128 needs 1 byte whereas your suggestion needs 2. Then from 128-255 they are both 2 bytes. Then from 256-(2**14-1=)16,383 LEB128 needs 2 bytes, whereas your suggestion needs 3. They are again the same from 16,383 to (2**16-1=)65,535, and then again shorter until (2**21-1=)about 2 billion. I think only for number larger than 2**56-1 do we start to get a one byte advantage using your scheme, and for numbers bigger than 2**112-1, two bytes, etc.

I think it is safe to say that most numbers will be in a space where LEB128 has a higher encoding efficiency than your suggestion.

This is not a defence, I see advantages in your suggestion w.r.t. simplicity, but it is not true that it is an expansion, unless I misunderstood the article (and I might be off on some of the specific numbers, but I don't think I am off on the general argument). --denny vrandečić (talk) 02:46, 14 August 2023 (UTC)[reply]

Also, to make sure, I am not defending LEB128. It has a number of other issues, but the particular one you mentioned doesn't hold up, I think. E.g. see this discussion on HackerNews for efficiency arguments w.r.t. encoding and decoding speed. -- denny vrandečić (talk) 02:52, 14 August 2023 (UTC)[reply]

Also, a big number could feature a 0 byte in it.
An arbitrary length format either has to give the size first, but that size is itself a number to encode,
or reserve a chunk representation as a flag or delimiter (in this case, the MSB bit).Musaran (talk) 15:08, 8 October 2023 (UTC)[reply]

"Encode signed integer" correctness

The "Encode signed integer" pseudocode doesn't match the example in the Signed LEB128 section. I think the example is wrong. 198.20.220.54 (talk) 23:32, 6 September 2018 (UTC)[reply]

The code from https://github.com/Equim-chan/leb128 decodes the signed example to -123456 properly. Code from LLVM https://llvm.org/doxygen/LEB128_8h_source.html agrees as well. So it seems to be correct. I haven't verified the pseudo code, yet. 88.219.19.145 (talk) —Preceding undated comment added 17:21, 2 November 2019 (UTC)[reply]

LLVM code does produce the byte sequence 0xC0, 0xBB, 0x78 for -123456. So the example encoding is correct (the log shows the example was changed since the question was raised -- but the current example is correct). Comparing the logic of encodeSLEB128() from LLVM with the pseudocode in the article, shows that both agree. The LLVM code provides an additional option to append padding bytes, otherwise is pretty similar.

However the pseudocode is very vague and hard to understand in this overabstracted way. The link to the LLVM code is definitely useful to fully understand the logic of the pseudocode.

Conclusion: Pseudocode is correct, if too vague. The signed number example is correct as well.

--88.219.19.145 (talk) 19:34, 2 November 2019 (UTC)[reply]

JFR use

The article ^[1] "Get started with JDK Flight Recorder in OpenJDK 8u" states that the LEB128 encoding is used in the binary representation of JFR’s recordings. It may be added the "Uses" section of this article. Maxime.bochon (talk) 18:47, 8 September 2020 (UTC)[reply]

References

^ https://developers.redhat.com/blog/2020/08/25/get-started-with-jdk-flight-recorder-in-openjdk-8u

[1] ttps://developers.redhat.com/blog/2020/08/25/get-started-with-jdk-flight-recorder-in-openjdk-8u

[1]