Skip to content

Commit 1d919aa

Browse files
committed
Tweaks.
1 parent 9c97724 commit 1d919aa

File tree

1 file changed

+24
-16
lines changed

1 file changed

+24
-16
lines changed

README.md

Lines changed: 24 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -5,19 +5,20 @@ Fast! Tiny milliseconds to compress a 10 MB string. Check out the benchmarks.<br
55
Well tested! See the test directory for usage examples and edge cases.
66

77
```java
8-
String data = "Assume this is a 100 megabytes string...";
8+
String data = "Assume this is a 100 MB string...";
9+
byte[] c;
910

1011
// 4‑bit compressor -> 50% compression rate
1112
// Max of 16 different chars. Default charset: `0-9`, `;`, `#`, `-`, `+`, `.`, `,`
12-
byte[] c = new FourBitAsciiCompressor().compress(data); // c is 50 megabytes.
13+
c = new FourBitAsciiCompressor().compress(data); /** c is 50 megabytes. */
1314

1415
// 5‑bit compressor -> 38% compression rate
1516
// Max of 32 different chars. Default charset: `A-Z`, space, `.`, `,`, `\`, `-`, `@`
16-
byte[] c = new FiveBitAsciiCompressor().compress(data); // c is 62 megabytes.
17+
c = new FiveBitAsciiCompressor().compress(data); /** c is 62 megabytes. */
1718

1819
// 6‑bit compressor -> 25% compression rate
19-
// Max of 64 different chars. Default charset: `A-Z`, `0-9`, and many punctuation marks defined at SixBitAsciiCompressor.DEFAULT_6BIT_CHARSET.
20-
byte[] c = new SixBitAsciiCompressor().compress(data); // c is 75 megabytes.
20+
// Max of 64 different chars. Default charset: `A-Z`, `0-9`, and many punctuation marks.
21+
c = new SixBitAsciiCompressor().compress(data); /** c is 75 megabytes. */
2122
```
2223

2324
## Downloads
@@ -50,7 +51,7 @@ This way we can remove those unnecessary bits and store only the ones we need.
5051
And this is exactly was this library do.
5152

5253
Another important feature is searching. This library not only supports compacting, but also binary searching on the
53-
compacted data itself without deflating it, which will be explained later in this documentation.
54+
compacted data itself without deflating it, which will be explained later.
5455

5556
To compress a string, you can easily use either `FourBitAsciiCompressor`, `FiveBitAsciiCompressor`, or `SixBitAsciiCompressor`.
5657

@@ -106,28 +107,35 @@ To extract ASCII bytes from a `String` in the most efficient way (for compressio
106107
But the overloaded version `compressor.compress(String)` already calls it automatically, so, just call the overloaded version.
107108

108109
### Where to store the compressed data?
109-
110110
In its purest form, a `String` is just a byte array (`byte[]`), and a compressed `String` couldn't be different.
111111
You can store it anywhere you would store a `byte[]`.
112112
The most common approach is to store each compressed string ordered in memory using a `byte[][]` (for binary search) or
113-
a B+Tree (coming in the next release).
113+
a B+Tree if you need frequent insertions (coming in the next release).
114114
The frequency of reads and writes + business requirements will tell the best media and data structure to use.
115115

116116
If the data is ordered before compression and stored in-memory in a `byte[][]`, you can use the full power of the binary search directly in the compressed data
117117
through `FourBitBinarySearch`, `FiveBitBinarySearch`, and `SixBitBinarySearch`.
118118

119119
### Binary search
120+
Executing a binary search in compressed data is simple as:
121+
```java
122+
byte[][] compressedData = new byte[100000000][]; // Data for 100 million customers.
120123

124+
SixBitBinarySearch binary = new SixBitBinarySearch(compressedData, false);
125+
int index = binary.search("key");
126+
```
127+
But this is not a realistic use case. Let's walk through a real-world scenario:
121128

129+
Imagine the company you are working with have 70 million customers. You can't create an array with that exact number of
130+
elements because otherwise you will have no space to add further customers to your data pool (usually with some incremental
131+
ID implementation to avoid adding in the middle, but always at the end of the array). In this case, we can extend the size
132+
to accommodate incoming customers by making the array bigger, like in the example above with:
133+
```byte[][] compressedData = new byte[100000000][]; // Data for 100 million customers.```
122134

123-
```java
124-
byte[][] compactedMass = new byte[100000000][]; // Data for 100 million customers.
125135

136+
### B+Tree
126137

127-
byte[] compressed = compressor.compress(input);
128-
byte[] decompressed = compressor.decompress(compressed);
129-
String string = new String(decompressed, StandardCharsets.ISO_8859_1);
130-
```
138+
Coming in the next release.
131139

132140
### Bulk / Batch compression
133141

@@ -141,10 +149,10 @@ from handle array positions and bounds. This is why we recommend `ManagedBulkCom
141149

142150
Both bulk compressors loop through the data in parallel by calling `IntStream.range().parallel()`.
143151

144-
Let's take `compactedMass` from the previous example and show how we can populate it with data from all customers:
152+
Let's take `compactedData` from the previous example and show how we can populate it with data from all customers:
145153

146154
```java
147-
byte[][] compactedMass = new byte[100000000][]; // Data for 100 million customers.
155+
byte[][] compactedData = new byte[100000000][]; // Data for 100 million customers.
148156

149157

150158

0 commit comments

Comments
 (0)