You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+24-16Lines changed: 24 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,19 +5,20 @@ Fast! Tiny milliseconds to compress a 10 MB string. Check out the benchmarks.<br
5
5
Well tested! See the test directory for usage examples and edge cases.
6
6
7
7
```java
8
-
String data ="Assume this is a 100 megabytes string...";
8
+
String data ="Assume this is a 100 MB string...";
9
+
byte[] c;
9
10
10
11
// 4‑bit compressor -> 50% compression rate
11
12
// Max of 16 different chars. Default charset: `0-9`, `;`, `#`, `-`, `+`, `.`, `,`
12
-
byte[] c =newFourBitAsciiCompressor().compress(data); // c is 50 megabytes.
13
+
c =newFourBitAsciiCompressor().compress(data); /** c is 50 megabytes.*/
13
14
14
15
// 5‑bit compressor -> 38% compression rate
15
16
// Max of 32 different chars. Default charset: `A-Z`, space, `.`, `,`, `\`, `-`, `@`
16
-
byte[] c =newFiveBitAsciiCompressor().compress(data); // c is 62 megabytes.
17
+
c =newFiveBitAsciiCompressor().compress(data); /** c is 62 megabytes.*/
17
18
18
19
// 6‑bit compressor -> 25% compression rate
19
-
// Max of 64 different chars. Default charset: `A-Z`, `0-9`, and many punctuation marks defined at SixBitAsciiCompressor.DEFAULT_6BIT_CHARSET.
20
-
byte[] c =newSixBitAsciiCompressor().compress(data); // c is 75 megabytes.
20
+
// Max of 64 different chars. Default charset: `A-Z`, `0-9`, and many punctuation marks.
21
+
c =newSixBitAsciiCompressor().compress(data); /** c is 75 megabytes.*/
21
22
```
22
23
23
24
## Downloads
@@ -50,7 +51,7 @@ This way we can remove those unnecessary bits and store only the ones we need.
50
51
And this is exactly was this library do.
51
52
52
53
Another important feature is searching. This library not only supports compacting, but also binary searching on the
53
-
compacted data itself without deflating it, which will be explained later in this documentation.
54
+
compacted data itself without deflating it, which will be explained later.
54
55
55
56
To compress a string, you can easily use either `FourBitAsciiCompressor`, `FiveBitAsciiCompressor`, or `SixBitAsciiCompressor`.
56
57
@@ -106,28 +107,35 @@ To extract ASCII bytes from a `String` in the most efficient way (for compressio
106
107
But the overloaded version `compressor.compress(String)` already calls it automatically, so, just call the overloaded version.
107
108
108
109
### Where to store the compressed data?
109
-
110
110
In its purest form, a `String` is just a byte array (`byte[]`), and a compressed `String` couldn't be different.
111
111
You can store it anywhere you would store a `byte[]`.
112
112
The most common approach is to store each compressed string ordered in memory using a `byte[][]` (for binary search) or
113
-
a B+Tree (coming in the next release).
113
+
a B+Tree if you need frequent insertions (coming in the next release).
114
114
The frequency of reads and writes + business requirements will tell the best media and data structure to use.
115
115
116
116
If the data is ordered before compression and stored in-memory in a `byte[][]`, you can use the full power of the binary search directly in the compressed data
117
117
through `FourBitBinarySearch`, `FiveBitBinarySearch`, and `SixBitBinarySearch`.
118
118
119
119
### Binary search
120
+
Executing a binary search in compressed data is simple as:
121
+
```java
122
+
byte[][] compressedData =newbyte[100000000][]; // Data for 100 million customers.
0 commit comments