|
|
# What is `csa`?
|
|
|
|
|
|
`csa` is a fully functional compressed self-index with (limited) archiving capabilities. It is heavily based on Huo _et al._ work[^1], for which it provides another implementation. Yet, `csa` also supports Simple9[^2] instead of Elias gamma[^3] for (faster) universal coding. When using Simple9 codes, `csa` can make use of a [trick](https://cloud.uvolante.org/index.php/s/DDgNmKineM9Ewr3/download) that nearly halves the random access time to the uncompressed data. The price to pay is an increase of the compressed self-index compared to using Elias gamma codes.
|
|
|
|
|
|
# Licensing information
|
|
|
|
|
|
`csa` is released as is, without any warranty, under a dual licensing scheme.
|
|
|
|
|
|
By default, `csa` is distributed under the [GNU Affero General Public License, version 3](https://www.gnu.org/licenses/agpl-3.0.html).
|
|
|
|
|
|
If you cannot comply with AGPLv3, please [contact us](mailto:cayre@uvolante.org?Subject=Alternative Software Licesing Inquiry for CSA) for alternative licensing.
|
|
|
|
|
|
# Debian/Ubuntu repository
|
|
|
|
|
|
We provide pre-compiled binaries for Debian/Ubuntu `amd64` architectures.
|
|
|
|
|
|
Please follow [these instructions](https://www.uvolante.org/apt) to add the repository to your system.
|
|
|
|
|
|
Once the repository is available on your system:
|
|
|
```
|
|
|
sudo apt install csa
|
|
|
```
|
|
|
|
|
|
# Source code
|
|
|
|
|
|
## Requirements
|
|
|
|
|
|
`csa` makes use of the following software:
|
|
|
|
|
|
* `clang`, `make`, `cmake`, `doxygen`, `git`,
|
|
|
* the [`oops`](https://forge.uvolante.org/code/oops/wikis) library.
|
|
|
|
|
|
## Cloning the source repository
|
|
|
|
|
|
Once `oops` is compiled and installed, clone the `git` tree:
|
|
|
```
|
|
|
git clone https://forge.uvolante.org/code/csa.git
|
|
|
```
|
|
|
|
|
|
# References
|
|
|
|
|
|
[^1]: Hongwei Huo, Longgang Chen, Jeffrey Scott Vitter and Yakov Nekrich, _"A Practical Implementation of Compressed Suffix Arrays with Applications to Self-Indexing "_, Proceedings of the Data Compression Compression, IEEE, pp. 292--301, March 2014.
|
|
|
|
|
|
[^2]: Vo Ngoc Anh and Alistair Moffat, _"Inverted Index Compression Using Word-Aligned Binary Codes"_, Information Retrieval, vol. 8, issue 1, pp. 151--166, January 2005.
|
|
|
|
|
|
[^3]: Peter Elias, _"Universal Codeword Sets and Representations of the Integers"_, IEEE Transactions on Information Theory, vol. 21, No. 2, pp. 194--203, March 1975. |
|
|
\ No newline at end of file |