Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .github/workflows/build-docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,12 @@ jobs:
pip install tox
tox -e docs

- name: Edit CSS
run: |
CSSPATH=./docs/_build/html/_static/styles/furo.css
cat ${CSSPATH} | sed "s/ol,ul{\\([^}]*\\);padding-left:1\\.2rem}/ol,ul{\\1}/" > tmp.css
mv tmp.css ${CSSPATH}

- name: GH Pages Deployment
uses: JamesIves/github-pages-deploy-action@v4
with:
Expand Down
5 changes: 5 additions & 0 deletions .github/workflows/run-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,11 @@ jobs:
python -m pip install setuptools
python -c "import setup; print(setup.build_igraph(None));"

- name: Extract examples from docstrings
run: |
echo "def test_docstrings():" > tests/test_docstrings.py
cat src/scranpy/*.py | sed "s/^/#/" | sed "s/^# *>>> / /" >> tests/test_docstrings.py

- name: Test with tox
run: |
export SCRANPY_INSTALLED_PATH=$(pwd)/installed
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -60,3 +60,5 @@ extern/build-*
extern/igraph-*
extern/*.tar.gz
src/scranpy/lib_scranpy.py

tests/test_docstrings.py
4 changes: 2 additions & 2 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -76,8 +76,8 @@ testing =
setuptools
pytest
pytest-cov
singlecellexperiment>=0.4.0
summarizedexperiment>=0.4.0
singlecellexperiment>=0.6.0
summarizedexperiment>=0.6.3
scrnaseq>=0.3.1
scipy

Expand Down
31 changes: 17 additions & 14 deletions src/scranpy/adt_quality_control.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,13 @@ def compute_adt_qc_metrics(

- A list of arrays.
Each array corresponds to an ADT subset and can either contain boolean or integer values.

- For booleans, the sequence should be of length equal to the number of rows, and values should be truthy for rows that belong in the subset.
If the sequence contains booleans, it should not contain any other type.
- For integers, the value is the row index of an ADT in the subset.
- For strings, the value is the name of an ADT in the subset.
This should match at least one element in ``row_names``.

- A dictionary where keys are the names of each ADT subset and the values are arrays as described above.
- A :py:class:`~biocutils.NamedList.NamedList` where each element is an array as described above, possibly with names.

Expand All @@ -55,7 +57,7 @@ def compute_adt_qc_metrics(
Each column is a double-precision NumPy array that contains the sum of counts for the corresponding subset in each cell.

References:
The ``compute_adt_qc_metrics`` function in the `scran_qc <https://libscran.github.io/scran_qc>`_ C++ library, which describes the rationale behind these QC metrics.
The ``compute_adt_qc_metrics`` function in the `scran_qc`_ C++ library, which describes the rationale behind these QC metrics.

Examples:
>>> import numpy
Expand Down Expand Up @@ -107,23 +109,23 @@ def suggest_adt_qc_thresholds(
Number of MADs from the median to define the threshold for outliers in each QC metric.

Returns:
If ``block = None``, a :py:class:`~biocutils.NamedList.NamedList` is returned, containing:
If ``block = None``, a :py:class:`~biocutils.NamedList.NamedList` is returned, containing the following entries.

- ``detected``, a number specifying the lower threshold on the number of detected ADTs.
- ``subsets``, a :py:class:`~biocutils.FloatList.FloatList` of length equal to the number of control subsets (and named accordingly).
- ``detected``: a number specifying the lower threshold on the number of detected ADTs.
- ``subsets``: a :py:class:`~biocutils.FloatList.FloatList` of length equal to the number of control subsets (and named accordingly).
Each entry represents the upper bound on the sum of counts in the corresponding control subset.

If ``block`` is provided, the NamedList instead contains:
If ``block`` is provided, the ``NamedList`` instead contains:

- ``detected``, a FloatList of length equal to the number of blocks (and named accordingly).
- ``detected``, a ``FloatList`` of length equal to the number of blocks (and named accordingly).
Each entry represents the lower threshold on the number of detected ADTs in the corresponding block.
- ``subset_sum``, a NamedList of length equal to the number of control subsets.
Each entry is another FloatList that contains the upper threshold on the sum of counts for that subset in each block.
- ``subset_sum``, a ``NamedList`` of length equal to the number of control subsets.
Each entry is another ``FloatList`` that contains the upper threshold on the sum of counts for that subset in each block.
- ``block_ids``, a list containing the unique levels of the blocking factor.
This is in the same order as the blocks in ``detected`` and ``subset_sum``.

References:
The ``compute_adt_qc_filters`` and ``compute_adt_qc_filters_blocked`` functions in the `scran_qc <https://libscran.github.io/scran_qc>`_ C++ library,
The ``compute_adt_qc_filters`` and ``compute_adt_qc_filters_blocked`` functions in the `scran_qc`_ C++ library,
which describe the rationale behind the suggested filters.

Examples:
Expand Down Expand Up @@ -169,8 +171,8 @@ def suggest_adt_qc_thresholds(


def filter_adt_qc_metrics(
thresholds: biocframe.BiocFrame,
metrics: biocutils.NamedList,
thresholds: biocutils.NamedList,
metrics: biocframe.BiocFrame,
block: Optional[Sequence] = None
) -> numpy.ndarray:
"""
Expand All @@ -188,10 +190,10 @@ def filter_adt_qc_metrics(
The levels should be a subset of those used in :py:func:`~suggest_adt_qc_thresholds`.

Returns:
A NumPy vector of length equal to the number of cells in ``metrics``, containing truthy values for putative high-quality cells.
A boolean NumPy vector of length equal to the number of cells in ``metrics``, containing truthy values for putative high-quality cells.

References:
The ``AdtQcFilters`` and ``AdtQcBlockedFilters`` functions in the `scran_qc <https://libscran.github.io/scran_qc>`_ C++ library.
The ``AdtQcFilters`` and ``AdtQcBlockedFilters`` functions in the `scran_qc`_ C++ library.

Examples:
>>> import numpy
Expand All @@ -200,7 +202,8 @@ def filter_adt_qc_metrics(
>>> res = scranpy.compute_adt_qc_metrics(mat, { "IgG": [ 1, 10, 20, 40 ] })
>>> filt = scranpy.suggest_adt_qc_thresholds(res)
>>> keep = scranpy.filter_adt_qc_metrics(filt, res)
>>> keep.sum()
>>> import biocutils
>>> print(biocutils.table(keep))
"""

dthresh = thresholds["detected"]
Expand Down
2 changes: 1 addition & 1 deletion src/scranpy/aggregate_across_cells.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ def aggregate_across_cells(
Number of threads to use for aggregation.

Returns:
:py:class:`~biocutils.NamedList.NamedList` containing:
A :py:class:`~biocutils.named_list.NamedList` containing the following entries.

- ``sum``: double-precision NumPy matrix where each row corresponds to a gene and each column corresponds to a unique combination of grouping levels.
Each matrix entry contains the summed expression across all cells with that combination.
Expand Down
6 changes: 3 additions & 3 deletions src/scranpy/aggregate_across_genes.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,10 +36,10 @@ def aggregate_across_genes(
The first column contains the row names/indices and the second column contains the weights.

Alternatively, a dictionary may be supplied where each key is the name of a gene set and each value is a sequence/tuple as described above.
The keys will be used to name the output NamedList.
The keys will be used to name the output ``NamedList``.

Alternatively, a :py:class:`~biocutils.NamedList.NamedList` where each entry is a gene set represented by a sequence/tuple as described above.
If names are available, they will be used to name the output NamedList.
If names are available, they will be used to name the output ``NamedList``.

row_names:
Sequence of strings of length equal to the number of rows of ``x``, containing the name of each gene.
Expand All @@ -53,7 +53,7 @@ def aggregate_across_genes(
Number of threads to be used for aggregation.

Returns:
List of length equal to that of ``sets``.
A :py:class:`~biocutils.NamedList.NamedList` of length equal to that of ``sets``.
Each entry is a numeric vector of length equal to the number of columns in ``x``,
containing the (weighted) sum/mean of expression values for the corresponding set across all cells.

Expand Down
4 changes: 2 additions & 2 deletions src/scranpy/build_snn_graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ def build_snn_graph(

Alternatively, a :py:class:`~knncolle.find_knn.FindKnnResults` object containing existing neighbor search results.

Alternatively, a :py:class:`~knncolle.Index.Index` object.
Alternatively, a :py:class:`~knncolle.classes.Index` object.

num_neighbors:
Number of neighbors in the nearest-neighbor graph.
Expand All @@ -44,7 +44,7 @@ def build_snn_graph(
The algorithm to use for the nearest-neighbor search.
Only used if ``x`` is not a pre-built nearest-neighbor search index or a list of existing nearest-neighbor search results.

Results:
Returns:
A :py:class:`~biocutils.NamedList.NamedList` containing the components of a (possibly weighted) graph.

- ``vertices``: integer specifying the number of vertices (i.e., cells) in the graph.
Expand Down
2 changes: 1 addition & 1 deletion src/scranpy/center_size_factors.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ def center_size_factors(
This argument only used if ``size_factors`` is double-precision, otherwise a new array is always returned.

Returns:
Array containing centered size factors.
Double-precision NumPy array containing centered size factors.
If ``in_place = True``, this is a reference to ``size_factors``.

References:
Expand Down
2 changes: 1 addition & 1 deletion src/scranpy/choose_highly_variable_genes.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ def choose_highly_variable_genes(
Ignored if ``None``.

Returns:
Array containing the indices of genes in ``stats`` that are considered to be highly variable.
Integer NumPy array containing the indices of genes in ``stats`` that are considered to be highly variable.

References:
The ``choose_highly_variable_genes`` function from the `scran_variances <https://libscran.github.io/scran_variances>`_ library,
Expand Down
7 changes: 4 additions & 3 deletions src/scranpy/cluster_graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ def cluster_graph(
Random seed to use for ``method = "multilevel"`` or ``"leiden"``.

Returns:
A :py:class:`~biocutils.NamedList.NamedList` containing:
A :py:class:`~biocutils.NamedList.NamedList` containing the following entries.

- ``membership``: an integer NumPy array containing the cluster assignment for each vertex, i.e., cell.
All values are in [0, N) where N is the total number of clusters.
Expand All @@ -61,7 +61,7 @@ def cluster_graph(

- ``merges``: an integer NumPy matrix with two columns.
Each row corresponds to a merge step and specifies the pair of cells or clusters that were merged at that step.
- ``modularity: a double-precision NumPy array that contains the modularity score at each merge step.
- ``modularity``: a double-precision NumPy array that contains the modularity score at each merge step.

For ``method = "leiden"``, the output also contains:

Expand All @@ -79,7 +79,8 @@ def cluster_graph(
>>> import scranpy
>>> graph = scranpy.build_snn_graph(pcs)
>>> clust = scranpy.cluster_graph(graph)
>>> print(clust["membership"])
>>> import biocutils
>>> print(biocutils.table(clust["membership"]))
"""

graph = (x["vertices"], x["edges"], x["weights"])
Expand Down
14 changes: 8 additions & 6 deletions src/scranpy/cluster_kmeans.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,18 +66,19 @@ def cluster_kmeans(
Number of threads to use.

Returns:
A :py:class:`~biocutils.NamedList.NamedList` containing:
A :py:class:`~biocutils.NamedList.NamedList` containing the following entries.

- ``clusters``: an integer NumPy array containing the cluster assignment for each cell.
Values are integers in [0, N) where N is the total number of clusters.
- ``centers``: a double-precision NumPy matrix containing the coordinates of the cluster centroids.
Dimensions are in the rows while centers are in the columns.
- ``iterations``: integer specifying the number of refinement iterations that were performed.
- ``status``: convergence status.
Any non-zero value indicates a convergence failure though the exact meaning depends on the choice of ``refine_method``.
- For Lloyd, a value of 2 indicates convergence failure.
- For Hartigan-Wong, a value of 2 indicates convergence failure in the optimal transfer iterations.
A value of 4 indicates convergence failure in the quick transfer iterations when ``hartigan_wong_quit_quick_transfer_failure = True``.
Any non-zero value indicates a convergence failure though the exact meaning depends on the choice of ``refine_method``.

- For Lloyd, a value of 2 indicates convergence failure.
- For Hartigan-Wong, a value of 2 indicates convergence failure in the optimal transfer iterations.
A value of 4 indicates convergence failure in the quick transfer iterations when ``hartigan_wong_quit_quick_transfer_failure = True``.

References:
https://ltla.github.io/CppKmeans, which describes the various initialization and refinement algorithms in more detail.
Expand All @@ -87,7 +88,8 @@ def cluster_kmeans(
>>> pcs = numpy.random.rand(10, 200)
>>> import scranpy
>>> clust = scranpy.cluster_kmeans(pcs, k=3)
>>> print(clust["clusters"])
>>> import biocutils
>>> print(biocutils.table(clust["clusters"]))
"""

out = lib.cluster_kmeans(
Expand Down
6 changes: 4 additions & 2 deletions src/scranpy/combine_factors.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,9 @@ def combine_factors(factors: Union[dict, Sequence, biocutils.NamedList, biocfram
If any entry of ``factors`` is a :py:class:`~biocutils.Factor.Factor` object, any unused levels will also be preserved.

Returns:
:py:class:`~biocutils.NamedList.NamedList` containing:
:py:class:`~biocutils.NamedList.NamedList` containing the following entries.

- ``levels``: a :py:func:`~biocframe.BiocFrame.BiocFrame` containing the sorted and unique combinations of levels as a tuple.
- ``levels``: a :py:class:`~biocframe.BiocFrame.BiocFrame` containing the sorted and unique combinations of levels as a tuple.
Each column corresponds to a factor in ``factors`` while each row represents a unique combination.
Corresponding elements of each column define a single combination, i.e., the ``i``-th combination is defined by taking the ``i``-th element of each column.
- ``index``: an integer NumPy array specifying the index into ``levels`` for each observation.
Expand All @@ -49,6 +49,8 @@ def combine_factors(factors: Union[dict, Sequence, biocutils.NamedList, biocfram
>>> y = random.choices([True, False], k = 20)
>>> combined = scranpy.combine_factors({ "foo": x, "bar": y })
>>> print(combined["levels"])
>>> import biocutils
>>> print(biocutils.table(combined["index"]))
"""

if isinstance(factors, biocframe.BiocFrame):
Expand Down
2 changes: 1 addition & 1 deletion src/scranpy/compute_clrm1_factors.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ def compute_clrm1_factors(x: Any, num_threads: int = 1) -> numpy.ndarray:
Number of threads to use.

Returns:
Array containing the CLRm1 size factor for each cell.
Double-precision NumPy array containing the CLRm1 size factor for each cell.
Note that these size factors are not centered and should be passed through, e.g., :py:func:`~scranpy.center_size_factors.center_size_factors` before normalization.

References:
Expand Down
4 changes: 2 additions & 2 deletions src/scranpy/correct_mnn.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,9 +80,9 @@ def correct_mnn(
Number of threads to use.

Returns:
A :py:class:`~biocutils.NamedList.NamedList` containing:
A :py:class:`~biocutils.NamedList.NamedList` containing the following entries.

- ``corrected``, a double-precision NumPy array of the same dimensions as the ``x`` used in :py:func:`~correct_mnn`, containing the corrected values.
- ``corrected``, a double-precision NumPy array of the same dimensions as ``x``, containing the corrected values.

References:
https://libscran.github.io/mnncorrect, which describes the MNN correction algorithm in more detail.
Expand Down
16 changes: 8 additions & 8 deletions src/scranpy/crispr_quality_control.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,15 +21,15 @@ def compute_crispr_qc_metrics(x: Any, num_threads: int = 1) -> biocframe.BiocFra

Returns:
A :py:class:`~biocframe.BiocFrame.BiocFrame` with number of rows equal to the number of cells (i.e., columns) in ``x``.
It contains the following columns:
It contains the following columns.

- ``sum``, a double-precision NumPy array containing the sum of counts across all guides for each cell.
- ``detected``, an integer NumPy array containing the number of guides with non-zero counts in each cell.
- ``max_value``, a double-precision NumPy array containing the maximum count for each cell.
- ``max_index``, an integer NumPy array containing the row index of the guide with the maximum count in each cell.

References:
The ``compute_crispr_qc_metrics`` function in the `scran_qc <https://github.com/libscran/scran_qc>`_ C++ library, which describes the rationale behind these QC metrics.
The ``compute_crispr_qc_metrics`` function in the `scran_qc`_ C++ library, which describes the rationale behind these QC metrics.

Examples:
>>> import numpy
Expand Down Expand Up @@ -71,19 +71,19 @@ def suggest_crispr_qc_thresholds(
Number of MADs from the median to define the threshold for outliers in each QC metric.

Returns:
If ``block = None``, a :py:class:`~biocutils.NamedList.NamedList` is returned, containing:
If ``block = None``, a :py:class:`~biocutils.NamedList.NamedList` is returned, containing the following entries.

- ``max_value``, a number specifying the lower threshold on the maximum count in each cell.

If ``block`` is provided, the NamedList instead contains:
If ``block`` is provided, the ``NamedList`` instead contains:

- ``max_value``, a FloatList of length equal to the number of blocks (and named accordingly).
Each entry represents the lower threshold on the maximum count in the corresponding block.
- ``block_ids``, a list containing the unique levels of the blocking factor.
This is in the same order as the blocks in ``detected`` and ``subset_sum``.

References:
The ``compute_crispr_qc_filters`` and ``compute_crispr_qc_filters_blocked`` functions in the `scran_qc <https://github.com/libscran/scran_qc>`_ C++ library,
The ``compute_crispr_qc_filters`` and ``compute_crispr_qc_filters_blocked`` functions in the `scran_qc`_ C++ library,
which describes the rationale behind the suggested filters.

Examples:
Expand Down Expand Up @@ -138,16 +138,16 @@ def filter_crispr_qc_metrics(
The levels should be a subset of those used in :py:func:`~suggest_crispr_qc_thresholds`.

Returns:
A NumPy vector of length equal to the number of cells in ``metrics``, containing truthy values for putative high-quality cells.
A boolean NumPy vector of length equal to the number of cells in ``metrics``, containing truthy values for putative high-quality cells.

References:
The ``CrisprQcFilters`` and ``CrisprQcBlockedFilters`` functions in the `scran_qc <https://libscran.github.io/scran_qc>`_ C++ library.
The ``CrisprQcFilters`` and ``CrisprQcBlockedFilters`` functions in the `scran_qc`_ C++ library.

Examples:
>>> import numpy
>>> mat = numpy.reshape(numpy.random.poisson(lam=5, size=1000), (50, 20))
>>> import scranpy
>>> res = scranpy.compute_crispr_qc_metrics(mat, { "IgG": [ 1, 10, 20, 40 ] })
>>> res = scranpy.compute_crispr_qc_metrics(mat)
>>> filt = scranpy.suggest_crispr_qc_thresholds(res)
>>> keep = scranpy.filter_crispr_qc_metrics(filt, res)
>>> keep.sum()
Expand Down
Loading
Loading