Skip to content

Commit 3bc7d37

Browse files
Update doc and samples for new features.
1 parent 5bef1e4 commit 3bc7d37

File tree

10 files changed

+170
-66
lines changed

10 files changed

+170
-66
lines changed

THIRD_PARTY_LICENSES.txt

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -697,3 +697,29 @@ software distributed under the License is distributed on an
697697
KIND, either express or implied. See the License for the
698698
specific language governing permissions and limitations
699699
under the License.
700+
701+
___________________________________________________________________________________________
702+
703+
Python dataframe interchange protocol
704+
705+
MIT License
706+
707+
Copyright (c) 2020 Consortium for Python Data API Standards contributors
708+
709+
Permission is hereby granted, free of charge, to any person obtaining a copy
710+
of this software and associated documentation files (the "Software"), to deal
711+
in the Software without restriction, including without limitation the rights
712+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
713+
copies of the Software, and to permit persons to whom the Software is
714+
furnished to do so, subject to the following conditions:
715+
716+
The above copyright notice and this permission notice shall be included in all
717+
copies or substantial portions of the Software.
718+
719+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
720+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
721+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
722+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
723+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
724+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
725+
SOFTWARE.

doc/src/release_notes.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,8 @@ Common Changes
9696
:meth:`AsyncConnection.fetch_df_batches()` to fetch data as DataFrames
9797
compliant with the Python DataFrame Interchange protocol. See
9898
:ref:`dataframeformat`.
99-
#) Added support for Oracle Database 23.7 SPARSE vectors.
99+
#) Added support for Oracle Database 23.7
100+
:ref:`SPARSE vectors <sparsevectors>`.
100101
#) Added support for :ref:`naming and caching connection pools
101102
<connpoolcache>` during creation, and retrieving them later from the
102103
python-oracledb pool cache with :meth:`oracledb.get_pool()`.

doc/src/user_guide/sql_execution.rst

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -743,12 +743,13 @@ Fetching using the DataFrame Interchange Protocol
743743

744744
Python-oracledb can fetch directly to the `Python DataFrame Interchange
745745
Protocol <https://data-apis.org/dataframe-protocol/latest/index.html>`__
746-
format. This then allows zero-copy data interchanges between Python data frame
747-
libraries. It is an efficient way to work with data using Python libraries such
748-
as `Apache Arrow <https://arrow.apache.org/>`__, `Pandas
749-
<https://pandas.pydata.org>`__, `Polars <https://pola.rs/>`__, `NumPy
750-
<https://numpy.org/>`__, `PyTorch <https://pytorch.org/>`__, or to write files
751-
in `Apache Parquet <https://parquet.apache.org/>`__ format.
746+
format. This can reduce application memory requirements and allow zero-copy
747+
data interchanges between Python data frame libraries. It is an efficient way
748+
to work with data using Python libraries such as `Apache Arrow
749+
<https://arrow.apache.org/>`__, `Pandas <https://pandas.pydata.org>`__, `Polars
750+
<https://pola.rs/>`__, `NumPy <https://numpy.org/>`__, `PyTorch
751+
<https://pytorch.org/>`__, or to write files in `Apache Parquet
752+
<https://parquet.apache.org/>`__ format.
752753

753754
.. note::
754755

@@ -914,7 +915,6 @@ org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame>`__ is:
914915
odf = connection.fetch_df_all(statement=sql, parameters=[myid], arraysize=1000)
915916
916917
# Get a Pandas DataFrame from the data.
917-
# This is a zero copy call
918918
df = pandas.api.interchange.from_dataframe(odf)
919919
920920
# Perform various Pandas operations on the DataFrame

doc/src/user_guide/vector_data_type.rst

Lines changed: 55 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -218,14 +218,30 @@ Using SPARSE Vectors
218218
====================
219219

220220
A Sparse vector is a vector which has zero value for most of its dimensions.
221-
This vector only physically stores the non-zero values. A sparse vector is
222-
supported when you are using Oracle Database 23.7 or later.
221+
This vector only physically stores the non-zero values. For more information
222+
on sparse vectors, see the `Oracle AI Vector search User's Guide <https://
223+
www.oracle.com/pls/topic/lookup?ctx=dblatest&id=GUID-6015566C-3277-4A3C-8DD0-
224+
08B346A05478>`__.
223225

224-
Sparse vectors can store the total number of dimensions, an array of indices,
225-
and an array of values. The storage formats that can be used with sparse
226-
vectors are float32, float64, and int8. Note that the binary storage format
227-
cannot be used with sparse vectors. You can define a column for a sparse
228-
vector using the following format::
226+
Sparse vectors are supported when you are using Oracle Database 23.7 or later.
227+
228+
Sparse vectors are represented by the total number of vector dimensions, an
229+
array of indices, and an array of values where each value's location in the
230+
vector is indicated by the corresponding indices array position. All other
231+
vector values are treated as zero. The storage formats that can be used with
232+
sparse vectors are float32, float64, and int8. Note that the binary storage
233+
format cannot be used with sparse vectors.
234+
235+
For example, a string representation could be::
236+
237+
[25, [5, 8, 11], [25.25, 6.125, 8.25]]
238+
239+
In this example, the sparse vector has 25 dimensions. Only indices 5, 8, and 11
240+
have values which are 25.25, 6.125, and 8.25 respectively. All of the other
241+
values are zero.
242+
243+
In Oracle Database, you can define a column for a sparse vector using the
244+
following format::
229245

230246
VECTOR(number_of_dimensions, dimension_storage_format, sparse)
231247

@@ -239,7 +255,7 @@ For example, to create a table with three columns for sparse vectors:
239255
int8sparsecol vector(35, int8, sparse)
240256
)
241257
242-
In this example the:
258+
In this example:
243259

244260
- The float32sparsecol column can store sparse vector data of 25 dimensions
245261
where each dimension value is a 32-bit floating-point number.
@@ -256,18 +272,9 @@ Inserting SPARSE Vectors
256272
------------------------
257273

258274
With python-oracledb, sparse vector data can be inserted using
259-
:ref:`SparseVector objects <sparsevectorsobj>`. You can specify the number of
260-
dimensions, an array of indices, and an array of values as the data for a
261-
sparse vector. For example, the string representation is::
262-
263-
[25, [5,8,11], [25.25, 6.125, 8.25]]
264-
265-
In this example, the sparse vector has 25 dimensions. Only indices 5, 8, and
266-
11 have values 25.25, 6.125, and 8.25 respectively. All of the other values
267-
are zero.
268-
269-
The SparseVector objects are used as bind values when inserting sparse vector
270-
columns. For example:
275+
:ref:`SparseVector objects <sparsevectorsobj>`. The SparseVector objects are
276+
used when fetching vectors, and as bind values when inserting sparse vector
277+
columns. For example to insert data:
271278

272279
.. code-block:: python
273280
@@ -289,7 +296,7 @@ columns. For example:
289296
)
290297
291298
cursor.execute(
292-
"insert into vector_sparse_table (:1, :2, :3)",
299+
"insert into vector_sparse_table values (:1, :2, :3)",
293300
[float32_val, float64_val, int8_val]
294301
)
295302
@@ -298,23 +305,43 @@ columns. For example:
298305
Fetching Sparse Vectors
299306
-----------------------
300307

301-
With python-oracledb, sparse vector columns are fetched in the same format
302-
accepted by Oracle Database by using the str() function. For example:
308+
With python-oracledb, sparse vector columns are fetched as :ref:`SparseVector
309+
objects <sparsevectorsobj>`:
303310

304311
.. code-block:: python
305312
306-
cursor.execute("select * from vec_sparse")
313+
cursor.execute("select * from vector_sparse_table")
314+
for row in cursor:
315+
print(row)
316+
317+
318+
This prints::
319+
320+
(oracledb.SparseVector(25, array('I', [6, 10, 18]), array('f', [26.25, 129.625, 579.875])),
321+
oracledb.SparseVector(30, array('I', [9, 16, 24]), array('d', [19.125, 78.5, 977.375])),
322+
oracledb.SparseVector(35, array('I', [10, 20, 30]), array('b', [26, 125, -37])))
323+
324+
Depending on context, the SparseVector type will be treated as a string:
325+
326+
.. code-block:: python
327+
328+
cursor.execute("select * from vector_sparse_table")
307329
for float32_val, float64_val, int8_val in cursor:
308-
print("float32:", str(float32_val))
309-
print("float64:", str(float64_val))
310-
print("int8:", str(int8_val))
330+
print("float32:", float32_val)
331+
print("float64:", float64_val)
332+
print("int8:", int8_val)
311333
312-
This prints the following output::
334+
This prints::
313335

314336
float32: [25, [6, 10, 18], [26.25, 129.625, 579.875]]
315337
float64: [30, [9, 16, 24], [19.125, 78.5, 977.375]]
316338
int8: [35, [10, 20, 30], [26, 125, -37]]
317339

340+
Values can also be explicitly passed to `str()
341+
<https://docs.python.org/3/library/stdtypes.html#str>`__, if needed.
342+
343+
**SPARSE Vector Metadata**
344+
318345
The :ref:`FetchInfo <fetchinfoobj>` object that is returned as part of the
319346
fetched metadata contains attributes :attr:`FetchInfo.vector_dimensions`,
320347
:attr:`FetchInfo.vector_format`, and :attr:`FetchInfo.vector_is_sparse` which

samples/create_schema.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@
5454
sample_env.run_sql_script(
5555
conn, "create_schema_21", main_user=sample_env.get_main_user()
5656
)
57-
if sample_env.get_server_version() >= (23, 5):
57+
if sample_env.get_server_version() >= (23, 7):
5858
sample_env.run_sql_script(
5959
conn, "create_schema_23", main_user=sample_env.get_main_user()
6060
)

samples/dataframe_pandas.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,6 @@
5151
odf = connection.fetch_df_all(statement=SQL, arraysize=100)
5252

5353
# Get a Pandas DataFrame from the data.
54-
# This is a zero copy call
5554
df = pandas.api.interchange.from_dataframe(odf)
5655

5756
# Perform various Pandas operations on the DataFrame

samples/dataframe_pandas_async.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,6 @@ async def main():
5555
odf = await connection.fetch_df_all(statement=SQL, arraysize=100)
5656

5757
# Get a Pandas DataFrame from the data.
58-
# This is a zero copy call
5958
df = pandas.api.interchange.from_dataframe(odf)
6059

6160
# Perform various Pandas operations on the DataFrame

samples/sql/create_schema_23.sql

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -27,15 +27,16 @@
2727
*
2828
* Performs the actual work of creating and populating the schemas with the
2929
* database objects used by the python-oracledb samples that require Oracle
30-
* Database 23.5 or higher. It is executed by the Python script
30+
* Database 23.7 or higher. It is executed by the Python script
3131
* create_schema.py.
3232
*---------------------------------------------------------------------------*/
3333

3434
create table &main_user..SampleVectorTab (
35-
v32 vector(3, float32),
36-
v64 vector(3, float64),
37-
v8 vector(3, int8),
38-
vbin vector(24, binary)
35+
v32 vector(3, float32),
36+
v64 vector(3, float64),
37+
v8 vector(3, int8),
38+
vbin vector(24, binary),
39+
v64sparse vector(30, float64, sparse)
3940
)
4041
/
4142

samples/vector.py

Lines changed: 47 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -45,16 +45,21 @@
4545
params=sample_env.get_connect_params(),
4646
)
4747

48-
# this script only works with Oracle Database 23.5 or later
49-
if sample_env.get_server_version() < (23, 5):
50-
sys.exit("This example requires Oracle Database 23.5 or later.")
48+
# this script only works with Oracle Database 23.7 or later
49+
#
50+
# The VECTOR datatype was initially introduced in Oracle Database 23.4.
51+
# The BINARY vector format was introduced in Oracle Database 23.5.
52+
# The SPARSE vector format was introduced in Oracle Database 23.7.
53+
54+
if sample_env.get_server_version() < (23, 7):
55+
sys.exit("This example requires Oracle Database 23.7 or later.")
5156

52-
# this script works with thin mode, or with thick mode using Oracle Client 23.5
57+
# this script works with thin mode, or with thick mode using Oracle Client 23.7
5358
# or later
54-
if not connection.thin and oracledb.clientversion()[:2] < (23, 5):
59+
if not connection.thin and oracledb.clientversion()[:2] < (23, 7):
5560
sys.exit(
5661
"This example requires python-oracledb thin mode, or Oracle Client"
57-
" 23.5 or later"
62+
" 23.7 or later"
5863
)
5964

6065
with connection.cursor() as cursor:
@@ -63,38 +68,66 @@
6368
vector1_data_64 = array.array("d", [11.25, 11.75, 11.5])
6469
vector1_data_8 = array.array("b", [1, 2, 3])
6570
vector1_data_bin = array.array("B", [180, 150, 100])
71+
vector1_data_sparse64 = oracledb.SparseVector(
72+
30, [9, 16, 24], array.array("d", [19.125, 78.5, 977.375])
73+
)
6674

6775
cursor.execute(
68-
"""insert into SampleVectorTab (v32, v64, v8, vbin)
69-
values (:1, :2, :3, :4)""",
70-
[vector1_data_32, vector1_data_64, vector1_data_8, vector1_data_bin],
76+
"""insert into SampleVectorTab (v32, v64, v8, vbin, v64sparse)
77+
values (:1, :2, :3, :4, :5)""",
78+
[
79+
vector1_data_32,
80+
vector1_data_64,
81+
vector1_data_8,
82+
vector1_data_bin,
83+
vector1_data_sparse64,
84+
],
7185
)
7286

7387
# Multi-row insert
7488
vector2_data_32 = array.array("f", [2.625, 2.5, 2.0])
7589
vector2_data_64 = array.array("d", [22.25, 22.75, 22.5])
7690
vector2_data_8 = array.array("b", [4, 5, 6])
7791
vector2_data_bin = array.array("B", [40, 15, 255])
92+
vector2_data_sparse64 = oracledb.SparseVector(
93+
30, [3, 10, 12], array.array("d", [2.5, 2.5, 1.0])
94+
)
7895

7996
vector3_data_32 = array.array("f", [3.625, 3.5, 3.0])
8097
vector3_data_64 = array.array("d", [33.25, 33.75, 33.5])
8198
vector3_data_8 = array.array("b", [7, 8, 9])
8299
vector3_data_bin = array.array("B", [0, 17, 101])
100+
vector3_data_sparse64 = oracledb.SparseVector(
101+
30, [8, 15, 29], array.array("d", [1.125, 200.5, 100.0])
102+
)
83103

84104
rows = [
85-
(vector2_data_32, vector2_data_64, vector2_data_8, vector2_data_bin),
86-
(vector3_data_32, vector3_data_64, vector3_data_8, vector3_data_bin),
105+
(
106+
vector2_data_32,
107+
vector2_data_64,
108+
vector2_data_8,
109+
vector2_data_bin,
110+
vector2_data_sparse64,
111+
),
112+
(
113+
vector3_data_32,
114+
vector3_data_64,
115+
vector3_data_8,
116+
vector3_data_bin,
117+
vector3_data_sparse64,
118+
),
87119
]
88120

89121
cursor.executemany(
90-
"""insert into SampleVectorTab (v32, v64, v8, vbin)
91-
values (:1, :2, :3, :4)""",
122+
"""insert into SampleVectorTab (v32, v64, v8, vbin, v64sparse)
123+
values (:1, :2, :3, :4, :5)""",
92124
rows,
93125
)
94126

95127
# Query
96128
cursor.execute("select * from SampleVectorTab")
97129

98-
# Each vector is represented as an array.array type
130+
# Each non-sparse vector is represented as an array.array type.
131+
# Sparse vectors are represented as oracledb.SparseVector() instances
99132
for row in cursor:
100133
print(row)

0 commit comments

Comments
 (0)