Skip to content

Commit 6ba2ff0

Browse files
authored
Release Spark 3.5-cpu-py312-v1.0 (#161)
* Update package to build with Python 3.12.11 and updated dependencies * Update documentation for consistency * Update formatting to make sure `make lint` passes
1 parent db6887e commit 6ba2ff0

File tree

17 files changed

+2085
-46
lines changed

17 files changed

+2085
-46
lines changed

.flake8

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,13 @@ exclude =
1212

1313
max-complexity = 10
1414

15-
ignore =
16-
C901, # Function is too complex
17-
D104, # Missing docstring in public package
18-
D107, # Allow init not to have a docstring.
19-
D105, # Allow magic methods (__repr__, __str__, ...) not to have docstrings.
20-
E203, # whitespace before ':': Black disagrees with and explicitly violates this.
21-
E501, # line too long -- follow black formatting instead.
22-
W503 # Line break occurred before a binary operator
15+
# C901: Function is too complex
16+
# D104: Missing docstring in public package
17+
# D107: Allow init not to have a docstring
18+
# D105: Allow magic methods not to have docstrings
19+
# E203: whitespace before ':' - Black disagrees with this
20+
# E501: line too long - follow black formatting instead
21+
# W503: Line break occurred before a binary operator
22+
ignore = C901,D104,D107,D105,E203,E501,W503
2323

2424
require-code = True

DEVELOPMENT.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
This document describes how to set up a development environment for developing, building, and testing the SageMaker Spark Container image.
33

44
## Development Environment Setup
5-
You’ll need to have python, pytest, docker, and docker-compose installed on your machine
5+
You’ll need to have python, pytest, and docker installed on your machine
66
and on your $PATH.
77

88
This repository uses GNU `make` to run build targets specified in `Makefile`. Consult the `Makefile` for the full list of build targets.
@@ -40,7 +40,7 @@ You may want to activate the Python environment in your `.bashrc` or `.zshrc`.
4040
-- [AmazonS3FullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonS3FullAccess) <br>
4141
-- [AmazonEC2ContainerRegistryFullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryFullAccess) <br>
4242

43-
5. [Create](https://docs.aws.amazon.com/cli/latest/reference/ecr/create-repository.html) an ECR repository with the name "sagemaker-spark" in the us-west-2 region
43+
5. [Create](https://docs.aws.amazon.com/cli/latest/reference/ecr/create-repository.html) an ECR repository with the name "sagemaker-spark-processing" in the us-west-2 region
4444

4545
6. Setup required environment variables for the container build:
4646
```
@@ -100,16 +100,16 @@ make build
100100

101101
Upon successful build, you will see two tags applied to the image. For example:
102102
```
103-
Successfully tagged sagemaker-spark:2.4-cpu-py37-v0.1
104-
Successfully tagged sagemaker-spark:latest
103+
Successfully tagged sagemaker-spark-processing:2.4-cpu-py37-v0.1
104+
Successfully tagged sagemaker-spark-processing:latest
105105
```
106106

107107
2. To verify that the image is available in your local docker repository, run `docker images`. You should see an image with two tags. For example:
108108
```
109109
✗ docker images
110110
REPOSITORY TAG IMAGE ID CREATED SIZE
111-
sagemaker-spark 2.4-cpu-py37-v0.1 a748a6e042d2 5 minutes ago 3.06GB
112-
sagemaker-spark latest a748a6e042d2 5 minutes ago 3.06GB
111+
sagemaker-spark-processing 2.4-cpu-py37-v0.1 a748a6e042d2 5 minutes ago 3.06GB
112+
sagemaker-spark-processing latest a748a6e042d2 5 minutes ago 3.06GB
113113
```
114114

115115
### Running Local Tests

Makefile

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ SHELL := /bin/sh
99
ifeq ($(IS_RELEASE_BUILD),)
1010
SPARK_VERSION := 3.5
1111
PROCESSOR := cpu
12-
FRAMEWORK_VERSION := py39
12+
FRAMEWORK_VERSION := py312
1313
SM_VERSION := 1.0
1414
USE_CASE := processing
1515
BUILD_CONTEXT := ./spark/${USE_CASE}/${SPARK_VERSION}/py3
@@ -31,16 +31,17 @@ all: build test
3131

3232
init:
3333
python --version
34-
pip install --upgrade pip
35-
# pipenv > 2022.4.8 fails to build smspark
36-
python -m pip install pipenv==2022.4.8
34+
python -m ensurepip --upgrade
35+
python -m pip install --upgrade setuptools
36+
python -m pip install pipenv
3737
cp smsparkbuild/${FRAMEWORK_VERSION}/Pipfile .
3838
cp smsparkbuild/${FRAMEWORK_VERSION}/pyproject.toml .
3939
cp smsparkbuild/${FRAMEWORK_VERSION}/setup.py .
4040
pipenv install
4141
cp Pipfile ${BUILD_CONTEXT}
4242
cp Pipfile.lock ${BUILD_CONTEXT}
4343
cp setup.py ${BUILD_CONTEXT}
44+
cp VERSION ${BUILD_CONTEXT}
4445

4546
# Builds and moves container python library into the Docker build context
4647
build-container-library: init
@@ -160,7 +161,7 @@ install-sdk:
160161

161162
# Makes sure docker containers are cleaned
162163
clean:
163-
docker-compose down || true
164+
docker compose down || true
164165
docker kill $$(docker ps -q) || true
165166
docker rm $$(docker ps -a -q) || true
166167
docker network rm $$(docker network ls -q) || true

available_images.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ name and version tag into the repository URL. For example:
3636

3737

3838

39-
173754725891.dkr.ecr.<region>.amazonaws.com/sagemaker-spark-processing:2.4-cpu-py37-v1.0
39+
173754725891.dkr.ecr.<region>.amazonaws.com/sagemaker-spark-processing:3.5-cpu-py310-v1.0
4040

4141
**Important**
4242

buildspec.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@ version: 0.2
22

33
phases:
44
install:
5+
runtime-versions:
6+
python: 3.12
57
commands:
68
- start-dockerd
79
build:

docker-compose.yml

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,10 @@
1-
version: "3.7"
2-
31
networks:
42
spark:
53
name: spark-network
64

75
services:
86
algo-1:
9-
image: sagemaker-spark:latest
7+
image: sagemaker-spark-processing:latest
108
# Spark does a reverse DNS lookup (IP address to hostname)
119
# By default in docker-compose the hostname is identical to the container name
1210
container_name: algo-1
@@ -21,7 +19,7 @@ services:
2119
- ${JARS_MOUNT:-/dev/null:/opt/ml/processing/input/jars}
2220
command: $CMD
2321
algo-2:
24-
image: sagemaker-spark:latest
22+
image: sagemaker-spark-processing:latest
2523
container_name: algo-2
2624
hostname: algo-2
2725
networks:

new_images.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,5 +3,5 @@ new_images:
33
- spark: "3.5"
44
use-case: "processing"
55
processors: ["cpu"]
6-
python: ["py39"]
6+
python: ["py312"]
77
sm_version: "1.0"

scripts/build.sh

Lines changed: 33 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -26,11 +26,38 @@ aws ecr get-login-password --region us-west-2 | docker login --username AWS --pa
2626

2727
echo "building image ${version} ... "
2828
echo "building image under ${build_context}/docker/${framework_version} ... "
29-
docker build \
30-
-f ${build_context}/docker/${framework_version}/Dockerfile.${processor} \
31-
-t ${repository}:${version} \
32-
--build-arg REGION=${REGION} \
33-
-t sagemaker-spark:latest \
34-
${build_context}
3529

30+
# Check if running on Mac with ARM architecture
31+
if [[ "$(uname)" == "Darwin" ]] && [[ "$(uname -m)" == "arm64" ]]; then
32+
echo "Detected Mac with ARM architecture, using buildx for cross-platform build..."
33+
34+
# Create a new builder instance if it doesn't exist
35+
if ! docker buildx inspect multi-platform-builder &>/dev/null; then
36+
echo "Creating new buildx builder 'multi-platform-builder'..."
37+
docker buildx create --name multi-platform-builder --use
38+
else
39+
echo "Using existing buildx builder 'multi-platform-builder'..."
40+
docker buildx use multi-platform-builder
41+
fi
42+
43+
# Build for amd64 architecture (x86_64)
44+
echo "Building for amd64 architecture..."
45+
docker buildx build \
46+
--platform linux/amd64 \
47+
--output type=docker \
48+
-f ${build_context}/docker/${framework_version}/Dockerfile.${processor} \
49+
-t ${repository}:${version} \
50+
--build-arg REGION=${REGION} \
51+
-t sagemaker-spark-${use_case}:latest \
52+
${build_context} 2>&1 | tee "build_log.txt"
53+
else
54+
# On Linux x86 or other platforms, use the original build command
55+
echo "Using standard docker build..."
56+
docker build \
57+
-f ${build_context}/docker/${framework_version}/Dockerfile.${processor} \
58+
-t ${repository}:${version} \
59+
--build-arg REGION=${REGION} \
60+
-t sagemaker-spark-${use_case}:latest \
61+
${build_context} 2>&1 | tee "build_log.txt"
62+
fi
3663
docker logout https://137112412989.dkr.ecr.us-west-2.amazonaws.com

smsparkbuild/py312/Pipfile

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
[[source]]
2+
name = "pypi"
3+
url = "https://pypi.org/simple"
4+
verify_ssl = true
5+
6+
[dev-packages]
7+
8+
[packages]
9+
black = "==25.1.0"
10+
boto3 = "==1.38.38"
11+
click = "==8.1.8"
12+
cryptography = "==45.0.4"
13+
docker = "==7.1.0"
14+
flake8 = "==7.2.0"
15+
flake8-docstrings = "==1.7.0"
16+
importlib-metadata = "==6.11.0"
17+
mypy = "==1.16.1"
18+
numpy = "==1.26.4"
19+
psutil = "==6.1.1"
20+
py = "==1.11.0"
21+
pyasn1 = "==0.6.1"
22+
pytest = "==8.4.0"
23+
pytest-cov = "==6.2.1"
24+
pytest-parallel = "==0.1.1"
25+
pytest-rerunfailures = "==15.1"
26+
pytest-xdist = "==3.7.0"
27+
pyyaml = "==6.0.2"
28+
regex = "==2024.11.6"
29+
requests = "==2.32.4"
30+
rsa = "==4.9.1"
31+
safety = "==3.5.2"
32+
sagemaker = "==2.247.0"
33+
smspark = {editable = true, path = "."}
34+
tenacity = "==9.1.2"
35+
types-requests = "==2.32.4.20250611"
36+
types-waitress = "==3.0.1.20241117"
37+
typing-extensions = "==4.14.0"
38+
waitress = "==3.0.2"
39+
watchdog = "==6.0.0"
40+
41+
[requires]
42+
python_version = "3.12"

0 commit comments

Comments
 (0)