Updated readme to use aws CLI and fixed paths. Updated spark_sql version (#233)

cabeaulac · web-flow · commit 524c1a3984a7 · 2023-12-09T08:55:05.000-08:00
diff --git a/emr-serverless-spark/README.md b/emr-serverless-spark/README.md
@@ -5,12 +5,28 @@ We will run a Java Spark job on EMR Serverless using a simple Java "Hello World"
 ## Prerequisites
 
 * LocalStack
-* `aws` CLI & `awslocal` script
+* `aws` CLI
 * Docker
 * Java and Maven
 
 ## Installation
 
+### Configuring a custom profile
+Configure a custom profile to use with LocalStack. Add the following profile to your AWS configuration file (by default, this file is at ~/.aws/config):
+```shell
+[profile localstack]
+region=us-east-1
+output=json
+endpoint_url = http://localhost:4566
+```
+
+Add the following profile to your AWS credentials file (by default, this file is at ~/.aws/credentials):
+```shell
+[localstack]
+aws_access_key_id=test
+aws_secret_access_key=test
+```
+
 Before creating the EMR Serverless job, we need to create a JAR file containing the Java code. We have the `java-demo-1.0.jar` file in the current directory. Alternatively, you can create the JAR file yourself by following the steps below.
 
 ```bash
@@ -21,14 +37,15 @@ mvn package
 Next, we need to create an S3 bucket to store the JAR file. To do this, run the following command:
 
 ```bash
+cd ..
 export S3_BUCKET=test
-awslocal s3 mb s3://$S3_BUCKET
+aws s3 mb s3://$S3_BUCKET
 ```
 
 You can now copy the JAR file from your current directory to the S3 bucket:
 
 ```bash
-awslocal s3 cp java-demo-1.0.jar s3://${S3_BUCKET}/code/java-spark/ 
+aws s3 cp hello-world/target/java-demo-1.0.jar s3://${S3_BUCKET}/code/java-spark/java-demo-1.0.jar
 ```
 
 ## Creating the EMR Serverless Job
@@ -42,7 +59,7 @@ export JOB_ROLE_ARN=arn:aws:iam::000000000000:role/emr-serverless-job-role
 We can now create an EMR Serverless application, which will run Spark 3.3.0. Run the following command:
 
 ```bash
-awslocal emr-serverless create-application \
+aws emr-serverless create-application \
     --type SPARK \
     --name serverless-java-demo \
     --release-label "emr-6.9.0" \
@@ -73,7 +90,7 @@ export APPLICATION_ID='<application-id>'
 Start the EMR Serverless application:
 
 ```shell
-awslocal emr-serverless start-application \
+aws emr-serverless start-application \
     --application-id $APPLICATION_ID
 ```
 
@@ -82,7 +99,7 @@ awslocal emr-serverless start-application \
 You can now run the EMR Serverless job:
 
 ```bash
-awslocal emr-serverless start-job-run \
+aws emr-serverless start-job-run \
     --application-id $APPLICATION_ID \
     --execution-role-arn $JOB_ROLE_ARN \
     --job-driver '{
@@ -103,6 +120,7 @@ awslocal emr-serverless start-job-run \
 The Spark logs will be written to the S3 bucket specified in the `logUri` parameter. You can stop the EMR Serverless application with the following command:
 
 ```bash
-awslocal emr-serverless stop-application \
+aws emr-serverless stop-application \
     --application-id $APPLICATION_ID
+    
 ```
diff --git a/emr-serverless-spark/hello-world/pom.xml b/emr-serverless-spark/hello-world/pom.xml
@@ -8,8 +8,8 @@
     <dependencies>
       <dependency> <!-- Spark dependency -->
         <groupId>org.apache.spark</groupId>
-        <artifactId>spark-sql_2.12</artifactId>
-        <version>3.3.0</version>
+        <artifactId>spark-sql_2.13</artifactId>
+        <version>3.5.0</version>
         <scope>provided</scope>
       </dependency>
     </dependencies>