Boto3 is an AWS SDK for Python. It allows users to create, and manage AWS services such as EC2 and S3. It provides an object oriented API services and low level services to the AWS services.
S3 is a simple Storage Service which allows you to store files as objects. It also know as object based storage service.
In this tutorial, you’ll learn how to open S3 object as String with Boto3 by using the proper file encodings.
UTF-8 is the commonly used encoding system for the text files. It support all the special characters in various language such as German umlauts Ä. These special characters are considered as Multibyte characters.
When a file is encoded using a specific encoding, then while reading the file, you need to specify that encoding to decode the file contents. Then only you’ll be able to see all the special characters without any problem.
[Optional]. When you store a file in S3, you can set the encoding using the file Metadata option.
Edit metadata of file using the steps shown below.
You’ll be taken to the file metadata screen.
The system defined metadata will be available by default with key as content-type and value as text/plain.
You can add the encoding by selecting the Add metadata option. Select System Defined Type and Key as content-encoding and value as utf-8 as shown below.
You’ve set the encoding for your file objects in S3.
Now you’ll read how to read file from S3.
Reading File as String From S3
In this section, you’ll read file as string from S3 with encoding as UTF-8.
First you’ll create a session with Boto3 by using the AWS Access key id and Secret access key.
Then create an S3 resource with the Boto3 session. Then you’ll create an S3 object to represent the AWS S3 Object by using your bucket name and objectname.
Now, with the get() action of this object, you can retrieve the S3 Object body using the [‘body’] argument.
It’ll give you a http response.
This http response can be read using the read() and decoded using the UTF-8 encoding as shown below.
import boto3 #Creating Session With Boto3. session = boto3.Session( aws_access_key_id='Your Access Key ID', aws_secret_access_key='You Secret access key' ) #Creating S3 Resource From the Session. s3 = session.resource('s3') #Creating Object From the S3 Resource. obj = s3.Object('Your_bucket_name', 'You File Object Name/Key') #Reading the File as String With Encoding file_content = obj.get()['Body'].read().decode('utf-8') # Printing the File print(file_content)
When you execute the above script, you’ll see the contents of the files printed.
This is a test file to demonstrate file reading functionality from aws S3 bucket.
You’ve read the file as a string. Next, you’ll read file line by line.
Reading S3 File Line by Line
In this section, you’ll read a file from S3 line by line using the iter_lines() method.
You’ll first read the file to the S3 object by using the Boto3 session and resource. Next, you’ll iterate the Object body using the iter_lines() method.
import boto3 #Creating Session With Boto3. session = boto3.Session( aws_access_key_id='Your Access Key ID', aws_secret_access_key='You Secret access key' ) s3 = session.resource('s3') obj = s3.Object('Your_bucket_name', 'You File Object Name/Key') for line in obj.get()['Body'].iter_lines(): print(line.decode('utf-8'))
In the print method, line object is decoded using UTF-8 to appropriately decode the line. Because you’ve encoded the file in the previous step of this tutorial. If you did not specify the decode, you’ll see character ‘b’ prefixed with every line you print.
When you execute the above script, it’ll print the contents of the file line by line as shown below.
This is the first line of the file. this is the second line of the file.
You’ve read the file line by line with proper encoding and decoding.
You’ve learnt how to open s3 object as string with Boto3 and also learnt how to read a file line by line using Boto3.
You may also like How to Download Files From S3 Using Boto3[Python]?
<Watch this space for more updates on Blog>
How do I get rid of the b-prefix in a string in python?
You need to decode the line with the proper encoding name while you print the line. For e.g. print(line.decode(‘utf-8’)) to decode the line using UTF-8 encoding.