@@ -6305,7 +6305,154 @@ xarray_ provides data structures inspired by the pandas ``DataFrame`` for workin
63056305with multi-dimensional datasets, with a focus on the netCDF file format and
63066306easy conversion to and from pandas.
63076307
6308- .. _xarray : https://xarray.pydata.org/en/stable/
6308+ .. _io.google_colab :
6309+
6310+ Google Colab
6311+ ------------
6312+
6313+ Google Colab is a popular cloud-based environment for running Python code,
6314+ including pandas operations. This section covers various methods to load data
6315+ into pandas DataFrames when working in Google Colab.
6316+
6317+ .. _io.google_colab.drive :
6318+
6319+ Reading from Google Drive
6320+ '''''''''''''''''''''''''
6321+
6322+ The most common approach is to mount your Google Drive, which allows you to
6323+ access files stored in Drive as if they were local files.
6324+
6325+ .. code-block :: python
6326+
6327+ from google.colab import drive
6328+ import pandas as pd
6329+
6330+ # Mount Google Drive
6331+ drive.mount(' /content/drive' )
6332+
6333+ # Read a CSV file from Google Drive
6334+ df = pd.read_csv(' /content/drive/MyDrive/path/to/your/file.csv' )
6335+
6336+ After running the mount command, you'll be prompted to authorize access to your
6337+ Google Drive. Once mounted, you can navigate to your files using the file browser
6338+ in the Colab sidebar and copy the path to use in pandas read functions.
6339+
6340+ This approach works with all pandas read functions:
6341+
6342+ .. code-block :: python
6343+
6344+ # Read Excel file
6345+ df = pd.read_excel(' /content/drive/MyDrive/data.xlsx' )
6346+
6347+ # Read JSON file
6348+ df = pd.read_json(' /content/drive/MyDrive/data.json' )
6349+
6350+ # Read Parquet file
6351+ df = pd.read_parquet(' /content/drive/MyDrive/data.parquet' )
6352+
6353+ .. _io.google_colab.upload :
6354+
6355+ Uploading files directly
6356+ '''''''''''''''''''''''''
6357+
6358+ For smaller files or one-time uploads, you can upload files directly from your
6359+ local machine:
6360+
6361+ .. code-block :: python
6362+
6363+ from google.colab import files
6364+ import pandas as pd
6365+ import io
6366+
6367+ # Upload file(s)
6368+ uploaded = files.upload()
6369+
6370+ # Read the uploaded CSV file
6371+ # Replace 'filename.csv' with your actual filename
6372+ df = pd.read_csv(io.BytesIO(uploaded[' filename.csv' ]))
6373+
6374+ .. note ::
6375+ Uploaded files are stored in the Colab session's temporary storage and will
6376+ be lost when the runtime disconnects.
6377+
6378+ .. _io.google_colab.url :
6379+
6380+ Reading from URLs
6381+ '''''''''''''''''
6382+
6383+ pandas can read files directly from URLs, which is useful for accessing data
6384+ from GitHub, public datasets, or other web sources:
6385+
6386+ .. code-block :: python
6387+
6388+ import pandas as pd
6389+
6390+ # Read CSV from a URL
6391+ url = ' https://raw.githubusercontent.com/user/repo/main/data.csv'
6392+ df = pd.read_csv(url)
6393+
6394+ # Read from GitHub
6395+ github_url = ' https://github.com/user/repo/raw/main/data.xlsx'
6396+ df = pd.read_excel(github_url)
6397+
6398+ .. _io.google_colab.gsheets :
6399+
6400+ Reading from Google Sheets
6401+ '''''''''''''''''''''''''''
6402+
6403+ You can read data directly from Google Sheets by making the sheet publicly
6404+ accessible and using its export URL:
6405+
6406+ .. code-block :: python
6407+
6408+ import pandas as pd
6409+
6410+ # Method 1: Using the sheet's export URL
6411+ sheet_id = ' your-sheet-id'
6412+ sheet_name = ' Sheet1'
6413+ url = f ' https://docs.google.com/spreadsheets/d/ { sheet_id} /gviz/tq?tqx=out:csv&sheet= { sheet_name} '
6414+ df = pd.read_csv(url)
6415+
6416+ For more advanced Google Sheets integration with authentication, consider using
6417+ the ``gspread `` library alongside pandas.
6418+
6419+ .. _io.google_colab.kaggle :
6420+
6421+ Reading Kaggle datasets
6422+ ''''''''''''''''''''''''
6423+
6424+ To access Kaggle datasets in Colab, you need to authenticate using your Kaggle
6425+ API credentials:
6426+
6427+ .. code-block :: python
6428+
6429+ # Upload your kaggle.json file
6430+ from google.colab import files
6431+ files.upload() # Select kaggle.json when prompted
6432+
6433+ # Setup Kaggle
6434+ ! mkdir - p ~ / .kaggle
6435+ ! cp kaggle.json ~ / .kaggle/
6436+ ! chmod 600 ~ / .kaggle/ kaggle.json
6437+
6438+ # Download a dataset
6439+ ! kaggle datasets download - d dataset- owner/ dataset- name
6440+ ! unzip dataset- name.zip
6441+
6442+ # Read the data
6443+ import pandas as pd
6444+ df = pd.read_csv(' datafile.csv' )
6445+
6446+ .. _io.google_colab.best_practices :
6447+
6448+ Best practices for Colab
6449+ '''''''''''''''''''''''''
6450+
6451+ - **For repeated use **: Mount Google Drive and store your data there
6452+ - **For small files **: Use the upload widget for quick one-time analysis
6453+ - **For public datasets **: Read directly from URLs when possible
6454+ - **For large files **: Consider using Parquet format for faster loading and smaller file sizes
6455+ - **Session management **: Remember that uploaded files and variables are lost when the runtime disconnects.. _xarray: https://xarray.pydata.org/en/stable/
63096456
63106457.. _io.perf :
63116458
0 commit comments