Skip to content

Conversation

@gavindeeppahl
Copy link
Collaborator

Description

This pr introduces a stand alone function which allows for basic unit test code creation, via utilisation of a simple config dict, and uses CSVs files as inputs. The output after running is a new .py file containing the script for the testing.

Improvements available on request:

  • I can add extensive commenting to the file if required
  • Currently the script outputs a generic test function at the end of the .py file, this can be improved if I have a template of what's preferable
  • The column_type override is currently limited to 'string' & 'float', this can be extended to other types
  • A different output directory can be added for the outputted .py file, currently it goes to the same location as the input files folder

Peer review

Any new code includes all the following:

  • Documentation: docstrings, comments have been added/ updated.
  • Style guidelines: New code conforms to the project's contribution guidelines.
  • Functionality: The code works as expected, handles expected edge cases and exceptions are handled appropriately.
  • Complexity: The code is not overly complex, logic has been split into appropriately sized functions, etc..
  • Test coverage: Unit tests cover essential functions for a reasonable range of inputs and conditions. Added and existing tests pass on my machine.

Review comments

Suggestions should be tailored to the code that you are reviewing. Provide context.
Be critical and clear, but not mean. Ask questions and set actions.

These might include:
  • bugs that need fixing (does it work as expected? and does it work with other code
    that it is likely to interact with?)
  • alternative methods (could it be written more efficiently or with more clarity?)
  • documentation improvements (does the documentation reflect how the code actually works?)
  • additional tests that should be implemented
    • Do the tests effectively assure that it
      works correctly? Are there additional edge cases/ negative tests to be considered?
  • code style improvements (could the code be written more clearly?)

Further reading: code review best practices

@gavindeeppahl gavindeeppahl added the enhancement New feature or request label Jul 30, 2024
@gavindeeppahl gavindeeppahl requested a review from AnneONS July 30, 2024 15:32
@gavindeeppahl gavindeeppahl self-assigned this Jul 30, 2024
@dombean
Copy link
Member

dombean commented Sep 2, 2024

@gavindeeppahl & @AnneONS:

Here are a couple of my thoughts:

  1. Refactor main() function to accept parameters instead of hard-coding them. This makes the function more flexible and reusable.

  2. Use argparse for command-line arguments: Add argparse to handle command-line arguments, allowing users to specify their own parameters when running the script. I guess this is only necessary if you want to call from command line and not script.

  3. Remove if __name__ == "__main__" command function and call it with their own arguments from another script.

I'm assuming you'd want to run with users calling own arguments from another script? So you can ignore argparse example.

import argparse
import logging
from pathlib import Path
import json


def main(csv_path: str, files: list, function_name: str, column_type_override: dict) -> None:
    """Initialise configuration and process CSV files for unit testing.

    This function sets up the configuration with paths, filenames, and function names,
    and then calls `process_dataframe` to handle the CSV files and generate the test
    code.

    Parameters
    ----------
    csv_path : str
        The path to the directory containing the CSV files.
    files : list
        A list of filenames to process.
    function_name : str
        The name of the function to generate tests for.
    column_type_override : dict
        A dictionary to override column types.

    Returns
    -------
    None
    """
    config = Config(
        csv_path=csv_path,
        files=files,
        function_name=function_name,
        column_type_override=column_type_override,
    )

    process_dataframe(config)

def run_from_command_line():
    parser = argparse.ArgumentParser(description="Process CSV files for unit testing.")
    parser.add_argument("--csv_path", type=str, required=True, help="Path to the CSV files directory.")
    parser.add_argument("--files", nargs='+', required=True, help="List of CSV filenames.")
    parser.add_argument("--function_name", type=str, required=True, help="Name of the function to generate tests for.")
    parser.add_argument("--column_type_override", type=str, required=True, help="Column type overrides in JSON format.")

    args = parser.parse_args()

    # Convert column_type_override from JSON string to dictionary
    column_type_override = json.loads(args.column_type_override)

    main(args.csv_path, args.files, args.function_name, column_type_override)

# Example usage:
# if __name__ == "__main__":
#     run_from_command_line()

Usage Instructions

Option 1: Running from the Command Line

Users can run the script from the command line with their own parameters:

python -m rdsa_utils.helpers.unit_test_writer --csv_path "path/to/csv" --files "input1.csv" "expected_output.csv" "fail_output.csv" --function_name "new_function" --column_type_override '{"string": ["period", "reference"], "float": ["602"]}'

Option 2: Creating a Custom Script

Users can create their own Python script to call the main() function with custom arguments:

from rdsa_utils.helpers.unit_test_writer import main

main(
    csv_path="path/to/csv",
    files=["input1.csv", "expected_output.csv", "fail_output.csv"],
    function_name="new_function",
    column_type_override={"string": ["period", "reference"], "float": ["602"]}
)

This approach provides flexibility and allows users to customise the parameters as needed.

Copy link
Collaborator

@AnneONS AnneONS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments, I'm now going to continue my review working inside VS Code :-)

Copy link
Collaborator

@AnneONS AnneONS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments- mostly about how we identify the type from the csv. But otherwise I'm happy this is ready to go when we've addressed Dom's comments. I spoke to him about how to run it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants