Merge pull request #5 from saisandeepramavath/nithikesh

saisandeepramavath · web-flow · commit 635eba3c25ac · 2025-12-01T17:01:03.000-05:00
Nithikesh
diff --git a/courseProjectDocs/integration-testing/README.md b/courseProjectDocs/integration-testing/README.md
@@ -0,0 +1,39 @@
+# Integration Testing - Instructions to Run Tests
+
+## Test File Location
+
+Integration tests are located in: **`pandas/tests/util/test_integration.py`**
+
+## Prerequisites
+
+```bash
+# Navigate to project directory
+cd /Volumes/T7Shield/SWEN777/SWEN_777_Pandas
+
+# Activate virtual environment
+source venv/bin/activate
+```
+
+## How to Run Tests to Reproduce Results
+
+### Run All Integration Tests
+
+```bash
+python -m pytest pandas/tests/util/test_integration.py -v
+```
+
+**Expected Output:**
+```
+collected 6 items
+
+pandas/tests/util/test_integration.py::TestSandeepIntegration::test_series_to_dataframe_dtype_preservation PASSED
+pandas/tests/util/test_integration.py::TestSandeepIntegration::test_dataframe_from_dict_mixed_series_dtypes PASSED
+pandas/tests/util/test_integration.
+py::TestNithikeshIntegration::test_validate_fillna_with_clean_method PASSED
+pandas/tests/util/test_integration.py::TestNithikeshIntegration::test_series_fillna_integration PASSED
+pandas/tests/util/test_integration.
+py::TestMallikarjunaIntegration::test_check_dtype_backend_with_lib_sentinel PASSED
+pandas/tests/util/test_integration.py::TestMallikarjunaIntegration::test_percentile_validation_with_numpy_arrays PASSED
+
+=================================== 6 passed in 0.94s
+```
diff --git a/courseProjectDocs/integration-testing/report.md b/courseProjectDocs/integration-testing/report.md
@@ -0,0 +1,126 @@
+# Integration Testing Report
+
+## Test Design Summary
+
+This integration testing effort focuses on verifying interactions between multiple pandas modules. The tests are organized into three areas, each covering at least 2 module interactions:
+
+### Test 1: Series-DataFrame-Dtype Integration (Sandeep Ramavath)
+**Modules Integrated:**
+- `pandas.core.series` (Series class)
+- `pandas.core.frame` (DataFrame class)
+- `pandas.core.internals` (internal data managers)
+- `pandas.core.dtypes` (data type handling)
+
+**Interactions Tested:**
+1. **Series.to_frame()**: Tests dtype preservation when converting Series to DataFrame through internal manager conversion
+2. **DataFrame construction from dict**: Tests how DataFrame handles multiple Series with different dtypes (int64, float32, object) during construction
+
+
+### Test 2: Validation-Missing Data Integration (Nithikesh Bobbili)
+**Modules Integrated:**
+- `pandas.util._validators` (validation utilities)
+- `pandas.core.missing` (missing data handling)
+- `pandas.core.series` (Series operations)
+- `pandas.core.internals` (internal data modification)
+
+**Interactions Tested:**
+1. **validate_fillna_kwargs with clean_fill_method**: Tests delegation from validator to missing data module for method normalization
+2. **Series.fillna/ffill operations**: Tests complete pipeline from user API through validation to missing data handling
+
+### Test 3: Dtype Backend-Libs Integration (Mallikarjuna)
+**Modules Integrated:**
+- `pandas.util._validators` (validation functions)
+- `pandas._libs.lib` (C extension library with sentinel values)
+- `numpy` (array handling and validation)
+
+**Interactions Tested:**
+1. **check_dtype_backend with lib.no_default**: Tests validator interaction with C library sentinel values
+2. **validate_percentile with numpy arrays**: Tests pandas validation with numpy array conversion and bounds checking
+
+## Test Data Preparation
+
+### Input Data Generation
+
+**Test 1 - Series/DataFrame Integration:**
+- **Input**: Created Series with explicit dtype (`int32`) and sample data `[1, 2, 3]`
+- **Input**: Created multiple Series with different dtypes: int64, float32, object
+- **Rationale**: Different dtypes exercise type preservation logic across module boundaries
+
+**Test 2 - Validation/Missing Data:**
+- **Input**: Series with `np.nan` values: `[1.0, np.nan, 3.0, np.nan, 5.0]`
+- **Input**: Method names `"pad"`, `"ffill"` and `None` values
+- **Rationale**: Missing values and various method names test validation and fill method delegation
+
+**Test 3 - Backend/Libs Validation:**
+- **Input**: `lib.no_default` sentinel, valid backends (`"numpy_nullable"`, `"pyarrow"`), invalid backend string
+- **Input**: Valid percentiles (`0.5`, `[0.25, 0.5, 0.75]`) and invalid (`1.5`, `[0.25, 1.5, 0.75]`)
+- **Rationale**: Mix of valid/invalid inputs tests error handling across module boundaries
+
+### Expected Output Data
+
+All tests include explicit expected outputs:
+- Series/DataFrame tests verify dtype preservation and data integrity
+- Validation tests verify normalized method names and appropriate ValueError exceptions
+- Backend tests verify acceptance of valid values and rejection with specific error messages
+
+## Execution and Results
+
+**Test File**: `pandas/tests/util/test_integration.py`
+
+**Execution Command:**
+```bash
+python -m pytest pandas/tests/util/test_integration.py -v
+```
+
+**Test Results:**
+```
+collected 6 items
+
+test_series_to_dataframe_dtype_preservation PASSED
+test_dataframe_from_dict_mixed_series_dtypes PASSED
+test_validate_fillna_with_clean_method PASSED
+test_series_fillna_integration PASSED
+test_check_dtype_backend_with_lib_sentinel PASSED
+test_percentile_validation_with_numpy_arrays PASSED
+
+=================================== 6 passed in 0.94s
+```
+
+**Summary:**
+- **Total Tests**: 6 integration tests
+- **Passed**: 6 (100%)
+- **Failed**: 0
+- **Execution Time**: 0.94 seconds
+
+### Defects Discovered
+
+**No defects were discovered during integration testing.** All module interactions functioned as expected:
+
+- Series-to-DataFrame conversion preserves dtypes correctly
+- DataFrame construction handles mixed-dtype Series properly
+- Validation module correctly delegates to missing data module
+- Series fillna operations integrate validation and missing data modules
+- Backend validation properly handles C library sentinel values
+- Percentile validation correctly integrates with NumPy array handling
+
+All error cases (ValueError for invalid inputs) behaved as designed, raising appropriate exceptions with descriptive messages.
+
+## Bug Reports
+
+**No bugs identified.** All integration points between modules are functioning correctly. The following expected behaviors were verified:
+
+1. **Type preservation across module boundaries**: Dtypes maintained through Series→DataFrame→Internals conversions
+2. **Validation delegation**: Validators correctly call specialized modules (e.g., `clean_fill_method`)
+3. **Error propagation**: Invalid inputs raise appropriate exceptions with clear messages
+4. **Sentinel value handling**: C library sentinels (`lib.no_default`) recognized by validators
+
+## Group Contributions
+
+| Student | Test Cases | Modules Integrated | Coverage |
+|---------|------------|-------------------|----------|
+| **Sandeep Ramavath** | 2 tests | Series, DataFrame, Internals, Dtypes | Series-DataFrame conversion and construction |
+| **Nithikesh Bobbili** | 2 tests | Validators, Missing Data, Series, Internals | Fillna validation and operation pipeline |
+| **Mallikarjuna** | 2 tests | Validators, C Libs, NumPy | Backend validation and percentile checking |
+
+**Total**: 6 integration tests covering 8+ distinct pandas modules with both normal and edge case scenarios.
+
diff --git a/pandas/tests/test_integration.py b/pandas/tests/test_integration.py
@@ -0,0 +1,168 @@
+"""
+Integration tests for pandas modules.
+
+These tests verify interactions between multiple modules/components:
+- pandas.core.series (Series construction)
+- pandas.core.frame (DataFrame construction)
+- pandas.core.dtypes (dtype handling)
+- pandas.core.internals (internal data management)
+- pandas.util._validators (validation utilities)
+- pandas.core.missing (missing data handling)
+"""
+import numpy as np
+import pytest
+
+import pandas as pd
+from pandas import Series, DataFrame, Index
+from pandas.core.missing import clean_fill_method
+from pandas._libs import lib
+from pandas.util._validators import (
+    validate_args_and_kwargs,
+    validate_fillna_kwargs,
+    check_dtype_backend,
+    validate_percentile,
+)
+
+
+class TestSandeepIntegration:
+    """Integration tests by Sandeep Ramavath covering Series-DataFrame-dtype interactions."""
+    
+    def test_series_to_dataframe_dtype_preservation(self):
+        """Test Series.to_frame() preserves dtype through internals conversion.
+        
+        This exercises interaction between:
+        - pandas.core.series.Series.to_frame()
+        - pandas.core.internals (manager conversion)
+        - pandas.core.frame.DataFrame
+        - pandas.core.dtypes (dtype preservation)
+        """
+        # Create Series with specific dtype
+        s = Series([1, 2, 3], name="test_col", dtype="int32")
+        
+        # Convert to DataFrame - should preserve dtype through internal conversion
+        df = s.to_frame()
+        
+        assert isinstance(df, DataFrame)
+        assert df.columns[0] == "test_col"
+        assert df["test_col"].dtype == np.dtype("int32")
+        assert len(df) == 3
+        assert (df["test_col"] == s).all()
+
+    def test_dataframe_from_dict_mixed_series_dtypes(self):
+        """Test DataFrame construction from dict with mixed Series dtypes.
+        
+        This exercises interaction between:
+        - pandas.core.frame.DataFrame.__init__
+        - pandas.core.internals.construction.dict_to_mgr
+        - pandas.core.series.Series (multiple instances with different dtypes)
+        - pandas.core.dtypes (type coercion and preservation)
+        """
+        # Create Series with different dtypes
+        s1 = Series([1, 2, 3], dtype="int64")
+        s2 = Series([1.0, 2.0, 3.0], dtype="float32")
+        s3 = Series(["a", "b", "c"], dtype="object")
+        
+        # Build DataFrame from dict of Series
+        df = DataFrame({"col1": s1, "col2": s2, "col3": s3})
+        
+        # Verify each column maintains its original dtype
+        assert df["col1"].dtype == np.dtype("int64")
+        assert df["col2"].dtype == np.dtype("float32")
+        assert df["col3"].dtype == np.dtype("object")
+        assert len(df) == 3
+
+
+class TestNithikeshIntegration:
+    """Integration tests by Nithikesh Bobbili covering validation-missing data interactions."""
+    
+    def test_validate_fillna_with_clean_method(self):
+        """Test validate_fillna_kwargs delegates to clean_fill_method.
+        
+        This exercises interaction between:
+        - pandas.util._validators.validate_fillna_kwargs
+        - pandas.core.missing.clean_fill_method
+        - method normalization and validation
+        """
+        # Test method normalization through validate_fillna_kwargs
+        value, method = validate_fillna_kwargs(None, "pad")
+        assert value is None
+        assert method == clean_fill_method("pad")
+        
+        # Test alternate method names
+        value, method = validate_fillna_kwargs(None, "ffill")
+        assert method == clean_fill_method("ffill")
+        
+        # Both None should raise
+        with pytest.raises(ValueError, match="Must specify a fill"):
+            validate_fillna_kwargs(None, None)
+    
+    def test_series_fillna_integration(self):
+        """Test Series.fillna() and ffill() use validation and missing data modules.
+        
+        This exercises interaction between:
+        - pandas.core.series.Series.fillna() / ffill()
+        - pandas.util._validators.validate_fillna_kwargs (internally)
+        - pandas.core.missing (fill methods)
+        - pandas.core.internals (data modification)
+        """
+        # Create Series with missing values
+        s = Series([1.0, np.nan, 3.0, np.nan, 5.0])
+        
+        # ffill uses forward fill method - interacts with missing data module
+        result = s.ffill()
+        expected = Series([1.0, 1.0, 3.0, 3.0, 5.0])
+        pd.testing.assert_series_equal(result, expected)
+        
+        # fillna with value - validation ensures value is acceptable
+        result = s.fillna(value=0.0)
+        expected = Series([1.0, 0.0, 3.0, 0.0, 5.0])
+        pd.testing.assert_series_equal(result, expected)
+
+class TestMallikarjunaIntegration:
+    """Integration tests by Mallikarjuna covering dtype_backend-libs interactions."""
+    
+    def test_check_dtype_backend_with_lib_sentinel(self):
+        """Test check_dtype_backend with lib.no_default sentinel.
+        
+        This exercises interaction between:
+        - pandas.util._validators.check_dtype_backend
+        - pandas._libs.lib.no_default (sentinel value)
+        - validation of backend options
+        """
+        # Should accept sentinel without exception
+        check_dtype_backend(lib.no_default)
+        
+        # Should accept valid backends
+        check_dtype_backend("numpy_nullable")
+        check_dtype_backend("pyarrow")
+        
+        # Should reject unknown backend
+        with pytest.raises(ValueError, match="dtype_backend .* is invalid"):
+            check_dtype_backend("not_a_backend")
+    
+    def test_percentile_validation_with_numpy_arrays(self):
+        """Test validate_percentile with numpy array interaction.
+        
+        This exercises interaction between:
+        - pandas.util._validators.validate_percentile
+        - numpy array conversion and validation
+        - pandas statistical methods that use percentiles
+        """
+        # Single percentile as float
+        result = validate_percentile(0.5)
+        assert isinstance(result, np.ndarray)
+        assert result == 0.5
+        
+        # Multiple percentiles as list
+        result = validate_percentile([0.25, 0.5, 0.75])
+        expected = np.array([0.25, 0.5, 0.75])
+        np.testing.assert_array_equal(result, expected)
+        
+        # Invalid percentile should raise
+        with pytest.raises(ValueError, match="percentiles should all be"):
+            validate_percentile(1.5)
+        
+        with pytest.raises(ValueError, match="percentiles should all be"):
+            validate_percentile([0.25, 1.5, 0.75])
+
+