analysis_domain_models Module Documentation
Introduction
The analysis_domain_models module is a fundamental component of the CodeWiki dependency analysis system, providing the core data structures that encapsulate the results of repository analysis and node selection operations. This module exists to standardize the representation of analysis outputs, ensuring consistency across different parts of the application—from the dependency analysis engine to the frontend visualization and export mechanisms.
At its core, this module defines two key data models: AnalysisResult and NodeSelection. These models capture the complete state of a repository analysis, including the repository metadata, identified functions, their relationships, and the configuration for partial exports. By using Pydantic models, the module ensures data validation, clear type definitions, and easy serialization/deserialization, making it a reliable foundation for data exchange between components.
This module is part of the dependency_analysis_engine system, working in conjunction with other modules like analysis_orchestration (which performs the actual analysis) and dependency_graph_construction (which builds the relationship graphs). It also integrates with the backend_documentation_orchestration and web_application_frontend modules to deliver analysis results to end users.
Core Components
AnalysisResult
The AnalysisResult class is the primary data structure for storing the complete output of a repository dependency analysis. It aggregates all relevant information about the analyzed repository, including its structure, identified functions, call relationships, and visualization data.
Key Attributes:
repository: ARepositoryobject containing metadata about the analyzed repository (such as name, path, and version control information).functions: A list ofNodeobjects representing the functions, classes, or other code elements identified during analysis.relationships: A list ofCallRelationshipobjects defining the dependencies and interactions between the identified nodes.file_tree: A nested dictionary representing the hierarchical structure of the repository's files and directories.summary: A dictionary containing high-level summary statistics and insights about the analysis (e.g., total number of functions, most connected nodes).visualization: An optional dictionary storing data for visualizing the dependency graph (default: empty dictionary).readme_content: An optional string containing the content of the repository's README file, if available (default: None).
Usage Example:
from codewiki.src.be.dependency_analyzer.models.analysis import AnalysisResult
from codewiki.src.be.dependency_analyzer.models.core import Repository, Node, CallRelationship
# Create sample core objects
repo = Repository(name="my-repo", path="/path/to/repo")
func1 = Node(id="func1", name="calculate_sum", type="function")
func2 = Node(id="func2", name="main", type="function")
rel = CallRelationship(source=func1.id, target=func2.id, type="calls")
# Create AnalysisResult
result = AnalysisResult(
repository=repo,
functions=[func1, func2],
relationships=[rel],
file_tree={"my-repo": {"src": {"main.py": {}}}},
summary={"total_functions": 2, "total_relationships": 1},
visualization={"layout": "force-directed"},
readme_content="# My Repository\nThis is a sample repo."
)
# Serialize to JSON for API response
json_result = result.model_dump_json()
NodeSelection
The NodeSelection class defines the configuration for selecting a subset of nodes from a complete analysis result, typically used for partial exports or focused visualizations. It allows users to specify which nodes to include, whether to include their relationships, and to assign custom display names.
Key Attributes:
selected_nodes: A list of node IDs representing the subset of nodes to include in the export/visualization.include_relationships: A boolean indicating whether to include the call relationships between the selected nodes (default: True).custom_names: A dictionary mapping node IDs to custom display names, allowing for more readable visualizations or exports.
Usage Example:
from codewiki.src.be.dependency_analyzer.models.analysis import NodeSelection
# Create a node selection for focused export
selection = NodeSelection(
selected_nodes=["func1", "func2"],
include_relationships=True,
custom_names={"func1": "Calculate Sum", "func2": "Main Function"}
)
# Validate and access attributes
if selection.include_relationships:
print(f"Including relationships for nodes: {selection.selected_nodes}")
Architecture and Relationships
The analysis_domain_models module is a central component in the CodeWiki architecture, acting as a data hub that connects the analysis engine with the rest of the system. The following diagram illustrates its position and relationships with other modules:
Component Relationships Explained:
-
Dependency on core_domain_models: The
AnalysisResultclass depends on theNode,CallRelationship, andRepositoryclasses from thecore_domain_modelsmodule. These core models define the basic building blocks of the dependency analysis, andAnalysisResultaggregates them into a complete result set. -
Interaction with analysis_orchestration: The
analysis_orchestrationmodule (containingAnalysisService,CallGraphAnalyzer, andRepoAnalyzer) is responsible for performing the actual repository analysis. It produces anAnalysisResultobject as its output, which is then passed to other components. -
Integration with dependency_graph_construction: The
dependency_graph_constructionmodule (containingDependencyGraphBuilder) contributes to therelationshipsandvisualizationattributes ofAnalysisResultby building the call relationship graphs and preparing visualization data. -
Consumption by backend_documentation_orchestration: The
backend_documentation_orchestrationmodule usesAnalysisResultto generate documentation from the analysis data, leveraging the repository structure, functions, and relationships to create comprehensive documentation. -
Consumption by web_application_frontend: The
web_application_frontendmodule consumes bothAnalysisResult(to display analysis results to users) andNodeSelection(to handle user requests for partial exports or focused visualizations).
Data Flow
The typical data flow involving analysis_domain_models components begins with a repository analysis request and ends with the delivery of results to the end user. The following sequence diagram illustrates this process:
Data Flow Steps:
-
Analysis Initiation: A user submits a repository for analysis through the web frontend. The frontend forwards this request to the backend orchestrator.
-
Analysis Execution: The backend orchestrator initiates the analysis process through the
analysis_orchestrationmodule. This module analyzes the repository and creates core domain objects (Node,CallRelationship,Repository). -
Result Aggregation: The
analysis_orchestrationmodule creates anAnalysisResultobject, populating it with the core domain objects, file tree, summary, and visualization data. TheAnalysisResultis validated by Pydantic before being returned. -
Result Delivery: The
AnalysisResultis passed back through the backend orchestrator to the web frontend, which displays the results to the user. -
Partial Export Request: If the user requests a partial export, the frontend creates a
NodeSelectionobject based on the user's input, validates it, and sends it to the backend orchestrator. -
Result Filtering: The backend orchestrator uses the
NodeSelectionto filter the originalAnalysisResult, including only the selected nodes, their relationships (if requested), and applying any custom names. -
Partial Export Delivery: The filtered result is returned to the frontend, which provides it to the user as a download.
Usage and Configuration
The analysis_domain_models module is primarily used as a data transfer object (DTO) layer, so it doesn't require extensive configuration. However, there are several best practices and usage patterns to follow:
Best Practices:
-
Always Validate: When creating instances of these models programmatically, use Pydantic's validation to ensure data integrity. Pydantic will automatically validate field types and raise
ValidationErrorif any constraints are violated. -
Use model_dump() and model_dump_json(): For serializing models to dictionaries or JSON, use Pydantic's built-in
model_dump()andmodel_dump_json()methods instead of manual serialization. These methods handle nested objects and optional fields correctly. -
Handle Optional Fields: Be aware that
visualizationandreadme_contentare optional fields inAnalysisResult. Always check if they exist before accessing them to avoidAttributeError. -
Consistent Node IDs: When working with
NodeSelection, ensure that the node IDs inselected_nodesexactly match the IDs of nodes in the correspondingAnalysisResult'sfunctionslist. Mismatched IDs will result in empty or incomplete filtered results.
Error Handling:
When using these models, you may encounter the following error conditions:
-
ValidationError: Raised by Pydantic when trying to create a model instance with invalid data (e.g., wrong type for a field, missing required field). Always wrap model creation in a try-except block when dealing with untrusted input.
from pydantic import ValidationError try: selection = NodeSelection(selected_nodes="not a list") except ValidationError as e: print(f"Invalid node selection: {e}") -
Missing Dependencies: If the
core_domain_modelsmodule is not available or the required classes (Node,CallRelationship,Repository) are not imported correctly, you will get anImportError. Ensure that the module path is correct and that all dependencies are installed. -
Inconsistent Data: When using
NodeSelectionto filter anAnalysisResult, if theselected_nodescontain IDs that are not present in thefunctionslist, the filtered result will be missing those nodes. Always validate the node IDs against theAnalysisResultbefore creating theNodeSelection.
Extending the Module
While the current models are designed to be general-purpose, there may be cases where you need to extend them to support additional features. Here are some guidelines for extension:
Extending AnalysisResult:
To add new fields to AnalysisResult, simply add them to the class definition with appropriate type annotations. For example, if you want to add a field for code quality metrics:
class AnalysisResult(BaseModel):
# ... existing fields ...
code_quality_metrics: Optional[Dict[str, Any]] = None
Make sure to update any components that create or consume AnalysisResult to handle the new field appropriately.
Extending NodeSelection:
Similarly, you can add new configuration options to NodeSelection. For example, if you want to add a depth limit for relationship inclusion:
class NodeSelection(BaseModel):
# ... existing fields ...
relationship_depth: Optional[int] = None
You would then need to update the filtering logic in the backend_documentation_orchestration module to respect this new parameter.
Creating Custom Models:
If you need more specialized data structures, you can create new Pydantic models in the same module. For example, a model for analysis configuration:
class AnalysisConfig(BaseModel):
include_private_functions: bool = False
max_depth: int = 5
excluded_directories: List[str] = ["tests", "docs"]
When extending the module, always ensure backward compatibility. Make new fields optional with default values, and avoid removing or changing existing fields without proper deprecation.
Related Modules
For more information about related modules, refer to their documentation:
- core_domain_models: Defines the core data structures (
Node,CallRelationship,Repository) used byAnalysisResult. - analysis_orchestration: Explains how the analysis is performed and how
AnalysisResultis produced. - dependency_graph_construction: Details how the call relationship graphs are built and integrated into
AnalysisResult. - backend_documentation_orchestration: Describes how
AnalysisResultis used to generate documentation. - web_application_frontend: Shows how
AnalysisResultandNodeSelectionare used in the user interface.