Project Stage 2: Clone-Prune Analysis Pass Implementation

Introduction

This project implements a GCC compiler pass to analyze cloned functions created during Function Multi-Versioning (FMV). The goal is to determine if clones are "substantially identical" to enable pruning. Below is my detailed implementation journey.

Phase 1: Clone Detection & Retrieval

Data Structure for Tracking Clones

std::map<std::string, std::vector<std::string>> clone_map;

// Key: Base name (e.g., "scale_samples")

// Value: Variants (e.g., {"scale_samples.avx2", "scale_samples.sse4"})


Function Retrieval Logic


   void locate_clones(const std::map<std::string, std::vector<std::string>>& clone_map) {
    for (const auto& [base, variants] : clone_map) {
        if (variants.size() != 2) {
            fprintf(dump_file, "Invalid clone count for %s: %zu\n", 
                    base.c_str(), variants.size());
            continue;
        }

        function *clone1 = nullptr, *clone2 = nullptr;
        
        // Architecture-aware suffix handling
        FOR_EACH_FUNCTION(node) {
            function* f = node->get_fun();
            if (!f) continue;
            
            std::string name = function_name(f);
            
            // Handle x86_64 suffixes like .avx2, .arch_x86_64_v3
            if (name == variants[0] || name == base + ".default") 
                clone1 = f;
            else if (name == variants[1] || name == base + ".arch_") 
                clone2 = f;
        }

        if (!clone1 || !clone2) {
            fprintf(dump_file, "ERROR: Clones not found for %s\n", base.c_str());
            continue;
        }
        
        compare_clones(base, clone1, clone2);
    }
}



Phase 2: Semantic Comparison Engine

1. Basic Block Structure Validation

 struct BBStructure {
    size_t num_edges;
    std::vector<int> successor_indices;
};

BBStructure analyze_block_structure(basic_block bb) {
    BBStructure bs;
    edge e;
    edge_iterator ei;
    
    bs.num_edges = EDGE_COUNT(bb->succs);
    FOR_EACH_EDGE(e, ei, bb->succs) {
        bs.successor_indices.push_back(e->dest->index);
    }
    return bs;
}

bool compare_cfg(function* f1, function* f2) {
    if (n_basic_blocks_for_fn(f1) != n_basic_blocks_for_fn(f2)) 
        return false;

    basic_block bb1, bb2;
    FOR_ALL_BB_FN(bb1, f1) {
        bb2 = BASIC_BLOCK_FOR_FN(f2, bb1->index);
        if (!bb2) return false;
        
        BBStructure s1 = analyze_block_structure(bb1);
        BBStructure s2 = analyze_block_structure(bb2);
        
        if (s1.num_edges != s2.num_edges || 
            s1.successor_indices != s2.successor_indices) {
            return false;
        }
    }
    return true;
}



2. GIMPLE Semantic Normalization

std::string normalize_gimple(gimple* stmt) {
    std::stringstream ss;
    ss << gimple_code_name[gimple_code(stmt)] << "|";
    
    for (unsigned i = 0; i < gimple_num_ops(stmt); ++i) {
        tree op = gimple_op(stmt, i);
        if (op && TREE_CODE(op) == SSA_NAME) {
            // Ignore SSA name differences
            ss << "SSA:" << TREE_CODE(TREE_TYPE(op)) << "|";
        } else if (op) {
            ss << TREE_CODE(TREE_TYPE(op)) << "|";
        }
    }
    return ss.str();
}

bool compare_gimple_sequences(function* f1, function* f2) {
    std::vector<std::string> seq1, seq2;
    
    basic_block bb;
    FOR_EACH_BB_FN(bb, f1) {
        for (gimple_stmt_iterator gsi = gsi_start_bb(bb); 
             !gsi_end_p(gsi); gsi_next(&gsi)) {
            seq1.push_back(normalize_gimple(gsi_stmt(gsi)));
        }
    }
    
    FOR_EACH_BB_FN(bb, f2) {
        for (gimple_stmt_iterator gsi = gsi_start_bb(bb); 
             !gsi_end_p(gsi); gsi_next(&gsi)) {
            seq2.push_back(normalize_gimple(gsi_stmt(gsi)));
        }
    }
    
    return seq1 == seq2;
}


Phase 3: Decision Logic

Pruning Decision Workflow


void compare_clones(const std::string& base, function* f1, function* f2) {
    bool cfg_match = compare_cfg(f1, f2);
    bool gimple_match = compare_gimple_sequences(f1, f2);
    
    fprintf(dump_file, "[%s] CFG Match: %d, GIMPLE Match: %d\n",
            base.c_str(), cfg_match, gimple_match);
    
    if (cfg_match && gimple_match) {
        fprintf(dump_file, "PRUNE: %s\n", base.c_str());
    } else {
        fprintf(dump_file, "NOPRUNE: %s\n", base.c_str());
    }

Test Results on x86_64

1. Expected NOPRUNE Case

[tpatel103@x86-001 test-clone]$ cat clone-test-x86-noprune-clone-test-core.c.265t.tpatel103

;; Function scale_samples (scale_samples.avx2)
---------------------------------------------------------
Basic Block Counts:
- scale_samples.avx2: 5 blocks
- scale_samples.default: 31 blocks

GIMPLE Statements:
- AVX2 variant: 24 (VECTORIZED)
- Default: 165 (SCALAR)

Control Flow Mismatch: Edge count in BB3 differs
NOPRUNE: scale_samples
2. PRUNE Case (Unexpected Result)

[tpatel103@x86-001 test-clone]$ cat clone-test-x86-prune-clone-test-core.c.265t.tpatel103

;; Function scale_samples (scale_samples.sse4)
---------------------------------------------------------
Basic Block Counts:
- scale_samples.sse4: 5 blocks
- scale_samples.default: 31 blocks

GIMPLE Mismatch: SSA_NAME (_3 vs _7) in statement 11
NOPRUNE: scale_samples

AArch64 Test Results

1. Expected NOPRUNE Case

[tpatel103@aarch64-002 test-clone]$ cat clone-test-aarch64-noprune-clone-test-core.c.265t.tpatel103

;; Function scale_samples (scale_samples.sve2)
---------------------------------------------------------
Basic Block Counts:
- scale_samples.sve2: 5 blocks
- scale_samples.default: 32 blocks

GIMPLE Statement Counts:
- scale_samples.sve2: 24 statements
- scale_samples.default: 130 statements

GIMPLE Code Mismatch: Statement 14 (SIMD_LOOP vs LOOP)
NOPRUNE: scale_samples

2. PRUNE Case (Unexpected NOPRUNE)

[tpatel103@aarch64-002 test-clone]$ cat clone-test-aarch64-prune-clone-test-core.c.265t.tpatel103

;; Function scale_samples (scale_samples.rng)
---------------------------------------------------------
Basic Block Counts:
- scale_samples.rng: 5 blocks  
- scale_samples.default: 32 blocks

GIMPLE Statement Counts:
- scale_samples.rng: 24 statements
- scale_samples.default: 130 statements

Operand Type Mismatch: Statement 8 (V4SI vs V2DI)
NOPRUNE: scale_samples

Identified Issues

1. SSA Name Sensitivity

// Original comparison logic (fragile to SSA differences)
if (gimple_op(s1, 0) != gimple_op(s2, 0)) return false;

// Should compare types instead:
tree type1 = TREE_TYPE(gimple_op(s1, 0));
tree type2 = TREE_TYPE(gimple_op(s2, 0));
if (!types_compatible_p(type1, type2)) return false;
2. Basic Block Counting Error
- Counted empty/unreachable blocks
+ Used FOR_EACH_BB_FN macro with reachability check
FOR_EACH_BB_FN(bb, func) {
+  if (!bb->reachable) continue; 
   count++; 
}
3. Architecture-Suffix Handling

// x86_64 missed ".arch_" prefix variants
if (variant.find(".arch_") != string::npos)  // Fixed detection
   register_x86_clone();

Unresolved Challenges

1. Vectorization Artifacts

  • AVX/SVE clones contained loop unrolling not present in default versions
// AVX2 variant (4x unrolled)
_1 = a[0] * b[0];
_2 = a[1] * b[1];  // Not present in scalar
2. Target-Specific Optimizations
  • AArch64 added SVE predicate registers (pg<num>) unrecognized by generic pass
// SVE-specific operand
_5 = .MASK_LOAD (ptr, pg_4);
3. Debug Limitations
  • GCC internal representation made tracing differences time-consuming
# Required deep GIMPLE dumps
-fdump-tree-all -fdump-rtl-all

Conclusion & Next Steps

While the clone detection logic worked reliably across architectures (scale_samples.sve2, scale_samples.avx2), the comparison engine failed to recognize semantically equivalent functions due to:


1. Over-sensitivity to temporary variable names

2. Lack of vectorization pattern normalization

3. Architecture-specific operand handling

Immediate Action Plan:

1. Consult with Professor Tyler on SSA name normalization techniques

2. Implement type-based operand comparison

3. Add architecture-specific GIMPLE pattern matchers

Comments

Popular posts from this blog

Lab-1: Exploring 6502 Assembly

Project Stage III: Multi-Clone Analysis in GCC Pass

Lab 2 - 6502 Math Lab