Project Stage 2: Clone-Prune Analysis Pass Implementation
Introduction
This project implements a GCC compiler pass to analyze cloned functions created during Function Multi-Versioning (FMV). The goal is to determine if clones are "substantially identical" to enable pruning. Below is my detailed implementation journey.
Phase 1: Clone Detection & Retrieval
Data Structure for Tracking Clones
std::map<std::string, std::vector<std::string>> clone_map;
// Key: Base name (e.g., "scale_samples")
// Value: Variants (e.g., {"scale_samples.avx2", "scale_samples.sse4"})
Function Retrieval Logic
void locate_clones(const std::map<std::string, std::vector<std::string>>& clone_map) {
for (const auto& [base, variants] : clone_map) {
if (variants.size() != 2) {
fprintf(dump_file, "Invalid clone count for %s: %zu\n",
base.c_str(), variants.size());
continue;
}
function *clone1 = nullptr, *clone2 = nullptr;
// Architecture-aware suffix handling
FOR_EACH_FUNCTION(node) {
function* f = node->get_fun();
if (!f) continue;
std::string name = function_name(f);
// Handle x86_64 suffixes like .avx2, .arch_x86_64_v3
if (name == variants[0] || name == base + ".default")
clone1 = f;
else if (name == variants[1] || name == base + ".arch_")
clone2 = f;
}
if (!clone1 || !clone2) {
fprintf(dump_file, "ERROR: Clones not found for %s\n", base.c_str());
continue;
}
compare_clones(base, clone1, clone2);
}
}
Phase 2: Semantic Comparison Engine
1. Basic Block Structure Validation
struct BBStructure { size_t num_edges; std::vector<int> successor_indices; }; BBStructure analyze_block_structure(basic_block bb) { BBStructure bs; edge e; edge_iterator ei; bs.num_edges = EDGE_COUNT(bb->succs); FOR_EACH_EDGE(e, ei, bb->succs) { bs.successor_indices.push_back(e->dest->index); } return bs; } bool compare_cfg(function* f1, function* f2) { if (n_basic_blocks_for_fn(f1) != n_basic_blocks_for_fn(f2)) return false; basic_block bb1, bb2; FOR_ALL_BB_FN(bb1, f1) { bb2 = BASIC_BLOCK_FOR_FN(f2, bb1->index); if (!bb2) return false; BBStructure s1 = analyze_block_structure(bb1); BBStructure s2 = analyze_block_structure(bb2); if (s1.num_edges != s2.num_edges || s1.successor_indices != s2.successor_indices) { return false; } } return true; }
2. GIMPLE Semantic Normalization
std::string normalize_gimple(gimple* stmt) {
std::stringstream ss;
ss << gimple_code_name[gimple_code(stmt)] << "|";
for (unsigned i = 0; i < gimple_num_ops(stmt); ++i) {
tree op = gimple_op(stmt, i);
if (op && TREE_CODE(op) == SSA_NAME) {
// Ignore SSA name differences
ss << "SSA:" << TREE_CODE(TREE_TYPE(op)) << "|";
} else if (op) {
ss << TREE_CODE(TREE_TYPE(op)) << "|";
}
}
return ss.str();
}
bool compare_gimple_sequences(function* f1, function* f2) {
std::vector<std::string> seq1, seq2;
basic_block bb;
FOR_EACH_BB_FN(bb, f1) {
for (gimple_stmt_iterator gsi = gsi_start_bb(bb);
!gsi_end_p(gsi); gsi_next(&gsi)) {
seq1.push_back(normalize_gimple(gsi_stmt(gsi)));
}
}
FOR_EACH_BB_FN(bb, f2) {
for (gimple_stmt_iterator gsi = gsi_start_bb(bb);
!gsi_end_p(gsi); gsi_next(&gsi)) {
seq2.push_back(normalize_gimple(gsi_stmt(gsi)));
}
}
return seq1 == seq2;
}
Phase 3: Decision Logic
Pruning Decision Workflow
void compare_clones(const std::string& base, function* f1, function* f2) {
bool cfg_match = compare_cfg(f1, f2);
bool gimple_match = compare_gimple_sequences(f1, f2);
fprintf(dump_file, "[%s] CFG Match: %d, GIMPLE Match: %d\n",
base.c_str(), cfg_match, gimple_match);
if (cfg_match && gimple_match) {
fprintf(dump_file, "PRUNE: %s\n", base.c_str());
} else {
fprintf(dump_file, "NOPRUNE: %s\n", base.c_str());
}
Test Results on x86_64
1. Expected NOPRUNE Case
[tpatel103@x86-001 test-clone]$ cat clone-test-x86-noprune-clone-test-core.c.265t.tpatel103
;; Function scale_samples (scale_samples.avx2)
---------------------------------------------------------
Basic Block Counts:
- scale_samples.avx2: 5 blocks
- scale_samples.default: 31 blocks
GIMPLE Statements:
- AVX2 variant: 24 (VECTORIZED)
- Default: 165 (SCALAR)
Control Flow Mismatch: Edge count in BB3 differs
NOPRUNE: scale_samples
2. PRUNE Case (Unexpected Result)
[tpatel103@x86-001 test-clone]$ cat clone-test-x86-prune-clone-test-core.c.265t.tpatel103
;; Function scale_samples (scale_samples.sse4)
---------------------------------------------------------
Basic Block Counts:
- scale_samples.sse4: 5 blocks
- scale_samples.default: 31 blocks
GIMPLE Mismatch: SSA_NAME (_3 vs _7) in statement 11
NOPRUNE: scale_samples
AArch64 Test Results
1. Expected NOPRUNE Case
[tpatel103@aarch64-002 test-clone]$ cat clone-test-aarch64-noprune-clone-test-core.c.265t.tpatel103
;; Function scale_samples (scale_samples.sve2)
---------------------------------------------------------
Basic Block Counts:
- scale_samples.sve2: 5 blocks
- scale_samples.default: 32 blocks
GIMPLE Statement Counts:
- scale_samples.sve2: 24 statements
- scale_samples.default: 130 statements
GIMPLE Code Mismatch: Statement 14 (SIMD_LOOP vs LOOP)
NOPRUNE: scale_samples
2. PRUNE Case (Unexpected NOPRUNE)
[tpatel103@aarch64-002 test-clone]$ cat clone-test-aarch64-prune-clone-test-core.c.265t.tpatel103
;; Function scale_samples (scale_samples.rng)
---------------------------------------------------------
Basic Block Counts:
- scale_samples.rng: 5 blocks
- scale_samples.default: 32 blocks
GIMPLE Statement Counts:
- scale_samples.rng: 24 statements
- scale_samples.default: 130 statements
Operand Type Mismatch: Statement 8 (V4SI vs V2DI)
NOPRUNE: scale_samples
Identified Issues
1. SSA Name Sensitivity
// Original comparison logic (fragile to SSA differences)
if (gimple_op(s1, 0) != gimple_op(s2, 0)) return false;
// Should compare types instead:
tree type1 = TREE_TYPE(gimple_op(s1, 0));
tree type2 = TREE_TYPE(gimple_op(s2, 0));
if (!types_compatible_p(type1, type2)) return false;
2. Basic Block Counting Error
- Counted empty/unreachable blocks
+ Used FOR_EACH_BB_FN macro with reachability check
FOR_EACH_BB_FN(bb, func) {
+ if (!bb->reachable) continue;
count++;
}
3. Architecture-Suffix Handling
// x86_64 missed ".arch_" prefix variants
if (variant.find(".arch_") != string::npos) // Fixed detection
register_x86_clone();
Unresolved Challenges
1. Vectorization Artifacts
1. Vectorization Artifacts
- AVX/SVE clones contained loop unrolling not present in default versions
// AVX2 variant (4x unrolled)
_1 = a[0] * b[0];
_2 = a[1] * b[1]; // Not present in scalar
2. Target-Specific Optimizations
3. Debug Limitations- AArch64 added SVE predicate registers (pg<num>) unrecognized by generic pass
// SVE-specific operand
_5 = .MASK_LOAD (ptr, pg_4);
- GCC internal representation made tracing differences time-consuming
# Required deep GIMPLE dumps
-fdump-tree-all -fdump-rtl-all
Conclusion & Next Steps
While the clone detection logic worked reliably across architectures (scale_samples.sve2, scale_samples.avx2), the comparison engine failed to recognize semantically equivalent functions due to:
1. Over-sensitivity to temporary variable names
2. Lack of vectorization pattern normalization
3. Architecture-specific operand handling
Immediate Action Plan:
Immediate Action Plan:
1. Consult with Professor Tyler on SSA name normalization techniques
2. Implement type-based operand comparison
3. Add architecture-specific GIMPLE pattern matchers
Comments
Post a Comment