Seminaire on Optimization

Dagstuhl Sept. 18-23

1  Is Code Optimization Relevant? (B. Pugh)

Bill Pugh was ill. Instead of that discussion to warm up about this topic.

Moores law:

Proebstings law of optimization:

2  Optimization Challenges in Object Code Translation (Stadel, Siemens)

In OCT almost everything is dynamic: indirect addressing always.

3  Generic Postpass Optimization (R. Wilhelm)

nice framework CRL for generic analyses

Uses ILP. worst case execution times. estimation of bounds.

TDL target description language for target description.

Postpass techniques seem to be a la mode since they do not touch the compiler

4  Retargetable Code Optimization by ILP (D. Kästner, Saarbrücken)

AVIV Hanano, ICG Bashford/leupers 99 Search techniques

PROPAN uses ILP to integrate several techniques (RA, IS, unit binding)

time or order-based ILP.

constraints that model data dependencies or flow modelling, register allocations.

Number of constraints and variables is quadratic.

Is there a techique besides ILP to solve constraints?

Wolf meint, im Scheduling wären bessere Abschätzungen möglich, wie gut man an das Optimium herankommt.

5  Optimizations based on Probabilistic DFA systems (Eduard Mehofer)

first work on PLDI 96 by Ramalingan

quite different; finding out with which probability a DFA result is valuable

6  Symbolic Analysis in the Vienna Fortran Compiler

Rita, an elimination framework for solving dfa problems

7  Aßmann: Stratification of Rewrite Systems for Program Optimizations

A technique to group and order rule sets of rewrite systems, to select one of the many normal forms of an indeterministic system.

8  JVM on Jalapeno (Quicksilver compiler)

Jalapeno is a set of compilers (fast one, good one).

IBM gave up its static Java compiler since it was too since it was too complicated to introduce dynamism into it.

Quicksilver generates parametrizable binary code for Java VMs.

Uses Relocation techniques. to bind variable and function addresses

very interesting.

9  Latte - an Open Source Java JIT Compiler (Byung-sun Yang)

Good register allocation. Must be fast. Enhanced left-edge algorithm from Tucker 75. very interesting. SSA-based control flow graph. PACT99 paper. Unitof allocation and optimizuation: tree region ofblocks.

latte.snu.ac.kr

10  Tuesday morning. Bidirectionality in DFA (Udy Khedkar Pune university India)

Data flow may not follow all paths, therefor bidirectionality is advantageous.

Critique on Muchnicks book because it bans bidirectional flows...

Distinguish node and edge flow functions. That give a general framework, from which unidirectional analyses are special cases.

Concept of graph width helps to improve the complexity estimate for bidirectional flow. Toplas 94. pldi 93.

PRE is not true bidirectional; true bidirectional problems cannot be made unidirectional.

Rüthing has also a paper in SAS 98.

It is the same problems as in AGs, replacing unificationby passes.

Assignment motion introduces bidirectional flows.

11  Dynamci Currency Determination in Optimized Programs (DanJ Dhamdere, Bombay)

(Debugging optimized programs). One needs to build execution histories, trajectories.

12  Mobile Code (M. Franz)

Huffman-encoding of the program. The most probabble program gets the least bits.

13  W. Amme Typed SSA

idea: Annotate type info to ssa-language to be more sure about effects.

14  David Gregg, University of Vienna Global Instruction Scheduling

Experiments in global scheduling, trace scheduling, superblock scheduling

in the Cameleon compiler of IBM YH, many optimizations

15  C. Kessler: Local Instruction Scheduling for Minimum Register Need

Overview on optimal scheduling algorithms for basic blocks. Search-based, topsort based (wavefront), ILP based algorithms. Important for Boris.

has an equivalence class construction, folding common subtrees of schedules.

then saving only the minimal cost paths for wavefronts.

Induction over wavefronting.

Using wavefronts for optimal problems! for ``being ready'' problems

Whats the difference between dynamci programming and wavefront? the former is cost-based.

Wolf: scheduling of parallel machines is similar.

16  E. Zehendner: Compiler-controlled dual-path branch execution

multithreaded processors.

Use threads for speculative execution of branches. Kill other branch as soon as possible. Make a common project from firm/klirflow.

17  O. Rüthing. Code-size sensitive code motion

lazy code motion plus code-size awareness.

Computing down-safety closure. For those nodes thatt do not introduce code replications in insertion. then compute a bipartite graph and from that a tight set from which the insertion points can be guessed.

18  R. Amir Decreasing the Cost of ArrayBound Check in Java

Extend JVM and raise VM Exceptions instead of explicit checks..

Array of array bounds (array protectioin register), 2 new instructions protected-load and protected-store use this array to raise an exception. Validity check is performed in parallel to the memory operation.

Special optimization for APregisters, to optimize their allocation

insertion of set_bound instructions, just before ploads, then moving them with code motion

AP registers are virtual, i.e. must be allocated, and spilling means software checking instead of hardwarechecking.

19  A. Liu, SUNY Stony Brook: From From Recursion to Iteration: What are the Optimizations?

Incrementalization:

  1. identify an increment opertion of f
  2. find an algorithm of f that computes the value of f in terms of the increment and the previous value

Incrementalization is the opposite of finding a closed form for recurrences

The trick is to use inverses or associativity. I dont like her classification, it is without names and not precise enough:

  1. pointer reversal for elimination of stacks
  2. backtracking on data construction
  3. update in-place without heap allocation.

In case of multiple recursions, use for incrementalization that recursion with the minimal put change

Was bedeutet das in Termalgebra??

20  E. Düsterwald HP Cambridge: Adaptive Dynamic Comilation Systems

Project dynamo at HP

Similar: Redstone/Wiggins at Compaq

Goal: adapting the binary to its execution environment

principle heuristic: keep the dynamic optimizations simple (hard testing,debugging, long runtime of complicated algorithms)

looking for hot program paths.

  1. path Profile based prediction

    1. bit tracing: memorize control flow conditions with bitstrings. space explosion for recording the paths.

  2. NET prediction scheme (next execution tail prediction): instrumentation counters only for targets of backwards edges. Saves time and code space.

Comparison.  

Path profiling metric. Compute hit rate and noise. Optimize noize and reduce hit rate.

Noise is worse with NEt, but the rest is the same or better.

Profiling for longer intervals does not improve prediction wuality due to rising missed opportunity cost

roughly identical hit rates.

Dynamo is part of the runtime system. dynamo looks lke a software interpreter that executes the same instruction set as the underlying cpu.

  1. Start in interpretation mode.
  2. get trace selector to select a trace.
  3. emit trace to code cache, continue. all code in the cache is comiled, and the more code is found in the cache, the more it is executed. (on-demand jit compiler).

Related work: boa binary translator IBM

How can that be used for dynamic reconfiguration of architectural code?

21  Anton Ertl: Optimizations during code selecting Tree parsing

some classes of optimizations in backend specifications. Disappointing.

22  Thursday morning: M. Hailpern: Coalesing as an Aid to Register allocation interference-graph coloring

Coalescing can increase the chromatic number if an inference graph, if dependency graphs are rebuilt.

23  Santosh Pande: Dynamic Register Optimizations.

Improving the linear Scan register allocation algorithm with splitting live ranges. Use the usage density. Looks very interesting as fast nice allocator.

24  C. Ferdinand Postpass code size compression for Infineon 80C166

almost 30% of reduction of code size.

25  F. Martin: Opt. for segmented memory architectures

26  J. Larus Cache-Concious Compilation: Can Compilers Hack it?

Cache-conscious data structures (Larus/chilimbi)

Locality enhanced servers.

Techniques:

  1. Clustering
  2. Coloring
  3. Compression

Aber: das alles reicht nicht, man sollte neue Sprachen entwickeln, die cachesensitiv sind. Wär das nicht was für AOP?

27  F. Mueller Playdoh, EPIC and what next?

new algorithm for node splitting in irreducible graphs

works with DJ-graphs to recognize regions in VPO compiler

28  ATAIR Software Wien: DSP Optimizations

interesting method to save def-use-analysis for dsps

first for basic bllocks calculate ``delta values''

for procedures be sure by a dfa framework that guarantees that delta values are the same for neighbored basic blocks.

29  A. Mycroft: A statically allocated parallel functional programming language

FLaSH project. first-order language with tail recursion. Conflicts if a function is called twice - then the first result must be memorized in order to reuse it again.

30  Jim Dehnert: Induction pointer prefetching

Good compilers should only prefetch that data which will actually be used.

Finding pointer increment variables

Does a profitability analysis, based on a recurrence measure and estimated memory bandwidth.

Builds a Miss Interference Graph

31  Jim Dehnert: The SGI 64 infrastructure

VERY GOOD: 10 years of development. Gnu Public license for Linux IA 64.

Phases:

  1. interprocedural analysis
  2. loop nest optimization and parallelization
  3. global optimization

IRs: very high (incl. fortran 90), high, mid, low WHIRL intermediate representation.

gcc, g++ are the base.

IPA:

  1. call tree, alias, array sections
  2. inlining, cloning, dead elimination, constants, code and data layout

ipa optimizations are re-called by the linker before linking.

loop nest optimizer mit cost model

Parallelization

global optimizations: SSA (version of CC96), all traditional optimizations

design for testability and debugability.

backend goes back to quadrupels.

target information table, several processors modeled IA-63, IA-32, MIPS

Hyperblock optimizer, based on Mahlke

predicate query subsystem make queries about predicate values

loop preparation for sw pipelining

get it:

  1. oss.sgi.com/projects/Pro64
  2. reality.sgi.com/dehnert_engr/papers/index.html
  3. www.capsl.udel.edu/...pro64

32  Discussion sessions

32.1  Unifying theories

  1. abstr. interpretations
  2. data-flow frameworks
  3. logic-based analysis
  4. Tarjans graph expressions, graph algorithms
  5. model checking: computing validity of predicates at graph nodes
  6. ILP and constraint solving
  7. polytope model
  8. collection of transformation techniques without analysis
  9. rewriting
  10. unimodular transformations
  11. systematic program transformation techniques
  12. memoization
  13. finite differenzing, incrementalization

32.2  Problems in Optimization

  1. Interaction of algorithms how do they harmonize?
  2. Lack of metrics
  3. sound basis for evaluation of unkown optimization work - benchmarks
  4. Compiler constructors have lost their links tto other fields
  5. common infrastructure? (suif: buggy from grad students, high learning curve)
  6. Are simple optimizations worthwhile?
  7. simulators are missing
  8. why does Linux work? and not optimization frameworks?
  9. test suites (ADA) nullstone.com als open source project? Test frameworks for C
  10. a bug data base in compoilers, to remember them

32.3  Teaching - How

  1. transformations only ala bacon
  2. finite differencing and optimization
  3. memoization

32.4  Challenges (friday morning)

Anton Ertl.

  1. Architectures and compilers should come togehter.
  2. one instance: value prediction., speculation.

Sam Mifkin, IBM

  1. low levew targeted to architectures

    1. embedded, smarter than macros
    2. performance predictable from source
    3. like P-DSP (with parallel constructs)

  2. efficient, high level languages

    1. extensions of K&P

Wolf: DSP language is a language to connect domain-specific libraries.

Jan-Erik /Swedes AIR:

  1. time to market is important
  2. optimize for shorter develompent time (abstraction, ..)

Max Hailperin:

  1. explain why heuristics come so near to the optimum, although it is NP complete.
  2. Wolf: instance complexity theory is working on special instances

Jim Dehnert:

  1. Irregular data structures.

S. Pande:

Dynamic optimizations, trends inarchitecture research. different measures: power, code size, incremental (scaling) optimizations.

D. Dhamdere: Execution histories. also for dynamic slicing.


File translated from TEX by TTH, version 2.53.
On 28 Sep 2000, 15:58.