Language systems ================ * How to run a program? * Which programming languages are usually compiled, virtualized or interpreted? * Why is a language compiled or interpreted (virtualized)? 1) What is the life cycle of a program? - Write the program using a text editor. - Compile the program to machine code: - Compile the program to assembly. - Translate assembly to machine code. - Link with libraries. - Load program in memory, and replace names with addresses. - Debug the program. The classical Sequence: [editor] --> source file --> [preprocessor] --> preprocessed source file --> [compiler] --> assembly language file --> [assembler] --> object file --> [linker] --> executable file --> [loader] --> running program in memory 1.1) What is assembly code? 1.2) What is machine code? 1.3) What is really the command gcc? In the example below, let's consider the following C file: /* cube.c */ #include #define CUBE(x) (x)*(x)*(x) int main() { int i = 0; int x = 2; int sum = 0; while (i++ < 100) { sum += CUBE(x); } printf("The sum is %d\n", sum); } 1.3) How to produce the preprocessed code from a c file? gcc -E cube.c > cube.p.c - or - cpp cube.c -o cube.p.c 1.3.1) Why does it have so many lines? 1.3.2) Which declarations will be in these lines? 1.3.3) What if I remove the #include from the program? 1.4) How to produce an assembly program from a c file? gcc -S cube.p.c - or in cpp - /usr/libexec/gcc/i686-pc-linux-gnu/4.1.2/cc1 cube.p.c -o cube.p.s - or in rhyme or doce.grad - /usr/lib/gcc/i486-linux-gnu/4.4.3/cc1 cube.p.c -o cube.p.s - or in opencl - /usr/lib/gcc/x86_64-linux-gnu/9/cc1 cube.p.c -o cube.p.s 1.4.1) Where is "The sum is %d"? 1.4.2) Where is the loop? 1.4.3) Is this program efficient? 1.5) How to produce an object file? as cube.p.s -o cube.o 1.5.1) Where is "The sum is %d"? 1.5.2) Where is printf? 1.6) How to link with the external libraries? /* In opencl */ /usr/lib/gcc/x86_64-linux-gnu/9/collect2 /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/Scrt1.o /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/crti.o -L/usr/lib/gcc/x86_64-linux-gnu/9 cube.o -lc -o cube.exe 1.6.1) Why is the executable so much larger than the object? 1.6.2) what are the differences between the object and the executable? 1.7) Could you use clang instead? $> cpp cube.c -o cube.tu # Here, the compiler uses the extension. $> clang -cc1 -std=c89 -o cube.o cube.tu 2) Compilers do some optimizations. How to optimize the code below? int i = 0; while (i < 100) { a[i++] = x*x*x; } /* In cpp */ $> /usr/libexec/gcc/i686-pc-linux-gnu/4.1.2/cc1 cube.p.c -o cube.p.s -O1 /* In rhyme */ /usr/lib/gcc/i486-linux-gnu/4.4.3/cc1 cube.p.c -o cube.p.s -O1 2.1) What is this new program doing? 2.2) Is there any disadvantage in doing some optimization? - It makes it harder to see what assembly is produced for each statement. 2.3) There are many levels of optimization. What are the differences between these levels? Example: 2.2) What will be produced by gcc for the program below? #include int main() { int i = 7; int* p = &i; *p = 13; printf("The value of i = %d\n", i); } Compare it with what gcc does for: #include int main() { printf("The value of i = %d\n", 13); } E.g: compile both with -s, and then -s -O1 2.3) Which data-structures does the compiler use to optimize a program? 3) Are all the assembly programs the same? 3.1) A program written in C compiles to the same assembly as a program written in SML? 3.2) When are assembly programs different? - When we compile to different computer architectures. 3.3) What is a computer architecture? - Hardware specification - Instruction set. 3.4) Give examples of computer architectures. /* In some OSX $ */ $> clang -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -Xclang -disable-O0-optnone -c -emit-llvm cube.c -o cube.bc $> llc cube.p.bc -march=arm -o cube.p.arm.s $> llc cube.p.bc -march=ppc32 -o cube.p.ppc32.s $> llc cube.bc -march=mips -o cube.mips.s 4) What is the way to execute a program other than compiling? - With an interpreter. 4.1) Can you write a bash script to list all the files in a director? #!/bin/bash for f in `ls`; do echo "File -> $f" done 5) Any other way? - With a virtual machine. 5.1) What is the difference between an interpreter and a virtual machine? 5.2) What are the advantages of virtual machines? - portability. - security. - profiling. 5.3) What is a famous virtual machine that you know? - The java virtual machine, which exists in any browser. 5.3.1) How to compile a simple Java program? public class Cube { public static void main(String args[]) { int x = 2; int sum = 0; for (int i = 0; i < 100; i++) { sum += x * x * x; } System.out.println("The sum is " + x); } } $> javac Cube.java $> javap -c Cube.class 5.3.2) What is a '.class'? To sum it up: Interpreters <-- high-level language Virtual machines <-- intermediate language Hardware <-- machine code produced by a compiler 6) After linking, the size of a program grows considerably. How to avoid this problem? - Use dynamic linking. 6.1) How does dynamic linking work? 6.2) What is the name of dynamic link libraries in windows? .dll 6.3) What about in unix? .so 6.4) How is dynamic linking in Java? 6.5) What are the advantages of dynamic libraries? - multiple programs can share code in memory. - library code can be updated in separate. - avoids loading code that is never used. 7) What is a just-in-time compiler? - compiles while interprets. 8) Where is the implementation of printf in the executable of cube.c? - Dynamic linking: $> gcc cube.c -o dyn.exe $> objdump -x dyn.exe | grep printf # show all the headers $> objdump -d dyn.exe | grep printf # disassemble the program - Static linking $> gcc -static cube.c -o static.exe $> objdump -x static.exe $> objdump -d static.exe $> ls *.exe 7140 dyn.exe 577945 static.exe BINDING 8) In our example program: int i; void main() { for (i=1; i<=100; i++) fred(i); } What set of values is associated with int? - language implementation time as in C, or language specification time as in Java. What is the type of fred? - compile time. What is the address of the object code for main? - load time. What is the implementation of fred? - link time. What is the value of i? - Runtime. The binding times are: Language definition time Language implementation time Compile time Link time Load time Runtime 8.1) What is defined during language definition time? - meaning of key words. 8.2) What is defined during language implementation? - range of values of int in C (but not in Java) 8.3) What is defined during compilation time? - type of variables. 8.4) What is defined during link time? - code of external functions. 8.5) What is defined during load time? - Memory location of code, data, etc 8.6) What is defined during run time? - Value of variables. - Type of variables in Perl, JavaScript, Lisp, etc. 9) What is a debugger? 9.1) What are good debugging informations? - where is the program executing, - the trace of execution, - the value of variables. Example: Gdb Tutorial g++ -g buggy.cc gdb a.out $> run $> where #0 0x08048a32 in Node::next (this=0x0) at buggy.cc:28 #1 0x08048d27 in LinkedList::remove (this=0x804c008, item_to_remove=@0xbfc51450) at buggy.cc:77 #2 0x08048969 in main () at buggy.cc:120 $> x 0xbfc51450 // shows whats'up on item_to_remove 10.1) Why can't I simply do $> p item_to_remove ? $> bt $> break LinkedList::remove $> condition 1 item_to_remove==1 The role of operating system: - giving memory to the linker, - taking care of memory management, - interface with the world: I/O Example: LLDB Tutorial // Find the problem with the program below: // #include #include #define ARRAY_SIZE 4 int main() { int x[5] = {2, 3, 5, 7}; int sum = 0; for (int i = ARRAY_SIZE - 1; i > 0; --i) { sum += x[i]; } assert(sum == 17); return 0; } $> clang -g ch0.c $> lldb a.out (lldb) b main (lldb) run (lldb) next (lldb) display sum (lldb) next (lldb) display i (lldb) next ... To sum it up: Interpreter: swipl Compiler: gcc Assembler: as Linker: ld Loader: operating system. Virtual Machine: JVM JIT compiler: browser compiling applets. =============================================================================== Examples of execution of different programming languages =============================================================================== * Compiling C with LLVM: ======================== $> setllvm $> echo "int main() {return 42;}" > test.c $> $LLVM/clang -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk test.c $> ./a.out $> echo $? $> vim identity.c void identity(int** a, int N) { int i, j; for (i = 0; i < N; i++) { for (j = 0; j < N; j++) { a[i][j] = 0; } } for (i = 0; i < N; i++) { a[i][i] = 1; } } $> $LLVM/clang -cc1 -ast-view file.c $> $LLVM/opt -dot-cfg file.ll $> cat .identity.dot $> dot -Tpdf .identity.dot -o identity.pdf * Using rust: ============= $> vim fact.rs fn fact(n: u32) -> u32 { if n < 2 { 1 } else { n * fact(n - 1) } } fn main() { println!("Fact(10) = {}", fact(10)); } // Which programming language is this one? $> rustc fact.rc $> ./fact $> rustc --emit llvm-ir fact.rs $> $LLVM/opt -dot-cfg fact.ll $> ls -la .*.dot | grep fact $> dot -Tpdf .*fact*fact*.dot -o fact.pdf $> open fact.pdf - or - $> echo 'fn main() {println!("Hello, World");}' > hello.rs $> rustc hello.rs $> ./hello $> rustc hello.rs --emit=llvm-ir $> dot -Tpdf .main.dot -o main.pdf * Using Julia ============= $> julia julia> function fact(x::Int) if x < 2 return 1 else return x * fact(x-1) end end julia> fact(10) 3628800 julia> code_llvm(fact, (Int,)) define i64 @julia_fact_195(i64 signext %0) #0 { ... } julia> code_native(fact, (Int,)) * Using Java and the JVM ======================== $> vim T.java public class T { public static void main(String args[]) { System.out.println("Hello, World!"); } } $> javac T.java $> java T * Using Kotlin and the JVM ========================== $> mkdir kt $> cd kt $> vim hello.kt fun main(args: Array) { println("Hello, World!") } $> kotlinc hello.kt -include-runtime -d hello.jar $> java -jar hello.jar * Using Scala and the JVM ========================= $> vim HelloWorld.scala object HelloWorld { def main(args: Array[String]): Unit = { println("Hello, world!") } } $> scalac HelloWorld.scala $> scala HelloWorld