Community Bonding Period with Compiler Research Group at Princeton University

27 May 2024
tech,
cern

May 04, 2024 - May 27, 2024 #

Submitted a PR to Compiler Research Group Github repo for adding me to the Compiler Research Team.
Got added to Google and CERN-HSF Contributors mailing lists.
Interacted with other CERN contributors and mentors virtually- shared ideas and thoughts on my project and progress.
Set up my Payoneer account for GSoC stipend successfully.
Learnt and struggled to setup a legacy codebase- faced build errors and deprecated warnings, raised built issues #232, 223 and doc fixes PRs.
While setting up my project, I faced issues with Linux and Nvidia driver support ecosystem and also conflicts due to different versions of C++ on Ubuntu.
Attended Google Summer of Code Contributor’s Summit(lasted for an hour then jumped to discuss about the project with other fellow contributors).

A glimpse of GSoC Contributor's Summit

GSOC Contributor's Summit

Attended our first introductory CERN- HSF meeting where contributors and mentors introduced themselves and shared their computing history and interests with each other. Also, we reported our progress to our mentors.
Experimented building project in 3-4 linux distributions only to receive build errors at the end :0, finally setup whole ecosystem in Ubuntu 24.04.(I was on a verge to install Windows, lol).
Read about project’s history, ecosystem and the future by watching conferences, talks and previous interns work.
Created my first presentation for CERN-HSF org admins introductory meeting for Google Summer Of Code.

Some low-level fun:

Learnt about new compiler construction concepts like Name Mangling,

Name mangling is a technique used by C++ compilers to encode additional information into the names of functions, classes, and other entities in order to support language features like function overloading, namespaces, and templates.

Root of all evil: The exact mangling scheme differs between compilers, as there is no standardized mangling format in C++. This is why object files and libraries generated by different C++ compilers cannot be easily linked together - their mangled names will be incompatible.

Learnt about binders, helper functions, shared libraries(.so or .dll), header files, convenience functions or proxies.
Setup a Debugger, I had to choose between GDB and LLDB, went for the first one which is default on my linux system- Interestingly, went through some adventure playing around with code and debugger. Learnt a thing or two about how low level details are showed and captured by compiler internals. In the end, I found some CUDA code tried running it on GDB to understand the working of proxies, and other internals.
Finally the gcc-clang issue got solved by a small PR fix #234(adding a header file to advancedcpp.h) but we got entertained with another error coming out of python version 3.12.
Tested my project with different versions of Python, found some horrific errors while building it with latest python 3.12.2, reported and get it updated. Voila, now it still shows some more errors(but luckily this time lot less than before). Reported it to the team. Let’s see what more challenges lies with upcoming future versions of dependencies :0
Interacted with my project team at CERN on meet again and I reported my findings.
Finally, the issue that was irritating me since day 1 is now fixed!! (PR fix related to it).
Read about CUDA compilation and research papers on how to add CUDA support to existing applications.
Setup GDB python extension to debug python files using GDB because GDB does not support python out of the box. Conclusions: It is better not to use GDB for testing large python files, only good for smaller tests.

Some low-level fun:

Terminologies that I learnt while understanding the codebase:
- Lazy loading: Functions/methods are loaded only when they are required during the runtime(like lazy loading of lookup table information).
- Lookup table: Compilers utilizes the concept of lookup tables/symbol tables during multiple phases of compilation, for example: while type checking, scope resolution, etc. Lookup table consists of information about symbols like type information, address, size, where defined, etc which helps to throw errors during while executing code in compile time/runtime.
- Type promotion: Promotion of small types to bigger types. For ex: promotion of char to int, float to double for more precision and efficiency.
- Type casting: Conversion of types from one to another, implicit or explicit type casting.
- Difference between Type promotion and Type casting? (Need to check, if you know then do discuss with me!)
- Type checking: Checking the type information using lookup/sym table during semantic analysis.
- Type mapping: Mapping of one type to another during the IR phase.. while performing transpilation or compilation(mid phase of compiler).
- Type resolution: The process by which a programming language compiler or interpreter determines the specific types of variables, expressions, and functions during compilation or execution. This process is crucial for enforcing type safety, optimizing performance, and ensuring that operations are performed on compatible data types.
- Object Return by Value in C++ :Returning an object by value- means that when a function returns an object, a copy of that object is made and returned to the caller. This is in contrast to returning a reference or a pointer to the object, where no copy is made, and the caller receives a reference or pointer to the original object.
Finally, after struggling with Python-GDB Debugging, I shifted to a new debugging environment- Using PyCharm Debugger for tracing python tests and debugging python code. To avoid this instance: Imagine debugging a debugger for debugging :)
Learnt about python coding practices/conventions(PEP 8 — Style guide for python code) related to the project. This forced me to go through the advanced Python concepts. Including list comprehension, docstrings, decorators, generators, functional python programming, magic methods, testing in python using pytest, type hinting: For documentational practices(better for code readability, predictablility, tools like mypy allows integrity and consistency of types.
Faced issues with Cuda 12.0 support(With respect to Cppyy) on Ubuntu 24.04 so downgrading to Ubuntu 20/22 for cuda 10.2(Cppyy supported CUDA version) support in order to move forward. Conclusion: Cppyy doesn't support latest CUDA runtime APIs because apparently the run-time APIs to launch kernels changed between CUDA 10.2 and 11.
Discussed with the mentors and org admins about extension of my GSoC project from standard 12 weeks to 22 weeks. My request is now accepted!!
Latest: Tried running Cuda sample code in Cppyy using Cuda 10.2 but still facing errors. More likely, the error is due to incompatible GCC/Clang versions on my system. Let's see....

Conclusions #

During this phase, I met interesting people with whom I shared my ideas, perspectives and also got introduced to new ideas and domains like High Energy Physics Computing, Scientific and High Performance Computing, etc. Moreover, after going through painful setup issues, I was again reminded of the importance of a well-maintained documentation, dependency management and codebase. These are the steps that should be taken in order to make the setup smoothly for newcomers:

Improve the documentation- Add the troubleshooting and solutions of the issues that you faced during your setup, tools to install(mentioning the versions and what might cause breaking compatibility issues later).
Dependency management with references to official dependency's websites.
Code reveiwing, Code coverage something like this: https://app.codecov.io/gh/compiler-research.
Testing guide

A glimpse of CERN GSoC org admin meeting <3

CERN org admin meeting

Previous: Google Summer Of Code at CERN-HSF
Next: Systems Research at Berkeley Lab