Abdelrahman Elogeel's Blog

Documenting Experience

How to get current path/directory in power shell in C#

Posted by Zikas on October 8, 2011

This is a quick tutorial which explains how can you get the current directory of the power shell navigator. As an example:

Navigator Current Directory

PS C:\Windows\System32>

C:\Windows\System32

PS C:\Program Files\>

C:\Program Files

PS E:\MyFile>

E:\MyFile

This can be done in C# using this piece of code:

using System.Management.Automation;

// Declare the class as a cmdlet and specify an 
  // appropriate verb and noun for the cmdlet name.
  [Cmdlet("Get", "CurrentDirectory")]
  public class GetCurrentDirectoryCommand : Cmdlet
  {
    protected override void BeginProcessing()
    {
      // Write current path
      Console.WriteLine(SessionState.Path.CurrentLocation.Path);
    }
  }

Posted in Windows Power Shell | Tagged: , , , , , , | 1 Comment »

Shared Assemblies and Strongly Named Assemblies (CLR via C#)

Posted by Zikas on August 6, 2011

The CLR supports two kinds of assemblies: weakly named assemblies and strongly named assemblies.

The real difference between weakly named and strongly named assemblies is that a strongly named assembly is signed with a publisher’s public/private key pair that uniquely identifies the assembly’s publisher.

An assembly can be deployed in two ways: privately or globally. A privately deployed assembly is an assembly that is deployed in the application’s base directory or one of its subdirectories. A weakly named assembly can be deployed only privately.

A strongly named assembly consists of four attributes that uniquely identify the assembly: a file name (without an extension), a version number, a culture identity, and a public key. Since public keys are very large numbers, we frequently use a small hash value derived from a public key. This hash value is called a public key token.

The following figure shows how PE is signed

Because public keys are such large numbers, and a single assembly might reference many assemblies, a large percentage of the resulting file’s total size would be occupied with public key information. To conserve storage space, Microsoft hashes the public key and takes the last8 bytes of the hashed value. These reduced public key values—known as public key tokens—are what are actually stored in an AssemblyRef table. In general, developers and end users will see public key token values much more frequently than full public key values. Note, however, that the CLR never uses public key tokens when making security or trust decisions because it is possible that several public keys could hash to a single public key token.

The Global Assembly Cash (GAC):

If an assembly is to be accessed by multiple applications, the assembly must be placed into a well-known directory, and the CLR must know to look in this directory automatically when a reference to the assembly is detected. This well-known location is called the global assembly cache (GAC), which can usually be found in the following directory (assuming that Windows is installed in the C:\Windows directory):

C:\Windows\Assembly

The GAC directory is structured: It contains many subdirectories, and an algorithm is used to generate the names of these subdirectories. You should never manually copy assembly files into the GAC; instead, you should use tools to accomplish this task. These tools know the GAC’s internal structure and how to generate the proper subdirectory names.

The most common tool for installing strongly named assemblies into the GAC is GACUtil.exe

What is the purpose of “registering” an assembly in the GAC? Well, say two companies each produce an OurLibrary assembly consisting of one file: OurLibrary.dll. Obviously, both of these files can’t go in the same directory because the last one installed would overwrite the first one, surely breaking some application. When you install an assembly into the GAC, dedicated subdirectories are created under the C:\Windows\Assembly directory, and the assembly files are copied into one of these subdirectories.

Consider using delayed signing if you want to install your assemblies to the CAG in development environment.

Figure below illustrates how CLR resolves a referenced type

The above figure is not correct case if the references type is in the .NET Framework assemblies. In this case, CLR loads the file that matches CLR version.

Type Forwarding:

The CLR supports the ability to move a type (class, structure, enum, interface, or delegate) from one assembly to another. For example, in .NET 3.5, the System.TimeZoneInfo class is defined in the System.Core.dll assembly. But in .NET 4.0, Microsoft moved this class to the MSCorLib.dll assembly. Normally, moving a type from one assembly to another would break applications. However, the CLR offers a System.Runtime.CompilerServices.TypeForwardedToAttribute attribute, which can be applied to the original assembly (such asSystem.Core.dll). The parameter that you pass to this attribute’s constructor is of type System.Type and it indicates the new type (that is now defined in MSCorLib.dll) that applications should now use. The CLR’s binder uses this information. Since the TypeForwardedToAttribute’s constructor takes a Type, the assembly containing this attribute will be dependent on the new assembly defining the type. If you take advantage of this feature, then you should also apply the System.Runtime.CompilerServices.TypeForwardedFromAttribute attribute to the type in the new assembly and pass to this attribute’s constructor a string with the full name of the assembly that used to define the type. This attribute typically is used for tools, utilities, and serialization. Since the TypeForwardedFromAttribute’s constructor takes a String, the assembly containing this attribute is not dependent on the assembly that used to define the type.

Publisher Control Policy:

Microsoft offers an XML config file that is used to ease the versioning of any assembly. Simply you (as a publisher for the assembly) can port the new version of your assembly with config file which will tell CLR to load the new assembly (say version 2.0) instead of the previous version (1.0). This is done automatically without any end user interaction.

Further if the end user wants to use the previous version for some reasons and ignores the publishers control policy, he can edit his application configuration file to disable the publisher control policy. Doing this for each application you’ve is not practicl so the solution is to edit the Machine.Config file to apply these changes.

Posted in C#, CLR via C# | Leave a Comment »

Building, Packaging, Deploying, and Administering Applications and Types (CLR via C#)

Posted by Zikas on August 4, 2011

A managed PE file has four main parts:

  1. PE32(+) header
    The PE32(+) header is the standard information that Windows expects
  2. CLR header
    The CLR header is a small block of information that is specific to modules that require the CLR (managed modules). The header includes the major and minor version number of the CLR that the module was built for: some flags, a MethodDef token indicating the module’s entry point method if this module is a CUI or GUI executable, and an optional strong-name digital signature. Finally, the header contains the size and offsets of certain metadata tables contained within the module. You can see the exact format of the CLR header by examining the IMAGE_COR20_HEADER defined in the CorHdr.h header file.
  3. Metadata
    The metadata is a block of binary data that consists of several tables. There are three categories of tables: definition tables, reference tables, and manifest tables. Table below describes some of the more common definition tables that exist in a module’s metadata block.

    Common Reference Metadata Tables

    an assembly is a unit of reuse, versioning, and security. It allows you to partition your types and resources into separate files so that you, and consumers of your assembly, get to determine which files to package together and deploy. Once the CLR loads the file containing the manifest, it can determine which of the assembly’s other files contain the types and resources the application is referencing. Anyone consuming the assembly is required to know only the name of the file containing the manifest; the file partitioning is then abstracted away from the consumer and can change in the future without breaking the application’s behavior.
    Below is Manifest Metadata tables

    To make your own assemblies appear in the .NET tab’s list, add the following subkey to the registry:
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\.NETFramework\AssemblyFolders\MyLibName
    MyLibName is a unique name that you create—Visual Studio doesn’t display this name. After creating the subkey, change its default string value so that it refers to a directory path (such as C:\Program Files\MyLibPath) containing your assembly’s files. Using HKEY_LOCAL_MACHINE adds the assemblies for all users on a machine; use HKEY_CURRENT_USER instead to add the assemblies for a specific user.
  4. IL.

Culture

If you’re designing an application that has some culture-specific resources to it, Microsoft highly recommends that you create one assembly that contains your code and your application’s default (or fallback) resources. When building this assembly, don’t specify a culture. This is the assembly that other assemblies will reference when they create and manipulate types it publicly exposes.

Now you can create one or more separate assemblies that contain only culture-specific resources—no code at all. Assemblies that are marked with a culture are called satellite assemblies. For these satellite assemblies, assign a culture that accurately reflects the culture of the resources placed in the assembly. You should create one satellite assembly for each culture you intend to support.

Posted in CLR via C# | Leave a Comment »

Learning and Disciplines in Computer Science

Posted by Zikas on July 24, 2011

No one argues that education and learning has important role in being outstanding. In this post I’ll try to explain from my point of view what are different learning areas in Computer Science.

Basic Computer Science

This area constructs basic thinking and problem solving skills for computer science student. Examples of this area include: Data Structures, Algorithms, File Structures, Operating Systems, Compiler Theory, Computer Security, Networking and others. Most topics in this area can be scientifically measured for example, you can know that this algorithm is efficient or not from complexity analysis theory.

This area is very important to be learnt while you are undergraduate. For example, you can’t go to your manager at work and say “Hey my manager, I want to learn what data structure is about”. How shame that you’ve not learnt anything about multithreading concepts or basic security topics, so what were you doing in your undergraduate life?

Software Engineering

Simply, you can’t be a software engineer without knowledge in this area. Software engineering is about the art of software construction and design this includes, Object Oriented Theory, Design Patterns, Code Refactoring, Software Architecting, UML, Writing Software Requirements, Framework Design, Software Testing, Software Development Processes and others. Some topics here are important to cover in your undergraduate studies (as Object Oriented) while others can be after graduation (as Framework Design).

From it’s nature of being art, there’s no specific or fixed metric that takes a design of a software and gives you guaranteed judgment. You need to have the talent and experience of how to design software to judge on your designs.

Technology

This area is further divided into three parts.

Internals

Internals are about having knowledge of Assembly Language, Windows APIs, POSIX APIs, Windows Sockets, Windows Memory Architecture, Windows Threading, Windows Thread Synchronization, Compiler Internals (Virtual Table, Virtual Pointer, Automatic Generated Code), Inter-Process Communication and others. This is about learning what’s underneath to have solid understanding of what’s happening in your development environment.

Programming Languages

Self exploratory, like C++, C#, Java, Lua, Objective-C and others.

Others

Android SDK, SQL Server, iOS, Code Revision (VSS, SVN, TFS), ASP.NET, ADO.NET, LINQ, WCF, WF, XNA, COM, COM+, Visual Studio and others.

Advanced Computer Science

Being involved in the theory of a specific computer science as, Algorithms, Artificial Intelligence, Cloud Computing, Programming Language Design, Compilers Theory. This area requires lot of dedication and basic science like probability, mathematics.

Domain Knowledge

When you are working on a specific software you’ll need to learn about your area domain knowledge. For example if your are working in games you’ll need to learn about Game Programming techniques.

Miscellaneous

Debugging, Writing Bug-Free Code, Writing Clean Code, Dealing with Legacy Code, Understanding Existing Code and others are good to have skills as Software Engineer.

Engineering Experience

The most important part in the learning process is about actual engineering experience you’ve. You are expected to come from the university with a base level of knowledge, but it takes years of actual development work to become a great software engineer. You have to start small, by fixing lots of bugs and doing small features, and work your way up over several releases. You can attend classes and read books all day long but you will never become a great engineer without years of experience*

Disciplines in Computer Science

One of common questions from undergraduates is “What are differences between Programmer and Developer?”. I see that you can divide Computer Science titles based on combination from this area. For example (this is just for explanation purpose not an ISO standard)

Programmer = Technology (Others) + Domain Knowledge

Software Engineer = Basic Computer Science + Software Engineering + Technology (Others) + Domain Knowledge

Academic Researcher = Basic Computer Science + Advanced Computer Science

Computer Scientist = Basic Computer Science + Advanced Computer Science + Software Engineering

* This section was a suggestion by Jason Allor and I see it’s very critical. Thanks Jason!

Posted in Orientation | Tagged: , , , | 6 Comments »

The CLR’s Execution Model

Posted by Zikas on June 27, 2011

Compiling Source Code into Managed Modules:

The common language runtime (CLR) is just what its name says it is: a runtime that is usable by different and varied programming languages. The core features of the CLR (such as memory management, assembly loading, security, exception handling, and thread synchronization) are available to any and all programming languages that target it.

In fact, at runtime, the CLR has no idea which programming language the developer used for the source code! By the way, managed assemblies always take advantage of Data Execution Prevention (DEP) and Address Space Layout Randomization (ASLR) in Windows; these two features improve the security of your whole system.

Figure below describes the CLR compilation process

Table blow describes parts of a managed module

Combining Managed Modules into Assemblies:

The CLR doesn’t actually work with modules, it works with assemblies. An assembly is an abstract concept that can be difficult to grasp initially. First, an assembly is a logical grouping of one or more modules or resource files. Second, an assembly is the smallest unit of reuse, security, and versioning. Depending on the choices you make with your compilers or tools, you can produce a single-file or a multi-file assembly. In the CLR world, an assembly is what we would call a component.

Figure below should help explain what assemblies are about. In this figure, some managed modules and resource (or data) files are being processed by a tool. This tool produces a single PE32(+) file that represents the logical grouping of files. What happens is that this PE32(+) file contains a block of data called the manifest. The manifest is simply another set of metadata tables. These tables describe the files that make up the assembly, the publicly exported types implemented by the files in the assembly, and the resource or data files that are associated with the assembly.

Assembly can do the following for you:

  • Allows you to decouple the logical and physical notions of a reusable, securable, versionable component.
  • Assemblies are self-descriptive, so their deployment is very easy.

Loading the Common Language Runtime:

Developers who want to write code that works only on a specific version of Windows might do this when using unsafe code or when interoperating with unmanaged code that is targeted to a specific CPU architecture. To aid these developers, the C# compiler offers a /platform command-line switch. This switch allows you to specify whether the resulting assembly can run on x86 machines running 32-bit Windows versions only, x64 machines running 64-bit Windows only, or Intel Itanium machines running 64-bit Windows only. If you don’t specify a platform, the default is anycpu, which indicates that the resulting assembly can run on any version of Windows.

64-bit versions of Windows offer a technology that allows 32-bit Windows applications to run. This technology is called WoW64 (for Windows on Windows64). This technology even allows 32-bit applications with x86 native code in them to run on an Itanium machine, because the WoW64 technology can emulate the x86 instruction set; albeit with a significant performance cost.

Process of loading is describes as:

  1. Windows examines EXE file’s header to determine whether the application requires a 32-bit or 64-bit address space.
  2. Windows also checks the CPU architecture information embedded inside the header to ensure that it matches the CPU type in the computer.
  3. After Windows has examined the EXE file’s header to determine whether to create a 32-bit process, a 64-bit process, or a WoW64 process, Windows loads the x86, x64, or IA64 version of MSCorEE.dll into the process’s address space
  4. Then, the process’s primary thread calls a method defined inside MSCorEE.dll. This method initializes the CLR, loads the EXE assembly, and then calls its entry point method (Main).

Executing your Assembly Code:

To execute a method, its IL must first be converted to native CPU instructions. This is the jobof the CLR’s JIT (just-in-time) compiler. Figure below shows what the happens when WriteLine is called for the first time.

Figure below shows what the process looks like when WriteLine is called the second time.

A performance hit is incurred only the first time a method is called. All subsequent calls tothe method execute at the full speed of the native code because verification and compilationto native code don’t need to be performed again.

Benefits of CLR and managed code over unmanaged code:

  • Write optimal code depending on the current machine architecture.
  • A JIT compiler can determine when a certain test is always false on the machine that itis running on. For example, consider a method that contains the following code:
    if (numberOfCPUs> 1) { ….}
    This code could cause the JIT compiler to not generate any CPU instructions if the hostmachine has only one CPU. In this case, the native code would be fine-tuned for thehost machine; the resulting code is smaller and executes faster.
  • The CLR could profile the code’s execution and recompile the IL into native code whilethe application runs. The recompiled code could be reorganized to reduce incorrectbranch predictions depending on the observed execution patterns. Current versions ofthe CLR do not do this, but future versions might.
  • Verification:
    • Verifies the code and make sure that there are no security problems.
    • Ability to runmultiple managed applications in a single Windows virtual address space. That will save a lot of OS resources.

The Native Code Generator Tool: NGen.exe:

The NGen.exe tool that ships with the .NET Framework can be used to compile IL code tonative code when an application is installed on a user’s machine. Since the code is compiledat install time, the CLR’s JIT compiler does not have to compile the IL code at runtime, andthis can improve the application’s performance. The NGen.exe tool is interesting in twoscenarios:

  1. Improving an application’s startup time.
  2. Reducing an application’s working set.

The compiled files using NGen.exe can be found under the directory

C:\Windows\Assembly\NativeImages_v4.0.#####_64

The directory name includes theversion of the CLR and information denoting whether the native code is compiled for x86(32-bit version of Windows), x64, or Itanium (the latter two for 64-bit versions of Windows).

Now, whenever the CLR loads an assembly file, the CLR looks to see if a correspondingNGen’d native file exists. If a native file cannot be found, the CLR JIT compiles the IL code asusual.

There are several potential problems with respect to NGen’d files:

  1. No intellectual property protection (especially in customer side applications)
  2. NGen’d files can get out of sync:
    When the CLR loads an NGen’d file, it compares anumber of characteristics about the previously compiled code and the current execution environment. If any of the characteristics don’t match, the NGen’d file cannot beused, and the normal JIT compiler process is used instead. Here is a partial list of characteristicsthat must match:
    1. CLR version: this changes with patches or service packs
    2. CPU type: this changes if you upgrade your processor hardware.
    3. Windows OS version: these changes with a new service pack update.
    4. Assembly’s identity module version ID (MVID): this changes when recompiling.
    5. Referenced assembly’s version IDs: this changes when you recompile a referencedassembly.
    6. Security: this changes when you revoke permissions (such as declarative inheritance,declarative link-time, SkipVerification, or UnmanagedCode permissions),that were once granted.
  3. Inferior execution-time performance

For sure it doesn’t make sense to use NGen.exe with server-side services.

Interoperability with Unmanaged Code:

The CLR supports these interoperability scenarios

  1. Managed code can call unmanaged function in a DLL: using P/Invoke mechanism.
  2. Managed code can use an existing COM component (server)

Unmanaged code can use a managed type (server)

Posted in CLR via C# | Leave a Comment »

Explore your Environment: CLR Fundamentals

Posted by Zikas on June 15, 2011

As with any form of troubleshooting, the more you understand the underlying system being debugged the greater success you will have at identifying the root cause. In the .NET world, this translates to understanding how the runtime itself functions. Knowing how the garbage collector works will enable you to more efficiently debug memory “leak” issues. Knowing how the interoperability layer works will enable you to more efficiently debug COM problems. Knowing how synchronization works will enable you to more efficiently debug hangs. And the list goes on and on. Venturing outside of the comfort zone of your own application and digging deep into the run time will greatly enhance your debugging success. Problems that may have otherwise taken weeks to debug through traditional means can now be solved in a relatively short time span.

In this article, we will take a guided tour of the .NET runtime especially core runtime components and concepts useful when debugging.

High-Level Overview:

At a high level, .NET is a virtual runtime environment that consists of a virtual execution engine, the Common Language Runtime (CLR), and a set of associated framework libraries. Applications written for .NET, at compile time, do not translate into machine code but instead use an intermediary representation that the execution engine translates at runtime (depending on architecture). Although this may seem as if the CLR acts as an interpreter (interpreting the intermediate language), the primary difference between the CLR and an interpreter is that the CLR does not retranslate the intermediate code each and every time. Rather, it takes a one-time hit of translating a chunk of intermediate code into machine code and then reuses the translated machine code in all subsequent invocations.

To better understand what components .NET consists of, Figure below illustrates the 50,000-foot overview of the different entities involved in the .NET world. At the core of .NET, there is an ECMA standard that states what implementations of the .NET runtime need to adhere to in order to be compliant. This standards document is commonly referred to as the Common Language Infrastructure (CLI). The CLI doesn’t just dictate rules for the runtime itself but also includes a set of library classes that are considered crucial and common enough to warrant inclusion. This set of class libraries is called the Base Class Libraries (BCL). The next layer in the Figure is the Common Language Runtime (CLR). This is an actual component and represents Microsoft’s implementation of the CLI. When a .NET redistributable package is installed on a machine, it includes the CLR. On top of the CLR sits the .NET framework. These are all the libraries that are available to developers when creating .NET applications. The .NET framework can be considered a superset of the BCL and includes frameworks such as the Windows Communication Foundation (WCF), Windows Presentation Foundation (WPF), and much more. The libraries that are part of the .NET framework but not the BCL are considered outside of the standards realm, and any applications that make use of them may or may not work on other CLI implementations besides the CLR. At the top level, we have the .NET applications, which run within the confines of the CLR.

Are there other CLI complaint implementations?

Is Microsoft’s CLR the only implementation of the CLI out there? Not quite. Because the CLI has become increasingly popular, there are a number of companies/organizations that have produced their own CLI-compliant runtimes. A great example of such an implementation is the Mono project (sponsored by Novell). In addition to being an open source project, the Mono CLI implementation can run on Windows, Linux, Solaris, and Mac OS X.

    Additionally, Microsoft has released the Shared Source Common Language Infrastructure (2.0), aka Rotor project, which includes a CLI-compliant implementation of the standard. Because the source code is shared source, this project provides great insights into how a functional implementation works.

Because the CLR is responsible for all aspects of .NET application execution, what does the general execution flow look like? Figure below illustrates a high-level overview of the execution model starting with the application’s source code

In .NET, the net outcome of a compilation is known as an assembly. The notion of an assembly is at the heart of .NET and will be discussed in more detail later in the chapter. For now, you can view the assembly as a self-contained entity that encapsulates everything that needs to be known about the application (including the code, or MSIL for the application). When the .NET assembly is run, the CLR is automatically loaded and begins executing the MSIL. The way that MSIL is executed is by first translating it to instructions native to the platform that the code is executing on. This translation is done at runtime by a component in the CLR known as the Just-In-Time (JIT) compiler.

CLR and Windows Loader:

Windows loader is able to execute normally a native code program but in .NET case the program code is in MSIL not native code so how Windows execute such programs? The answers lies in the portable executable (PE) file format. Figure below illustrates at a high level, the general structure of a PE image file.

To support execution of PE images, the PE header includes a field called AddressOfEntryPoint. This field indicates the location of the entry point for the PE file. In the case of a .NET assembly, it points to a small piece of stub code located in the .text section. The next field of importance is in the data directories. When any given .NET compiler produces an assembly, it adds a data directory entry to the PE file. More specifically, the data directory entry is at index 15 and contains the location and size of the CLR header. The CLR header is then located in the next part of interest in the PE file, namely the .text section. The CLR header consists of a structure named the IMAGE_COR20_HEADER. This structure contains information such as the managed code application entry point, the major and minor version of the target CLR, and the strong name signature of the assembly. You can view this data structure as containing information needed to know which CLR to load and the most basic data about the assembly itself. Other parts of the .text section include the assembly metadata tables, the MSIL, and the unmanaged startup stub. The unmanaged startups tub simply contains the code that will be executed by the Windows loader to bootstrap the execution of the PE file.

In the next few sections, we will take a look at how the Windows loader loads both native images as well as .NET assemblies.

Loading Native Images:

To better understand the loading of .NET assemblies, we’ll start by looking at how the Windows loader loads native PE images. Let’s use good old notepad.exe as the example executable (running on Windows Vista Enterprise). Please note that when dealing with PE files there are two important terms used:

File offset: This is the offset within the PE file of any given location.

Relative Virtual Address (RVA): This value is applicable only when the PE image has been loaded and is the relative address within the virtual address space of the process. For example, an RVA of 0×200 means 0×200 bytes from the image base address once loaded into memory.

Loading .NET Assemblies:

  1. The user executes a .NET assembly.
  2. The Windows loader looks at the AddressOfEntryPoint field and references the .text section of the PE image file.
  3. The bytes located at the AddressOfEntryPoint location are simply a JMP instruction to an imported function in mscoree.dll.
  4. Control is transferred to the _CorExeMain function in mscoree.dll to bootstrap the CLR and transfer execution to the assembly’s entry point.

Assembly Overview:

At a high level, an assembly is the primary building block and deployment unit of .NET applications and can be viewed as a self-describing logical container for other components. When I say self-describing I mean that the assembly contains all the necessary information to uniquely identify and describe the assembly.

There are two different categories of assemblies:

  1. Shared assemblies: are assemblies that are intended to be used across different.NET applications. Framework assemblies are good examples of shared assemblies.
  2. Private assemblies: are assemblies that are used as part of an application/component but are not suitable to be used by other applications/components.

Assembly Manifest:

Because an assembly is the fundamental building block of .NET applications and is entirely self-describing, where is the descriptive content stored? The answer lies in the metadata section of an assembly, also known as the assembly manifest. An assembly manifest is typically embedded in the assembly PE file but is not required to be.

    Below is an example for single and multi-file assemblies

An assembly manifest typically contains the following pieces of information:

  1. List of dependent native code modules
  2. List of dependent assemblies
  3. Version of the assembly
  4. Public key token of the assembly (if assigned)
  5. Assembly resources
  6. Assembly flags such as stack reserve, sub system and so on

The best way to view the manifest for a given assembly is to use a tool called ILDasm. It is installed as part of the .NET 2.0 SDK and can display very rich assembly information. To view the manifest of an assembly, launch ildasm.exe with the name of the assembly from the command line.

Type Metadata:

Each object instance located on the managed heap consists of the following pieces of auxiliary information (check the figure below):

  • The sync block is a bit mask of auxiliary information or an index into a table maintained by the CLR and contains auxiliary information about the object itself.
  • The type handle is the fundamental unit of the type system in the CLR. It serves as the starting point for fully describing the type located on the managed heap.
  • The object instance comes after the sync block index and the type handle and is the actual object data.

The method table contains metadata that fully describe the particular type. Figure below illustrates the overall layout of the method table

The very first category of data that the type handle points to contains some miscellaneous information about the type itself. Table below illustrates the fields in this category

Method Descriptor:

A method descriptor contains detailed information about a method such as the textual representation of the method, the module it is contained within, the token, and the code address of the code behind the method.

Modules:

Previously, we explained that an assembly can be viewed as a logical container for one or more code modules. A module then can be viewed as containing the actual code and/or resources for a given component. When traversing various kinds of CLR data structures (such as method tables, method descriptors, etc.), they all typically contain a pointer to the module where they are defined.

Metadata Tokens:

At a high level, a metadata token is represented by 4 bytes, as illustrated in Figure below.

The high-order byte represents the table that the token is referencing. Table below outlines the different tables available

EEClass:

The EEClass data structure is best viewed as the logical equivalent of the method table, and as such can be described as a mechanism to enable the self descriptive nature of the CLR type system. Internally, the EEClass and method table are two distinct constructs, but logically they represent the same concept, thus begging the question of why the separation was introduced to begin with. The separation occurred based on how frequently type fields were used by the CLR. Fields that are used quite frequently are stored in the method table, whereas fields that are used less frequently are stored in the EEClass data structure.

    Figure below provides an overview of the most key elements of the EEClass data structure

The hierarchical nature of object-oriented languages such as C# is replicated in the EEClass structure. When the CLR loads types, it creates a similar hierarchy of EEClass nodes with parent and sibling pointers, enabling it to traverse the hierarchy in an efficient manner. For the most part, the fields in the EEClass data structure are straightforward. One field of importance is the MethodDesc Chunk field that contains a pointer to the first chunk of method descriptors in the type. This enables you to traverse the method descriptors that are part of any given type. Each chunk also contains a pointer to the next chunk in the chain.

Posted in .NET Debugging | Leave a Comment »

Introduction to the Debugging Tools

Posted by Zikas on June 9, 2011

Ad-hoc debugging, is about guessing the general area of the code base where the source of the problem might be and then tracing the code line by line. Using debugging tools may speedup and safe time/efforts for developers while debugging.

Here we describe common tools for debugging.

Debugging Tools for Windows:

Usage scenarios: Collection of debuggers and tools.

Download: www.microsoft.com/whdc/devtools/debugging/default.mspx

There are three user mode debuggers available in the Debugging Tools for Windows package—NTSD, CDB, and WinDbg—and one kernel mode debugger (kd). Although these debuggers are three separate tools, it is important to understand that they all rely on the same core debugger engine. The most significant difference between the debuggers is that WinDbg has a graphical user interface (GUI) component, making it easier to work with when doing source level debugging.In contrast, NTSD and CDB are purely console-based debuggers

SOS:

Usage scenarios: General debugging extension for .NET applications

Download: it’s already part of .NET SDK

SOS is a debugger extension that can be used to debug .NET applications using the native debugger. It provides a truly amazing set of commands that enables developers to delve deep into the CLR and help troubleshoot pesky application bugs. Among other things, there are commands that enable you to see the finalization queues, managed heaps, managed threads, setting managed code breakpoints, seeing exceptions, and much more.

Because SOS provides an abstracted view into the internals of the CLR, it’s important to note that when debugging using the SOS debugger extension; care must be taken to use the correct version of SOS. Each of the .NET versions ship with its corresponding version of SOS and can be found in the following location:

%windir%\microsoft.net\<architecture>\<version>\sos.dll

Architecture can be either Framework (for 32-bit) or Framework64 (for64-bit), and the version represents the version of the .NET framework you are targeting. Before the SOS debugger extension can be used, it must be loaded into the debugger by using the .load command. The following listing illustrates the loading process when running notepad.exe under the debugger.

Contrary to what you might believe, the SOS debugger extension is not named after the distress signal. When the .NET framework was in its 1.0 stage, the Microsoft development team used a debugger extension called STRIKE to figure out complex problems in .NET code. As the .NET framework matured, so did the debugger extension, and it became known as Son of Strike (SOS).

SOSEX:

Usage scenarios: General debugging extension for .NET applications

Download: www.stevestechspot.com/downloads/sosex_32.zip or www.stevestechspot.com/downloads/sosex_64.zip

SOSEX is another debugger extension targeted at the native debuggers and managed code debugging. It was developed by Steve Johnson and is available as a free download. SOSEX, not surprisingly, stands for SOS Extended. SOSEX adds a set of powerfull debugging commands to your arsenal. Examples of such commands include deadlock detection, generational garbage collection commands, and more powerful breakpoint commands.

CLR Profiler:

Usage scenarios: Memory Allocation Profiler

Download: http://www.microsoft.com/downloads/en/details.aspx?FamilyID=be2d842b-fdce-4600-8d32-a3cf74fda5e1

The CLR Profiler is an invaluable tool when it comes to troubleshooting memory related issues in .NET applications. It provides features such as:

  • Heap statistics (including allocation graphs)
  • Garbage collection statistics
  • Garbage Collector Handle Statistics (including allocation graphs)
  • Garbage Collection Generation Sizes
  • Profiling Statistics

Clicking the Start Application button brings up a dialog where you can choose the application you want to profile. After an application and profiling action has been chosen, the CLR Profiler launches the application and starts collecting data. The CLR Profiler offers a number of different statistical views of the data collected. Below is a screen shot from the application:

The data collected is output to a log file that is by default located in %windir%\Temp. The log filename takes the form:

Pipe_<pid>.log

Where <pid> is the process identifier of the process being profiled. The CLR Profiler can also be started and controlled via the command line.

Performance Counters:

Performance counters are an important part of the troubleshooting process. During the .NET framework installation process, a collection of performance counters is installed. These performance counters represent a goldmine of information when analyzing .NET application behavior. To view the performance counters, the Windows Performance Monitor can be used. Table below lists all the performance counter categories that are part of the .NET runtime.

Reflector for .NET:

Usage scenarios: .NET assembly analyzer and assembler.

Download: http://www.reflector.net/

Reflector for .NET is a .NET assembly explorer tool that includes a powerful disassembler that can reproduce the code from the MSIL (Microsoft Intermediate Language) to a higher level language of choice. The language choices are C#, VisualBasic, Delphi, Managed C++, and Chrome. Additionally, it includes an extensibility model in the form of an add-in API. There are many add-INS available ranging from a code review add-in to a code metrics add-in. Figure 1-4 shows an example of analyzing the Reflector.exe binary itself using Reflector for .NET.

PowerDbg:

Usage scenarios: Debugger tool.

Download: www.codeplex.com/powerdbg

PowerDbg is a library developed by Roberto Farah that allows you to control the native debuggers via Powershell (requires 1.0). It is a super useful tool when you want to control the execution of the debuggers using the command line. The PowerDbgscript returns information to the user in a neat and easily digestible fashion. The greatthing about PowerDbg is that it is easily extensible and enables calling and formattingyour favorite commands (or a set of commands in common debug scenarios).

Managed Debugging Assistants:

Usage scenarios: General CLR Debugging

Download: Part of CLR

Managed Debugging Assistants (MDAs) is not a standalone tool per se; rather, it is a component of the CLR that provides invaluable information when running and debugging .NET applications. If you are familiar with Application Verifier for native code, MDAs serve a very similar purpose. Through elaborate instrumentation of the runtime, common programming mistakes can be identified at runtime and subsequently fixed prior to shipping the application. Find in this link number of available troubleshoot problems within that category.

To utilize MDAs, they must first be enabled (prior to starting the process being debugged). The way to enable theMDAs is via the registry. More specifically, you need to add the following value under the registry key (the value is string type):

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\.NETFramework\MDA=”1″

By setting the preceding registry value, you have notified the CLR that MDAs should be utilized. Before you can actually make use of them though, you need to enable specific MDAs on a per application basis. The process of enabling MDAs is done via a configuration file that must be named according to the rule

<appname>.exe.dma.config

where appname is the name of the application that you want to enable MDAs for.The configuration file itself contains all the MDAs that you want enabled.

Posted in .NET Debugging | Leave a Comment »

CLR via C#

Posted by Zikas on June 1, 2011

All posts in tagged with CLR via C# are summaries for Jeffery Richter great book CLR via C#.

Posted in CLR via C# | Tagged: , , , , , | Leave a Comment »

Starting at Microsoft

Posted by Zikas on May 10, 2011

Starting in C#:

The key with C# is not about knowing its syntax rather; it’s about mastering the Framework.

CLR via C# is strongly recommended to read.

Working as a developer:

One of important roles for you as a developer is to know how to develop Framework. You can read Framework Design Guidelines as a reference for this point.

Besides that, you need to read Refactoring: Improving the Design of Existing Code which will increase you knowledge with the development in existing Framework.

As a developer you need to do the following:

  1. Write clean code.
  2. Understand .Net Framework.
  3. Master design patterns.
  4. Master unit testing.
  5. Plan and communicate well.

The more your code introduces bugs, the more your manager loves you!

Unit Testing:

One of the common tools in unit testing in .Net is nunit.

Why Unit Test:

  • Let you know if your code works fine with previous code or not. (some developers fixes a bug but introduces 2 new bugs in another location)
  • Reduce number of bugs in your code.
  • You’ve more confidence when changing your code.
  • Helps you refactoring your code.

Advanced .Net Debugging is useful to understand debugging and how to fix bugs.

Writing Solid Code helps you to avoid bugs.

Posted in Orientation | 3 Comments »

The Semantics of Constructors

Posted by Zikas on March 24, 2011

Default Constructor Construction:

There are four characteristics of a class under which the compiler needs to synthesize a default constructor for classes that declare no constructor at all. The Standard refers to these as implicit, nontrivial default constructors. The synthesized constructor fulfills only an implementation need. It does this by

  1. Invoking member object default constructor or,
  2. Base class default constructors or,
  3. Initializing the virtual function or,
  4. Virtual base class mechanism for each object.

Classes that do not exhibit these characteristics and that declare no constructor at all are said to have implicit, trivial default constructors. In practice, these trivial default constructors are not synthesized.

Within the synthesized default constructor, only the base class subobjects and member class objects are initialized. All other nonstatic data members, such as integers, pointers to integers, arrays of integers, and so on, are not initialized. These initializations are needs of the program, not of the implementation. If there is a program need for a default constructor, such as initializing a pointer to 0, it is the programmer’s responsibility to provide it in the course of the class implementation.

Programmers new to C++ often have two common misunderstandings:

  1. That a default constructor is synthesized for every class that does not define one
  2. That the compiler-synthesized default constructor provides explicit default initializers for each data member declared within the class

As you have seen, neither of these is true

In the case of having virtual function in your class, the following two class "augmentations" occur during compilation:

  1. A virtual function table (referred to as the class vtbl in the original cfront implementation) is generated and populated with the addresses of the active virtual functions for that class.
  2. Within each class object, an additional pointer member (the vptr) is synthesized to hold the address of the associated class vtbl.

Copy Constructor Construction:

When are bitwise copy semantics not exhibited by a class? There are four instances:

  1. When the class contains a member object of a class for which a copy constructor exists (either explicitly declared by the class designer, or synthesized by the compiler)
  2. When the class is derived from a base class for which a copy constructor exists (again, either explicitly declared or synthesized)
  3. When the class declares one or more virtual functions
  4. When the class is derived from an inheritance chain in which one or more base classes are virtual

Program Transformation Semantics:

Such a requirement would levy a possibly severe performance penalty on a great many programs. For example, although the following three initializations are semantically equivalent:

X xx0( 1024 );
X xx1 = X( 1024 );
X xx2 = ( X ) 1024;

In the second and third instances, the syntax explicitly provides for a two-step initialization:

  1. Initialize a temporary object with 1024.
  2. Copy construct the explicit object with the temporary object.

That is, whereas xx0 is initialized by a single constructor invocation

// Pseudo C++ Code
xx0.X::X( 1024 );

a strict implementation of either xx1 or xx2 results in two constructor invocations, a temporary object, and a call to the destructor of class X on that temporary object:

// Pseudo C++ Code
X __temp0;
__temp0.X::X( 1024 );
xx1.X::X( __temp0 );
__temp0.X::~X();

The simplest method of implementing the copy constructor is as follows:

Point3d::Point3d( const Point3d &rhs )

{

   _x = rhs._x;

   _y = rhs._y;

   _z = rhs._z;

};

This is okay, but use of the C library memcpy() function would be more efficient:

Point3d::Point3d( const Point3d &rhs )

{

   memcpy( this, &rhs, sizeof( Point3d );

};

Use of both memcpy() and memset(), however, works only if the classes do not contain any compiler-generated internal members. If the Point3d class declares one or more virtual functions or contains a virtual base class, use of either of these functions will result in overwriting the values the compiler set for these members.

As you can see, correct use of the memset() and memcpy() functions requires some knowledge of the C++ Object Model semantics!

Member Initialization List:

You must use the member initialization list in the following cases in order for your program to compile:

  1. When initializing a reference member.
  2. When initializing a const member.
  3. When invoking a base or member class constructor with a set of arguments.

The order in which the list entries are set down is determined by the declaration order of the members within the class declaration, not the order within the initialization list.

In summary, the compiler iterates over and possibly reorders the initialization list to reflect the declaration order of the members. It inserts the code within the body of the constructor prior to any explicit user code.

Posted in C++ Object Model | Leave a Comment »

 
Follow

Get every new post delivered to your Inbox.