Jump to content

Wikipedia:Reference desk/Archives/Computing/2019 April 19

From Wikipedia, the free encyclopedia
Computing desk
< April 18 << Mar | April | May >> April 20 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


April 19

[edit]

Are different types of machine code actually different “formats”?

[edit]

If I open a .exe file of a computer game with some text editor, I see what I once grasped as "Gibberish" which today I consider machine code in some non-binary format (likely hexadecimal), compiled from the original source code (likely "human-readable" (C++) code).

Are different types of machine code actually different "formats"? — Preceding unsigned comment added by 182.232.24.167 (talk) 07:29, 19 April 2019 (UTC)[reply]

Machine code in a file is usually binary but a given text editor or other software may display it in hexadecimal (base 16), octal (base 8), as ASCII characters, or whatever. There are many different CPU's with different machine languages, sometimes backwards compatible with earlier CPU's so they can run old programs. It can vary what is called a "format". The file format will often just be called a binary executable regardless of which CPU and operating system it works on, but it's also possible to specify this. PrimeHunter (talk) 10:24, 19 April 2019 (UTC)[reply]
Note that numbers, by themselves, are not inherently binary or hexadecimal or decimal or in any other base. They're just numbers. The base only comes into account when displaying the number. For example, binary 01101010 and hexadecimal 6A are the exact same number. It's just displayed in different ways. JIP | Talk 10:38, 19 April 2019 (UTC)[reply]


A better way to understand the concept is to recognize that an executable program file, like a modern ".exe" file, almost surely requires an operating system to unpack, decode, and run it. The file contains program code, but it also contains much more; and the file format is so very complicated that it actually requires other program-software (in this case, Windows) to interpret the program's binary before hardware can execute it.
There are several common formats for executable program files, and they are all intricately linked to the operating system(s) that know how to load and run them.
On Windows, program files, especially ones ending in ".exe", are commonly stored in the "Portable Executable" format.
On Linux and Unix-style systems, program files are almost always stored in elf file format. By convention, there is no special file-name extension on Unix or Linux. The operating system determines if a file is executable by checking the file system's executable permissions.
On Apple platforms and a few related open-source systems including some versions of Linux, program files are almost always stored in the Mach-O file format.
Each of these platforms has its own convention, and the details differ; but conceptually, every one of these program files contains a lot more than just the executable code - which is what most people are thinking when they say "machine code." Additionally, each file format also stores important details about how to run the program: how much memory it needs, and where it should be placed; what system requirements and system tool libraries are needed, and where they must be found; and so on. These details are encoded, in binary, in the executable file format. It would be a stretch to call that part of the binary file "machine code," because that part is almost always interpreted by system software like the loader. The files may also contain data (like the pictures and sounds used in the game); and in many cases, the files contain references to data stored in other files. Those other files may also contain executable code, and additional data, and they also must be loaded and decoded and interpreted.
An operating system, like Windows or Linux, is responsible for opening these program files, interpreting them, and then loading all the resources, including the executable "machine code," before actually running it. On a modern operating system, this "machine code" is frequently written for a high-level abstraction of the "machine," and is not always written for a "hardware" instruction-set. The operating system itself may act as the interpreter, or it may act as a hypervisor.
All of these details relate to the file format that stores the machine code. These details are distinct from the actual format of the "machine code", which would typically be some well-defined encoding of the instruction set architecture for a particular CPU type (e.g., the Intel x86_64 set, or the arm64 or powerpc or some other hardware type). A lot of times, when one casually refers to "machine code," that's what they meant to say: the code segment of the loaded program, and/or the file-system representation of the unloaded program; but in my opinion, this is actually the least complicated part of the execution of program code. What's worse is that the word "machine" gets abused pretty severely by poorly-informed users: as a general guideline, if the person using the phrase "machine code" can't tell you exactly what "machine" they're "coding," you shouldn't trust the rest of their technobabble. Any programmable system can be abstractly thought of as a "machine," and that includes any software-system that interprets binary numbers.
Only the most very elementary microcomputers allow the skilled programmer to directly write nontrivial executable code in the form of "machine"-encoded binary. Such microcomputers may have hardware or nonvolatile software to decode ELF or "a.out"-style program code; or they may have carefully-documented hardware that allows a programmer to directly load main memory with instructions written in binary. The programmer may use special compilers, linkers, or other tools, to place the instructions in the exact form that the hardware expects. This type of "machine code" programming is becoming less and less common, even among specialized embedded-system platforms; in 2019, even a small "embedded system" microcontroller may incorporate a 64-bit computer with virtual memory, a file system, and a full-blown Linux instance. This generally means that writing program-code alone - that is, raw binary numerals that encode the executable instructions for things like math and memory access - would not be sufficient to make a real program run correctly. A huge amount of additional work must be done, usually by the operating system, to prepare the CPU (and its peripheral resources, including main memory) to execute those instructions. Obviously, the bootstrapping conundrum implies that somebody must have booted the operating system using machine-code; but this isn't something that actually concerns most programmers unless they are professionally involved in solving that exact problem.
On older systems, and certain hobbyist platforms, you can dramatically simplify program loading; for example, ancient versions of Microsoft DOS would honor the COM file format, which was in many respects a simple file containing "raw" machine instructions. One could use this principle to argue the case that although early "DOS" provided a command-shell and a set of utility programs, it was not a true operating-system.
Nimur (talk) 12:02, 19 April 2019 (UTC)[reply]
A file is essentially a series of byte values. Without knowing anything about the file format we can always display the bytes in the same way as this IP address 172.16.254.1 (4 bytes each treated as a decimal number in the range 0 to 255) or ACh 10h FEh 01h (the same numbers expressed in hex). The file is useful to a machine or person only when its format is known. The format explains how bytes at particular positions in the file are grouped. It typically states that there is a readable text encoded in 8-bit ASCII bytes at one location, there are 4-byte floating-point number representations at another location, or that there is compiled executable code for a CPU starting at a third location. A Text editor assumes that a file contains only 8-bit ASCII or possibly multi-byte Unicode symbols and will show blanks or gibberish if the format is otherwise. A Hex editor makes no assumption about a file format and can show its raw and exact content without interpretation. A file of media data (image, audio or video) can be used only with a display- or editing program that implements the correct encoding and decoding process that associates bytes in the media file with image pixels, sound samples or video pixels respectively. See the partial List of file formats. DroneB (talk) 12:27, 19 April 2019 (UTC)[reply]
All of the following applies to 64-bit Linux – other systems will differ in detail if not in outline.
If you can, get hold of a "HelloWorld.c" from somewhere and run gcc -v -oHelloWorld HelloWorld.c. If that went OK, run objdump -x HelloWorld in a virtual terminal or else send it to the printer. Now go to Executable and Linkable Format.
The file starts with a file header. This is not part of the machine code but simply enables the run-time linker (ld-linux.so.2) to identify the file. Next comes a block of Program Headers. These do not contain run-time information but do define the memory layout. Taking one example:
   LOAD off    0x0000000000000000 vaddr 0x0000000000400000 paddr 0x0000000000400000 align 2**21
        filesz 0x000000000000070c memsz 0x000000000000070c flags r-x
which tells the linker that 70C bytes starting from virtual address 0000000000400000 are required to map executable (r-x) code. The code will be found at offset 0000000000000000 from the start of the information below. The linker's job is to create the page tables with appropriate flags, and then copy this binary information in. The linker doesn't "know" what it is, the linker just sets the "r-x" flags.
After the program headres there are a number of section tables. They describe various chunks of binary. Some of these chunks will need access to external files (shared object files for example), others simply point down into the file. for example:
13 .text         00000182  0000000000400430  0000000000400430  00000430  2**4
                 CONTENTS, ALLOC, LOAD, READONLY, CODE
Which shows that the 13th section is 182 bytes in length, starts at virtual address 400430 (so is in the program section I mentioned above). It is stored in the file at offset 430, is aligned on a quadword boundary, has binary information, can be allocated, loaded, cannot be changed and holds code.
The linker (ld) will have combined compatible sections together into program sections, so a lot of this is for analysis rather that run-time. One crucial important section is the Symbol table. When code is compiled then numerous symbols are not known. Some of these are resolved at link time, the remainder have to be found by the run-time linker by examining .so files and similar. The absolute last step of the run-time linker is to load the start address (often called the entry point) into the processor's program counter.
Surely you mean gcc -v -o HelloWorld HelloWorld.c? You were missing a space JIP | Talk 21:42, 19 April 2019 (UTC)[reply]
 $ gcc -v -o HelloWorld   HelloWorld.c 2>sp
 $ ./HelloWorld 
 Hello World!
 $ gcc -v -oHelloWorld   HelloWorld.c 2>nsp
 $ ./HelloWorld 
 Hello World!
 $ diff sp nsp
The two output files only differ in the temporary file names reported. I'll accept that the extra space makes it slightly clearer though. Martin of Sheffield (talk) 21:54, 19 April 2019 (UTC)[reply]

Why don't all mobile broadband adaptors (dongles etc) support cellular calls

[edit]

I have one mobile broadband dongle, and one "wireless modem" in my PC. At least with the software I have used/sought, there seems to be no scope for using either for making or receiving cellular calls, though they do connect to cellular networks and CAN send/receive SMS messages.

My question: why? Given that people use VoIP software on PCs (e.g. Skype), I'd guess that there was some utility in providing the facility (might be less costly than VoIP over mobile broadband for a start). I don't see why it would be expensive to implement, but am I missing something? Or is there a business pressure to not provide it so as to sell more phones?--Leon (talk) 19:49, 19 April 2019 (UTC)[reply]

This is decision of the mobile operator not to provide voice calls over dongles. Ruslik_Zero 20:58, 19 April 2019 (UTC)[reply]
But how does the mobile operator know that it's communicating with a dongle and not a phone?--Leon (talk) 10:52, 20 April 2019 (UTC)[reply]