This regenerates the source code from an EXE file created by a Basic compiler.
TO RUN THE DECOMPILER
SAMPLES OF OUTPUT
Enter the following command from DOS or a DOS Box:
[Path]DEBASIC [Path]XXXXXX [Switches]
Where XXXXXX.EXE is the name of the file to be decompiled.
If DEBASIC.EXE or XXXXXX.EXE are not on the default drive then the path should be included.
The output should appear on a file called XXXXXX.DEB on the same directory as XXXXXX.EXE .
The switches are explained below. They are all a / character followed by a letter. The letter must be the correct case. Sometimes the letter is immediately followed by hexadecimals which must be in upper case. There may be any number of switches. They must be separated by a blank.
Example - The command:
DEBASIC OPTIMAQB /Z /N78C0 /L3CE0
Will decompile OPTIMAQB.EXE using the 3 switches and place the results in OPTIMAQB.DEB .
For a first attempt at decompiling it is best not to use switches. Indeed it is hoped that a later version will not require switches to locate the code and constants. When no switches are present the decompiler will assume that the EXE file was compiled using version 4.0 of the Microsoft QuickBasic compiler. If this is the case it should find the code, the constants and the "Logical Offset" automatically. It will also do this for some older versions of the Microsoft QuickBasic compiler.
If the constant data strings appear rubbishy then it is recommend that you start by investigating /L and /N .
If the initial data strings are correct but they become gradually more and more offset as you go through the program, then suspect that the data has been compressed and try using /8 . This will not work unless /N is exactly correct. A clue here is that the byte immediatly preceding the start of the constant data area usually contains hexadecimal B0.
If there are a lot of Unrecognised Library Functions then try /7 and /9. One at a time.
Here are the 5 switches for locating code and constants in more detail:
The xxxxx value is the hexadecimal value of the location within the EXE file at which the constant values start.
The xxxxx value is the logical address of the start of the constants. I.E. The hexadecimal address that the program addresses the start of the Constant area pointed to by /Nxxxxx . The decompiler searches for strings in an attempt to determine /L AND /N when the /Z switch is present.
The xxxxx value is location of the start of the instructions.
The xxxxx value is the logical address of the start of the variables.
This will make the decompiler assume that the old IBM Basic compiler has been used to compile the .EXE file. It will set the various constants and methods (for example coding float constants) accordingly. If any of the 4 switches (above) are used in conjunction with this, then the latter will override the particular value. /X32 and /I22B will be set and /N and /L will be determined by a search for strings.
Here are the remaining switches:
This causes some constant array references to be regarded as constants. You should experiment with this switch when constant array references appear where you think they should not be.
Use this switch to suppress the display of the line number being decompiled. This is displayed on the screen. The decompiler runs faster if it is suppressed.
This causes the "Hex" version of the output to be produced. In this, the original code in hexadecimal is displayed after each decompiled source instruction. This also dumps the first 12 bytes of any subroutine called and the calling code. For diagnosing really desperate problems
NOT YET AVAILABLE If debugging information is included it will sometimes be possible to regenerate the original line numbers using this switch. This facility is not very polished in version 1 as it doesn't cope with multiple statements on the same line very well. It is included (arguably prematurely) because you might require to know the original line numbers so that you can determine the meaning of error messages produced by the original program, that refer to line numbers.
Fiddle factor. This is only used when the file being decompiled is not a proper .EXE file but has been produced using my techniques for decompressing or decrypting compressed or encrypted files. I'll issue more details about these techniques later. Basically this is the amount that has to be subtracted from each long call destination to get the correct address.
Highest segment address for subroutines. Anything higher than this is treated as a library function or an add on.
Not much use and about to be withdrawn. It specified a variable in the dynamic storage area that was to be treated as purely a work area.
Decompile as far as this location. Otherwise the decompilation ends at the first END statement.
Another fiddle factor - see /F . This is the correction for segment values.
Assume version 1 of the Microsoft basic compiler created the EXE file.
Old IBM method used for coding float decimals.
Display pass 1 output. The decompiler scans the data twice. During the first scan addresses and symbols are not properly resolved. For diagnosing really desperate problems
Display DIM statement for arrays that don't have dynamic dimension statements at start of decompilation.
Assume long string descriptors.
MUST BE LOWER CASE s . Set the interval between line numbers in the decompilation listing. Otherwise the line numbers go up in increments of 10.
Force .EXE file produced by Microsoft compiler to be regarded as Stand Alone version.
Force .EXE file produced by Microsoft compiler to be regarded as Non Stand Alone version.
NOT AVAILABLE YET. Limit of number of lines in output.
/Vxxxxx This is a VBDOS Program. The decompiler only works on these if they are Stand-Alone. Then there are a few more unrecognised functions than normally. The xxxxx value is the hex address of the area in which the constant data strings are stored. These are not stored with the rest of the data constants, as with other Basic Compilers.
/gxxxx Any sectors higher or equal to the xxxx value will be regarded as containing add-ons.
NOTE it must ba a lowercase g as an uppercase G indicates the lower limit of Library routines -- see main manual.
/Wxxxx The amount xxxx will be subtracted from the address of an add on routine to calculate the address within the .EXE file at which the add-on code begins. If a "Fiddle Factor" /F is used, this is added to the address as well.
In case more than 1 add on library is used it is permissable to have multiple (up to 10) /W and /g switches. They must be in the order I.E. The first /W and /g must refer to the first add-on library in the .EXE file and so on. In some programmes the ordinary library routines are not before the add ons. To indicate this to the decompiler a lower case /w is used instead of /W for the ordinary library.
It can be difficult to determine the /W value. The best way of doing this is:
1 - Start to run the program under Debug so that the addresses in the calls to the add-ons are relocated.
2 - Identify the code at the start of the add-on.
3 - Locate that code in the .EXE file.
4 - Work out the difference between this location of the code in the .EXE file and the (unrelocated) call address.
Suppress the table of symbol offsets.
Assume Version 7 of Microsoft Basic compiler created EXE file.
Assume constant data is compressed. NOTE if this is the case the /N must be determined exactly for the decompression to work. As a clue here the byte immediatly preceding the start of the constant data is usually hexadecimal B0 or sometimes B1.
Assume Version 4.5 of Microsoft Basic compiler created EXE file. For stand alone programs you may experiment with omitting this!
To help identify when add ons are in a program being decompiled a table of code segment addresses apears at or near the start of the decompilation. This contains the the number of times each segment is referenced and how many interrupt 3Fs in the segment are called. For Non-Stand Alone programmes (Those requiring a runtime module) interrupt 3F's are calls to ordinary Library routines. Add-on code is usually stored beyond library routines but hot always.
For Non-Stand Alone programmes the identification of Add-Ons is usually automatic. Otherwise they are identified using the /gxxxx (Must be lower case) switch.
Each global (non dynamic) symbolic label in the decompiled code consists of:
1 - A letter:
I for an integer, D for a double precision decimal, L (for label) for anything else. A later version will use C for a currency variable and X for a long integer.
2 - Followed by a number. Any number of digits. Referred to as the Label Number.
3 - Followed by $ if it represents a character string.
Each local (dynamic) symbolic label in the decompiled code consists of:
1 - A letter:
R for a relocatable variable in the dynamic storage area. P for a parameter.
2 - Followed by a number. Any number of digits. Equal to the offset in the dynamic storage area.
At the end of the decompilation listing there are two symbol table listings for the global variables. One in Label Number order followed by one in address order.
Each entry in this table consists of the label, followed by its logical address, followed by a list of one letter attributes as follows:
S=String, A=Array, I=Integer, D=Double precision, F=Forced to be a number rather than a label, E=Appears left of an equal sign.
There are 2 types of error messages:
Try using the /z switch.
Unexpected values on stack. It pays to be suspicious when this happens..
This is usually recognisable from the context. Otherwise it has to be deduced by running the decompiled program under a Basic interpreter in parallell with the original EXE file running under Debug.