Debug your own Linux software like a pro

From time to time, a certain piece of software fails. Whether it’s due to a defect that lies in its code or within a buggy external library the software depends on, the result is somehow the same: the program usually crashes. In this scenario, an ICT expert is meant to solve the issue or to find a way to circumvent it. However, more often than not, the ICT expert will not have any previous knowledge of the software, so a comprehensive and thorough analysis without the right tools and techniques will prove an insurmountable task. To make matters worse, sometimes the ICT expert will be dealing with proprietary software for which sources will not even be available (bear in mind that there are non-open source programs running on GNU/Linux boxes, too). So a debugger’s point of view can save the day. How? This tutorial will examine two real cases where you, playing the role of the ICT expert, will dissect, analyse and eventually fix or bypass the problem by means of thinking like a debugger, making use of the GNU Debugger (GDB) along with the objump utility. The process can work even in those cases where you are not acquainted with the defective software at all.

Bypassing a segmentation fault

Our first case illustrates that, even when we do have the sources but we still do not have a good understanding of the code, it may be feasible to bypass – to an extent – the code implementation’s flaws that would otherwise lead us to a crash. For this first real case example, you will face a bug affecting LibreOffice 3.5.4.2: the software crashes with a segmentation fault error message when opening a certain Microsoft Excel XML spreadsheet. You can get a bit more of information about the crash by running dmesg:

# dmesg|grep soffice

[936028.103160] soffice.bin[3495]: segfault at

200030000 ip 0000000200030000 sp 00007fffff42baa8

error 14 in libunoxmllo.so[7f4e6bfb9000+a2000]

… but that’s about it. You need to understand what is going on behind the scenes, and the only effective way to do so is by using a debugger’s approach. A memory dump tells you something, but the debug symbols can tell you much more. Because this issue happens to a Debian GNU/Linux box, you can install the LibreOffice debug symbols this way:

# apt-get install libreoffice-dbg

Once you have the debug symbols, you run the program inside a gdb session in order to figure out where exactly the segmentation fault happens. This is much easier than dealing with assembly code or memory addresses:

~ gdb /usr/lib/libreoffice/program/soffice.bin

(gdb) set args -o file_that_segfaults_libreoffice.

xlsx

(gdb) r

Program received signal SIGSEGV, Segmentation fault.

0x0000000200030000 in ?? ()

(gdb) bt

#0 0x0000000200030000 in ?? ()

#1 0x00007fffd6014225 in ScFormulaCell::Compile

(this=0x7fffe000a770, rFormula=…, bNoListening=false,

eGrammar=formula::FormulaGrammar::GRAM_NATIVE)

at /home/rene/Debian/Pakete/LibreOffice/libreoffice-

3.5.4+dfsg2/sc/source/core/data/cell.cxx:1076

As shown in the previous listing, you need to set up the program’s parameters first (in this case, the document to be opened), and then execute it. Soon after, the program crashes and you get the segmentation fault error message inside your gdb session. You quickly realise that the address where the segmentation fault is triggered is not valid: 0x0000000200030000. So some sort of memory corruption issue must be in order. Therefore, you need to retrace your steps and have a proper look at the program’s back-trace to find out which function is in the previous frame. Using the bt command, you find out that it is the ScFormulaCell::Compile() method. You also know that the last executed statement is located exactly at line 1076 in the sc/source/core/data/cell.cxx C++ source file. Since this is an open source project, its sources are publicly available and so you can install them easily:

# apt-get source libreoffice

Now that you have the sources, you can see the whole ScFormula::Compile() method’s implementation and look closely at its lines 1075 and 1076:

if(pCodeOld)

delete pCodeOld;

Apparently, the segmentation fault happens at some point in the program flow, after freeing the memory address pointed to by pCodeOld (a pointer to a structure of type ScTokenArray). It is common practice to inspect the program’s data after a crash, so in order to achieve that you use the frame command to inspect the pCodeOld pointer at ScFormula::Compile() right after the crash, like so:

(gdb) frame 1

(gdb) p *pCodeOld

$38 = {<formula::FormulaTokenArray> = {_vptr.

FormulaTokenArray = 0x7fffc40d63c0, pCode =

0x22b7960,pRPN = 0x22b86d0, nLen = 30032, nRPN =

50189, nIndex = 32767, nError = 0, nRefs = 30048,

nMode = 13 ‘\r’, bHyperLink = 196}, <No data

fields>}

After setting a breakpoint at line 1075, you find that the program doesn’t crash every time it executes the delete statement. So you inspect the structure’s values right after the program is interrupted at line 1075 in order to compare them with the ones firing the segmentation fault. Now, these values are 0. So, your reasoning should be something along these lines: ‘what if I can trace how many times this segmentation fault should happen in order to avoid it?’ Of course, you are not acquainted at all with the code in charge of parsing a Microsoft Excel XML spreadsheet, so you are just trying to find a way to circumvent this bug.

Gdb enables you to alter the program’s data and pack a bunch of gdb commands to be executed as soon as a breakpoint is hit. Aided by these facilities, you will make use of the previous breakpoint to alter the program flow only whenever some values held by the ScTokenArray structure are greater than zero. After giving the problem some thought, you come out with this:

(gdb) set args -o file-that-fires-the-segfault.xls

(gdb) set pagination off

(gdb) b cell.cxx:1075

(gdb) set $hits = 0

(gdb) commands 1

> set $check = pCodeOld->nRPN

> printf “Check is: %d\n”, $check

> if $check>0

> printf “Patching pCodeOld to avoid the crash …”

> set $hits++

> set var pCodeOld=0x0

> end

> c

> end

As shown above, you set a gdb local variable called hits storing how many times the segmentation fault should happen. You add some commands to be executed by gdb itself as soon as the breakpoint at line 1075 is hit (commands 1). You choose the nRPN field as a checkpoint to infer whether the segmentation fault should happen (check>0), updating the value held by the hits variable accordingly and altering the pCodeOld program’s pointer to be null (set var pCodeOld = 0x0). Now, recall that at line 1075 in the cell.cxx C++ source file, a check of this sort is made: if(pCodeOld). So, it comes as no surprise that by setting the value pointed to by pCodeOld to 0x0, the previous branch will not be taken and no delete statement executed. In case the nRPN field’s value is less or equal to 0, the program flow will just continue normally.

So, you run the program from the beginning with this breakpoint set in place, this time being able to open the document. The hits variable reports two hits. Right after having the document opened, you save it using the native ODS format, closing LibreOffice Calc afterwards. Finally, you try to open the ODS file in a new LibreOfice Calc instance, this time with no problems at all. You, and you alone, have successfully circumvented this issue!

01_lead_image

Fixing a double-free error

It is time to move on; your next assignment concerns a proprietary piece of software: the ATI graphics card software installer. While installing the graphic card drivers on a GNU/Linux box, and right before the installer ends its execution, a double-free or memory heap corruption error is triggered and the process killed, which unfortunately prevents the installer from actually setting up the drivers.

The GNU library libc6 implements some basic protections to avoid memory corruption of the heap. In this particular case, the corruption happens because of a double-free pointer condition. You know that using the open source ATI driver is out of the question, because it is brand-new hardware and the only drivers capable of detecting the device properly are those of ATI itself. Therefore, you are meant to find a way to fix it.

The defective software is an ELF-64 binary included with the ATI installer package: http://ift.tt/1nj2A3y. The exact double-free memory corruption error is reported by the glibc6 itself at 0x013cebf0. Apart from giving you the exact address that is being double-freed, you also have the entire back-trace of the installer’s execution. This way, it is feasible to determine the buggy instruction offset inside the ELF-64 binary:

*** glibc detected *** …/http://ift.tt/1nj2A3y

double free or corruption (fasstop): 0x013cebf0.

=== Backtrace: ===

lib/libc.so.6(cfree+0x76)[…]

http://ift.tt/1LAMmhJ]

02_hexeditor

According to the previous back-trace, a call triggering the double-free corruption error is placed at offset 0x40a6b0 in the setup binary file (last line). This time, though, the software is not open source, therefore you do not have its source code. But you are a bit aware of disassemblers and mnemonics, so you resolve to disassemble the binary using the objdump utility and look for the offset 0x40a6b0:

~ objdump -d http://ift.tt/1nj2A3y > setup.S

~ cat setup.S|grep 40a6b0

8575 40a6b0: e8 53 90 ff ff callq 403708 <free@plt>

As clearly shown above, there is a call to the free() function at offset 0x40a6b0 (last line). And, as the previous back-trace has shown you, the instruction located in this offset is the one triggering the double-free or memory corruption error message. It seems pretty obvious to you that this instruction is freeing the already-freed address 0x013cebf0, and that is a bug leading to a well-known software vulnerability (see the boxout, opposite).

As a good ICT expert, you want to corroborate this assumption by executing the program inside a gdb session. Because this program is based on the ncurses library, you have to redirect its output to another terminal, otherwise it would prove difficult to debug it properly:

~gdb

(gdb) tty /dev/pts/12

(gdb) file http://ift.tt/1nj2A3y

(gdb) b *0x40a6b0

Breakpoint 1 at 0x40a6b0

(gdb) r

#1 0x40a6b0 in …

(gdb) stepi

0x0000000000403708 in free@plt ()

(gdb) x/8w $rsp

0x7fff1787bb00: 0x00000000 0x00000000 0x7c0ba9e0

0x00007f3b

0x7fff1787bb10: 0x013cebf0 0x00000000 0x01318140

0x00000000

After redirecting the program’s output to /dev/pts/12, you set a breakpoint at the offset address 0x40a6b0 (lines 2-4), where the buggy instruction free() is located. Then you run the program (line 6). Once the program flow reaches the buggy instruction, it stops. At that point, you execute just one machine instruction with the stepi command, analysing the stack before actually calling the free() function that irremediably leads to a double-free corruption error message (lines 10-12). Bear in mind that the stack holds the free function’s parameters, so by running the stepi command you are allowing the stack to be set up properly before the program actually calls the external function. As you have previously seen, the free() function is freeing the address 0x13cebf0. According to the previous listing there is, inside the stack, this very same address indeed (last line). So far so good – your statement has been corroborated and now an obvious conclusion is at hand: there is a double-free memory issue because the call to free at offset 0x40a6b0 is trying to free a previously freed pointer that was pointing, at some point, at address 0x13cebf0. Now, an obvious question manifests itself: how can you fix it?

Well, you do not have the ATI installer’s source code, but even when you don’t have access to the sources, a program can still be patched. You know that the buggy instruction is located inside the ELF-64 binary, so you guess that all you need to do is replace the op-code instruction e8 53 90 ff ff with another one. Our reasoning is like this: if we did not want the free() function to be called at that offset, what other machine instruction do we have to use? The first one that comes to mind, of course, is the NOP (0x90) instruction. Since the free call is five bytes in length, you have to replace it with five NOP instructions. Aided by a hexadecimal editor, you replace those five bytes with 0x90. Right after that, you try once again and run the program. This time, as expected, the installer does not crash and the drivers can be installed, at last!

Conclusions

It is commonly believed that a debugger is of no use to an ICT expert. Of course, not every ICT expert shares this opinion. We truly believe that most software-related issues can be fixed or bypassed by means of debugging them, and hope this article has been engaging enough to sweep away the sceptics.

from Linux User & Developer – the Linux and FOSS mag for a GNU generation http://ift.tt/1nj2CZg
via IFTTT

Raspberry Pi - My Hell & Fixes

Thursday, 3 March 2016