Programming C++, C#: August 2016

Wednesday, August 31, 2016

Bugs found in GCC with the help of PVS-Studio

I regularly check various open-source projects to demonstrate the abilities of the PVS-Studio static code analyzer (C, C++, C#). Now it is time for the GCC compiler to get checked. Unquestionably, GCC is a very qualitative and well-tested project, that's why it's already a great achievement for a tool to find any errors in it. Fortunately, PVS-Studio coped with this task. No one is immune to typos or carelessness. This is why the PVS-Studio can become an additional line of defense for you, on the front of the endless war against bugs.

GCC

GNU Compiler Collection (usually shortened to GCC) - is a set of compilers for different programming languages developed in the scope of the GNU project. GCC is free software, distributed by the free software foundation under the terms of the GNU GPL and GNU LGPL and is a key component of the GNU toolchain. The project is written in C and C++.

The GCC compiler has great built-in diagnostics, which help in the detection of many issues at the compilation stage. Of course, GCC is built with GCC and, thus, is able to find errors in its own code. Additionally, the GCC source code is checked by the Coverity analyzer. In general, I think that a lot of enthusiasts have also checked it with other analyzers and other tools. This makes it a hard task for PVS-Studio to find errors in GCC.

We used the trunk version from the git-repository: (git) commit 00a7fcca6a4657b6cf203824beda1e89f751354b svn+ssh://gcc.gnu.org/svn/gcc/trunk@238976

Note. The article publication is slightly late, and perhaps some bugs are already fixed. However, it's not a big deal: new errors constantly get in the code and the old ones disappear. The main thing is that the article shows that static analysis can help programmers to detect errors after they get into the code.

Foreseeing a discussion

As I said in the beginning, I consider GCC project to be of high quality. I am sure a lot of people will want to argue with that. As an example, I'll give a quote from Wikipedia in Russian (translated):

Some OpenBSD developers, Theo de Raadt and Otto Moerbeek criticiz GCC, saying that "gcc gets about 5-6% slower every release, has new bugs, generates crappy code, and drives us nuts".

To my mind, these statements are unjustified. Yes, perhaps, GCC code has too many macros which make reading it a bit difficult. But I can't agree with the statement about it being buggy. If GCC were buggy, nothing would work at all. Just think about the amount of programs that get successfully compiled by it and thus, work well. The creators of GCC do a great, complicated job, with professionalism. We should really thank them. I'm glad I can test the work of PVS-Studio on such a high-quality project.

For those who say that the code of Clang is still much better, I can remind you: PVS-Studio also found bugs in it: 1, 2.

PVS-Studio

I have checked the GCC code with the help of the alpha-version of PVS-Studio for Linux. We are planning to give the beta-version of the analyzer in the middle of September 2016 to those programmers who will find it useful. You may find the instruction of how to become the first person to try the Beta-version of PVS-Studio for Linux on your project in the article "PVS-Studio confesses its love for Linux"

If you are reading this article much later than September 2016, and want to try PVS-Studio for Linux, then I suggest visiting the product page: http://www.viva64.com/en/pvs-studio/

Analysis results

We have got to the most interesting part of the article, which our regular readers are looking forward to. Let's have a look at those code fragments where the analyzer found bugs or really suspicious code.

Unfortunately, I can't give the developers the full analysis report. There is too much garbage (false alarms) at this point, because of the fact that the analyzer isn't ready to face the Linux world yet. We still have a lot of work to do regarding the reduction of the number of false positives for typical constructions. I'll try to explain using a simple example. Many diagnostics should not issue warnings for the expressions related to the assert macros. These macros are sometimes written very creatively, so we should teach the analyzer not to pay attention to them. The thing is that the assert macro can be defined in many various ways, thus we should teach PVS-Studio all the typical variants.

That's why I am asking GCC developers to wait until the Beta-version is released. I wouldn't like to spoil the impression by a report, generated by a half-finished version.

Classics (Copy-Paste)

We'll start with the most common and classical error that is detected by V501 diagnostic. Typically, these errors appear because of carelessness when Copy-Pasting the code, or are just typos during the creation of new code.

static bool
dw_val_equal_p (dw_val_node *a, dw_val_node *b)
{
  ....
  case dw_val_class_vms_delta:
    return (!strcmp (a->v.val_vms_delta.lbl1,
                     b->v.val_vms_delta.lbl1)
            && !strcmp (a->v.val_vms_delta.lbl1,
                        b->v.val_vms_delta.lbl1));
  ....
}

PVS-Studio warning: V501 There are identical sub-expressions '!strcmp(a->v.val_vms_delta.lbl1, b->v.val_vms_delta.lbl1)' to the left and to the right of the '&&' operator. dwarf2out.c 1428

It's hard to see the errors right away, we should take a closer look here. This is why the error was not detected during code reviews and refactoring.

The function strcmp compares the same strings twice. It seems to me that we should have compared not the members of lbl1 class, but of lbl2. Then the correct code could look like this:

return (!strcmp (a->v.val_vms_delta.lbl1,
                 b->v.val_vms_delta.lbl1)
        && !strcmp (a->v.val_vms_delta.lbl2,
                    b->v.val_vms_delta.lbl2));

It should be noted that the code, provided in the article, is slightly aligned, so that it doesn't take too much space on the x-axis. In fact, the code looks like this:

This error could be avoided by using "table" code alignment. For example, an error would be easier to notice if you format the code like this:

I have spoken about this approach in details in the e-book "The Ultimate Question of Programming, Refactoring, and Everything" (see chapter N13: Table-style formatting"). I recommend that everyone who cares about code quality have a look at this book.

Let's look at one more mistake, which I'm sure, appeared because of Copy-Paste:

const char *host_detect_local_cpu (int argc, const char **argv)
{
  unsigned int has_avx512vl = 0;
  unsigned int has_avx512ifma = 0;
  ....
  has_avx512dq = ebx & bit_AVX512DQ;
  has_avx512bw = ebx & bit_AVX512BW;
  has_avx512vl = ebx & bit_AVX512VL;       // <=
  has_avx512vl = ebx & bit_AVX512IFMA;     // <=
  ....
}

PVS-Studio warning: V519 The 'has_avx512vl' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 500, 501. driver-i386.c 501

Different values are written to the variable has_avx512vl twice in a row. It does not make sense. I have reviewed the code and found a variable has_avx512ifma. Most likely, it should be initialized by the expression ebx & bit_AVX512IFMA. Then the correct code should be as follows:

has_avx512vl   = ebx & bit_AVX512VL;    
has_avx512ifma = ebx & bit_AVX512IFMA;

A typo

I'll continue testing your attentiveness. Look at the code and try to find the error, without looking at the warning below.

static bool
ubsan_use_new_style_p (location_t loc)
{
  if (loc == UNKNOWN_LOCATION)
    return false;

  expanded_location xloc = expand_location (loc);
  if (xloc.file == NULL || strncmp (xloc.file, "\1", 2) == 0
      || xloc.file == '\0' || xloc.file[0] == '\xff'
      || xloc.file[1] == '\xff')
    return false;

  return true;
}

PVS-Studio warning: V528 It is odd that pointer to 'char' type is compared with the '\0' value. Probably meant: *xloc.file == '\0'. ubsan.c 1472

The programmer accidentally forgot to dereference the pointer in the expression xloc.file == '\0'. As a result the pointer is just compared with 0, i.e. with NULL. It doesn't have any effect, because such a check was already done earlier: xloc.file == NULL.

The good thing is that the programmer wrote the terminal null as '\0'. This helps us to understand quicker, that there is a bug here and how it should be fixed. I have also written about this in the book(see chapter N9: Use the '\0' literal for the terminal null character).

Correct variant of the code:

if (xloc.file == NULL || strncmp (xloc.file, "\1", 2) == 0
    || xloc.file[0] == '\0' || xloc.file[0] == '\xff'
    || xloc.file[1] == '\xff')
  return false;

Although, let's improve the code even more. I recommend formatting the expression like this:

if (   xloc.file == NULL
    || strncmp (xloc.file, "\1", 2) == 0
    || xloc.file[0] == '\0'
    || xloc.file[0] == '\xff'
    || xloc.file[1] == '\xff')
  return false;

Pay attention: now if you make the same mistake, the chance of noticing it is slightly higher:

if (   xloc.file == NULL
    || strncmp (xloc.file, "\1", 2) == 0
    || xloc.file == '\0'
    || xloc.file[0] == '\xff'
    || xloc.file[1] == '\xff')
  return false;

Potential null pointer dereference

This part could also be called "Example number one thousand, why macros are bad". I really don't like macros and always urge people to avoid using them if possible. Macros make it difficult to read the code, provoke errors, and make the work of static analyzers harder. As best I can tell, from a brief interaction with the GCC code, the authors are big fans of macros. I was really tired looking at what the macros are expanded to, and perhaps missed quite a number of interesting errors. I should confess that I was lazy at times. But still, I will demonstrate a couple of errors, connected with macros.

odr_type
get_odr_type (tree type, bool insert)
{
  ....
  odr_types[val->id] = 0;
  gcc_assert (val->derived_types.length() == 0);
  if (odr_types_ptr)
    val->id = odr_types.length ();
  ....
}

PVS-Studio warning: V595 The 'odr_types_ptr' pointer was utilized before it was verified against nullptr. Check lines: 2135, 2139. ipa-devirt.c 2135

Do you see an error here? I guess not, and the analyzer warning isn't of much help. The matter of the fact is that odr_types isn't a name of a variable, but a macro that is declared in the following way:

#define odr_types (*odr_types_ptr)

If we expand the macro and remove everything that isn't really related to the code, we'll get the following:

(*odr_types_ptr)[val->id] = 0;
if (odr_types_ptr)

First, the pointer is dereferenced and then checked. It's hard to say whether this will lead to trouble or not. It all depends on the situation, whether the pointer will really be equal to nullptr. If this situation is impossible, then this redundant check should be removed, otherwise it will mislead people supporting the code, and the code analyzer as well. If a pointer can be null, then it is a serious mistake that requires even more attention and should be fixed.

Let's consider one more similar case:

static inline bool
sd_iterator_cond (sd_iterator_def *it_ptr, dep_t *dep_ptr)
{
  ....
  it_ptr->linkp = &DEPS_LIST_FIRST (list);
  if (list)
    continue;
  ....
}

PVS-Studio warning: V595 The 'list' pointer was utilized before it was verified against nullptr. Check lines: 1627, 1629. sched-int.h 1627

We should show the macro again to see the error:

#define DEPS_LIST_FIRST(L) ((L)->first)

Let's expand the macro and get:

it_ptr->linkp = &((list)->first);
if (list)
  continue;

Some of you may say: "Hey, wait! There is no error here. We just get a pointer to the class member. There is no null pointer dereference. Yes, perhaps the code isn't really accurate, but there is no error!"

Yet, it's not as simple as it may seem. We have undefined behavior here. It's just pure luck that such code works. Actually, we cannot write like this. For example, the optimizing compiler can remove the check if (list), if it sees list->first. If we execute the -> operator, then it is assumed that the pointer isn't equal to nullptr. If it so, then we shouldn't check the pointer.

I have written a whole article on this topic: "Null Pointer Dereferencing Causes Undefined Behavior" I am examining a similar case there. Before starting any arguments, please carefully read this article.

However, this situation is really complicated, and is not really obvious. I can assume that I may be wrong and there is no error here. However, until now no one could prove it to me. It will be interesting to see the comments of GCC developers, if they read this article. They should know how the compiler works, and if this code should be interpreted as erroneous or not.

Using a destroyed array

static void
dump_hsa_symbol (FILE *f, hsa_symbol *symbol)
{
  const char *name;
  if (symbol->m_name)
    name = symbol->m_name;
  else
  {
    char buf[64];
    sprintf (buf, "__%s_%i", hsa_seg_name (symbol->m_segment),
       symbol->m_name_number);
     name = buf;
  }
  fprintf (f, "align(%u) %s_%s %s",
           hsa_byte_alignment (symbol->m_align),
           hsa_seg_name(symbol->m_segment),
           hsa_type_name(symbol->m_type & ~BRIG_TYPE_ARRAY_MASK),
           name);
  ....
}

PVS-Studio warning: V507 Pointer to local array 'buf' is stored outside the scope of this array. Such a pointer will become invalid. hsa-dump.c 704

The string is formed in the temporary buffer buf. The address of this temporary buffer is stored in the variable name, and is used further on in the body of the function. The error is that after the buffer is written to the variable name, the buffer itself will be destroyed.

We cannot use a pointer to a destroyed buffer. Formally, we are dealing with undefined behavior. In practice, this code may work quite successfully. Correct working of the program is one of the ways in which undefined behavior shows itself.

In any case, this code has an error and it must be fixed. The code can work due to the fact that the compiler may think it is unnecessary to use a temporary buffer for storing other variables and arrays further on. Then, although the array, created on the stack is considered to be destroyed, it will not be used, and the function will work correctly. But this luck can end any time, and the code that used to work for 10 years may suddenly start acting weirdly when upgrading to a new version of the compiler.

To fix this error, we should declare the buf array in the same scope as the name pointer:

static void
dump_hsa_symbol (FILE *f, hsa_symbol *symbol)
{
  const char *name;
  char buf[64];
  ....
}

Execution of similar actions regardless of the condition

The analyzer detected a code fragment that I cannot call erroneous with 100% certainty. However, it is really suspicious to do the check and then, regardless of the result, perform the same actions. Of course, it may work correctly, but this code fragment is worth revising for sure.

bool
thread_through_all_blocks (bool may_peel_loop_headers)
{
  ....
  /* Case 1, threading from outside to inside the loop
     after we'd already threaded through the header.  */
  if ((*path)[0]->e->dest->loop_father
      != path->last ()->e->src->loop_father)
  {
    delete_jump_thread_path (path);
    e->aux = NULL;
    ei_next (&ei);
  }
  else
  {
    delete_jump_thread_path (path);
    e->aux = NULL;
    ei_next (&ei);
  }
  ....
}

PVS-Studio warning: V523 The 'then' statement is equivalent to the 'else' statement. tree-ssa-threadupdate.c 2596

If this code has a bug, it's hard to say how to fix it. This is a case where it is necessary to be familiar with the project, in order to fix it.

Redundant expression of the (A == 1 || A != 2) kind

static const char *
alter_output_for_subst_insn (rtx insn, int alt)
{
  const char *insn_out, *sp ;
  char *old_out, *new_out, *cp;
  int i, j, new_len;

  insn_out = XTMPL (insn, 3);

  if (alt < 2 || *insn_out == '*' || *insn_out != '@')
    return insn_out;
  ....
}

PVS-Studio warning: V590 Consider inspecting this expression. The expression is excessive or contains a misprint. gensupport.c 1640

We are interested in the condition: (alt < 2 || *insn_out == '*' || *insn_out != '@').

It can be shortened to: (alt < 2 || *insn_out != '@').

I would venture to guess that the operator != should be replaced with ==. Then the code will make more sense:

if (alt < 2 || *insn_out == '*' || *insn_out == '@')

Zeroing a wrong pointer

Let us consider a function that frees the resources:

void
free_original_copy_tables (void)
{
  gcc_assert (original_copy_bb_pool);
  delete bb_copy;
  bb_copy = NULL;
  delete bb_original;
  bb_copy = NULL;
  delete loop_copy;
  loop_copy = NULL;
  delete original_copy_bb_pool;
  original_copy_bb_pool = NULL;
}

PVS-Studio warning: V519 The 'bb_copy' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 1076, 1078. cfg.c 1078

Take a look at these 4 lines of code:

delete bb_copy;
bb_copy = NULL;
delete bb_original;
bb_copy = NULL;

Accidentally, the pointer bb_copy is zeroed twice. Here is the correct version:

delete bb_copy;
bb_copy = NULL;
delete bb_original;
bb_original = NULL;

Assert that does not check anything

Invalid condition, being an argument of the macro gcc_assert, won't affect how correctly the program works, but will make the bug search more complicated, if there is such. Let us consider the code:

static void
output_loc_operands (dw_loc_descr_ref loc, int for_eh_or_skip)
{
  unsigned long die_offset
    = get_ref_die_offset (val1->v.val_die_ref.die);
  ....
  gcc_assert (die_offset > 0
        && die_offset <= (loc->dw_loc_opc == DW_OP_call2)
             ? 0xffff
             : 0xffffffff);
  ....
}

PVS-Studio warning: V502 Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '<=' operator. dwarf2out.c 2053

The priority of the ternary operator ?: is lower than of the comparison operator <=. This means that we are dealing with a condition like this:

(die_offset > 0 &&
  (die_offset <= (loc->dw_loc_opc == DW_OP_call2)) ?
    0xffff : 0xffffffff);

Thus, the second operand of the && operator can take the value 0xffff or 0xffffffff. Both of these values are true, so this expression can be simplified to:

(die_offset > 0)

This is clearly not what the programmer intended to get. To fix this, you should add a pair of parentheses:

gcc_assert (die_offset > 0
      && die_offset <= ((loc->dw_loc_opc == DW_OP_call2)
           ? 0xffff
           : 0xffffffff));

The ?: operator is very treacherous, and it's better not to use it in complex expressions. It is very easy to make a mistake. We have collected a large number of examples of such errors, which were found by PVS-Studio in various open source projects. I have also written in details about the ?: operator in thebook that I've mentioned earlier (see chapter N4: Beware of the ?: operator and enclose it in parentheses).

Forgotten "cost"

The structure alg_hash_entry is declared in the following way:

struct alg_hash_entry {
  unsigned HOST_WIDE_INT t;
  machine_mode mode;
  enum alg_code alg;
  struct mult_cost cost;
  bool speed;
};

The programmer decided to check if in the synth_mult function there is an object that is needed. To do this he needed to compare the structure fields. However, it seems like there is an error here:

static void synth_mult (....)
{
  ....
  struct alg_hash_entry *entry_ptr;
  ....
  if (entry_ptr->t == t
      && entry_ptr->mode == mode
      && entry_ptr->mode == mode
      && entry_ptr->speed == speed
      && entry_ptr->alg != alg_unknown)
  {
  ....
}

PVS-Studio warning: V501 There are identical sub-expressions 'entry_ptr->mode == mode' to the left and to the right of the '&&' operator. expmed.c 2573

Mode is checked twice, but cost isn't checked in any way. Perhaps, one of these comparisons should be removed, but there is a chance, that we should compare cost. It's difficult for me to say, but the code should be fixed.

Duplicated assignments

To my mind, the following code fragments don't pose any hazard to the life of the program, and it seems that the duplicated assignment can just be removed.

Fragment N1

type_p
find_structure (const char *name, enum typekind kind)
{
  ....
  structures = s;                   // <=
  s->kind = kind;
  s->u.s.tag = name;
  structures = s;                   // <=
  return s;
}

PVS-Studio warning: V519 The 'structures' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 842, 845. gengtype.c 845

Fragment N2

static rtx
ix86_expand_sse_pcmpistr (....)
{
  unsigned int i, nargs;
  ....
    case V8DI_FTYPE_V8DI_V8DI_V8DI_INT_UQI:
    case V16SI_FTYPE_V16SI_V16SI_V16SI_INT_UHI:
    case V2DF_FTYPE_V2DF_V2DF_V2DI_INT_UQI:
    case V4SF_FTYPE_V4SF_V4SF_V4SI_INT_UQI:
    case V8SF_FTYPE_V8SF_V8SF_V8SI_INT_UQI:
    case V8SI_FTYPE_V8SI_V8SI_V8SI_INT_UQI:
    case V4DF_FTYPE_V4DF_V4DF_V4DI_INT_UQI:
    case V4DI_FTYPE_V4DI_V4DI_V4DI_INT_UQI:
    case V4SI_FTYPE_V4SI_V4SI_V4SI_INT_UQI:
    case V2DI_FTYPE_V2DI_V2DI_V2DI_INT_UQI:
      nargs = 5;         // <=
      nargs = 5;         // <=
      mask_pos = 1;
      nargs_constant = 1;
      break;
  ....
}

PVS-Studio warning: V519 The 'nargs' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 39951, 39952. i386.c 39952

Fragment N3

The latter fragment seems to be stranger than the other ones. Perhaps there is some mistake here. The variable steptype is assigned with a value 2 or 3 times. It is very suspicious.

static void
cand_value_at (....)
{
  aff_tree step, delta, nit;
  struct iv *iv = cand->iv;
  tree type = TREE_TYPE (iv->base);
  tree steptype = type;                 // <=
  if (POINTER_TYPE_P (type))
    steptype = sizetype;                // <=
  steptype = unsigned_type_for (type);  // <=
  ....
}

PVS-Studio warning: V519 The 'steptype' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 5173, 5174. tree-ssa-loop-ivopts.c 5174

Conclusion

I'm glad I could write this article. Now I will have something to say in response to the comments like "PVS-Studio isn't necessary, because GCC issues the same warnings". As you can see, PVS-Studio is a very powerful tool and excels GCC in the diagnostic capabilities. I do not deny that the GCC has excellent diagnostics. This compiler, if properly set up, really brings out a lot of problems in the code. But PVS-Studio is a specialized and rapidly developing tool, which means that it will also be better at detecting errors in the code than the compilers.

I suggest taking a look at the analysis of other open-source projects, and visiting this section of our web-site. Also, those who use Twitter can follow me @Code_Analysis. I regularly post links to interesting articles about programming on C and C++, and also talk about the achievements of our analyzer.

An Ideal Way to Integrate a Static Code Analyzer into a Project

One of the most difficult things about using static analysis tools is managing false positives. There are a number of ways to eliminate them using the analyzer's settings or changing the code itself. I took a small project Apple II emulator for Windows as an example to show you how you can handle PVS-Studio's analysis report, and demonstrate by a number of examples how to fix errors and suppress false positives.

Introduction

I will be describing an ideal process of integrating the static analysis methodology into one's software project. The aim of this process is to eliminate all the false positives and genuine errors so that the analyzer finally generates 0 warnings. It is exactly this approach we were sticking to when working on the Unreal Engine 4 project.

In practice, you can seldom achieve an ideal workflow, though. That's why, when working on a large-scale project, it would be more sensible to use an alternative approach: you can hide all the current warnings and set the analyzer to display only those triggered by freshly written or modified code. For this purpose, the PVS-Studio analyzer provides a special mechanism storing the information related to warnings in a special base. To learn more, see the article Integrating Static Analysis into a Project with over 10 Mbytes of Source Code.

Now that you have all the diagnostic messages hidden, you can focus on the quality of code being written. While catching and fixing bugs in new code, you will quickly get to value the might and usefulness of the static analysis methodology. And when you have spare time for that, you can get back to fixing old, hidden, warnings, thus gradually implementing all the necessary edits in the project.

But let's get back to our ideal happy world. Imagine we can afford taking our time to comfortably work with the warnings output by the PVS-Studio analyzer.

In this article, I'm going to show you how to manage the analyzer's warnings, leading you through the entire process - from the first analysis to the moment when we get 0 messages in the message output window.

This is the reason why I picked a small project. I could choose a larger one, but then it'd be too tiresome for me to write the article and for you to read it. Well, it's going to be tiresome anyway. Even with a small project, the article would inevitably turn out large, but please read it carefully. It may help you use our code analyzer with more efficiency.

Our today's laboratory rat is the Apple II emulator for Windows project. The choice was absolutely random, so we won't discuss it. I didn't really care what project we'd take; the only requirement was that it should be small but at the same time buggy enough for us to find some interesting examples.

The project characteristics are the following:

Source code size: 3 Mbytes.
Number of code lines: 85700.
Analysis time (on 8 processor cores): 30 seconds.

The first launch

After the first launch of the analyzer, we've got the following diagnostic messages:

Figure 1. Diagnostic messages output at the first launch of the PVS-Studio analyzer on the Apple II emulator for Windows project.

In this article, I will only discuss warnings of the 1-st and 2-nd severity levels from the general analysis (GA) rule set. We could manage the 3-rd level as well, but the article would be just too huge then. So I'll only give a brief overview of Level 3 diagnostics but won't fix anything there.

Microoptimizations (OP) are of no interest for us at this point.

As for 64-bit diagnostics, there is no 64-bit configuration of this project, so they are not relevant either.

Having checked the project, I sorted all the warnings by their codes. You can do it by clicking on the "Code" column (see Figure 2).

Figure 2. PVS-Studio message window. The messages are sorted by the diagnostic number.

Message sorting by code makes it easier to work with the warnings: you have similar messages arranged in groups, so having figured out the reasons behind one message, you will find it easier to deal with the others in the same group.

Note. Some readers may wonder why we haven't enabled this type of message sorting by default. You see, we want to let our users see messages as they appear in the list while the analysis is still running. If we got them sorted right away, new messages would appear in different random places of the list instead of its end. That would result in the messages "jumping" all around and you not being able to work comfortably with such a "jerking" list.

Managing analyzer messages

The solution consists of three projects (you can see them in the Solution Explorer window in Figure 2). Two of these - zlib and zip_lib - are of no interest to us, so we need to exclude them from analysis. You can actually exclude only zip_lib as zlib is by default added into the exceptions list. Excluding certain files from analysis is done in PVS-Studio's settings window (section Don't Check Files):

Figure 3. The zip_lib project excluded from analysis.

I excluded the irrelevant project in advance, but you can easily do the same after the analysis. Moreover, you don't need to open the settings window to do this. Just call the drop-down menu and click on the corresponding command to quickly hide all the messages related to a certain file or folder. That's very convenient indeed. I recommend studying the article "PVS-Studio for Visual C++": it describes this and many other features that will allow you to efficiently and comfortably use the tool.

Now we have everything set up for working on the messages. We'll start with the instances of the V501 diagnostic and go on down the list. In total we'll discuss 32+49 = 81 messages. It's quite a lot, so we'll discuss some of them in detail and only briefly touch upon others.

A false positive in xxxxxREG macros

The first 6 messages are triggered by complex macros ADDXXREG, ADCHLREG, SBCHLREG, SBCHLREG. When they are expanded, excessive constructs appear which make the analyzer generate messages like this:

V501 There are identical sub-expressions to the left and to the right of the '^' operator: (tmp >> 8) ^ reg_ixh ^ reg_ixh z80.cpp 3444

The ADDXXREG macro is pretty large and consists of other macros, so I won't cite it here.

What matters to us is the fact that the XOR operation is executed over the reg_ixh variable twice. Therefore, the expression can be reduced to (tmp >> 8). However, there is actually no bug here; it's just an excessive expression when substituting certain macro arguments:

ADDXXREG(reg_ixh, reg_ixl, reg_ixh, reg_ixl, 15, 2);

These are false positives and we need to eliminate them. I suggest suppressing all the warnings associated with them. To do this, I added the following comments in the header file containing the definitions of these macros:

//-V:ADDXXREG:501
//-V:ADCHLREG:501
//-V:SBCHLREG:501
//-V:SBCHLREG:501

To learn more about this message suppression mechanism, see the corresponding documentation section.

We could actually do with just one comment. Since all the macros' names contain the letter sequence "REG", we can add only one comment //-V:REG:501 to suppress all the V501 warnings in any lines containing the "REG" sequence. But it's not a very good idea because you risk accidentally hiding a useful message that has nothing to do with those macros. A little better way is to add a parenthesis for the search mechanism: //-V:REG(:501. But as for this particular case, I believe we should overcome our laziness and insert the 4 comments as suggested at first.

An error in the sprint() function's parameters

sprintf( sText, "%s %s = %s\n"
  , g_aTokens[ TOKEN_COMMENT_EOL  ].sToken
  , g_aParameters[ PARAM_CATEGORY ].m_sName
  , g_aParameters[ eCategory ]
  );

The analyzer's diagnostic message: V510 The 'sprintf' function is not expected to receive class-type variable as fifth actual argument. debug.cpp 2300

Indeed, the fifth actual argument of the function is represented by a structure of the Command_t type. I suspect that what should be used instead is the following: g_aParameters[eCategory].m_sName. I've fixed the code accordingly.

Smelling ZeroMemory()

The next message tells us about an incompletely filled array: V512 A call of the 'memset' function will lead to underflow of the buffer 'pHDD->hd_buf'. harddisk.cpp 491

BYTE  hd_buf[HD_BLOCK_SIZE+1]; // Why +1?
ZeroMemory(pHDD->hd_buf, HD_BLOCK_SIZE);

The last byte can't be cleared. I'm not sure if this is an error or not. Note the comment: it seems like even the developers themselves don't know for sure what size the array should be and if it should be zeroed entirely.

Code like that is called "smelling". It doesn't necessarily contain a bug but it does look strange and suspicious and may cause some troubles later.

I will simply suppress this warning by a comment. You can fix the code yourself or use the drop-down menu command "Mark selected messages as False Alarms":

Figure 3. Inserting comments in the code to suppress diagnostic messages.

Selecting this command will make the analyzer automatically insert the comment:

ZeroMemory(pHDD->hd_buf, HD_BLOCK_SIZE); //-V512

A false positive when calling the memcpy() function

unsigned char random[ 256 + 4 ];
memcpy( &memmain[ iByte ], random, 256 );

The memcpy() function copies only part of the 'random' buffer. The analyzer doesn't like it and honestly warns us about it. In this particular case, the analyzer is wrong - there is no error. I have suppressed the warning by a comment like in the previous case. It doesn't look neat but I'm not sure if I can do a better thing in code that isn't mine.

Unnecessary operations

nAddress_ = 0;
nAddress_ = (unsigned)*(LPBYTE)(mem + nStack);
nStack++;
nAddress_ += ((unsigned)*(LPBYTE)(mem + nStack)) << 8;

The analyzer's diagnostic message: V519 The 'nAddress_' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 568, 569. debugger_assembler.cpp 569

The analyzer has noticed that the nAddress_ variable is assigned different values several times on end. There is no error here, it's just excessive code. I've removed the first line where the variable is assigned 0. Another way to get rid of the warning is to replace the second assignment with "+=".

A similar issue can be found in two other files:

File video.cpp (see lines 3310 and 3315). I've removed the unnecessary operation "pSrc += nLen;".

File Debug.cpp (see lines 5867 and 5868). I've replaced the following code:

char *p = sLine;
p = strstr( sLine, ":" );

with

char *p = strstr( sLine, ":" );

There's no need to speak in more detail about these fragments.

Error in the switch operator

The next diagnostic, V519, points at a really serious error. Although it is a classic one and everyone knows of it, programmers still tend to make it in every kind of program.

switch( c )
{
  case '\\':
    eThis = PS_ESCAPE;
  case '%':
    eThis = PS_TYPE;
    break;
  default:
    sText[ nLen++ ] = c;
    break;
}

The analyzer's diagnostic message: V519 The 'p' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 5867, 5868. debug.cpp 5868

The 'break' operator is missing after "eThis = PS_ESCAPE;". Because of that, the value of the 'eThis' variable will immediately change to PS_STYPE. And this is definitely an error. To fix it, I've added the 'break' operator.

Always false condition

inline static ULONG ConvertZ80TStatesTo6502Cycles(UINT uTStates)
{
  return (uTStates < 0) ?
      0 : (ULONG) ((double)uTStates / uZ80ClockMultiplier);
}

The analyzer's diagnostic message: V547 Expression 'uTStates < 0' is always false. Unsigned type value is never < 0. z80.cpp 5507

The programmer wanted to protect the code from the issue when a negative value is passed into the function. The protection won't work, however, because the 'uTStates' variable is unsigned.

I've added an explicit conversion to the 'INT' type:

return ((INT)uTStates < 0) ?
    0 : (ULONG) ((double)uTStates / uZ80ClockMultiplier);

The analyzer being too suspicious

In the next function, the analyzer worries about a possible array overrun.

void SetCurrentImageDir(const char* pszImageDir)
{
  strcpy(g_sCurrentDir, pszImageDir);
  int nLen = strlen( g_sCurrentDir );
  if( g_sCurrentDir[ nLen - 1 ] != '\\' )
  ....
}

The analyzer's diagnostic message: V557 Array underrun is possible. The value of 'nLen - 1' index could reach -1. applewin.cpp 553

If you pass an empty string into the function, its length will become zero and an array overrun will occur: g_sCurrentDir[ 0 - 1 ].

The analyzer doesn't know if this situation is possible or not, so it warns you just in case.

I don't know that either. If it is possible, then the analyzer has found a real bug; if not, then it's just a false positive.

I decided to treat it as the latter. But instead of adding a comment to suppress the warning, it will be better to fix the code itself. So I made an additional check in the function:

if (nLen == 0)
  return;

There is another fragment with a potential array overrun, but I must take care not to turn this article into a reference book. So I won't discuss this second fragment which I have simply suppressed by a comment. See the same file, line 556.

Assignment instead of comparison

if ((bytenum == 3) && (byteval[1] = 0xAA))
{

The analyzer's diagnostic message: V560 A part of conditional expression is always true: (byteval[1] = 0xAA). diskimagehelper.cpp 439

I'm sure the programmer actually wanted the '==' operation, not '='. If it were an assignment, they'd make it in a much more natural and sensible way:

if (bytenum == 3)
{
  byteval[1] = 0xAA;

So this is an error and it must be fixed:

if ((bytenum == 3) && (byteval[1] == 0xAA))

False positives caused by macros

if ((TRACKS_MAX>TRACKS_STANDARD) && ....)

The analyzer's diagnostic message: V560 A part of conditional expression is always true: ((35 + 5) > 35). diskimagehelper.cpp 548

If we expand the macros, we'll get the expression ((35 + 5) > 35). It is always true, but that's not an error.

This is the case when I'm not sure at all what I'd better do about the code. OK, I won't bother too much and will simply suppress the false positive by a comment: //-V560.

An unnecessary variable

During code refactoring, some variables may get "lost". They are used in the code somehow but you don't need them actually. This is, I guess, what happened to the bForeground variable:

BOOL    bForeground;
....
bForeground = FALSE;
....
if( bForeground )
  dwCoopFlags |= DISCL_FOREGROUND;
else
  dwCoopFlags |= DISCL_BACKGROUND;
....
if( hr == DIERR_UNSUPPORTED && !bForeground && bExclusive )

The 'bForeground' variable is not changed or used anywhere, anymore. And it makes the analyzer generate the warning: V560 A part of conditional expression is always true: !bForeground. mouseinterface.cpp 690

This example is interesting from the philosophical viewpoint. Is this message false or not? Even a human can't answer for sure. The analyzer is right as it has detected an anomaly; but from the human viewpoint, this fragment may as well be just unfinished code and then everything is OK.

As for us, let's treat it as another example of "smelling code". I've deleted the 'bForeground' variable.

Undefined behavior

*(mem+addr++) =
  (opcode >= BENCHOPCODES) ? 0x00 : ((addr >> 4)+1) << 4;

The analyzer's diagnostic message: V567 Undefined behavior. The 'addr' variable is modified while being used twice between sequence points. cpu.cpp 564

You don't know how exactly the expression will be calculated:

Perhaps the 'addr' variable will be incremented first and then used in the right part of the expression.
Or maybe just the other way round.

The correct code should look as follows:

*(mem+addr) =
  (opcode >= BENCHOPCODES) ? 0x00 : ((addr >> 4)+1) << 4;
addr++;

Incorrect arguments when calling the wsprintf() and similar functions

There are a few errors related to the issue when an incorrect number of actual arguments is passed into formatted output functions. In total there were 10 errors of this kind but we'll discuss only one of them:

wsprintf( sText, TEXT("%s full speed Break on Opcode: None")
  , sAction
  , g_iDebugBreakOnOpcode
  , g_aOpcodes65C02[ g_iDebugBreakOnOpcode ].sMnemonic
);

The analyzer's diagnostic message: V576 Incorrect format. A different number of actual arguments is expected while calling 'wsprintfA' function. Expected: 3. Present: 5. debug.cpp 939

When forming the string, the two last parameters are not taken into account. As an outside observer, I can't say for sure if these parameters are excess or the error is in the format string.

I accepted the first version and removed the parameters.

Similar issues can be found in the following code fragments:

Expected: 8. Present: 9. debug.cpp 7377
Expected: 3. Present: 4. debugger_help.cpp 1263
Expected: 3. Present: 4. debugger_help.cpp 1265
Expected: 3. Present: 4. debugger_help.cpp 1267
Expected: 3. Present: 4. debugger_help.cpp 1282
Expected: 3. Present: 4. debugger_help.cpp 1286
Expected: 3. Present: 4. debugger_help.cpp 1288
Expected: 5. Present: 4. debugger_help.cpp 1332
Expected: 3. Present: 4. frame.cpp 691
Expected: 3. Present: 4. frame.cpp 695

There are a couple of other fragments where "%08X" is used to print the pointer values. On the 32-bit system, it works well; but on the 64-bit one, the pointer will be printed only partially. The correct way is to use "%p". The following are the code fragments where other similar defects were found:

To print the value of pointer the '%p' should be used. tfe.cpp 507
To print the value of pointer the '%p' should be used. tfe.cpp 507

False positives in double comparisons

Though it's not its fault, the analyzer generated two false messages for repeating conditions. Let's discuss one of them:

if (nAddress <= _6502_STACK_END)
{
  sprintf( sText,"%04X: ", nAddress );
  PrintTextCursorX( sText, rect );
}

if (nAddress <= _6502_STACK_END)
{
  DebuggerSetColorFG( DebuggerGetColor( FG_INFO_OPCODE ));
  sprintf(sText, "  %02X",(unsigned)*(LPBYTE)(mem+nAddress));
  PrintTextCursorX( sText, rect );
}

The analyzer's diagnostic message: V581 The conditional expressions of the 'if' operators situated alongside each other are identical. Check lines: 2929, 2935. debugger_display.cpp 2935

There is no error; the programmer just divided the actions into two separate groups. From the analyzer's viewpoint, this code is strange: what if the conditions should be different? Anyway, we need to do something about the false positive. I decided to unite two conditional operators into one:

if (nAddress <= _6502_STACK_END)
{
  sprintf( sText,"%04X: ", nAddress );
  PrintTextCursorX( sText, rect );

  DebuggerSetColorFG( DebuggerGetColor( FG_INFO_OPCODE ));
  sprintf(sText, "  %02X",(unsigned)*(LPBYTE)(mem+nAddress));
  PrintTextCursorX( sText, rect );
}

I don't think the code has become less comprehensible because of this, but we have certainly got rid of the false positive.

The second message deals with a similar issue: V581 The conditional expressions of the 'if' operators situated alongside each other are identical. Check lines: 2237, 2245. debugger_display.cpp 2245

Figure 5. They advise showing some picture in the middle of a long article so that the readers could rest a bit. I'm not sure about what picture to add that would suit the subject of this article, so here you are this nice cat.

Dereferencing a pointer before checking it

In total the analyzer output 3 warnings related to this type of issues. Unfortunately, the code is pretty complicated in those fragments, so instead of the real code I will cite shorter and simpler pseudocode. For the first 2 warnings, it looks something like this:

int ZEXPORT unzGetGlobalComment(char *szComment)
{
  ....
  if (A)
  {
    *szComment='\0';
     return UNZ_ERRNO;
  }
  ....
  if ((szComment != NULL) && X)
  ....
}

The analyzer's diagnostic message: V595 The 'szComment' pointer was utilized before it was verified against nullptr. Check lines: 1553, 1558. unzip.c 1553

As you can see, the passed pointer 'szComment' can equal NULL - it is indicated by the (szComment != NULL) check.

However, there is a code fragment where the programmer bravely dereferences the pointer without checking it. That's dangerous. Perhaps 'szComment' can never become equal to 0 in practice, but the code is still dangerous and must be fixed.

Another similar issue: V595 The 'pToken_' pointer was utilized before it was verified against nullptr. Check lines: 811, 823. debugger_parser.cpp 811

And as for the last, third, case, it's a bit more complicated. I'm sick and tired of explaining to everyone that code like that is incorrect and must be fixed. The function is short, so here it is in full:

bool ArgsGetValue ( Arg_t *pArg,
                    WORD * pAddressValue_, const int nBase )
{
  TCHAR *pSrc = & (pArg->sArg[ 0 ]);
  TCHAR *pEnd = NULL;

  if (pArg && pAddressValue_)
  {
    *pAddressValue_ =
      (WORD)(_tcstoul( pSrc, &pEnd, nBase) & _6502_MEM_END);
    return true;
  }
  return false;
}

The analyzer's diagnostic message: V595 The 'pArg' pointer was utilized before it was verified against nullptr. Check lines: 204, 207. debugger_parser.cpp 204

The 'pArg' pointer can equal zero, which is indicated by the presence of the "if (pArg && pAddressValue_)" condition. But before being checked, it is used in the following expression:

TCHAR *pSrc = & (pArg->sArg[ 0 ]);

This expression leads to undefined behavior. You just can't dereference null pointers.

Many will argue that code like that doesn't access any memory but just calculates some address - therefore, there is no problem with it. Well, this interpretation of undefined behavior is just too narrow. Don't make guesses about how the compiler can or cannot behave and how the code will or won't work. Just keep in mind that you can't write it that way and there's no sense arguing why exactly.

Undefined behavior in code like that doesn't only have to do with accessing a zero address (which may never happen, indeed). It's that the compiler, for instance, is allowed to reduce the check condition to "if (pAddressValue_)". Since there is the "pArg->xxx" expression in the code, then the pointer is certainly not null and doesn't need to be checked.

It's pointless discussing this matter in more detail. If you want to learn more, see a special article on the subject: Null Pointer Dereferencing Causes Undefined Behavior.

The code is easy to fix - you just need to move the variable declaration inside the 'if' block.

A scary expression

The analyzer was confused by the following expression:

if ((cx > 4) & (cx <= 13))

The analyzer's diagnostic message: V602 Consider inspecting the '(cx > 4)' expression. '>' possibly should be replaced with '>>'. debug.cpp 8933

The analyzer sees that the operands of the '&' operator are variables of the 'bool' type. This is strange. In cases like this, a special logical operator '&&' is usually used.

It is a common practice to use the '&' operator for bitwise operations. That's why the analyzer has assumed that the programmer, too, intended to work with bits in this code:

if ((cx >> 4) & (cx <= 13))

It has been too precautious and turned out to be wrong, though. But there is some fault of the programmer as well. This code smells. A much more sensible way to write it is as follows:

if (cx > 4 && cx <= 13)

Unspecified behavior and horrible macros

It's unknown what exactly shifting negative values to the right will result in. You'd better never do that because the code's behavior may vary depending on the compiler.

const short SPKR_DATA_INIT = (short)0x8000;
if (g_nSpeakerData == (SPKR_DATA_INIT >> 2))

The analyzer's diagnostic message: V610 Unspecified behavior. Check the shift operator '>>'. The left operand 'SPKR_DATA_INIT' is negative. speaker.cpp 450

A way out is to declare the SPKR_DATA_INIT constant as unsigned. However, you will need to make a few more additional subtle fixes to prevent compiler and analyzer warnings regarding signed/unsigned number comparison.

The analyzer has detected 3 more similar dangerous fragments:

The left operand 'SPKR_DATA_INIT' is negative. speaker.cpp 453
The left operand '~0x180' is negative. tfe.cpp 869
The left operand '~0x100' is negative. tfe.cpp 987

By the way, when fixing the last two warnings, I stumbled upon 2 more errors. That is, the analyzer can also help you with catching bugs in an indirect way sometimes.

This is how the macro is used:

SET_PP_16(TFE_PP_ADDR_SE_BUSST, busst & ~0x180);

It is expanded into a long string, so I will only show you a part of it:

..... = (busst & ~0x180 >> 8) & 0xFF; .....

The >> shift operator's precedence is higher than that of the & operation. See the table: operation precedence.

The programmer expected the code to execute in the following order:

..... = ((busst & ~0x180) >> 8) & 0xFF; .....

While actually it will be like this:

..... = (busst & (~0x180 >> 8)) & 0xFF; .....

That is why the PVS-Studio analyzer warns us: "the left operand '~0x180' is negative".

See how dangerous macros can be?

Security holes

The functions sprintf(), wsprintf(), etc. are used in a very insecure way in this project. To put it brief, they are used in the following way:

sprintf(buf, STR);

If the STR string contains control characters such as "%s", there will be consequences no one can predict.

Code like this is usually treated as a vulnerability (see the details).

However, I don't think it's that much critical for an emulator; no one is going to attack it. But this code is dangerous by itself - it can easily crash the program or cause its incorrect execution.

The correct way of implementing this function is as follows: sprintf(buf, "%s", STR);

The analyzer has found quite a lot of other dangerous function calls - 21 messages in total.

Opposite conditions

// TO DO: Need way of determining if DirectX init failed
if (soundtype != SOUND_WAVE)
{
  if (soundtype == SOUND_WAVE)
    soundtype = SOUND_SMART;

The analyzer's diagnostic message: V637 Two opposite conditions were encountered. The second condition is always false. Check lines: 270, 272. speaker.cpp 270

As the comment suggests, the code is incomplete. I'm not sure what we should do in cases like that. I decided to comment out the second meaningless 'if':

if (soundtype != SOUND_WAVE)
{
  //if (soundtype == SOUND_WAVE)
  //  soundtype = SOUND_SMART;

Bad code alignment

The code looks as if both statements were related to the 'if' operator:

{
  if ((Slot4 == CT_MockingboardC) || (Slot4 == CT_Phasor))
    m_PropertySheetHelper.GetConfigNew().m_Slot[4] = CT_Empty;
    m_PropertySheetHelper.GetConfigNew().m_Slot[5] = CT_SAM;
}

The analyzer's diagnostic message: V640 The code's operational logic does not correspond with its formatting. The statement is indented to the right, but it is always executed. It is possible that curly brackets are missing. pagesound.cpp 229

So far as I get it, there's no bug in the code. But it is not a false positive either. The analyzer is definitely right to warn the user about it. We must fix the alignment:

{
  if ((Slot4 == CT_MockingboardC) || (Slot4 == CT_Phasor))
    m_PropertySheetHelper.GetConfigNew().m_Slot[4] = CT_Empty;
  m_PropertySheetHelper.GetConfigNew().m_Slot[5] = CT_SAM;
}

Incorrect handling of the strncat() function

strncat( sText, CHC_DEFAULT, CONSOLE_WIDTH );
strncat( sText, pHelp      , CONSOLE_WIDTH );

The analyzer's diagnostic message: V645 The 'strncat' function call could lead to the 'sText' buffer overflow. The bounds should not contain the size of the buffer, but a number of characters it can hold. debugger_help.cpp 753

The function's second argument represents the number of characters that can be added to the string. And a better and safer way to write this code is as follows:

strncat( sText, CHC_DEFAULT, sizeof(sText) - strlen(sText) - 1);
strncat( sText, pHelp      , sizeof(sText) - strlen(sText) - 1);

To learn more, see the description of the V645 diagnostic.

Unnecessary checks

For quite a long time now, the 'new' operator has been set to throw the std::bad_alloc exception when it fails to allocate memory. Nevertheless, you can still encounter unnecessary checks like the following one in various programs:

BYTE* pNewImageBuffer = new BYTE [uNewImageSize];
_ASSERT(pNewImageBuffer);
if (!pNewImageBuffer)
  return false;

The analyzer's diagnostic message: V668 There is no sense in testing the 'pNewImageBuffer' pointer against null, as the memory was allocated using the 'new' operator. The exception will be generated in the case of memory allocation error. diskimagehelper.cpp 197

_ASSERT and the check can and should be removed - they just make no sense here.

A few other similar issues:

mouseinterface.cpp 175
serialcomms.cpp 839
savestate.cpp 108
savestate.cpp 218
speech.cpp 40

User-defined system types

A few data types in this project are user-defined:

typedef unsigned long ULONG;
typedef void *LPVOID;
typedef unsigned int UINT;

There is no apparent error here. So let's treat this code as "smelling" and suppress the warnings with the help of the //-V677 comment.

The "Law of the Big Two" violated

There is the CConfigNeedingRestart class where the = operator is declared but a copy constructor is missing, which violates the "Law of the Big Two".

The class is pretty lengthy, so I won't cite it here. Just take my word for it.

All the fields of this class are primary types, so it doesn't need a = operator of its own at all. The class will be successfully copied automatically.

It's the same with the Disk_t class - in both cases we can remove the = operator.

The analyzer's diagnostic messages:

V690 The 'CConfigNeedingRestart' class implements the '=' operator, but lacks a copy constructor. It is dangerous to use such a class. config.h 7
V690 The 'Disk_t' class implements the '=' operator, but lacks a copy constructor. It is dangerous to use such a class. disk.cpp 74

A typo

int nHeight=nHeight=g_aFontConfig[ FONT_CONSOLE ]._nFontHeight;

The analyzer's diagnostic message: V700 Consider inspecting the 'T foo = foo = ...' expression. It is odd that variable is initialized through itself. debugger_display.cpp 1226

It's just a typo. I've changed it to:

int nHeight = g_aFontConfig[ FONT_CONSOLE ]._nFontHeight;

The analyzer being too worried about enumerations

The 'AppMode_e' enumeration includes the following named constants: MODE_LOGO, MODE_PAUSED, MODE_RUNNING, MODE_DEBUG, MODE_STEPPING.

The analyzer is worried about not all them being used in this switch():

switch (g_nAppMode)
{
  case MODE_PAUSED  : _tcscat(.....); break;
  case MODE_STEPPING: _tcscat(.....); break;
}

The analyzer's diagnostic message: V719 The switch statement does not cover all values of the 'AppMode_e' enum: MODE_DEBUG, MODE_LOGO, MODE_RUNNING. frame.cpp 217

As for this code, I'm feeling somewhat ashamed of the analyzer, frankly. It's just that its empirical algorithms failed us. This message is a false positive, and there are a number of ways to eliminate it. For example, we can add the "default" branch in the code.

switch (g_nAppMode)
{
  case MODE_PAUSED  : _tcscat(.....); break;
  case MODE_STEPPING: _tcscat(.....); break;
  default: break;
}

Another similar false positive: V719 The switch statement does not cover all values of the 'AppMode_e' enum: MODE_DEBUG, MODE_LOGO. frame.cpp 1210

I promised you to briefly discuss Level 3 warnings

We do not recommend (at least at start) checking the 3-rd level at all. There are too many false or uninteresting or specific messages there. And that's just the case with this project.

For instance, there are pretty many V601 warnings in this code.

inline int IsDebugBreakpointHit()
{
  if ( !g_bDebugNormalSpeedBreakpoints )
    return false;
  return _IsDebugBreakpointHit();
}

The analyzer's diagnostic message: V601 The 'false' value is implicitly cast to the integer type. debug.h 210

The function returns the 'int' type, while there is a line "return false".

The analyzer is right to pick on this code but in practice there are hardly any bugs to find in fragments like that. That's why we put this warning in the Level 3 group.

And here's an example of a specific diagnostic:

double g_fClksPerSpkrSample;
....
if ((double)g_nRemainderBufferSize != g_fClksPerSpkrSample)

The analyzer's diagnostic message: V550 An odd precise comparison. It's probably better to use a comparison with defined precision: fabs(A - B) > Epsilon. speaker.cpp 197

Whether or not this code is correct depends on the application and the values stored in the variables of the 'double' type.

Some users enjoy this diagnostic very much; others argue that they use double to store integer values and are very much aware of what they are doing when comparing them. Well, you just can't please everybody.

Running the analyzer after fixing all the bugs

Now that we have fixed all the messages (of Levels 1 and 2), we can relaunch the analyzer. The result is an expected one - all the warnings are gone (see Figure 6).

Figure 6. There are no more warnings of the 1-st and 2-nd levels.

This is an ideal approach that can only be applied to small projects. Nevertheless, I hope that I have managed to convince you that there's nothing extremely difficult about managing analyzer diagnostic messages. Although some of them proved to be false positives, we still haven't faced any troubles with them and have fixed them all.

Summing up

People often ask us how many false positives our analyzer usually generates. We don't have an exact answer because gathering such statistics is very difficult and they won't make much sense anyway. The number of false positives varies greatly across different projects.

There is also a problem with data interpretation. For example, a badly written macro which is intensively used throughout an entire project may affect the statistics so that they show 20 times more false positives than genuine errors. It's not a problem, though. You just need to suppress the warnings in this macro, and the number of false positives will drop by 90% or so at once.

Another trouble about it has to do with the fact that programmers don't usually take into account that some warnings are difficult to put in a certain category. What such diagnostics reveal is not bugs but "smelling code". Such code should be fixed because even if it works well for now, it may fail in the future. In this article, I've shown you a few examples of these diagnostics.

Programmers, however, are inclined to binary logic and insist on getting a precise answer to the question: "Is this a false positive? Yes or No?" If you have read this article carefully, then I hope you won't pose the question in such a categorical way.

As you can see, it's hard to speak about the number of false positives in general. But if we take a particular small project, then we can answer this question in relation to this project.

The statistics on diagnostic messages output by the PVS-Studio analyzer for the Apple II emulator for Windows project are the following:

Total number of messages generated (General Analysis rule set, Levels 1 and 2): 81
Real errors: 57
"Smelling code" fragments that need fixing: 9
False positives: 15

The same in the percent form:

Real errors: 70 %
"Smelling" code: 11 %
False positives: 19 %

Conclusion

Welcome to try the PVS-Studio analyzer on your project. You can download the demo version here:http://www.viva64.com/en/pvs-studio-download/

And please tell your colleagues and friends about our static analyzer. I'll appreciate if you post your messages in twitter or any other news feed. Thank you!

P.S. To keep up with our new articles and news from the C/C++ world, follow me on twitter:https://twitter.com/Code_Analysis

Thank you all for reading!