Programming C++, C#: September 2016

Friday, September 30, 2016

Thursday, September 29, 2016

While you are still thinking, whether to download PVS-Studio or not, the bug is already threatening your computer.

Wednesday, September 28, 2016

C/C++/C# programmers! We are happy to announce that a new product from PVS-Studio team will be released in October. We will present a static code analyzer for Linux! Very soon our Unicorn will meet the Penguin! Stay updated following the news in our

Facebook group www.facebook.com/StaticCodeAnalyzer/?ref=settings

Instagram - pvsstudio

Monday, September 26, 2016

Birth and early life of a bug

Friday, September 23, 2016

How to conquer a bug?

Thursday, September 22, 2016

Linux

Tux is a penguin character and the official mascot of the Linux kernel. Originally created as an entry to a Linux logo competition, Tux is the most commonly used icon for Linux, although different Linux distributions depict Tux in various styles. The character is used in many other Linux programs and as a general symbol of Linux.

The concept of the Linux mascot being a penguin came from Linus Torvalds, the creator of Linux. Tux was created by Larry Ewing in 1996 after an initial suggestion made by Alan Cox and further refined by Linus Torvalds on the Linux kernel mailing list. Torvalds took his inspiration from an image he found on an FTP site, showing a penguin figurine looking strangely like the Creature Comforts characters made by Nick Park. The first person to call the penguin "Tux" was James Hughes, who said that it stood for "(T)orvalds (U)ni(X)". However, tux is also an abbreviation of tuxedo, the outfit which often springs to mind when one sees a penguin.

Tux was originally designed as a submission for a Linux logo contest. Three such competitions took place; Tux won none of them. This is why Tux is formally known as the Linux mascot and not the logo.

Tuesday, September 20, 2016

Fighting the bug

Friday, September 16, 2016

Exercises for a programmer

Thursday, September 15, 2016

Null Pointer Dereferencing Causes Undefined Behavior

I have unintentionally raised a large debate recently concerning the question of whether it is legal in C/C++ to use the &P->m_foo expression with P being a null pointer. The programmers' community divided into two camps. The first claimed with confidence that it isn't legal, while the others were as sure that it is. Both parties gave various arguments and links, and it occurred to me that at some point I had to make things clear. For that purpose, I contacted Microsoft MVP experts, and the Visual C++ Microsoft development team communicating through a closed mailing list. They helped me to prepare this article and now everyone interested is welcome to read it. For those who can't wait to learn the answer: That code is NOT correct.

Debate history

It all started with an article about a Linux kernel check with the PVS-Studio analyzer. But the issue doesn't have anything to do with the check itself. The point is that in that article I cited the following fragment from Linux' code:

static int podhd_try_init(struct usb_interface *interface,
        struct usb_line6_podhd *podhd)
{
  int err;
  struct usb_line6 *line6 = &podhd->line6;

  if ((interface == NULL) || (podhd == NULL))
    return -ENODEV;
  ....
}

I called this code dangerous because I thought it to cause undefined behavior.

After that, I got a pile of emails and comments, readers objecting to that idea of mine, and was even close to giving in to their convincing arguments. For instance, as proof of that code being correct they pointed out the implementation of the offsetof macro, typically looking like this:

#define offsetof(st, m) ((size_t)(&((st *)0)->m))

We deal with null pointer dereferencing here, but the code still works well. There were also some other emails reasoning that since there had been no access by null pointer, there was no problem.

Although I tend to be gullible, I still try to double-check any information I may doubt. I started investigating the subject, and eventually wrote a small article: "Reflections on the Null Pointer Dereferencing Issue".

Everything suggested that I had been right: One cannot write code like that. But I didn't manage to provide convincing proof for my conclusions, and cite the relevant excerpts from the standard.

After publishing that article, I was again bombarded by protesting emails, so I thought I should figure it all out once and for all. I addressed language experts with a question, to find out their opinions. This article is a summary of their answers.

About C

The '&podhd->line6' expression is undefined behavior in the C language when 'podhd' is a null pointer.

The C99 standard says the following about the '&' address-of operator (6.5.3.2 "Address and indirection operators"):

The operand of the unary & operator shall be either a function designator, the result of a [] or unary * operator, or an lvalue that designates an object that is not a bit-field and is not declared with the register storage-class specifier.

The expression 'podhd->line6' is clearly not a function designator, the result of a [] or * operator. It is an lvalue expression. However, when the 'podhd' pointer is NULL, the expression does not designate an object since 6.3.2.3 "Pointers" says:

If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.

When "an lvalue does not designate an object when it is evaluated, the behavior is undefined" (C99 6.3.2.1 "Lvalues, arrays, and function designators"):

An lvalue is an expression with an object type or an incomplete type other than void; if an lvalue does not designate an object when it is evaluated, the behavior is undefined.

So, the same idea in brief:

When -> was executed on the pointer, it evaluated to an lvalue where no object exists, and as a result the behavior is undefined.

About C++

In the C++ language, things are absolutely the same. The '&podhd->line6' expression is undefined behavior here when 'podhd' is a null pointer.

The discussion at WG21 (232. Is indirection through a null pointer undefined behavior?), to which I referred to in the previous article, brings in some confusion. The programmers participating in it insist that this expression is not undefined behavior. However, no one has found any clause in the C++ standard permitting the use of "podhd->line6" with "podhd" being a null pointer.

The "podhd" pointer fails the basic constraint (5.2.5/4, second bullet) that it must designate an object. No C++ object has nullptr as address.

Summing it all up

struct usb_line6 *line6 = &podhd->line6;

This code is incorrect in both C and C++, when the podhd pointer equals 0. If the pointer equals 0, undefined behavior occurs.

The program running well is pure luck. Undefined behavior may take different forms, including program execution in just the way the programmer expected. It's just one of the special cases of undefined behavior, and that's all.

You cannot write code like that. The pointer must be checked before being dereferenced.

Additional ideas and links

When considering the idiomatic implementation of the 'offsetof()' operator, one must take into account that a compiler implementation is permitted to use what would be non-portable techniques to implement its functionality. The fact that a compiler's library implementation uses the null pointer constant in its implementation of 'offsetof()' doesn't make it OK for user code to use '&podhd->line6' when 'podhd' is a null pointer.
GCC can / does optimize, assuming no undefined behavior ever occurs, and would remove the null checks here -- the Kernel compiles with a bunch of switches to tell the compiler not to do this. As an example, the experts refer to the article "What Every C Programmer Should Know About Undefined Behavior #2/3".
You may also find it interesting that a similar use of a null pointer was involved in a kernel exploit with the TUN/TAP driver. See "Fun with NULL pointers". The major difference that might cause some people to think the similarity doesn't apply is that in the TUN/TAP driver bug, the structure field that the null pointer accessed was explicitly taken as a value to initialize a variable, instead of simply having the address of the field taken. However, as far as standard C goes, taking the address of the field through a null pointer is still undefined behavior.
Is there any case when writing &P->m_foo where P == nullptr is OK? Yes, for example when it is an argument of the sizeof operator: sizeof(&P->m_foo).

Acknowledgements

This article was made possible thanks to the experts whose competence I can see no reason to doubt. I want to thank the following people for helping me in writing it:

Michael Burr is a C/C++ enthusiast who specializes in systems level and embedded software including Windows services, networking, and device drivers. He can often be found on the StackOverflowcommunity answering questions about C and C++ (and occasionally fielding the easier C# questions). He has 6 Microsoft MVP awards for Visual C++.
Billy O'Neal is a (mostly) C++ developer, and contributor to StackOverflow. He is a Microsoft Software Development Engineer on the Trustworthy Computing Team. He has worked at several security related places previously, including Malware Bytes and PreEmptive Solutions.
Giovanni Dicanio is a computer programmer, specializing in Windows operating system development. Giovanni wrote computer programming articles on C++, OpenGL, and other programming subjects on Italian computer magazines. He contributed code to some open-source projects as well. Giovanni likes helping people solving C and C++ programming problems on Microsoft MSDN forums, and recently on StackOverflow. He has 8 Microsoft MVP awards for Visual C++.
Gabriel Dos Reis is a Principal Software Development Engineer at Microsoft. He is also a researcher and a longtime member of the C++ community. His research interests include programming tools for dependable software. Prior to joining Microsoft, he was Assistant Professor at Texas A&M University. Dr. Dos Reis was a recipient of the 2012 National Science Foundation CAREER award for his research in compilers for dependable computational mathematics and educational activities. He is a member of the C++ standardization committee.

References

Wikipedia. Undefined Behavior.
A Guide to Undefined Behavior in C and C++. Part 1, 2, 3.
Wikipedia. offsetof.
LLVM Blog. What Every C Programmer Should Know About Undefined Behavior #2/3.
LWN. Fun with NULL pointers. Part 1, 2.

Tuesday, September 13, 2016

Accord.Net: Looking for a Bug that Could Help Machines Conquer Humankind

Articles discussing the results of analysis of open-source projects are a good thing as they benefit everyone: some, including project authors themselves, can find out what bugs lurk in a project; others discover for themselves the static analysis technology and start using it to improve their code's quality. For us, it is a wonderful means to promote PVS-Studio analyzer, as well as to put it through some additional testing. This time I have analyzed Accord.Net framework and found lots of interesting issues in its code.

About the project and the analyzer

Accord.Net is a .NET machine learning framework written in C#. It is comprised of several libraries covering a wide range of tasks such as static data processing, machine learning, pattern recognition, and so forth. The source code can be downloaded from the GitHub repository.

The project was scanned with PVS-Studio static code analyzer, which can be downloaded here. I also encourage you to take a look at other articles on analysis of open-source projects, and the "bug database", where we collect bugs found by our tool.

A few words about warnings

The analyzer issued 91 first-level and 141 second-level warnings. In this article, I discuss or mention 109 warnings out of the total number of warnings issued. Looking through the other warnings, I found 23 more issues that looked like errors, but I don't mention them here because they are of little interest or look very similar to those already discussed. As for the remaining warnings, they are a bit harder to classify and would require more thorough investigation. So, out of 232 warnings, at least 132 report real errors. This figure tells us that the ratio of false positives for this project is about 46%. Oh, wait, sorry... It actually tells us that half the warnings deal with real bugs! It looks like a rather weighty argument for why we need to use static analysis tools. At the end of the article, I will talk about how to and how not to use static analysis, but for now let's see what interesting issues were be found in Accord.Net.

Errors found

Identical subexpressions

It's quite easy to let in errors detected by diagnostic V3001, especially when you use the copy-paste technique or when variables used in an expression have similar names. This type of errors is one of the most common and can be found in this project, too. Try to spot the error in the following fragment without reading the warning description.

public Blob[] GetObjects(UnmanagedImage image, 
                         bool extractInOriginalSize)
{
  ....
  if ((image.PixelFormat != PixelFormat.Format24bppRgb)    &&
      (image.PixelFormat != PixelFormat.Format8bppIndexed) &&
      (image.PixelFormat != PixelFormat.Format32bppRgb)    &&
      (image.PixelFormat != PixelFormat.Format32bppArgb)   &&
      (image.PixelFormat != PixelFormat.Format32bppRgb)    &&
      (image.PixelFormat != PixelFormat.Format32bppPArgb)
      )
  ....
}

PVS-Studio warning: V3001 There are identical sub-expressions 'image.PixelFormat != PixelFormat.Format32bppRgb' to the left and to the right of the '&&' operator. Accord.Imaging BlobCounterBase.cs 670

The image.PixelFormat != PixelFormat.Format32bppRgb subexpression is repeated twice. When you have names like that for enumeration elements, making a mistake becomes very easy - and that's exactly what happened in this example. An extra subexpression like that is very difficult to notice when just skimming the code. A trickier question is whether one of the comparisons is really redundant or it was meant to work with some other enumeration value instead. The first case simply deals with redundant code, while the second implies a logical error.

The same code snippet was found one more time in the same file, line 833. Copy-paste... Copy-paste never changes.

Wrong loop termination condition

We all are used to naming loop counters with names like i, j, k, etc. It's a convenient technique, and quite a common one, but sometimes it may backfire on you, as shown in the following example.

public static void Convert(float[][] from, short[][] to)
{
  for (int i = 0; i < from.Length; i++)
    for (int j = 0; i < from[0].Length; j++)
      to[i][j] = (short)(from[i][j] * (32767f));
}

PVS-Studio warning: V3015 It is likely that a wrong variable is being compared inside the 'for' operator. Consider reviewing 'i' Accord.Audio SampleConverter.cs 611

Variable i is used in the termination condition of the second loop, while variable j is used as its counter. As for i, it does not change within the nested loop. Therefore, the j variable will be incremented until it goes beyond the array bounds, causing an exception to be thrown.

Different logic blocks for identical conditions

The following code fragment contains two identical if statements with different logic blocks.

public void Fit(double[][] observations, 
                double[] weights, 
                MultivariateEmpiricalOptions options)
{
  if (weights != null)
    throw new ArgumentException("This distribution does not support  
                                 weighted  samples.", "weights");
  ....
  if (weights != null)
      weights = inPlace ? weights : (double[])weights.Clone();
  ....
}

PVS-Studio warning: V3021 There are two 'if' statements with identical conditional expressions. The first 'if' statement contains method return. This means that the second 'if' statement is senseless Accord.Statistics MultivariateEmpiricalDistribution.cs 653

Strange code, isn't it? Especially considering that the weights variable is a method parameter and is not used at all between the conditions. Therefore, the second if statement will never execute because if theweights != null expression is true, ArgumentException will be thrown.

The same code was found one more time in the same file, line 687.

Conditions that are always false

Diagnostic V3022 has grown much nicer since the analyzer's first release and keeps surprising me. Let's see if it can surprise you as well. First try to find the error in the code below without reading the diagnostic message. Mind that this is an abridged version of the code, with some lines left out.

private static void dscal(int n, double da, double[] dx, 
                          int _dx_offset, int incx)
{
  ....
  if (((n <= 0) || (incx <= 0)))
  {
    return;
  }
  ....
  int _i_inc = incx;
  for (i = 1; (_i_inc < 0) ? i >= nincx : i <= nincx; i += _i_inc)
  ....
}

PVS-Studio warning: V3022 Expression '(_i_inc < 0)' is always false. Accord.Math BoundedBroydenFletcherGoldfarbShanno.FORTRAN.cs 5222

Finding the error now that irrelevant lines have been removed is very easy of course. However, you still can't say right away where exactly the error is hiding. The point is (as you might have guessed after reading the warning) that the (_i_inc < 0) expression is always false. Note also that the _i_inc variable is initialized to the value of variable incx, which is a positive number at the moment of initializing _i_incbecause the method executed a bit earlier would terminate if it were otherwise. Therefore, the _i_incvariable can have only a positive value, so the _i_inc < 0 comparison will always evaluate to false, and the loop termination condition will always be i <= nincx.

Such profound analysis has become possible thanks to the mechanism of virtual values, which has significantly improved some diagnostics of the analyzer.

private void hqr2()
{
  ....
  int low = 0;
  ....
  for (int i = 0; i < nn; i++)
  {
      if (i < low | i > high)
        ....
  }
  ....
}

PVS-Studio warning: V3063 A part of conditional expression is always false: i < low. Accord.Math JaggedEigenvalueDecompositionF.cs 571

The i < low subexpression will always be false, as the least value the i variable can take is 0, while low, too, will always be referring to 0 when this comparison is evaluated. That is, the i < low subexpression will be "running idle" all the time.

There were a lot of defects like that. Here's just a few:

V3063 A part of conditional expression is always false: i < low. Accord.Math JaggedEigenvalueDecompositionF.cs 972
V3063 A part of conditional expression is always false: i < low. Accord.Math JaggedEigenvalueDecomposition.cs 571
V3063 A part of conditional expression is always false: i < low. Accord.Math JaggedEigenvalueDecomposition.cs 972
V3063 A part of conditional expression is always false: i < low. Accord.Math EigenvalueDecomposition.cs 567

Integer division with casting to real type

The analyzer detected suspicious calculations. Programmers often forget that division of integer values is performed as integer division by default. If it was meant to be real division instead, you might get a nasty and elusive error. It's hard sometimes for a programmer who is not involved in a project to tell when such expressions are incorrect, but they must be checked anyway. Let's examine a few more similar cases.

public static double GetSpectralResolution(int samplingRate, 
                                           int samples)
{
  return samplingRate / samples;
}

PVS-Studio warning: V3041 The expression was implicitly cast from 'int' type to 'double' type. Consider utilizing an explicit type cast to avoid the loss of a fractional part. An example: double A = (double)(X) / Y;. Accord.Audio Tools.cs 158

The method above performs division of two integers, but the result of this operation is implicitly cast to type double, which looks strange.

The next example is even stranger:

public static int GreatestCommonDivisor(int a, int b)
{
  int x = a - b * (int)Math.Floor((double)(a / b));
  while (x != 0)
  {
    a = b;
    b = x;
    x = a - b * (int)Math.Floor((double)(a / b));
  }
  return b;    
}

PVS-Studio warnings:

V3041 The expression was implicitly cast from 'int' type to 'double' type. Consider utilizing an explicit type cast to avoid the loss of a fractional part. An example: double A = (double)(X) / Y;. Accord.Math Tools.cs 137
V3041 The expression was implicitly cast from 'int' type to 'double' type. Consider utilizing an explicit type cast to avoid the loss of a fractional part. An example: double A = (double)(X) / Y;. Accord.Math Tools.cs 142

The analyzer didn't like the (double)(a / b) expression. The Floor method returns the largest integer less than or equal to the specified double-precision floating-point number (MSDN. Math.Floor). The variablesa and b, however, are of type int, so integer division will be performed, producing an integer. It turns out that explicitly casting this value to type double and calling the Floor method make no sense.

To execute that operation correctly, the programmer should have cast one of the operands to typedouble. In that case, it would execute as real division and calling the Floor method would make sense:

Math.Floor((double)a / b)

The value of a method parameter is constantly overwritten

Let's go on. Errors of this type are rather rare, but they still show up every now and then.

private static double WeightedMode(double[] observations, 
                                   double[] weights, 
                                   double mode, 
                                   int imax, 
                                   int imin)
{
  ....
  var bestValue = currentValue;
  ....
  mode = bestValue;
  return mode;
}

PVS-Studio warning: V3061 Parameter 'mode' is always rewritten in method body before being used. Accord.Statistics TriangularDistribution.cs 646

One of the method's parameters, mode, is overwritten and returned, although it's not used at all within the method (except when it is overwritten). I can't say for sure if this is an error (some of the similar issues found in other projects obviously were bugs), but this code does look strange.

There is, by the way, one interesting thing about this project: almost each triggered diagnostic is triggered more than once. The same code as in the example above was found in several other parts of the project. Indeed, copy-paste never changes...

V3061 Parameter 'mode' is always rewritten in method body before being used. Accord.Statistics TriangularDistribution.cs 678
V3061 Parameter 'mode' is always rewritten in method body before being used. Accord.Statistics TriangularDistribution.cs 706
V3061 Parameter 'mode' is always rewritten in method body before being used. Accord.Statistics TriangularDistribution.cs 735

Null dereference

public override string ToString(string format, 
                                IFormatProvider formatProvider)
{
  ....
  var fmt = components[i] as IFormattable;
  if (fmt != null)
    sb.AppendFormat(fmt.ToString(format, formatProvider));
  else
    sb.AppendFormat(fmt.ToString());
  ....
}

PVS-Studio warning: V3080 Possible null dereference. Consider inspecting 'fmt'. Accord.Statistics MultivariateMixture'1.cs 697

Bugs that result in throwing exceptions are usually caught during the development process if the code is tested thoroughly enough. But sometimes they slip by, as proved by the example above. If the fmt != nullcondition is false, the instance method ToString of the fmt object is called. What's the result? RaisingNullReferenceException.

As you have probably guessed already, this diagnostic was triggered one more time: MultivariateMixture'1.cs 697

Mutual assignment of references

public class MetropolisHasting<T> : IRandomNumberGenerator<T[]>
{
  ....        
  T[] current;
  T[] next;
  ....
  public bool TryGenerate()
  {
    ....
    var aux = current;
    current = next;
    next = current;
   ....
  }
  ....
}

PVS-Studio warning: V3037 An odd sequence of assignments of this kind: A = B; B = A;. Check lines: 290, 289. Accord.Statistics MetropolisHasting.cs 290

In the fragment of method TryGenerate above, the programmer obviously wanted to swap the references to arrays next and current (the aux variable is not used anywhere else) but made a mistake assigning a reference to the same array to both variables current and next - the array that was previously referred to by the reference stored in next.

This is what the fixed code should look like:

var aux = current;
current = next;
next = aux;

Potential division by zero

There were a few potential division-by-zero errors. Let's check them briefly:

public BlackmanWindow(double alpha, int length) 
    : base(length)
{
    double a0 = (1.0 - alpha) / 2.0;
    double a1 = 0.5;
    double a2 = alpha / 2.0;

    for (int i = 0; i < length; i++)
        this[i] = (float)(a0 - 
          a1 * Math.Cos((2.0 * System.Math.PI * i) / (length - 1)) +
          a2 * Math.Cos((4.0 * System.Math.PI * i) / (length - 1)));
}

PVS-Studio warnings:

V3064 Potential division by zero. Consider inspecting denominator '(length - 1)'. Accord.Audio BlackmanWindow.cs 64
V3064 Potential division by zero. Consider inspecting denominator '(length - 1)'. Accord.Audio BlackmanWindow.cs 65

The warning was triggered by the following code:

(2.0 * System.Math.PI * i) / (length - 1)

In reality, this potential error might never show up (length is the length of some window and must be 1 for the error to occur), but who knows? We'd better play safe; otherwise we risk getting a nasty error, which could also be hard to track down.

There was another interesting code fragment with potential division by zero.

public static double[,] Centering(int size)
{
  if (size < 0)
  {
      throw new ArgumentOutOfRangeException("size", size,
          "The size of the centering matrix must 
           be a positive integer.");
  }

  double[,] C = Matrix.Square(size, -1.0 / size);

  ....
}

PVS-Studio warning: V3064 Potential division by zero. Consider inspecting denominator 'size'. Accord.Math Matrix.Construction.cs 794

Personally I find this error very interesting and funny. The analyzer warned us about potential division by zero in the -1.0 / size expression. Now take a look at the check a bit earlier. If size < 0 , an exception will be thrown, but if size == 0 , there will be no exception, but we will get division by 0. At the same time, it is mentioned in the literal passed to the exception constructor that the matrix size must be a positiveinteger, while the check is done against non-negative values; and positive and non-negative are different things, after all. It seems like we could fix the error by simply adjusting the check:

if (size <= 0)

Using a bitwise operator instead of a logical one

At times you have to deal with the problem of some programmers not knowing the difference between bitwise and logical operators ('|' and '||', '&' and '&&'). Possible implications range from extra computations to crashes. In this project, the analyzer found a few strange fragments with bitwise operations:

public JaggedSingularValueDecompositionF(
         Single[][] value,
         bool computeLeftSingularVectors, 
         bool computeRightSingularVectors, 
         bool autoTranspose, 
         bool inPlace)
{
  ....
  if ((k < nct) & (s[k] != 0.0))
  ....
}

PVS-Studio warning: V3093 The '&' operator evaluates both operands. Perhaps a short-circuit '&&' operator should be used instead. Accord.Math JaggedSingularValueDecompositionF.cs 461

The body of the if statement will execute if both subexpressions (k < nct and s[k] != 0.0) are true. However, even if the first subexpression (k < nct) is false, the second will be evaluated anyway, which wouldn't happen if the programmer used the && operator. So, if they wanted the value of k to be checked to avoid going beyond the array bounds, they failed.

Other similar issues:

V3093 The '&' operator evaluates both operands. Perhaps a short-circuit '&&' operator should be used instead. Accord.Math JaggedSingularValueDecompositionF.cs 510
V3093 The '&' operator evaluates both operands. Perhaps a short-circuit '&&' operator should be used instead. Accord.Math JaggedSingularValueDecompositionF.cs 595
V3093 The '&' operator evaluates both operands. Perhaps a short-circuit '&&' operator should be used instead. Accord.Math JaggedSingularValueDecomposition.cs 461
V3093 The '&' operator evaluates both operands. Perhaps a short-circuit '&&' operator should be used instead. Accord.Math JaggedSingularValueDecomposition.cs 510
V3093 The '&' operator evaluates both operands. Perhaps a short-circuit '&&' operator should be used instead. Accord.Math JaggedSingularValueDecomposition.cs 595
V3093 The '&' operator evaluates both operands. Perhaps a short-circuit '&&' operator should be used instead. Accord.Math Gamma.cs 296

Checking one and the same element in a loop

One bug was found through a diagnostic added in the recent analyzer version.

public override int[] Compute(double[][] data, double[] weights)
{
  ....
  int cols = data[0].Length;
  for (int i = 0; i < data.Length; i++)
    if (data[0].Length != cols)
      throw new DimensionMismatchException("data", 
                  "The points matrix should be rectangular. 
                   The vector at position {} has a different
                   length than previous ones.");
  ....
}

PVS-Studio warning: V3102 Suspicious access to element of 'data' object by a constant index inside a loop. Accord.MachineLearning BinarySplit.cs 121

It's quite an interesting bug. The programmer wanted to make sure that the jagged array data is two-dimensional (i.e. is a matrix) but made a mistake in the data[0].Length != cols expression and indexed into it with an integer literal, 0, instead of the loop counter i. As a result, the data[0].Length != colsexpression is always false, as it is equivalent to expression data[0].Length != data[0].Length. Had thedata parameter been a two-dimensional array (Double[,]), this error, as well as the entire check, could have been avoided. However, the use of the jagged array might be determined by some specifics of the application's architecture.

Passing a calling object as an argument to a method

The following code fragment looks strange, too.

public static double WeightedMean(this double[] values, 
                                       double[] weights)
{
  ....
}

public override void Fit(double[] observations, 
                         double[] weights, 
                         IFittingOptions options)
{
  ....
  mean = observations.WeightedMean(observations);
  ....
}

PVS-Studio warning: V3062 An object 'observations' is used as an argument to its own method. Consider checking the first actual argument of the 'WeightedMean' method. Accord.Statistics InverseGaussianDistribution.cs 325

The analyzer didn't like that the WeightedMean method receives as an argument the same object it is called from. It's even stranger considering that WeightedMean is an extension method. I did some additional investigation to see how this method was used in other parts of the application. Everywhere it is used, the second argument is represented by array weights (note that this array is also present in theFit method, which we are discussing), so it does look like an error, and then the fixed code should look like this:

mean = observations.WeightedMean(weights);

Potential serialization error

The analyzer detected a potential issue related to serialization of one of the classes.

public class DenavitHartenbergNodeCollection :  
  Collection<DenavitHartenbergNode>
{ .... }

[Serializable]
public class DenavitHartenbergNode
{
  ....
  public DenavitHartenbergNodeCollection Children 
  { 
    get; 
    private set; 
  }
  ....
}

PVS-Studio warning: V3097 Possible exception: the 'DenavitHartenbergNode' type marked by [Serializable] contains non-serializable members not marked by [NonSerialized]. Accord.Math DenavitHartenbergNode.cs 77

When serializing an instance of class DenavitHartenbergNode, SerializationException exception may be thrown - it depends on what serializer type is selected. If it is, for example, an instance of typeBinaryFormatter, the exception will be thrown because all serializable members (and this property is such a member) are required to be annotated with the attribute [Serializable].

Here are some ways to fix this error:

implement this property through a field annotated with the [NonSerialized] attribute. In that case, the field (and therefore the associated property) won't be serialized;
implement the ISerializable interface and set the GetObjecData method to ignore serialization of this property;
annotate the DenavitHartenbergNodeCollection type with the attribute [Serializable].

Judging by the surrounding code (all the other properties are serializable), it is the third scenario that must be implemented.

However, if instances of this type are serialized by serializers that do not require all serializable members to be annotated with the [Serializable] attribute, there is nothing to worry about.

The analyzer found lots of unsafe event calls. How many? 75 V3083 warnings! Let's examine only one such example because they all look almost the same.

private void timeUp_Elapsed(object sender, ElapsedEventArgs e)
{
  ....
  if (TempoDetected != null)
    TempoDetected(this, EventArgs.Empty);
}

PVS-Studio warning: V3083 Unsafe invocation of event 'TempoDetected', NullReferenceException is possible. Consider assigning event to a local variable before invoking it. Accord.Audio Metronome.cs 223

This code checks if there are any subscribers to the TempoDetected event and calls it if the check proves true. The programmer assumed that the check would help avoid throwing an exception if no subscribers to TempoDetected were found. However, there is some chance that at the moment between testing TempoDetected for null and the call to the event, no subscribers will be left (for example, they could unsubscribe from it in other threads). In that case, NullReferenceException will be thrown. To avoid problems like that, you may use, for example, the null-conditional operator '?.', which was added in C# 6.0. To learn more about this issue and other ways to resolve it, see the documentation on the diagnostic rule.

How to and how not to use static analyzers

Before finishing the article, I'd like to say a few words about how one should use static analysis tools. The following approach is very common: "We've tested our project before a release and haven't found anything of much interest". No, no, no! It's the worst way of using static analysis. To make it clearer, here's an analogy: stop using the IDE when developing applications and write all your code in Notepad instead; then, before the very release, switch back to the IDE. Sounds crazy, doesn't it? Of course it does! The IDE wouldn't be of much use if you left it sitting idle on your SSD/HDD for most of the development time when it could have really helped. It's just the same thing with static analyzers - they must be applied regularly, not occasionally.

When you run an analyzer on your code just before the release, it's obvious that most of the bugs have been fixed already. But at what cost? At the cost of the developers' nerves and time, and numerous tests designed to catch those very bugs. Taking all that into account, the cost of fixing these errors is, to put it mildly, quite a big one.

All these troubles, however, could be avoided if you integrated a static analyzer into the development process in the right way. Having it installed on every developer's machine, you can set it all up in such a way that most of the bugs that can be detected by the analyzer are found and fixed before they get a chance to make it to the repository. Moreover, finding and fixing an error that hasn't yet become overgrown with various dependencies is much cheaper. The incremental analysis mode, which allows you to catch errors right after they have appeared, makes analysis even more efficient.

Another good technique is to integrate static analysis into night builds. It can help catch errors quicker and also find out who let them slip in the repository, which is also a good way to motivate developers to be more careful when writing code.

To sum up, it is the regular use of static analysis tools that allows developers to benefit from them in the best way possible.

Conclusion

It was one more opportunity for me to scan an interesting project and find as interesting errors to share with you so that you could take note of something we have discussed, or learn something new, or just try to be more careful when writing code. Nevertheless, we are all humans, and to err is human. PVS-Studio can help fix errors in your code, and remember that regular use of static analyzers helps reduce the number of troubles you are faced with when hunting for bugs and fixing them.

Programming C++, C#