Wednesday, April 26, 2017

Videos about static code analysis





IT conferences and meetings on programming languages see a growing number of speakers talking about static code analysis. Although this field is quite specific, there is still a number of interesting discussions to be found here to help programmers understand the methods, ways of use, and specifics of static code analysis. In this article, we have collected a number of videos on static analysis whose easy style of presentation makes them useful and interesting to a wide audience of both skilled and novice programmers.

What is Static Analysis?

Picture 2

Author: Matt Might
Static analyzers allow programmers to bound and predict the behavior of software without running it. Once used exclusively for program optimization, they have rapidly risen in prominence for areas like software security and automatic parallelization. The author takes you on a tour of the landscape of static analysis through the lens of abstract interpretation.

Static code analysis used for code clean up

Picture 15
Author: PVS-Studio team
The report gives information about ways to detect bugs, methodology of static analysis, correct and incorrect use of analysis tools. The author also provides myths about static analysis that may lead to erroneous understanding among the developers. The presentation shows errors in Open Source projects, detected by such tools as ReSharper, PVS-Studio, Visual Studio SCA.

Static Code Analysis: Scan All Your Code For Bugs

Picture 12
Author: Jared DeMott
The author discusses static code analysis and how it is used in bug elimination. The talk covers a discussion of pattern matching, procedural, data flow, and statistical analysis, and also includes examples of common software vulnerabilities such as memory corruption, buffer overflow and over reads, script injection, XSS and CSRF, command injection, and misconfigurations.

Static Code Analysis: Preventing Bugs and Lag Before They Happen

Picture 6
Author: Vinny DaSilva
A Unit 16 Los Angeles forum session. The author discusses how to use static code analysis tools to improve code quality throughout the development process, how to customize static code analysis to fit teams' specific needs and workflows, and how to integrate with continuous integration systems to give developers continuous feedback.
Make more Secure Code! - Overview of Security Development Lifecycle and Static Code Analysis
Picture 13
Author: Jason Cohen
Despite the exponential growth in security products, security services, security companies, security certifications, and general interest in the security topic, we still see security vulnerability disclosures happening on a regular basis. Implementing Security Development Lifecycle best practices and principles can go a long way to reducing the potential for common security flaws. Input sanitization issues, Cross-Site-Scripting, buffer overflows, and many other known issues still represent the bulk of security issues present. Static Code Analysis can help catch many of these unnoticed issues before code makes it out of the developer's hands. In this video, the author discusses some common best practices of the Security Development Lifecycle theory and how this can be integrated into modern code.

Bug Hunting with Static Code Analysis

Picture 27
Author: Nick Jones
A BSlidesLondon 2016 conference video. This talk covers a number of automated analysis techniques for spotting bugs and security flaws in applications at the source code level, ranging from quick and dirty bash scripts through open source and commercial analyzers to custom implementations. The video also discusses how these techniques can be used in continuous integration systems to catch bugs as early in the development cycle as possible.

The Current State of (free) Static Analysis

Picture 4
Author: Jason Turner
A CPPCON2015 conference video. The author discusses the currently available free static analysis software available for C++ and explains what kinds of errors these tools can catch, what kinds they miss, and why static analysis should be a part of a normal build process.

Static Analysis and C++: More Than Lint

Picture 14
Author: Neil MacIntosh
A CPPCON2015 conference video. Static analysis can find not only trivial bugs but also subtle, complex bugs early, identify opportunities to improve performance, encourage consistent style and appropriate usage of libraries and APIs. This talk looks at the different purposes static analysis tools can be used to meet all these different goals. Specific examples are presented from the author's experience working with sophisticated analysis tools on large, commercial codebases.

Make Friends with the Clang Static Analysis Tools

Picture 5
Author: Gabor Horvath
A CPPCON2016 conference video. This talk is an overview of the open source static analysis tools for C++ with the emphasis on Clang based tools. Understanding these methods can be really useful as it helps write more static analysis friendly code and understand the cause of false positive results. It also helps to understand limitations of the currently available tools. The author gives a short tutorial on how to use these tools and how to integrate them into the workflow.

Finding Bugs with Clang at Compile and Run Time

Picture 22
Author: Bernhard Merkle
An ACCU 2016 conference video. Code analysis and verification gain more and more importance within programming and quality assurance of software projects. Especially in languages like C/C++, undefined behavior and memory leaks can cause great problems. Static analysis tools help a lot but often hard to detect problems happen at runtime. This session shows how to use clang's features to find bugs at both compile time (via static analysis) and runtime (via sanitizers). The combination of both approaches can improve software quality a lot.

Static Source Code Analysis, The Next Generation

Picture 26
Author: James Croall
A Devoxx 2016 conference video. Gone are the days of "linters" and glorified spell checkers. Today's static source code analysis is accurate and trustworthy, and can find complex inter-procedural coding defects that our human eyes would never see. The video discusses how open-source developers have used Coverity's Software Testing Platform to find and fix critical, crash causing bugs and security defects in the Java language.

Static Analysis Saved My Code Tonight

Picture 8
Author: Damien Seguy
A PHP UK Conference 2017 video. Static analysis tools check PHP code without running it. Fully automated, they bring expertise to review the code, enforce good practices when programming, keep code ready for the next PHP version. PHP 7 has developed tremendously our capacity to audit code -thanks to AST and return types, it is possible to go deeper and prevent more bugs. In this video, the author reviews the current state of static analysis tools and shows what they can find and how to integrate them in the development cycle.

Static Code Analysis with Python

Picture 28
Author: Andrew Wolfe
Auditing a code base for code formatting mistakes, potential security vulnerabilities or defects can be time consuming. Static code analysis will let the computer do that for you. The video discusses how to use code static analysis to catch errors early and improve code quality in Python codebases.

Augmenting Static Analysis Using Pintool: Ablation

Picture 17
Author: Paul Mehta
A BH USA 2016 conference video. Ablation is a tool supplementing static analysis built to extract information from a process as it executes. This information is then imported into the disassembly environment where it used to resolve virtual calls, highlight regions of code executed, or visually diff samples. The goal of Ablation is to augment static analysis with minimal overhead or user interaction. Ablation makes it simple to diff samples by and highlight where the samples diverge. This is achieved by comparing the code executed rather than just comparing data. The video also compares a heavily mutated crash sample and the source sample.

Conclusion

You may find that some of the videos cover the same aspects, but each codebase is unique and one developer's experience may be different from that of another. The authors share their knowledge of the static analysis methodology and experience of using static analysis tools with the audience to prevent them from making the same mistakes, straining their nerves and wasting their time on finding and fixing these mistakes. The static analysis field is intensively developing; some diagnostic rules inevitably become obsolete, whereas new diagnostics and standards appear. That's why attempts to compare analyzers based on what defects they can detect or running them on synthetic tests make no sense. The only way to compare tools is to run them on your code and see which of them meets your needs and expectations the most.

Thursday, April 20, 2017

PVS-Studio team: code audit and other services

Our team is known as a developers of PVS-Studio. In addition, using our experience, we help companies tackle some of their specific tasks. Not many people know about this direction of our activity. To fill this information gap, I decided to write this post. So, please, take 5 minutes to read about additional services we provide.
Picture 1
Our main activity is the development and sales of the static code analyzer PVS-Studio. But, somehow by itself, there is a parallel activity related to the sale of our competences. This can be called outsourcing, but we don't like that word. It does not reflect that we carry out work in those areas where we are experts. That is, in one way or another, our work intersects with the topic of static code analysis or our large knowledge in C++ and C#.

In general, our team performed the analysis (audit) of third-party code using our or third-party tools. However, we are not limited only by the audit. For example, we did a large job migrating an application to 64-bit platform. More details can be found in the article "How to Port a 9 Million Code Line Project to 64 bits?"
Almost all of our work was done after signing NDA and, unfortunately, I can't talk about them. Speaking about open tasks, I can refer to Epic Games for which we carried out the improvement of Unreal Engine code. Here is an article about it: "How the PVS-Studio Team Improved Unreal Engine's Code".
Since there is a practice of doing third-party orders, we decided we should continue and expand this activity. Possible directions:
  • One-time or regular audit of your code (C, C++, C#, Java).
  • Porting large projects to 64-bit platforms.
  • Development of OEM solutions based on PVS-Studio static code analyzer.
  • Development of a coding standard.
  • Tasks, not related to the original topic. As we have the cooperation, why not do this too...
Cooperation is not limited to listed areas. Rather, they are given for example. We are ready to discuss other options and types of work. All in all, anything you can pay for.
I suggest continue the discussion the tasks that we could help with. Contact information.
If you are not sure, whether to write regarding any issues or not, then just write. Communication in any case is useful. It's hard to get where the cooperation can lead to.

By Andrey Karpov 

ClueCon Weekly - April 19th 2017 - PVS analyzer

The guys from the PVS analyzer crew stop by to talk about their analyzer.


Wednesday, April 19, 2017

If the coding bug is banal, it doesn't meant it's not crucial

Spreading the word about PVS-Studio static analyzer, we usually write articles for programmers. However, some things are seen by programmers quite one-sided. That is why there are project managers who can help manage the process of the project development and guide it to the right direction. I decided to write a series of articles, whose target audience is project managers. These articles will help better understand the use of static code analysis methodology. Today we are going to consider a false postulate: "coding errors are insignificant".
Picture 1
Recently I've written an article "A post about static analysis for project managers, not recommended for the programmers". It was quite expected that people started commenting that there is no use in a tool that find simple errors. Here is one of such comments:

The reason is simple: the main bugs are in algorithms. In the work of analysts, mathematicians, there are not that many bugs in the coding.
Nothing new, I should say. Again, we see a myth that "expert developers do not make silly mistakes". Even if they make, it's nothing serious: such bugs are easy to find and they aren't crucial, as a rule.
I don't see the point in discussing the idea that professionals don't make banal errors. This topic was already covered in the articles several times. If everything is that simple, why have these professionals made so many errors in the well-known projects? By this moment, we have found more than 11000 errors, although we have never had a goal to find as many errors as possible: this was just our byproduct of writing articles.
It would be much more interesting to discuss this topic: a lot of programmers think that it's possible to make really serious errors only when writing algorithms. So I want to warn the managers that it is not so - any bug can be critical. I do not deny that errors in algorithms are extremely important, but we should not underestimate the importance of typos and common blunders.
Some programmers claim that if their analyzer cannot find bugs in complex algorithms, it is not needed. Yes, the analyzer is not capable of finding complicated algorithmic errors, but it requires artificial intelligence, which is not created yet. Nevertheless, it is equally important and necessary to look for simple errors, as well as for algorithmic ones.
I suggest having a look at three examples, so that I don't sound unfounded.
For a start, I ask you to recall a critical vulnerability in iOS that appeared due to a double goto.
if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
  goto fail;
if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
  goto fail;
  goto fail;
if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
  goto fail;
Details can be found in the article Apple's SSL/TLS bug. It's not important, if this error appeared because of a typo or unsuccessful merge. It is obvious, that it is a "mechanical" error that has nothing to do with mathematics or algorithms. Still, this error can be detected by the analyzer PVS-Studio.
Now, here is a vulnerability in MySQL: Security vulnerability in MySQL/MariaDB sql/password.c.
typedef char my_bool;
....
my_bool check(...) {
  return memcmp(...);
}
The error appears because of implicit type casting (int -> char), during which the values of the higher bits are ignored. Again, this error has no relation to the complex algorithms, and was easily detected by PVS-Studio. Despite it simplicity, this error leads to the fact that in one out of 256 cases, on some platforms, the procedure of comparing a hash with an expected value always will return 'true' regardless of the hash.
The third example. Once I took part in the development of the package of numerical simulation of gas-dynamic processes. There was a lot of mathematics, algorithms and so on. Of course, there were math issues. But I remember that were much more problems related to the migration of the code to the 64-bit system. By the way, it was that moment when there appeared an idea to create Viva64 analyzer, that later evolved in PVS-Studio (story: "PVS-Studio project - 10 years of failures and successes").
One of the errors was caused by improper file positioning in the file with the help of _fseeki64 function. When the modeling package became 64-bit, it could handle large amounts of data, and as a result, write large size of data on the disk. But then, it could not read it correctly. I cannot say that the code wasn't really badly written. It had something like this:
unsigned long W, H, D, DensityPos;
....
unsigned long offset = W * H * D * DensityPos;
res = _fseeki64(f, offset * sizeof(float), SEEK_SET);
We have an overflow when the variables are multiplied. Of course, when the programmer was writing this code, he couldn't assume that the size of the long type will remain 32-bit in Win64 (ILP32LL). We spent a lot of time looking for this bug. When you see such pseudocode, everything seems very clear and simple. In practice it was very hard to understand, why strange bugs appear when exceeding a certain threshold of the size of processed data. The week of debugging could be easily avoided if the code was checked by PVS-Studio that could easily find the described bug. The algorithms and mathematics didn't cause any troubles when porting to the 64-bit system.
As you can see, simple mistakes can lead to serious consequences. It's better to find as many of them as possible with the help of static analyzer without spending hours and days debugging. And even more so, it is better to find the error yourself. The worst case scenario: it turns out that your application has a vulnerability, but it is already installed on tens of thousands computers.
It is also useful to find as many simple errors as possible using several tools, so that you could spend more time looking for defects in algorithms and creating a new functionality.
By the way, I suggest the managers, reading this article to use our services of the project check. We can conclude a small contract, in the scope of which we can examine the project and fix all the errors, that we will be able to find. Firstly, it can be useful in any case, secondly, if you are pleased with the result, it will open the way for the further cooperation. If necessary, we are ready to sign an NDA. I suggest discuss details by mail.
Additional links:

Monday, April 17, 2017

A post about static code analysis for project managers, not recommended for the programmers

If you consider yourself a good programmer or at least you think your level is above average, I do not recommend reading this article. This article is meant for the managers of software projects. I would like to discuss here quite important, but very boring issues (for programmers), related to the methodology of static code analysis.
Picture 2
I hope, this introduction has stopped some programmers from reading. Someone may still continue reading, but this is the free will. We are going to speak about things that aren't much pleasant for the programmers.

Our company develops a static code analyzer PVS-Studio, meant to look for errors in the code of programs, written in C, C++ and C#. The idea is simple: we run the analyzer and examine those code fragments that seemed suspicious to it. In general, this is some kind of automatic code-review.
For any manager and the programmers themselves it is clear that the high-quality code is better than of low-quality. It's also clear to everybody that the less glitches has the program, the better.
But here comes the most interesting part. Although everybody understands importance of the high-quality code, programmers often get furious when they are offered to use static analysis on the code. It seems as they were offended, their professionalism was questioned, and they are eager to reply with all their arsenal of negative criticism. Over the years we have seen such a large amount of brickbats like this:
  • "this is a tool for students, in our team we have only experts, that's why if we have errors, they are only related to the synchronization of the threads".
  • "we do not use static code analysis as we hire only professionals that are above average"
I think that if a project manager would be present during such discussions, then quite a lot of programmers would get an old-fashioned look for the arrogance. Every manager can recall a situation when they were looking for a bug which turned out to be a stupid typo. Or, there were cases when the manager started being very skeptical: if everything is fine, why does the application continue crashing from time to time? The managers don't see the process of the project development and the appearing issues as positively as the developers.
Figure 1. Programmers are often too rosy, being convinced that everything is fine.
Figure 1. Programmers are often too rosy, being convinced that everything is fine.
So I would like to unravel up a mystery, which is of course is not a mystery at all. Programmers have a trouble with estimating their level of proficiency. An article "The Problem With 'Above Average Programmers" gives a nice explanation of this effect. I quote an excerpt:
How would you rate your programming skills? (Below Average, Average, or Above Average)?
Based on psychological studies across many different groups, about 90% of all programmers will answer "Above Average".
Of course, that can't possible be true. In a group of 100 people, 50 are above average, 50 are below average. This effect is known as illusory superiority. It is described in may spheres, but even if you haven't heard about this, you will most likely answer "above average".
This is a problem, that prevents programmers from learning new technology and methodology. In our case it hinders positive attitude to the static code analysis. It is very unpleasant for a programmer to realize that some program will teach how to code. It is even more unpleasant when the analyzer detects stupid errors in the ideal code and makes them publicly available. It takes some will and wisdom to note and accept faults and defects in the code. It's much easier to write some negative feedback about the tool and in general about some technology and continue living in the comfortable and closed world.
The analyzer PVS-Studio finds bugs in the code of such projects as Qt, MSBuild, LLVM, GCC, GDB, Chromium. Developers of these projects cannot be rated as below average. However, this doesn't prevent the programmers to answer that their code is very qualitative and that the analysis is not relevant for them at all.
I like asking in such cases: who made these 11000 errors, if there are only "above average" professionals everywhere? The question is, of course, rhetorical, and I'm not waiting for an answer.
I think managers are realizing what I am getting at. Static code analyzers are essential during the development of medium and large-sized projects. It helps to fight against a lot of errors and control the development process in general. Regular checks help to detect abnormal growth of the number of bugs, caused by the fact that there is some programmer who is about to quit the job and writes sloppy code, because he doesn't care much, but has to pretend that he is working. This situation was made up, of course, but it is really necessary to have a tool that can asses the quality of the project and the analyzer warnings is one one of the best metrics.
By the way, here is an interesting story on the topic. Quite recently, my colleague has checked CryEngine V project and found a lot of bugs. There were so many High level warnings that my colleague didn't have enough patience to have a look at all of them. Then, all of a sudden, we learn that Crytek has been experiencing some problems and several programmers aren't getting the salaries for more than 3 months already. An unhealthy situation in the company affected the quality of the code. It was interesting to see such a clear relationship.
In general, you should not hesitate to force programmers to use static code analysis. Let it be not PVS-Studio, but some Cppcheck (that does simple checks, but for free). This will already be great. The programmer may start objecting, so the manager has to remain firm.
Often the programmers refer to the fact that the work with the analyzer takes time, forcing to view false positives.
I cannot help being sarcastic here. Oh, well... To spend a day setting up the analyzer and suppressing the false positives is too much. At the same time to sit and look for the bug for a week is totally fine.
Proof: a bug that took about 50 hours of unsuccessful search was found and detected in less than an hour.
By the way, it's very simple to integrate the analyzer to the development process if there is no need to look for the errors in the old code. PVS-Studio can show errors that are related only to the new or the edited code (see Mass Suppression of Analyzer Messages).
The managers should be very skeptical about the objections of the programmers of why they do not need a static analyzer. They may really not need them. It is necessary for the project. This is one of the methodologies, like unit-tests, for example, that allows increasing the reliability of the code and reducing the expenses for its maintenance. The quicker the errors will be found, the better. Static analyzers help detecting errors at the earliest stage, i.e. once they appear in the code.
Figure 2. Please note, some programmers use static analysis in a wrong way.
Figure 2. Please note, some programmers use static analysis in a wrong way.
Another important point. Some programmers may claim that there were few errors found during the testing, so the implementation of the analyzer is not rational. Don't let them confuse you. Of course, there won't be a lot of bugs in the working code, otherwise the program won't work. However, finding and fixing bugs can be very expensive. Often, a lot of user complaints and hours of meditation of the programmers in the debugger could be avoided just be one run of the code analyzer. The whole point of and the use of static analysis is in its regular use, rather than in occasional runs.
I've heard myself several times, that some programmers run static analyzers before the release. Is there are people who do this and state that this is normal, they are not suitable for this profession. This is the most incorrect way of using the static analyzers. It is the same as turning on the compiler warnings before the release, but disable them during the rest of the time.
Thank you for your attention. I suggest everybody who is interested in the methodology of static analysis in general and in PVS-Studio analyzer in particular, contact us. We can help to check your project, set up the analyzer and show how to deal with the issued warnings.

Tuesday, April 11, 2017

War of the Machines: PVS-Studio vs. TensorFlow

"I'll be back" (c). I think everybody knows this phrase. Although, today we aren't going to talk about the return of the terminator, the topic of the article is similar in some way. We'll discuss the analysis of the the machine learning library TensorFlow and will try to find out, if we can sleep peacefully or Skynet is already coming...
Picture 14

TensorFlow

TensorFlow is a machine learning library, developed by Google corporation and available as an open-source project since November, 9th, 2015 year. At the moment it is actively used in research work and in dozens of commercial products of Google, including Google Search, Gmail, YouTube, Photos, Translate, Assistant, etc. The source code is available at the repository on GitHub and on the Google Open Source platform.

Picture 13
Why was this project chosen?
  • Google. If a project is developed by Google, Microsoft or any other famous developers, its analysis is a kind of a challenge for us. Besides that, a lot of people would be interested to see the mistakes made by developers from big name corporations.
  • Machine learning. Nowadays, this subject is gaining more and more popularity. For good reason, some of the results are truly impressive! I won't bring the examples here, you may easily find them yourselves.
  • Statistics on GitHub. This is also quite an important criterion, because the more popular is the project, the better. TensorFlow is breaking all the possible and impossible records! It takes one of the top places among C++ projects, has more than 50 000 stars and over 20 000 forks! It's amazing!
Of course, we cannot miss a chance to check such a project. I don't even know why my colleagues haven't checked it yet. Well, it's time to do this.

What was the tool of the analysis?

If you know what PVS-Studio is, then you know the answer. In case you are still not aware, please don't rush reading on. For example, it could be interesting to know that we have a C# analyzer for more than a year, and a Linux version for roughly half a year.
Picture 16
Here is also the general information about the tool. The analysis was done using a static code analyzer PVS-Studio that finds bugs in programs written in C, C++ and C#. PVS-Studio works under Linux and Windows; currently it has more than 400 diagnostics, whose description you may find on this page.
Besides developing the static analyzer, we also check open source projects and write reports about the results. By this moment we have checked more than 280 projects, where we found more than 10 800 errors. These aren't some small and insignificant projects, but quite well known ones: ChromiumClangGCCRoslynFreeBSDUnreal EngineMono and others.
PVS-Studio is available for download, that's why I suggest trying it on your project and checking out what it may find in your code.
By the way, PVS-Studio has its own tag on StackOverflow (link). I recommend asking questions there, so that other developers could quickly find the necessary information without waiting for our reply by the e-mail. In our turn we are always glad to help our users.

The article format

This time I want to divert from a traditional flow of the analysis: Downloaded the project - checked - wrote about the found bugs. I also want to tell about some analyzer settings and the ways they can be useful. In particular I will show how to fight against false positives, how to benefit from disabling certain diagnostics and excluding particular files from the analysis. Of course, we will have a look at the errors that PVS-Studio managed to find in the source code TensorFlow.

Preparation for the analysis

Now that PVS-Studio is also available under Linux, we have a choice of how to perform the analysis: under Linux or Windows. Quite recently I checked one project under openSUSE, which was quite simple and convenient, but still I decided to check TensorFlow under Windows. It was more familiar for me. What's more, it can be built using CMake which presupposes further work in the Visual Studio IDE, for which we have a special plugin (the latest version obtained code highlighting of erroneous fragments).
Officially, the built of TensorFlow under Windows is not supported (according to the website). Nevertheless, there is also a link of how to build a project using CMake. I should say that it gets easily built according to these instructions.
As a result we get a set of .vcxproj files, combined as one .sln, which means that further on it will be comfortable to work with the project from Visual Studio, which is great. I worked from the Visual Studio 2017 IDE, whose support was added to the PVS-Studio 6.14 release.
Note. It's a good idea to build a project before the analysis and make sure that it gets compiled and there are no errors. It is necessary to reassure that the analysis will be done efficiently and the analyzer will have all the syntactic and semantic information. There is now a note on the TensorFlow site: By default, building TensorFlow from sources consumes a lot of RAM. Well, it's okay, because I have a 16 GB RAM on the machine. What do you think? During the build I had a Fatal Error C1060 (compiler is out of heap space)! My machine ran our of memory! It was quite unexpected. No, I didn't have five virtual machines running concurrently with the build. In all fairness it has to be added, that using bazel for build, you may limit the number of RAM used (the description is given in TensorFlow build instructions).
I couldn't wait to press the sacred button "Analyze solution with PVS-Studio" and see those interesting bugs we found, but first it would be great to exclude those files from the analysis which aren't much interesting: third-party libraries, for example. It can be easily done in the PVS-Studio settings: on the tab 'Don't Check Files' we set masks of those files and paths, whose analysis is of no interest. The settings already have a certain set of paths (/boost/, for example). I have replenished it with two masks: /third_party/ and /external/. This allows not only excluding warnings from the output window, but also excluding the files of the directories from the analysis, which decreases the analysis time.
Figure 1 - Set exception analysis in PVS-Studio preferences
Figure 1 - Set exception analysis in PVS-Studio preferences
Finally, we can run the analysis and see what was found.
Note. 'Don't Check Files' can be configured before and after the analysis. I have just described the first case, the second scenario allows filtering the obtained log, which is also useful and can save you from viewing unnecessary warnings. This will be described below.

False positives: arithmetic and fun

Why false positives are important (and frustrating)

False positives - a headache for everybody: for us, the developers of a static code analyzer and for the users because they clutter up useful output. A large number of false positives may repel people from using the tool. In addition, people usually judge the analyzer based on the criteria of the percentage of false positives. It is not that easy as it may seem, and this topic is for another article and discussion. My colleague has recently written an article about this, I recommend having a look at it.

How to fight against false positives?

Our task is to try getting rid of the false positives on the analysis stage, so that the users never see them. To do this, we add exceptions to the diagnostic rules, i.e. special cases, when the analyzer shouldn't issue warnings at all. The number of these exceptions can vary greatly from the diagnostic to diagnostic: for some diagnostics we don't have to write exceptions at all and sometimes we may have dozens of such exceptions implemented.
Nevertheless, we aren't able to cover all the cases (sometimes they are too specific), that's why our second task is to allow our user exclude the false positives from the analysis themselves. PVS-Studio provides several mechanisms for this: suppression by comments, configuration files and suppression bases. There is a separate article devoted to this, so I won't go deep into details.

False positives and TensorFlow

Why have I started speaking about false positives in general? Firstly, because it is very important to fight against false positives, secondly, because of what I saw when I checked TensorFlow and filtered and output by the diagnostic rule V654 (the image is clickable).
Figure 2 - All the found warnings of V654 have the same pattern
Figure 2 - All the found warnings of V654 have the same pattern
64 warnings and all of them have the same pattern - the following code:
false && expr
In the code itself, these fragments look like this:
DCHECK(v);
DCHECK(stream != nullptr);
DCHECK(result != nullptr);
Here is how the macro DCHECK is declared:
#ifndef NDEBUG
....
#define DCHECK(condition) CHECK(condition)
....
#else
....
#define DCHECK(condition) \
  while (false && (condition)) LOG(FATAL)
....
#endif
What follows from this code? DCHECK - is a debugging macro. In the debug version it is expanded to the check of the condition (CHECK(condition)), in the release version - to a loop that will never be executed - while (false && ....). Since I was building a release version of the code, the macro expanded correspondingly (to the while loop). As a result, the analyzer seems to complain correctly - because the result of the expression is always false. But what's the point of these warnings, if they are issued for the code that was meant to be like this? So, the percentage of false positives for this diagnostic will be the same as on the diagram below.
Figure 3 - The ratio of good and false positives of diagnostics
Figure 3 - The ratio of good and false positives of diagnostics V654
You may have thought that this was a joke? No, we are not kidding, there are 100% of false positives. This is exactly what I was talking about. I also said that there are various ways to fight against them. By pressing 'Add selected messages to suppression base' we can correct this to the opposite direction (the image is clickable).
Figure 4 - Fighting against false positives
Figure 4 - Fighting against false positives
This is the way to suppress all current warnings by removing them from the output window. But it is not quite correct, because if you start using the DCHECK macro again when writing new code, you will get warnings again. There is a solution. We need to suppress the warning in the macro by leaving a special comment. Then the suppression code will be as follows:
//-V:DCHECK:654 
#define DCHECK(condition) \
  while (false && (condition)) LOG(FATAL)
The comment should be written in the same header file, where the macro is declared.
That's it, we may forget about DCHECK macro, because the V654 warning will not be issued for it anymore. As a result, we have successfully dealt with false positives. After these simple actions, the diagram of false positives for V654 will be as follows.
Figure 5 - We successfully eliminated false positives
Figure 5 - We successfully eliminated false positives
We see a completely different picture, as the percentage of false positives is 0. Quite an amusing arithmetic. Why did I start talking about false positives in general? I just wanted to explain that false positives are inevitable. The general aim of the analyzer is to decrease their number on the phase of the analysis, but you'll probably have to deal with them due to some peculiarities of the project. I hope I managed to convey that false alarms can be handled (and should be handled), and it is quite simple.

A couple more settings

Perhaps, you can't wait to take a look at the bugs we found, but please, be patient and read about couple more settings that will make the life easier during the work with the analysis results.

Warnings in automatically generated files

During the analysis we checked not only the code, which was written manually by the programmers, but the automatically generated. It won't be interesting for us to warning for such code, that's why we will exclude them from the analysis. 'Don't check files' settings are coming to aid here. Specifically for this project, I specified the following file names:
pywrap_*
*.pb.cc
This allowed hiding more than 100 warnings of general analysis (GA) of the medium certainty level.

Disabling specific diagnoses

One more analyzer setting that turned out to be very useful - disabling groups of diagnostic rules. Why can it be relevant? For example, there were about 70 warnings V730 (not all the class members are initialized in the constructor). These warnings really need reviewing, because they might signal about hard-to-detect bugs. Nevertheless, it may not be clear to a person, who is not much familiar with the code, if the uninitialized member will lead to problems or there is another tricky way of it further initialization. For an article, these errors aren't much interesting too. That's why, the developers should really review them and we won't focus on it here. Therefore, we have a goal - to disable a whole group of diagnostic rules. It can be easily done: in settings of the PVS-Studio plugin you should just uncheck the necessary diagnostic.
Figure 6 - Disabling irrelevant diagnoses
Figure 6 - Disabling irrelevant diagnoses
Disabling those diagnostic rules that are not relevant to your project, you simplify the further work with the output of the analyzer.

The analyzer warnings issued for the project

Well, now let's move on to the most intriguing part - those code fragments that the analyzer found suspicious.
Usually, I like starting with a classic error, that is made both in C# and C++ projects - an error a == a, which gets detected by V501 and V3001 diagnostic rules. But there are no such errors here! In general, the bugs detected in this project... is quite peculiar... So, here we go.
void ToGraphDef(const Graph* g, GraphDef* gdef, bool pretty) {
  ....
  gtl::InlinedVector<const Edge*, 4> inputs;
  ....
  for (const Edge* e : inputs) {
    const string srcname = NewName(e->src(), pretty);
    if (e == nullptr) {
      ndef->add_input("unknown");
    } else if (!e->src()->IsOp()) {
    } else if (e->IsControlEdge()) {
      ndef->add_input(strings::StrCat("^", srcname));
    } else if (e->src_output() == 0) {
      ndef->add_input(srcname);
    } else {
      ndef->add_input(strings::StrCat(srcname, ":", e->src_output()));
    }
  }
  ....
}
PVS-Studio warning: V595 The 'e' pointer was utilized before it was verified against nullptr. Check lines: 1044, 1045. function.cc 1044
In the loop we see that certain vector elements get iterated and depending on the value of the elements, certain actions are performed. The check e == nullptr presupposes that the pointer can be null. The thing is that we see the dereference of this pointer during the call of the function NewName: e->src(). The result of such an operation is undefined behavior, which may lead, inter alia, to the program crash.
But the code of TensorFlow is not that simple. The filling of this vector (inputs) happens earlier and looks as follows:
for (const Edge* e : n->in_edges()) {
  if (e->IsControlEdge()) {
    inputs.push_back(e);
  } else {
    if (inputs[e->dst_input()] == nullptr) {
      inputs[e->dst_input()] = e;
    } else {
      LOG(WARNING) << "Malformed graph node. multiple input edges: "
                   << n->DebugString();
    }
  }
}
Looking carefully at the code, you can understand that the null pointers will never be written to the inputs vector, because there will always be the null pointer dereference before adding the elements, besides that the check against nullptr is missing before the dereference of the pointer. Since the inputs vector won't contain null pointers, it turns out that the statement e == nullptr, that we spoke about before, will always be false.
Anyway, this code is really tricky and PVS-Studio found it really successfully. Let's move on.
Status MasterSession::StartStep(const BuildGraphOptions& opts, 
                                int64* count,
                                ReffedClientGraph** rcg, 
                                bool is_partial) {
  ....
  ReffedClientGraph* to_unref = nullptr;
  ....
  if (to_unref) to_unref->Unref();
  ....
}
PVS-Studio warning: V547 Expression 'to_unref' is always false. master_session.cc 1114
In the body of the method we see that a local variable to_unref is declared, being initialized by the nullptr value. Before the if statement, this pointer is not used in any way, its value doesn't get changed. Thus, the body of the if statement won't be executed, because the pointer remained null. Perhaps, this code was left after the refactoring. There is a chance that this pointer was to be used somewhere between the initialization and the check, but instead of it, the programmer used another one (mixed them up), but I didn't find similar names. Looks suspicious.
Let's go on.
struct LSTMBlockCellBprop ....
{
  ....
  void operator()(...., bool use_peephole, ....) {
  ....
  if (use_peephole) {
    cs_prev_grad.device(d) =
        cs_prev_grad +
        di * wci.reshape(p_shape).broadcast(p_broadcast_shape) +
        df * wcf.reshape(p_shape).broadcast(p_broadcast_shape);
  }

  if (use_peephole) {
    wci_grad.device(d) = 
      (di * cs_prev).sum(Eigen::array<int, 1>({0}));
    wcf_grad.device(d) = 
      (df * cs_prev).sum(Eigen::array<int, 1>({0}));
    wco_grad.device(d) = 
      (do_ * cs).sum(Eigen::array<int, 1>({0}));
  }
  ....
  }
};
PVS-Studio warning: V581 The conditional expressions of the 'if' operators situated alongside each other are identical. Check lines: 277, 284. lstm_ops.h 284
There are two conditional operators with an identical conditional statement, but between these statements, the expression (in this case the use_peephole parameter) doesn't get changed. Sometimes it may indicate a serious enough error, when a wrong statement was used in one of the cases, but in this case, judging by the context, we can say that the conditional statements were just duplicated. I think it's not a bug, but all the operations could be put in a single conditional statement.
One does not simply write an write and skip copy-paste errors.
struct CompressFlags {
  ....
  Format format;

  ....
  int quality = 95;

  ....
  bool progressive = false;

  ....
  bool optimize_jpeg_size = false;

  ....
  bool chroma_downsampling = true;

  ....
  int density_unit = 1;  
  int x_density = 300;
  int y_density = 300;

  ....
  StringPiece xmp_metadata;

  ....
  int stride = 0;
};

class EncodeJpegOp : public OpKernel {
  ....
  explicit EncodeJpegOp(OpKernelConstruction* context) :  
    OpKernel(context) { 
    ....
    OP_REQUIRES_OK(context, 
      context->GetAttr("quality", &flags_.quality));
    OP_REQUIRES(context, 
      0 <= flags_.quality && flags_.quality <= 100,
      errors::InvalidArgument("quality must be in [0,100], got ",
      flags_.quality));
    OP_REQUIRES_OK(context,
      context->GetAttr("progressive", 
                       &flags_.progressive));
    OP_REQUIRES_OK(context, 
      context->GetAttr("optimize_size", 
                       &flags_.optimize_jpeg_size));
    OP_REQUIRES_OK(context, 
      context->GetAttr("chroma_downsampling",         // <=
                       &flags_.chroma_downsampling));
    OP_REQUIRES_OK(context, 
      context->GetAttr("chroma_downsampling",         // <=
                       &flags_.chroma_downsampling));
    ....
  }
  ....  
  jpeg::CompressFlags flags_;
}
PVS-Studio warning: V760 Two identical blocks of text were found. The second block begins from line 58. encode_jpeg_op.cc 56
As you can see in the code, the programmer checks the values of the flags, read from the field flags_ in the constructor of the EncodeJpegOp class via the macros OP_REQUIRES_OK and OP_REQUIRES. However, in the last lines of the given fragment, the value of the same flag is checked for the constructor. It looks very much like copy-paste: the code was copied, but not edited.
The most interesting (and the hardest part) is to understand if the copy-paste redundant or something else was meant to be here. If the code is redundant, then there is nothing horrible, but the situation is completely different, if another code fragment was meant here, because we get a logical error here.
Having reviewed the body of the constructor, I haven't found the check of the stride field. Perhaps, in one of the cases, this very check was meant to be. On the other hand, the order of the fields in the constructor is similar with the order of field declaration in the structure CompressFlags. Thus, it's hard to say how this code should be fixed, we can only make assumptions. In any case, this code is worth paying attention to.
The analyzer also found several suspicious fragments related to the bit shifting. Let's have a look at them. I want to remind, that incorrect use of the shift operations leads to undefined behavior.
Picture 7
class InferenceContext {
  ....
  inline int64 Value(DimensionOrConstant d) const {
    return d.dim.IsSet() ? d.dim->value_ : d.val;
  }
  ....
}
REGISTER_OP("UnpackPath")
    .Input("path: int32")
    .Input("path_values: float")
    .Output("unpacked_path: float")
    .SetShapeFn([](InferenceContext* c) {
      ....
      int64 num_nodes = InferenceContext::kUnknownDim;
      if (c->ValueKnown(tree_depth)) {
        num_nodes = (1 << c->Value(tree_depth)) - 1;    // <=
      }
      ....
    })
....;
PVS-Studio warning: V629 Consider inspecting the '1 << c->Value(tree_depth)' expression. Bit shifting of the 32-bit value with a subsequent expansion to the 64-bit type. unpack_path_op.cc 55
The strangeness of this code is in the fact that the 32 and 64 bit values are mixed in the shift and assignment operations. The literal 1 is a 32-bit value, for which a left-side shift is performed. The result of the shift still has a 32-bit type, but is written to the 64-bit variable. It is suspicious, because we may get undefined behavior if the value returned by the Value method is more than 32.
Here is a quote from the standard: The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1 * 2^E2, reduced modulo one more than the maximum value representable in the result type. Otherwise, if E1 has a signed type and non-negative value, and E1*2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.
This code can be fixed by writing 1 as a 64-bit literal or doing the type extension via casting. More details on the shift operations can be found in the article "Wade not in unknown waters. Part three".
The extension through the casting was used in another fragment as well. Here is the code:
AlphaNum::AlphaNum(Hex hex) {
  ....
  uint64 value = hex.value;
  uint64 width = hex.spec;
  // We accomplish minimum width by OR'ing in 0x10000 to the user's  
  // value,
  // where 0x10000 is the smallest hex number that is as wide as the 
  // user
  // asked for.
  uint64 mask = ((static_cast<uint64>(1) << (width - 1) * 4)) | value;
  ....
}
PVS-Studio warning: V592 The expression was enclosed by parentheses twice: ((expression)). One pair of parentheses is unnecessary or misprint is present. strcat.cc 43
This code is actually correct, but the analyzer found it suspicious, having detected duplicate parentheses. The analyzer thinks in the following way: the double brackets do not affect the evaluation result, so perhaps one pair of brackets is placed not where it should be.
We cannot exclude that the brackets were probably meant to explicitly underline the precedence of evaluations and to avoid remembering the priorities of the operations '<<' and '*'. They aren't much necessary as they are in the wrong place anyway. I reckon that this evaluation order is correct (first we specify a shift value, and then do the shift itself), so we just have to put the brackets in the correct order, so that they do not confuse people.
 uint64 mask = (static_cast<uint64>(1) << ((width - 1) * 4)) | value;
Let's go on.
void Compute(OpKernelContext* context) override {
  ....
  int64 v = floor(in_x);
  ....
  v = ceil(in_x1);
  x_interp.end = ceil(in_x1);
  v = x_interp.end - 1;
  ....
}
PVS-Studio warning: V519 The 'v' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 172, 174. resize_area_op.cc 174
The v variable is assigned twice, but between these assignments, the value of this variable is not used in any way. Moreover, the variable x_interp.end is assigned with the same value of the variable that was written to the variable. Even if we omit the fact that the call of the ceil function is redundant here, as it is not critical (although...), the code looks strange: either it is weirdly written or it contains a tricky error.
What's next?
void Compute(OpKernelContext* context) override {
  ....
  int64 sparse_input_start;                                     // <=
  ....
  if (sparse_input) {
    num_total_features += GetNumSparseFeatures(
      sparse_input_indices, *it, &sparse_input_start);          // <=
  }
  if (num_total_features == 0) {
    LOG(WARNING) << "num total features is zero.";
    break;
  }
  if (rand_feature < input_spec_.dense_features_size()) {
    ....
  } else {
    ....
    const int32 sparse_index = sparse_input_start +             // <=
      rand_feature - input_spec_.dense_features_size();
    ....
  }
  ....
}
PVS-Studio warning: V614 Potentially uninitialized variable 'sparse_input_start' used. sample_inputs_op.cc 351
The suspicious thing about this code is that during the initialization of the sparse_index constant, a potentially uninitialized variable sparse_input_start can also be used. At the time of declaration, this variable is not initialized with any value, i.e. it contains some junk. Further on, in case the statement sparse_input is true, the address of the variable sparse_input_start is passed to the function GetNumSparseFeatures, where perhaps, the variable initialization takes place. Otherwise, if the body of this conditional operator isn't executed, sparse_input_start will remain uninitialized.
Of course, we can suppose that in case if sparse_input_start remains uninitialized, it won't get used, but it's too bold and unobvious, so better to set a standard value for the variable.

Is that all?

Well, yes and no. To be honest, I was hoping to find more defects and write an article in the style of the articles QtMonoUnreal Engine 4 and similar to them, but it didn't work. The project authors did a great job, there were not so many errors found. I was also hoping that the project would of a bigger size, but there were only 700 files checked in the chosen configuration, including the auto generated files.
Besides that, a lot of things were left outside the scope of this article, for example:
  • we reviewed only the warnings of the GA group;
  • we did not review the warnings of the 3 (Low) level of certainty;
  • the analyzer issued several dozens of V730 warnings, but it's hard to judge about their criticality, so it's up to developers to decide;
  • and many more.
Nevertheless, there was quite a number of interesting fragments found, which we reviewed in this article.

Summing up

TensorFlow turned out to be quite an interesting and high-quality project in terms of code, but, as we saw, not without flaws. At the same time PVS-Studio proved once more that it is able to find errors even in the code of well-known developers.
In conclusion, I cannot but compliment all the developers of TensorFlow for the qualitative code and wish them the best of luck in the future.
Picture 15

Thank you for the attention to those who got to the end of the article and don't forget to use PVS-Studio!