Author: Haseeb Ahmad Basil
Viewers: 415
Last month viewers: 35
Categories: PHP Tutorials, PHP community
Those benefits include testing importing code, using code styles that make the code readable to others, avoiding needless code repetitions, and organizing the code to make it easy to find the parts used for different purposes.
Read this article to learn more about these good practices and use PHP tools to help you audit your PHP code so you know what you need to do to improve your PHP code quality.
In this article you will learn about:
1. Introduction
2. How More Code Analysis Tools Can Help
3. PHPUnit
4. PHPLOC
5. PHP_Codesniffer
6. PHP Copy/Paste Detector
7. PHP Mess Detector
8. Recommendations for Usage
9. Conclusion
1. Introduction
It has long been the goal of many people to make programming a thing of the past and allow non-developers to tell the computer what they want. At this point, the computer would create all the necessary code to solve their problems.
This has not yet materialized. This is fortunate because it means we can keep our coding jobs for the foreseeable future. Still, it is also unfortunate because software capable of performing such tasks could bring about many advances we have not considered.
However, I do not believe software like that is, in fact, a real possibility. For it to work for anything non-trivial, humans would still need to provide requirements to the computer in a clear, unambiguous manner.
The conditions must be syntactically correct so the computer can understand and interpret them correctly. We all know how difficult it is to get specific requirements for building software.
As humans, we are pretty good at parsing the requirements, but understanding them is tough. The conditions we receive are often open to interpretation and assumptions.
We usually do not even realize we have made assumptions until a bug or defect is opened, or a new requirement clarifies or explains what the client wanted.
Often, they do not know what they want until they see the code running and decide that whatever they see does not match what they wanted even if it does precisely what they asked.
2. How More Code Analysis Tools Can Help
So, for now, and for the foreseeable future, software is pretty bad at writing software. Even so, that does not mean we can not use it to gain insight into our code.
As I write this article, I see a running count of words at the bottom of the screen. Right now, it reads 370 words. While I could go through what I have written and figure out how many words it contains, the software I use can do it significantly faster than I hope to.
Code can find syntax errors more quickly than I can. But we can do even more than syntax checking and word counts with software.
We can use software to gain insight into our code base to let it find patterns, help determine ways to improve our code and our tests, identify areas that need better testing or refactoring, and more.
This month, we will look at some of these tools and see what insights they can provide, and we will learn how to integrate them so you are kept informed and working on improving our code.
3. PHPUnit
I will briefly discuss PHPUnit here. It is a testing framework that helps with ensuring that our code is doing what we expect and not doing what we do not expect.
Of course, in order to use this tool, we have a lot of work to do. We must write tests, whether they are at the unit level, integration, functional, or system tests. For now, I will state that PHPUnit is a tool that you should be using and leave it to other future articles to elaborate more on that,
4. PHPLOC
The PHPLOC (or PHP lines of code) is a tool that does not require any work on your part to use other than running it.
It will traverse your code base and give you insight into the makeup of your code. It does a lot more than check lines of code.
However, since it is written in PHP and knows about PHP, it will give you listings of how many and what types of classes, comments versus lines of code, and more.
Look at a sample output from PHPLOC from the project site:
$ php phploc.phar src phploc 7.0.0 by Sebastian Bergmann. Directories 3 Files 10 Size Lines of Code (LOC) 1882 Comment Lines of Code (CLOC) 255 (13.55%) Non-Comment Lines of Code (NCLOC) 1627 (86.45%) Logical Lines of Code (LLOC) 377 (20.03%) Classes 351 (93.10%) Average Class Length 35 Minimum Class Length 0 Maximum Class Length 172 Average Method Length 2 Minimum Method Length 1 Maximum Method Length 117 Functions 0 (0.00%) Average Function Length 0 Not in classes or functions 26 (6.90%) Cyclomatic Complexity Average Complexity per LLOC 0.49 Average Complexity per Class 19.60 Minimum Class Complexity 1.00 Maximum Class Complexity 139.00 Average Complexity per Method 2.43 Minimum Method Complexity 1.00 Maximum Method Complexity 96.00 Dependencies Global Accesses 0 Global Constants 0 (0.00%) Global Variables 0 (0.00%) Super-Global Variables 0 (0.00%) Attribute Accesses 85 Non-Static 85 (100.00%) Static 0 (0.00%) Method Calls 280 Non-Static 276 (98.57%) Static 4 (1.43%) Structure Namespaces 3 Interfaces 1 Traits 0 Classes 9 Abstract Classes 0 (0.00%) Concrete Classes 9 (100.00%) Methods 130 Scope Non-Static Methods 130 (100.00%) Static Methods 0 (0.00%) Visibility Public Methods 103 (79.23%) Non-Public Methods 27 (20.77%) Functions 0 Named Functions 0 (0.00%) Anonymous Functions 0 (0.00%) Constants 0 Global Constants 0 (0.00%) Class Constants 0 (0.00%)
As you can see, about 13% of the overall codebase is comments. The average length of a class is 35 lines of code, but the maximum size is 172.
It may be a good idea to look at and re-factor that is refactoring farther down. The average method length is just 2 lines of code, but the maximum length is 117. Again, this potentially points to a place that might warrant further investigation and simplification.
The following section is Cyclomatic Complexity. Cyclomatic complexity measures how many different linear independent paths through the code exist.
For example, a method with no loops and branches has a cyclomatic complexity 1. There are no branches. Hence, there is only one way through that code.
Each branch point in the code, including loops and case statements, increases the cyclomatic complexity number by one. For example, a method with two if statements would score a 3 for cyclomatic complexity, scoring one for the entrance to the technique and one more for each if statement.
Methods and functions with high cyclomatic complexity are harder to understand and test. They are spots where bugs find places to hide.
This means that I should probably look at my code and find ways to reduce the number of conditionals and loops for tested class methods, which has a cyclomatic complexity of 19.6. We will talk about other tools that also use this metric.
Next up is the section for dependencies. Currently, there are no global constants or global variables, and there are no places where my code is accessing superglobals.
If you used super-globals, you could refactor these to inject the values refactor instead of having those methods reach out into the Superglobals.
Removing these accesses will make the code easier to test and eliminate a potential source of bugs. Also in this section are the numbers and types of attribute accesses and method calls.
The structure section indicates how many namespaces, interfaces, and traits were found. It splits classes between concrete and abstract. Methods are broken down between public and nonpublic static and non-static. Additionally, functions and constants are shown.
PHPLOC provides interesting statistics relatively quickly on your codebase, but it is up to you to interpret and act on them.
-
PHPLOC: http://phpa.me/github-phploc
5. PHP_Codesniffer
If you are working on a code base with a team, being able to read and understand each class and section of code easily is essential.
A few years ago, convincing team members to follow a coding standard was a more significant challenge than today.
While there are still developers who rail against coding standards, most have embraced the reality that even if you do not entirely agree with all parts of any given coding standard, having a coding standard is better than not having one.
Many, if not most, Open Source PHP projects define and follow a coding standard. Using one defined by the PHP-FIG (Framework Interoperability Group) is common now.
PSR-13 and PSR-24 are the two. Several other coding standards are included with PHP_Codesniffer, such as PEAR and Zend standards.
PHP_Codesniffer allows you to use any defined standards or to use them while either adding in or removing certain sniffs.
You can also define your standard using the sniffs from any means. The software will run across your codebase and report any errors or warnings for code that does not conform to your chosen or defined coding standard.
This allows impartial software to let developers know when they have submitted code that is not compliant with the coding standards, even going so far as to reject pull requests or commits that are not up to standards (if you so choose).
Additionally, several tools will automatically reformat code to follow your chosen coding standard. I will not be covering these here, however.
Choosing a defined coding standard (recommended) or creating your own (not recommended) means that no matter what section of code a developer is working on, the resulting code should look the same, which lowers the barrier to making changes by allowing developers to focus on the code, not on how it is formatted.
6. PHP Copy and Paste Detector
The PHP Copy and Paste Detector finds copy and pasted code in your software. And it is more thoughtful about it than you might think.
First of all, to count, duplicate sections of code need to be longer than a certain threshold, so a bunch of small methods that are duplicated will not be reported as reproduced.
Secondly, even if the code has changed variable names, comments added, or reformatted, PHPCPD can find and report those duplicate sections.
It uses PHP to parse tokens to determine blocks of duplicated code, so formatting and variable names do not necessarily come into play.
The report for PHPCPD will indicate the files and line ranges for found duplicated code and a summary showing how much of the codebase is made up of copied and pasted code.
While it might be tempting to keep this tool reporting at 0% all the time, that may not be a good idea in all cases. In many situations, including reducing copy/pasted code, the solution is it depends.
Copy and pasted code can often be relieved by refactoring it into a standard method or class and calling it from the places requiring it.
However, ensuring that the duplicated code does the same thing for the same reason is essential. Dealing with duplicated code is often better than introducing an incorrect abstraction.
7. PHP Mess Detector
The PHP Mess Detector is another code analysis tool that can tell you about potential problems and issues.
This includes reporting on dead code (unreachable code), overcomplicated expressions (cyclomatic complexity), and other possible bugs.
The PHPMD tool has several different rulesets that can be turned on or off, each containing several other rules that it uses to detect these (potential) problem areas.
The first set of rules is about writing clean code. It will detect and report on functions and methods that use Boolean arguments. This is because a Boolean argument often represents a violation of the single responsibility principle (SRP). It is the "S" as in "SOLID", which means each of our methods should do one thing and do it well.
A Boolean argument may indicate two different responsibilities provided by one method. To fix this, the logic around the Boolean flag could be extracted into another form or class.
The "clean code" ruleset also looks for static access to other resources and uses of "else". Fixed access can be a problem when testing because it makes it challenging to introduce a test double.
According to the PHPMD docs, the "else" keyword is never needed; it recommends using a ternary when doing simple assignments.
I do not always follow all of those recommendations, but having the PHPMD report list these issues allows me to revisit code and potentially develop a better or more straightforward solution.
Next up are the unused code rules. These detect new private fields, unused local and private variables, and unused formal parameters.
I have found that the first three of these are always fixable and often indicate either a change in design or a refactor that left a little mess in the code base. Removing them is nearly always the right thing to do and should never result in broken code.
The only rule I have had any issue with is unused formal parameters. If the parameter is part of an interface you are implementing, then removing it requires a change to the interface, which would need the modification of other classes that implement the interface. This may not be possible.
If the method in question does not implement a plan from an interface, then removing the unused parameter would require finding all callers of the way and changing them.
Tools like PHPStorm make this easy, but if the unused parameter is not the last one in the list, it is imperative to update all callers to remove the parameter. If this is not done, it will introduce an error.
The third set of PHPMD rules concerns naming. These detect fields, parameters, and local variables that are too long or too short.
For instance, variable names like $id will be flagged as too short. More extended variables, over 20 characters, will also be flagged.
Additionally, the naming rules will find constants not defined in upper case. It will detect and flag PHP 4 style constructors. These constructors are not the __construct function. Those are the functions that match the name of the class. PHP 4 style constructors were deprecated in PHP 7.
Finally, it will flag "getters" for Boolean fields that use the word "get" instead of "is" or "has".
For example, a getter named "getValid" on a Boolean field will be flagged with the recommendation that the getter be called isValid() or hasValid().
Additionally, PHPMD includes a set of rules around design. It will flag code that uses exit(), eval(), or goto. It will fade classes with more than 15 child classes and flag classes that descend from more than six parent classes. Both of these indicate that there is likely an unbalanced inheritance hierarchy.
Finally, it will look at the coupling between objects. This means it will count up dependencies, method parameters that are objects, return types, and thrown exceptions.
If there are more than 13 total, the class will be flagged. This detection not only works on formally type-hinted parameters, but it also examines the doc block comments for @returns,
@throws, @param, and others. Each of these coupled classes indicates some other class that the class in question must know something about or that consumers of this class need to be aware of. Keeping this number low means it is easier to work with the code.
The next-to-last set of rules is the so-called Controversial Rules. They are primarily about coding style, and a PHP_codesniffer ruleset would cover many ensuring class names, property names, method names, parameters, and variables are defined in the camel case.
The final rule in this set is about accessing super global variables. Where PHPLOC will report a count of how many times Superglobal variables are accessed, PHPMD will give you a report of where all those places in the code are.
The final ruleset for phpmd contains the "Code Size Rules". These are the rules that, for me at least, lead to the best improvements in the code, but they also typically take the most effort to resolve.
The first is Cyclomatic Complexity. As mentioned in the PHPLOC section, this is how many paths are through the code. Methods with more than ten paths will be flagged, and PHPMD will count each if, else if, for, while, and case, along with 1 for entering into the method.
The following rule is for NPathComplexity. This is similar to cyclomatic complexity, but the number of unique paths through the code, not just linear independent bearings.
For this metric, each added conditional or loop can have a multiplier effect on the number of paths through the code. A score of 200 or higher will result in PHPMD flagging the code.
The excessive method length rule looks at the lines of code in a method as an indicator that a way may try too much and recommends refactoring into other helper methods and classes or removing copy/pasted code.
Similarly, the excessive class length rule looks at the lines of code in the entire class, again indicating the course might be doing too much.
The excessive parameter count rule flags methods that have more than ten parameters. It may indicate that a new object should be created to group like parameters.
The excessive public methods rule flags classes that define more than 45 general methods. It indicates that much effort may be needed to test the course thoroughly.
The recommendation is to break the class into smaller classes that each do less. This rule will also flag types containing too many methods (public or otherwise) or properties.
More than 15 properties or 25 methods may indicate a class that could be reduced in complexity. The public and total method rules will ignore getters and setters by default.
Finally, the excessive class complexity rule totals up all the complexity metrics of the various methods in a class. It indicates the relative amount of time and effort needed to maintain or modify the style.
Configuring PHPMD to use some or all of the rules across any of the rulesets is simple. If a configured rule does not make sense for your code base or you disagree with it, you can disable it.
PHPMD provides a great way to detect potential problem areas in your code automatically.
8. Recommendations for Usage
While running all of these tools on your development machine is possible, chances are that doing so will quickly become tedious, and you will decide to stop running them.
Instead, I recommend setting up a "build" in Jenkins or another continuous integration server (think TravisCI, Bamboo, or others).
These tools can be configured to run whenever new source code is checked in, whether merged into the mainline or when a pull request is submitted.
The CI tools can keep track of the various statistics and reports between one run and the next and produce charts and graphs that allow for easy viewing of trends and changes.
The build job can be configured to ensure that statistics that indicate problems are increasing will cause a build to fail, indicating that the code should not be merged or accepted until the issues introduced are resolved.
The CI server can report back build status via email, Slack, IRC, and other ways to inform you or other developers of the results of running these tools.
It is then up to you to determine how much effort and when you want to maintain the code to reduce the errors and warnings presented by these code analysis tools.
9. Conclusion
I would recommend looking at these tools, trying them out, and looking at other devices mentioned on the PHP QA Tools page: .
In addition to providing installation and configuration for the tools I have discussed, this page includes information about getting these tools (and more) to work in Jenkins.
If you are already running a CI server and do not have these tools, consider installing them and integrating the reports into your build. Use the words to increase the quality and maintainability of your code.
Finally, one tool we did not get into much is PHPUnit. With PHPUnit, it is essential, but not always easy, to write practical tests that help ensure the code is doing the right thing.
You need to be a registered user or login to post a comment
Login Immediately with your account on:
Comments:
No comments were submitted yet.