In yesterday’s column, I expressed my deep concerns about elements of Consumer Reports’ testing process. It was based on an article from AppleInsider. I eagerly awaited part two, hoping that there would be at least some commentary about the clear shortcomings in the way the magazine evaluates tech gear.
I also mentioned two apparent editorial glitches I noticed, in which product descriptions and recommendations contained incorrect information. These mistakes were obvious with just casual reading, not careful review. Clearly CR needs to beef up its editorial review process. A publication with its pretensions needs to demonstrate a higher level of accuracy.
Unfortunately, AppleInsider clearly didn’t catch the poor methodology used to evaluate speaker systems. As you recall, they use a small room, and crowd the tested units together without consideration of placement, or the impact of vibrations and reflections. The speakers should be separated, perhaps by a few feet, and the tests should be blind, so that the listeners aren’t prejudiced by the look or expectations for a particular model.
CR’s editors claim not to be influenced by appearance, but they are not immune to the effects of human psychology, and the factors that might cause them to give one product a better review than another. Consider, for example, the second part of a blind test, which is level matching. All things being equal, a system a tiny bit louder (a fraction of a dB) might seem to sound better.
I don’t need to explain why.
Also, I was shocked that CR’s speaker test panel usually consists of just two people with some sort of unspecified training so they “know” what loudspeakers should sound like. A third person is only brought in if there’s a tie. Indeed calling this a test panel, rather than a couple of testers or a test duo or trio, is downright misleading.
Besides, such a small sampling doesn’t consider the subjective nature of evaluating loudspeakers. People hear things differently, people have different expectations and preferences. All things being equal, even with blind tests and level matching, a sampling of two or three is still not large enough to get a consensus. A large enough listening panel, with enough participants to reveal a trend, might, but the lack of scientific controls from a magazine that touts accuracy and reliability is very troubling.
I realize AppleInsider’s reporters, though clearly concerned about the notebook tests, were probably untutored about the way the loudspeakers were evaluated, and the serious flaws that make the results essentially useless.
Sure, it’s very possible that the smart speakers from Google and Sonos are, in the end, superior to the HomePod. Maybe a proper test with a large enough listener panel and proper setup would reveal such a result. So far as I’m concerned, however, CR’s test process is essentially useless on any system other than those with extreme audio defects, such as excessive bass or treble
I also wonder just how large and well equipped the other testing departments are. Remember that magazine editorial departments are usually quite small. The consumer publications I wrote for had a handful of people on staff, and mostly relied on freelancers. Having a full-time staff is expensive. Remember that CR carries no ads. Income is mostly from magazine sales, plus the sale of extra publications and services, such as a car pricing service, and reader donations. In addition, CR requires a multimillion dollar budget to buy thousands of products at retail every year.
Sure, cars will be sold off after use, but even then there is a huge loss due to depreciation. Do they sell their used tech gear and appliances via eBay? Or donate to Goodwill?
Past the pathetic loudspeaker test process, we have their lame notebook battery tests. The excuse for why they turn off browser caching doesn’t wash. To provide an accurate picture of what sort of battery life consumers should expect under normal use, they should perform tests that don’t require activating obscure menus and/or features that only web developers might use.
After all, people who buy personal computers will very likely wonder why they aren’t getting the battery life CR achieved. They can’t! At the end of the day, Apple’s tests of MacBook and MacBook Pro battery life, as explained in the fine print at its site, are more representative of what you might achieve. No, not for everyone, but certainly if you follow the steps listed, which do represent reasonable, if not complete, use cases.
It’s unfortunate that CR has no competition. It’s the only consumer testing magazine in the U.S. that carries no ads, is run by a non-profit corporation, and buys all of the products it tests anonymously via regular retail channels. Its setup conveys the veneer of being incorruptible, and thus more accurate than the tests from other publications.
It does seem, from the AppleInsider story, that the magazine is sincere about its work, though perhaps somewhat full of itself. If it is truly honest about perfecting its testing processes, however, perhaps it should reach out to professionals in the industries that it covers and refine its methodology. How CR evaluates notebooks and speaker systems raises plenty of cause for concern.
AppleInsider got the motherlode. After several years of back and forth debates about its testing procedures, Consumer Reports magazine invited the online publication to tour their facilities in New York. On the surface, you’d think the editorial stuff would be putting on their best face to get favorable coverage.
And maybe they will. AppleInsider has only published the first part of the story, and there are apt to be far more revelations about CR’s test facilities and the potential shortcomings in the next part.
Now we all know about the concerns: CR finds problems, or potential problems, with Apple gear. Sometimes the story never changes, sometimes it does. But the entire test process may be a matter of concern.
Let’s take the recent review that pits Apple’s HomePod against a high-end Google Home Max, which sells for $400 and the Sonos One. In this comparison, “Overall the sound of the HomePod was a bit muddy compared with what the Sonos One and Google Home Max delivered.”
All right, CR is entitled to its preferences and its test procedures, but let's take a brief look at what AppleInsider reveals about them.
So we all know CR claims to have a test panel that listens to speakers set up in a special room that, from the front at least, comes across as a crowded audio dealer with loads of gear stacked up one against another. Is that the ideal setup for a speaker system that’s designed to adapt itself to a listening room?
Well, it appears that the vaunted CR tests are little better than what an ordinary subjective high-end audio magazine does, despite the pretensions. The listening room, for example, is small with a couch, and no indication of any special setup in terms of carpeting or wall treatment. Or is it meant to represent a typical listening room? Unfortunately, the article isn’t specific enough about such matters.
What is clear is that the speakers, the ones being tested and those used for reference, are placed in the open adjacent to one another. There’s no attempt to isolate the speakers to prevent unwanted reflections or vibrations.
Worse, no attempt is made to perform a blind test, so that a speaker’s brand name, appearance or other factors doesn’t influence a listener’s subjective opinion. For example, a large speaker may seem to sound better than a small one, but not necessarily because of its sonic character. The possibility of prejudice, even unconscious, against one speaker or another, is not considered.
But what about the listening panel? Are there dozens of people taking turns to give the speakers thorough tests? Not quite. The setup involves a chief speaker tester, one Elias Arias, and one other tester. In other words, the panel consists of just two people, a testing duo, supposedly specially trained as skilled listeners in an unspecified manner, with a third brought in in the event of a tie. But no amount of training can compensate for the lack of blind testing.
Wouldn’t it be illuminating if the winning speaker still won if you couldn’t identify it? More likely, the results might be very different. But CR often appears to live in a bubble.
Speakers are measured in a soundproof room (anechoic chamber). The results reveal a speaker’s raw potential, but it doesn’t provide data as to how it behaves in a normal listening room, where reflections will impact the sound that you hear. Experienced audio testers may also perform the same measurements in the actual listening location, so you can see how a real world set of numbers compares to what the listener actually hears.
That comparison with the ones from the anechoic chamber might also provide an indication how the listening area impacts those measurements.
Now none of this means that the HomePod would have seemed less “muddy” if the tests were done blind, or if the systems were isolated from one another to avoid sympathetic vibrations and other side effects. It might have sounded worse, the same, or the results might have been reversed. I also wonder if CR ever bothered to consult with actual loudspeaker designers, such as my old friend Bob Carver, to determine the most accurate testing methods.
It sure seems that CR comes up with peculiar ways to evaluate products. Consider tests of notebook computers, where they run web sites from a server in the default browser with cache off to test battery life. How does that approach possibly represent how people will use these notebooks in the real world?
At least CR claims to stay in touch with manufacturers during the test process, so they can be consulted in the event of a problem. That approach succeeded when a preliminary review of the 2016 MacBook Pro revealed inconsistent battery results. It was strictly the result of that outrageous test process.
So turning off caching in Safari’s usually hidden Develop menu revealed a subtle bug that Apple fixed with a software update. Suddenly a bad review become a very positive review.
Now I am not going to turn this article into a blanket condemnation of Consumer Reports. I hope there will be more details about testing schemes in the next part, so the flaws — and the potential benefits — will be revealed.
In passing, I do hope CR’s lapses are mostly in the tech arena. But I also know that their review of my low-end VW claimed the front bucket seats had poor side bolstering. That turned out to be totally untrue.
CR’s review of the VIZIO M55-E0 “home theater display” mislabeled the names of the setup menu’s features in its recommendations for optimal picture settings. It also claimed that no printed manual was supplied with the set; this is half true. You do receive two Quick Start Guides in multiple languages. In its favor, most of the picture settings actually deliver decent results.
You can bet that, when reviewing smartphones, Consumer Reports magazine appears to have a blind spot towards Samsung; maybe a few blind spots. How so? Well, I’ll get to that shortly.
Now on the surface, CR ought to be the perfect review source. Unlike most other publications, online or print, it actually buys tested products from retail stores. That includes luxury cars costing over $100,000 if need be. So, in that area at least, it should be incorruptible. Compare that to regular publications that contain reviews, most of which receive free samples from the manufacturers.
Indeed, when I announced recently that Vizio sent me a 4K TV for review — with no preconditions as to how I rate the product — I got a comment from a reader suggesting that my article would somehow be tainted. But I’ve been reviewing tech gear received on that basis for over two decades, and it’s definitely not a factor. Never has been.
But even if there’s a tiny bit of suspicion on the part of some people that product reviews might be slanted if those products are sent free of charge, I am not surprised that CR gets high credibility. So there’s a story from Seoul, South Korea touting the fact that, “Samsung’s Galaxy S8 tops U.S. consumer review.”
South Korea? But isn’t CR an American magazine? Yes, so this story no doubt originated from Samsung, even though a manufacturer is theoretically prohibited from quoting a CR review. So the article mentions the conclusion, not the contents, so even if it was originated from Samsung, the company is off the hook.
According to the latest CR report about smartphones, the Samsung Galaxy S8 and the Galaxy S8 Plus gained top ratings by CR. Number three, peculiarly, was last year’s Galaxy S7. Really. So where did the iPhone 8 end up? According to CR, fourth and fifth. Number six was the Galaxy Note 8.
I decided to take a look at the factors that put the iPhones below three Samsungs, including one of last year’s models. Let’s just say it didn’t make a whole lot of sense in the scheme of things, but I’ve had these issues before with CR.
Take, for example, the Galaxy S8 versus the iPhone 8. The former is rated 81, the latter is rated 80. So despite the implications of the article from that South Korean publication, the scores are extraordinary close. A minor issue here, another minor issue there, and the results might have been reversed.
But what is it that makes the Samsung ever-so-slightly superior to the iPhone? Unfortunately, the two reviews aren’t altogether clear on that score. So on the basis of 11 performance categories in which the two phones are rated, the iPhone 8 has six excellent ratings, four very goods, and one good. So in theory the Samsung should have scored better in these categories. However, it has four excellent and seven very goods.
From my point of view, the Apple ought to rate better. More excellent ratings, right? But there is a Good rating for battery life, whereas the Samsung rates as Excellent. Evidently that factor must supplant all other considerations and award the Samsung with a higher total. Curiously, the longer battery life of the iPhone 8 Plus evidently didn’t merit a rating higher than Good either.
But there’s more. It turns out that the iPhone is far more resilient to damage than the Galaxy S8. According to CR, the iPhone “survived the water dunk test and our tough 100 drops in the tumbler with just some minor scratches.”
Evidently, being a rugged mobile handset doesn’t count for very much, because the qualitative ratings don’t include that factor. So the Galaxy S8, according to CR, doesn’t fare nearly as well. The report states, “The screen is rather fragile. After 50 rotations in the tumbler, our experts rated it only fair. The display was badly broken and not working. For this phone, a protective case is a must have.”
What does that say to you? It says to me that the Galaxy S8 should have been seriously downgraded because it’s very fragile; users are forced to buy extra protection for normal use and service. Smartphones are routinely dropped or knocked against things.
To me, it’s barely acceptable. To CR, ruggedness doesn’t matter.
Nor does the reliability of a smartphone’s biometrics count, evidently. As most of you know, the Galaxy S8 and its big brother, the S8 Plus, have three biometric systems. The fingerprint sensor, located at the rear, is an awkward reach. You are at risk of smudging the camera lenses instead. Both the facial recognition and iris sensors aren’t terribly secure. Both can be defeated by digital photographs.
In short, you have a breakable smartphone with two biometric features of questionable quality being judged superior to another smartphone that’s rugged and has a reliable fingerprint sensor. But maybe it has somewhat shorter battery life than the competition. In other words, CR seems to regard battery life above other important factors, but how ratings are weighted, and why potential breakability is not considered, is just not mentioned.
But since CR buys the products it reviews, the serious flaws in its review methods aren’t important. The media that continues to quote the magazine’s ratings without critical comment aren’t helping to encourage CR to change its ways.
And please don’t get me started about the curious way in which it rates the battery life of notebook computers.
Gene Steinberg is a guest contributor to GCN news. His views and opinions, if expressed, are his own. Gene hosts The Tech Night Owl LIVE - broadcast on Saturday from 9:00pm - Midnight (CST), and The Paracast - broadcast on Sunday from 3:00am - 6:00am (CST). Both shows nationally syndicated through GCNlive. Gene’s Tech Night Owl Newsletter is a weekly information service of Making The Impossible, Inc. -- Copyright © 1999-2017. Click here to subscribe to Tech Night Owl Newsletter. This article was originally published at Technightowl.com -- reprinted with permission.