Software engineering (continued) as promised. I hope this helps everyone. If not feel free to ignore it.
So, software engineering is a process of matching desired outcomes to expected outcomes and then to actual outcomes. As I mentioned earlier, in order to do that software engineers live in a world of logic, trust and problem determination. The logic is that they breakdown the desired outcome into manageable chunks and then code the expected results (we call them acceptance criteria which are basically expected behaviours) as a series of highly logical instructions that have very predictable behaviour- eg. assignments of variables , if then statements based on conditions etc.
To do this software engineers have to trust that other components in the system do their job. So for example, if I code something in the python language, I have to explicitly trust that the python kernel and libraries behave exactly as they are expected to behave. That in turn requires that the c language libraries it uses behave predictably and that the operating system file management code works too. Which in turn requires us to believe that the c compiler correctly creates machine code that works with a particular processor. And so on and so on. Crucially this spills over into code created in the infrastructure world. And then the physical operation of hardware devices and networks.
Software engineers test that the code they write meets the acceptance criteria or behaviour. And then they start working out whats gone wrong based on the evidence they can see and crucially the balance of probabilities. So their testing process can miss things, especially outside the domain of their experience. Sometimes it needs user input to find things. Mostly they find defects with their own tests. Whatever, they act based on probabilities.
So if the code doesn’t do what they expect, they will investigate their own code first. And 99% of the time fix it there. Then they will begin to suspect other components based on balance of probabilities. How feasible is it that component x is not working as expected? They will try to reproduce a potential defect or unexpected behaviour in the suspect component and confront the team that develops or maintains that component. And so on…
I explain all this because it means that somethings in a system have a reasonably high probability of being a problem - in my case, especially my own code!!!
Other components can have a rapidly diminishing probability of being a problem based on how frequently it is used and sometimes based on how long and how well the team builds and tests that component. That probability approaches but never reaches zero for some things that are so well tested and used by 1000s of systems. But its never really zero as defects due to poor processes or design decisions can always emerge. The recent crowdstrike outage proves that. Windows allows the kernel to be updated by software in certain conditions. Hey presto that went wrong with an untested software upgrade.
I say all this because you mention a result you heard and attribute it to ‘identical accurate rips being the same but sounding different’. You were probably asked to compare that the files were in actual fact identical as the first step in problem determination. If you didn’t do this, the problem determination process ends there. And a software engineer will walk away and find something better to do.
This is because a software engineer implicitly trusts the lower level code such that when two files are identical at the bit level they will behave in an identical way in a closed system of logic. If they didn’t believe this, their world would come crashing down as in fact would every computer system in the world. It would be armageddon mixed with apocalypse now and we would be all bashing stones together to light fires!
So lets examine the ripping process again. The music was captured and converted into a digital file using the pcm format at 16bit 44khz. The software engineers who built the software to do this relied on software libraries that handle pcm files with predictable behaviours and therefore it was trusted. They also relied on and trusted the operating system libraries that copied the file from one hard drive to another. And crucially the libraries to read it back into memory. So far so good.
Then it goes wrong. The digital file is encoded onto a file storage medium called a cd that is nothing like as reliable as the low level file system code they trust when they read the file from a hard drive.
The process of recovering that data from the cd is not a trusted process. Its is inherently unreliable. So when the cd is ripped, it is quite unpredictable if the data is actually ripped accurately. By accurately I mean if you were to compare the pcm file you have created to the original pcm file used, it would be an identical match. There are factors that improve the chances of the rip being accurate and being an identical match to the original pcm file. A better optical drive or transport is one. Error detection code at the optical drive level and software that uses those features to check if the sectors might be bad and trigger actions to re read multiple times and take the most commonly recovered results etc. So different rips at different times and with different methods have the potential to be different and sound different.
But because we do not have access to the original pcm file we can’t actually compare the file to the original pcm file. So we have no way of validating the file and calling it an accurate rip. If the file is ripped by two different processes and drives and produces different files at the bit level it is definitely the case that one of the two (or both!) are inaccurate. But which is it? There is no way of knowing.
Ok enter accurate rip. This is a software engineering challenge. How do I without having access to the original pcm file, validate that it is correct and a match? Their solution was simple and brilliant. You crowdsource the results from multiple rips of a cd by different people and store it in a database. If 2 people with different optical drives and physically different cds rip the same version of a cd and it has the exact same checksums, the probability that they are both accurate rips just went up dramatically. If the results from 3 users in different parts of the world agree it goes up even higher. And so on. At say 20 identical files we have to be as close to being certain as it is possible to be in system engineering terms that we have a rip that is identical to the original pcm file. Unless there is a bug in the ripping software itself of course.
That is why I called it the gold standard software ripping solution. It means that if 100 people have ripped the remastered version of Dire Straits ‘Brother in Arms’ and they agree on the checksums we can be close to being certain that the file is identical to the original pcm file used to create the cd. That degree of probability is what I and some others who have used this software have been calling an accurate rip.
The byproduct of this simple and genius engineering solution is that the quality of the optical drive no longer matters much. If it gets a rip that matches what 100 different people got when they ripped the cd, my trust has gone from low to very very high. There are other great reasons as Dunc rightly mentions to get a good optical cd transport if you intend to continue to play the cds into a DAC. In my case the cds are gathering dust in the loft.
So my question is simple. Does the software bundled in the various ripping solutions you have been using, utilise the above accurate rip software engineering solution or something similar to handle the inherent unreliability of cds? I don’t know the answer to that question I am afraid. But I begin to suspect at least like me the earlier ones you used did not, especially if you are hearing differences between the rips.
So I hope that helps you and others when it comes to ripping. A rip becomes an ‘accurate rip’ when the result produced is matched to the result produced by lots of people. It is the best proxy we have to trust it matches the original pcm file.
It’s why I think some of us tried to say earlier this ripping game is actually really really really simple and really really cheap. You can get an accurate rip with a cheap optical drive, your existing pc and a few quid spend on a bit of software owned by dbpoweramp.
If the ripping solutions you have been using don’t use this technique to validate the result matches the original pcm, what makes a software engineer trust that the rip is actually a match to the original pcm file? Maybe they have a comparable genius software engineering solution? I don’t know. It’s a question you’ll have to ask them.
Anyway enough from me. For ripping, my recommendation is always going to be the accurate rip software mentioned above. It’s what I did years ago. I have moved on well beyond this and currently use Roon to seamlessly integrate my old cd collection, my recordings of vinyl done over the years and the qobuz streaming. And I love vinyl. I keep an open mind and always listen first but the software engineer in me puts me immediately into problem determination mode If the result is not as expected what is the probable cause? I will tend to not go deeper into tracks that have low probability until the evidence becomes incontrovertible
Above all keep enjoying your music and the hifi journey.
PS todays demo was educational as they usually are. To my ears, the upgrade in sound quality from the old Klimax casing and layout designed in I think about 2008 and this one is truly remarkable. I for one will be getting a home demo into my 5 series amps asap.