Metadata massaging

cpt_paranoia · Jul 28, 2018

I started the process of ripping my CDs in 2010, before I got an internet connection. In order to get track metadata, I used a file of track titles and durations I'd created on an ancient Sinclair QL, going back to the start of my CD buying. I wrote some linux scripts to take a file of filenames, and create a script to rename the raw 'Track x' titles produced by Exact Audio Copy, using the track titles. I then imported these into MediaMonkey, and used 'auto-tag from filename', using a suitable parsing template. When my new CDs did not appear in the old metadata file, I simply typed the info in while the disc as ripping; quicker than entering it into EAC before starting the rip (parallel processing, see...). It took about as long to type the data as it did to rip the disc.

That worked nicely.

Then I bought a cheap, secondhand scanner, and embarked on the process of scanning artwork at 300ppi (about 1440pixels), then using Picasa to straighten, clean, crop, contrast adjust, and save the artwork. I resized all the artwork to 600pixels. I kept all the artwork (raw scan, Picasa processed, and resized).

I wrote more scripts to automate the tidying away of riplogs, and scans, and to copy the 600pixel images to 'folder.jpg' and back_cover.jpg' for media managers to find. I'd rip a batch, scan & tidy the artwork, drop the artwork into folders, and run a batch processing script to do all the gruntwork.

That worked nicely.

Then I got an internet connection, and started using metadata sources, for both track details and artwork. Those sources have improved over the years, but I still check the tag information, and 'correct' it to my long-standing format. A new PC with a faster CD drive meant that I couldn't type the metadata before the disc had ripped; my parallel processing found a bottleneck: me. So the downloaded metadata took over.

But those 600 pixel images have started to look a bit low-resolution lately, in comparison with those I've downloaded.

So I decided it was time to upgrade the 600 pixel artwork on my CD collection to the full-size scans that I made originally (and then resized to 600 pixels). Since I had kept all the scanned, processed & resized artwork, this just needed me to write a script to do the copy.

But there was a complication; for some albums, I have downloaded better artwork, where my original covers were poor, or the CD cover had been mangled by the record company. So my script had to check to see if this had been done, and leave the replacement artwork in place.

After some considerable hacking of an Awk script and the syntax of the generated shell script, I got it working. For each album, the Awk script happens to generate 100 lines of shell script. So, with 2273 CDs to process, that's 227302 lines of shell script (shell type header line, and last \n).

After 22 minutes beavering away on the NAS, the 227k line script finished, chucking out only six unexpected boundary cases which I sorted out by hand.

Oh, and I first had to write a script to rename all the scan files that I foolishly used spaces in (unix, huh?), to allow the image replacement script to work... But that was a simple matter of using find, edit and paste. Scan processing script now modified so as not to insert spaces in the filenames.

Computers are wonderful things some times; I hate to think how long it would have taken to move all those files by hand...

On the other hand, without computers, I wouldn't be obsessing about the ripping, scanning, cleaning and massaging of metadata...

Metadata massaging

cpt_paranoia

Addicted Member

Similar threads