Alias Rhythmbox Rescan Collection / Remove Dupes?
from an old post https://lists.ubuntu.com/archives/ub...ry/063446.html
The problem that there were duplicates in the RhythmboxDB.
Reasons could be that:
- The same files were added from different locations
- The song files have been duplicated in the same folder but with different filenames
Here is a simple XSLT script to process the RhythnboxDB and remove the duplicates.
Warning: it is not intelligent, i.e.
- it will not try to keep the song record that has the most *hits" or that has extra information such as *ratings*
- it will only try to determine which song files are the same based on one criteria such as the FileSize, or the FileName (location)
- which means there is still a risk that two different song files have the same file size ...
Save the following code in a file named norhythmboxduplicates.xsl :
Code:
<!-- norhythmboxduplicates.xsl: remove duplicates in the Rhythmbox database -->
<!-- ~/.gnome2/rhythmbox/rhythmdb.xml -->
<!-- xsltproc norhythmboxduplicates.xsl rhythmdb.xml -o newrhythmboxdb.xml -->
<!-- xmlstarlet tr norhythmboxduplicates4.xsl rhythmdb.xml > newrhythmboxdb.xml -->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" />
<xsl:template match="rhythmdb/entry">
<xsl:if test="not( preceding-sibling::entry/file-size = file-size ) and not( preceding-sibling::entry/location = location )">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:if>
</xsl:template>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Use a XSLT processor on the Rhythmbox database ( ~/.gnome2/rhythmbox/rhythmdb.xml).
Typing the following will generate a new database without duplicates:
Code:
xsltproc norhythmboxduplicates.xsl rhythmdb.xml -o newrhythmboxdb.xml
Checking the results
You can use the following code to check own many entries (songs) there are in your database before you apply the script, and after:
Code:
<!-- Simple Count -->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>
<xsl:template match="rhythmdb">
<xsl:value-of select="count(entry)"/>
<xsl:text>
</xsl:text> <!-- note the Carriage Return character enclosed in the xsl:text -->
</xsl:template>
</xsl:stylesheet>
and name it rhythmcount.xsl .
To compare, run both :
Code:
xsltproc rhythmcount.xsl ~/.gnome2/rhythmbox/rhythmdb.xml
xsltproc rhythmcount.xsl newrhythmboxdb.xml
Using the new database
To use the new database without duplicates, simply rename the old one to a backup name and rename the new one to the default database name:
Code:
mv ~/.gnome2/rhythmbox/rhythmdb.xml ~/.gnome2/rhythmbox/rhythmdb.bak.xml
mv newrhythmboxdb.xml ~/.gnome2/rhythmbox/rhythmdb.xml
Future work:
To make the script better, we could:
- Make use of the mountpoint and location records in rhythmboxdb (in my case, most of the time, the file was the same, just the mount point was different because of symbolic linking, etc ...)
- Make use of *play-count* and *rating* to keep your favorite songs stats
- To Make use of an external program to build *check-sums* instead of just being based on *file size*
Bookmarks