Results 1 to 4 of 4

Thread: Removing duplicated songs in Rhythmbox database.

  1. #1
    Join Date
    Dec 2006
    Beans
    Hidden!
    Distro
    Ubuntu

    Thumbs down Removing duplicated songs in Rhythmbox database.

    Alias Rhythmbox Rescan Collection / Remove Dupes?
    from an old post https://lists.ubuntu.com/archives/ub...ry/063446.html

    The problem that there were duplicates in the RhythmboxDB.
    Reasons could be that:
    • The same files were added from different locations
    • The song files have been duplicated in the same folder but with different filenames


    Here is a simple XSLT script to process the RhythnboxDB and remove the duplicates.

    Warning: it is not intelligent, i.e.
    • it will not try to keep the song record that has the most *hits" or that has extra information such as *ratings*
    • it will only try to determine which song files are the same based on one criteria such as the FileSize, or the FileName (location)
    • which means there is still a risk that two different song files have the same file size ...


    Save the following code in a file named norhythmboxduplicates.xsl :
    Code:
    <!-- norhythmboxduplicates.xsl: remove duplicates in the Rhythmbox database -->
    <!-- ~/.gnome2/rhythmbox/rhythmdb.xml  -->
    <!-- xsltproc norhythmboxduplicates.xsl rhythmdb.xml -o newrhythmboxdb.xml  -->
    <!--  xmlstarlet tr norhythmboxduplicates4.xsl rhythmdb.xml > newrhythmboxdb.xml  -->
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
      <xsl:output method="xml" />
    
      <xsl:template match="rhythmdb/entry">
    	<xsl:if test="not( preceding-sibling::entry/file-size = file-size ) and not( preceding-sibling::entry/location = location )">
    		<xsl:copy>
    		        <xsl:apply-templates select="@*|node()"/>
    		</xsl:copy>
    	</xsl:if>
      </xsl:template>
    
      <xsl:template match="@*|node()">
        <xsl:copy>
          <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
      </xsl:template>
    
    </xsl:stylesheet>
    Use a XSLT processor on the Rhythmbox database ( ~/.gnome2/rhythmbox/rhythmdb.xml).

    Typing the following will generate a new database without duplicates:
    Code:
    xsltproc norhythmboxduplicates.xsl rhythmdb.xml -o newrhythmboxdb.xml
    Checking the results

    You can use the following code to check own many entries (songs) there are in your database before you apply the script, and after:
    Code:
    <!-- Simple Count -->
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
      <xsl:output method="text"/>
    
    <xsl:template match="rhythmdb">
      <xsl:value-of select="count(entry)"/>
       <xsl:text>
    </xsl:text> <!-- note the Carriage Return character enclosed in the xsl:text -->
    </xsl:template>
    </xsl:stylesheet>
    and name it rhythmcount.xsl .

    To compare, run both :

    Code:
    xsltproc rhythmcount.xsl ~/.gnome2/rhythmbox/rhythmdb.xml
    xsltproc rhythmcount.xsl newrhythmboxdb.xml

    Using the new database

    To use the new database without duplicates, simply rename the old one to a backup name and rename the new one to the default database name:

    Code:
    mv ~/.gnome2/rhythmbox/rhythmdb.xml ~/.gnome2/rhythmbox/rhythmdb.bak.xml
    mv newrhythmboxdb.xml ~/.gnome2/rhythmbox/rhythmdb.xml


    Future work:


    To make the script better, we could:
    • Make use of the mountpoint and location records in rhythmboxdb (in my case, most of the time, the file was the same, just the mount point was different because of symbolic linking, etc ...)
    • Make use of *play-count* and *rating* to keep your favorite songs stats
    • To Make use of an external program to build *check-sums* instead of just being based on *file size*

  2. #2
    Join Date
    Aug 2006
    Beans
    69

    Re: Removing duplicated songs in Rhythmbox database.

    thanks very much, excellent stuff!

  3. #3
    Join Date
    Mar 2008
    Beans
    41

    Re: Removing duplicated songs in Rhythmbox database.

    If someone still needs this, you can also try my rhythmbox plugin for deleting duplicates: http://ubuntuforums.org/showthread.php?t=1078839&page=4

  4. #4
    Join Date
    Jul 2009
    Beans
    5
    Distro
    Ubuntu 9.10 Karmic Koala

    Re: Removing duplicated songs in Rhythmbox database.

    Thank you! Great work

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •