DISQUS

CommaVee: Converting a CVS repository to SVN

  • Michael Haggerty · 2 years ago
    I am currently the main developer/maintainer of cvs2svn. I read your article with interest, and would like to discuss some of the points that you made.

    I do not know of any problems in cvs2svn that are caused by deep repositories or repositories with lots of branches. If you have found some, please let us know.

    You claim that cvs2svn cannot be restarted. This is partly correct. The individual passes of cvs2svn are self-contained and can be re-run. Thus if there is a problem in pass4, you don't have to restart the conversion at pass1. This can be a big benefit, particularly when dealing with tag/branch names that were used inconsistently.

    However, you are correct that pass1, in which the data are parsed out of CVS, cannot be restarted if there is an error.

    We try to give reasonable error messages (and sometimes workarounds) for the kinds of repository corruption that have been reported to us
    frequently. If you have found other common failure modes, please report them to our mailing list and we will look at them.

    Regarding symbol names with appended carriage returns:

    I've never seen this problem; thanks for pointing it out. If you would send an email to the users mailing list including a snippet of a CVS repository that shows this problem, I'd be happy to look at it. If you have any suggestions for how you think cvs2svn should work around this problem, please let us know.

    Regarding the --retain-conflicting-attic-files option:

    This option has been added to the trunk version of cvs2svn (and works, as far as I know), and that is why it is included in the online documentation. But the latest official release, 1.5.1, does not yet include this feature. I understand that this discrepancy can lead to confusion.
  • John Minnihan · 2 years ago
    Thanks for the comments, Michael.

    I am very interested in being able to restart Pass1 - is this feasible? For very large repositories (i.e. 15+Gb), a fatal error deep into the conversion is a huge time-waster if the process must be completely restarted.

    WRT issues converting large & complex repositories, my statement is based on anecdotal evidence & experience converting 15 different CVS repositories with cvs2svn.py. Of those, 13 have converted properly and error-free, but were smallish and non-complex. The other two have exhibited both the failures I have noted in my posting above, as well as what I am referring to as ...trouble with deep branch structures.

    That particular issue presented itself on a very large repository in a module that had numerous branch and release tags, both across the entire module and on single files. The error that was presented was
    ERROR: $CVSROOT/foo/Attic/bar.java,v is not a valid ,v file
    Exited due to fatal error(s).

    This was duplicated twice (two subsequent executions of the conversion) against a few files whose only noticeable difference from other ,v files was the depth of the symbol list. Once I moved those files out of the CVS repository, the conversion moved past them. This makes me conclude that this issue is related to the depth of the symbol list and is reproducible.

    This error case and the one involving the ^M at the end of a symbol name are what have prompted me to characterize the error flagging & reporting as poor with respect to use on large, complex repositories.  It took multiple executions of very large conversions to finally determine the root cause of these issues; had the error reporting been more meaningful, the issues with the ,v files could have been resolved after the first report.

    Thanks for the update about the --retain-conflicting-attic-files option; I will grab the version from SVN next time and try it.
  • Michael Haggerty · 2 years ago
    Your comments got me thinking about a resumable pass1. It wouldn't be terribly hard to implement. (No promises, though :-) )

    Regarding "depth" problems: I think a far more likely explanation is that a "deep" repository is likely to be an old one that has accumulated repository corruption over the years. But consider submitting your unprocessable *,v files to the user mailing list if you think they are not corrupt.

    Granted, it would be nice if we could provide a more specific error message in the case of a corrupt *,v file. I don't think that the parser that we use provides more information, but I'll double-check.

    For further discussion, the users@cvs2svn mailing list would be more appropriate so that other members of the cvs2svn community can participate.
  • Anjana Sen · 1 year ago
    I was doing cvs2svn conversion and came across the error "Error summary:
    ERROR: '/Development/Inc/Attic/.keepme,v' is not a
    valid ,v file
    Exited due to fatal error(s).".
    I am attaching the file here
    head ;
    access ;
    symbols ;
    locks ; strict;
    comment @# @;


    desc
    @@

    Please let me know what could be wrong in this file. Is the file corrupted? I would like to specify here that the corresponding file is not very deep in the reporitory. Please help.
  • John Minnihan · 1 year ago
    Hey Anjana,

    If that's the entire contents of .keepme,v file that is generating the error, then it is indeed corrupt.

    Immediately following the lines


    desc
    @@


    should be the contents of at least one revision to the file; it is entirely legal to have only one revision. For example, the entire contents of an unmodified file named verifymsg,v , which is present in the CVSROOT directory of most repositories, looks like this:


    head 1.1;
    access ;
    symbols ;
    locks ; strict;
    comment @# @;


    1.1
    date 2001.10.11.14.14.32; author somebody; state Exp;
    branches;
    next ;

    desc
    @@



    1.1
    log
    @initial checkin@
    text
    @# The "verifymsg" file is used to allow verification of logging
    # information. It works best when a template (as specified in the
    # rcsinfo file) is provided for the logging procedure. Given a
    # template with locations for, a bug-id number, a list of people who
    # reviewed the code before it can be checked in, and an external
    # process to catalog the differences that were code reviewed, the
    # following test can be applied to the code:
    #
    # Making sure that the entered bug-id number is correct.
    # Validating that the code that was reviewed is indeed the code being
    # checked in (using the bug-id number or a seperate review
    # number to identify this particular code set.).
    #
    # If any of the above test failed, then the commit would be aborted.
    #
    # Actions such as mailing a copy of the report to each reviewer are
    # better handled by an entry in the loginfo file.
    #
    # One thing that should be noted is the the ALL keyword is not
    # supported. There can be only one entry that matches a given
    # repository.
    @


    Note how this differs from your file.