#35 new
Reinier de Lange

regular expression too big

Reported by Reinier de Lange | December 21st, 2010 @ 10:17 AM

This problem has come up several times in the Google group, but I had expected a ticket for this.

See also: http://groups.google.com/group/pickle-cucumber/browse_thread/thread...#

I encountered this problem today and I have replaced all occurrences of '#{capture_predicate}' with '(\w[\w ]+\w)' in pickle_steps.rb (line 64, line 73). This works fine for now.

regular expression too big: /^(\w[\w ]+\w) should (?:be|have) (?:an? )?((?:[_ ]accessible[_ ]attributes|table[_ ]name[_ ]suffix|invalid|respond[_ ]to[_ ]without[_ ]attributes|nil|marked[_ ]for[_ ]destruction|current|[_ ]protected[_ ]attributes|equal|changed[_ ]for[_ ]autosave|store[_ ]full[_ ]sti[_ ]class|eql|persisted|outdated|[_ ]active[_ ]authorizer|new[_ ]record|connection[_ ]handler|attribute[_ ]present|destroyed|include[_ ]root[_ ]in[_ ]json|duplicable|locking[_ ]enabled|[_ ]validators|readonly|respond[_ ]to|blank|acts[_ ]like|instance[_ ]variable[_ ]defined|present|tainted|friendly[_ ]id|unserializable[_ ]attribute|valid|instance[_ ]of|html[_ ]safe|attribute|unfriendly[_ ]id|kind[_ ]of|is[_ ]haml|is[_ ]a|frozen|[_ ]one[_ ]time[_ ]conditions[_ ]valid[_ ]4|changed|permitted[_ ]to|[_ ]one[_ ]time[_ ]conditions[_ ]valid[_ ]6|table[_ ]name[_ ]prefix|partial[_ ]updates|[_ ]one[_ ]time[_ ]conditions[_ ]valid[_ ]8|id|name|sluggable[_ ]id|sequence|sluggable[_ ]type|scope|created[_ ]at|[_ ]one[_ ]time[_ ]condit (RegexpError)

Comments and changes to this ticket

  • Peter

    Peter June 8th, 2011 @ 09:38 PM

    • Assigned user cleared.


    I was wondering if there was any progress or ideas on how to solve this in a more general way? It seems like the current method of expanding out all models and factories|blueprints is going to always have issues scaling to larger apps, and I'm currently hitting it so I was wondering if anyone had any ideas on how to approach reworking the steps before I dove in and tried some solutions?


  • Peter

    Peter June 8th, 2011 @ 09:38 PM

    • Assigned user set to “Ian White”
  • Ian White

    Ian White June 15th, 2011 @ 07:48 PM

    Hi Peter,

    I have some ideas, but haven't had much time to work on OSS stuff lately. I hope that changes soon. Hoping in fact to get some stuff done this weekend.

    The approach that is mostly done in the 0.5 beta (currently unstable, but on github) is to make pickle use a regexp that is general (not a massive disjunction) that you can add stuff to via config, and have it default to the current behaviour (introspect the app to figure out the massive regexps). The main problem I'm hitting is that a general regexp that captures a normal looking english expression is waaaay too general and will conflict with loads of other steps.

    I shall raise this on the mailing list, but I'm laning towards making the next pickle have some sort of delimiters like '[' ']' by default, this will solve the problem, for example:

    Given [a user] exists

    The stuff in the '[' ']' can just be passed straight to pickle without having the build up massive regexps etc.

    The problem is that it looks horrible, but I'm leaning toward being ok with that, because it will encourage users to instead write steps that use the pickle DSL.

    I'm still thinking about it though, and would love to hear any thoughts you have.


  • Peter

    Peter June 16th, 2011 @ 05:13 PM

    Hi Ian,

    I had started down that more generalized regexp approach, but quickly saw that it was ambiguous with way too many other steps. The only way I could see that could working is with delimiters like you suggest. I'm relatively new to cucumber, so I'm likely the wrong one to ask how people feel about. I definitely don't mind it, but it does take some of the natural language feel out of it and makes it harder for non-programmers to write the steps. I believe the BDD purists will be saddened by that. However, some of them are saddened by Pickle being a shortcut around things like signing a user up anyway, so perhaps if you've committed to using Pickle to simplify and speed up the tests then you're okay with that non-natural language too in those cases?

    I'd be a little leary about the part where you encourage users to write their own steps that use the Pickle DSL; what's likely to happen there is that people are going to come up with their own shorthand DSL to map to steps that use the Pickle DSL. I.e. they'll decide on their own a way to do 'Given a user: registered' or something to make the mapping to the Pickle steps consistent. At that point you haven't avoiding changing the Pickle DSL at all, you've just forced each user to make up their own new Pickle DSL. That could have power, but I'd worry in this case you'd loose people as they'd have no convention to follow for that. Better to just pick something, like using '[' and ']' so that all Pickle users are still on the same DSL.

    I thought I'd also share what I did short term to see if it sparks any bright ideas. Keep in mind that I have a small number of features currently that I'm working on expanding (hence my interest in using Pickle). I didn't want to rewrite the DSL and I couldn't use the generalized RegExp, but what I did notice was that I was using Pickle for a small subset of my models that were needed for overall tracking; most of the time I wanted to go through the user-facing UI so as to test the creation of objects, it was only with things like User and Program that I already had tests for their creation through the UI and now wanted to test scenarios farther down the flow. Ergo, what I ended up doing was adjusting the predicates (to /(\w[\1]+\1)/ per some other thred) and the factories to only be ones I listed, i.e. (in features/support/pickle.rb):

    valid_factories = [
    Pickle.config.factories = Pickle.config.factories.inject({}) do |h, (n, f)|
      valid_factories.include?(n) ? h.merge(n => f) : h

    And then I can create a program, reward and user. Now obvious the maintenance sucks as you have to add more, and if you truely need all your factories then you're right back where you started, but those factories are mainly for my model tests, in the features i generally just want a basic object. This isn't a long term solution, but it got me going while we're having this discussion.

    Along those lines, the only other long term solution I can think of is if there is someway to take the large RegExp that would exist, and pare it down into something smaller? Like there is probably a least common sort of RegExp that would serve the same purpose. For example, if you were only interested in User and you had a factory :activated_user, you would get the regexp something like /(activated(\s+|_)user|user)/ (rough approximation), but that could be something like /(\w+)user/ with the prefix passed through to the Pickle step; I haven't through it all the way through, but the basic idea is it might be possible to have the steps be:

    1. Generate RegExp as is now
    2. Compress RegExp
    3. On match, reconstruct what original match would be

    In addition, perhaps its okay to put some natural rules around the syntax but make it stricter? Like it has to be /(\w+)({list of models})/ and then all factories must end with their model name? Or something like "Given a user who is activated" or "Given a program which is activated" and then the rule is something like /({list of models})(?:(?:which|who) is (\w+))?/ where that last match is the factory name or part of it. Then you wouldn't need to compress, you'd just need to reconstruct on match to the expected factory name.

    At any rate, just wanted to throw that feedback and those ideas out there in case they helped spark any good ideas.


Please Sign in or create a free account to add a new ticket.

With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.

New-ticket Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป

Assign tickets to <b>Ian White</b>, so I get notified

People watching this ticket