Find the answer to your Linux question:
Results 1 to 4 of 4
Hi all, I've recently started working on a project that's requiring me to analyze large amounts of numerical data. I need to be able to fit my datasets to some ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Jan 2008
    Posts
    4

    Mathematics - Regression software for linux?


    Hi all,

    I've recently started working on a project that's requiring me to analyze large amounts of numerical data. I need to be able to fit my datasets to some fairly intricate (nonlinear) equations (polynomials, transcendentals .etc). I've been playing around with R (The R Project for Statistical Computing) but I'm not sure that it's exactly what I'm looking for. Regression seems to be a pretty small component of it and it really feels awkward to do anything that's much more than a polynomial in it (of course, it could just be that I don't fully know how to use it yet).

    Does anyone know of any software that's made to do data analysis like this? On Windows there are things like Sigmaplot or Igor pro (which really are not ideal either, but they seem closer than R...).

    Thanks in advance,

    -Ben

  2. #2
    Linux Newbie sdimhoff's Avatar
    Join Date
    Jan 2007
    Posts
    191
    Hi Ben!

    First of all, welcome to the forums.

    2.) How large are these datasets which you are talking about? Size does matter for this type of thing. Since R loads data into memory, you either have to be clever with your scripting or look to another option when single datasets get above 100,000 or so. (This is not a firm number, just an example of when things start to slow down.) If your datasets are very large then you may have to switch languages. Building regression functions in something like C or fortran are necessary when this happens (Don't worry, someone else has probably already made them.)

    3.) I use R daily and actually regression is a Major large part of the R picture. However, I fully realize that since R comes off as a strange jumble of S+, matlab-like utilities, perl, etc. it can be difficult to find things. Here is a link to what I think is some of the best R help around:

    Regression
    Other regressions

    If you just go to Index of /UNIX/48_R you can see all the topics. (you may want to start with Other regressions and skip to the non-linear fitting portion)

    Another big advantage of R is the easy analysis available once you've got yourself a fitted model.

    4.) If you post an example of the regression you need to accomplish I can probably help you out. (or at least try then point you in a new direction...)

    5.) Finally, if you do decide to use R and get confused about anything, the developers are amazingly helpful and you can just use their R-help mailing list. The other avid users/devs on that mailing list would be able to answer any R questions you could throw at them.
    Linux since: 2001
    Gentoo since: 2004
    - - - - - - - -
    Translation:
    I fix things until they break.

  3. #3
    Just Joined!
    Join Date
    Jan 2008
    Posts
    4
    Hey, thanks for all the info. I'll have to read over those references carefully.

    It's good to know that R is capable of regression. I mean, you would think that it would be, but it just seems awkward to get, for example, a quadratic regression by:

    quad_reg <- lm(data ~ X + I(X^2))

    Seems "unnatural," like I'm using lm for something it wasn't intended to do...

  4. #4
    Linux Newbie sdimhoff's Avatar
    Join Date
    Jan 2007
    Posts
    191
    You may be having trouble for a couple of simple reasons. First of all, it seems unnatural because you are using the incorrect function to do the model fitting.

    lm = linear model
    nls = nonlinear least squares

    There are actually several ways to skin this cat, but I think it would be best to just look to those two functions. If you can transform your function to be linear then you can use lm, and transform the fitted model back. If you can't do that, or just can't do that easily, then nls works great. Here would be an example of that quadratic:

    Code:
    > x = seq(from=1, to=10, by=.1)
    > somedata = 4*x + rnorm(91,mean=0,sd=20) + 10*(x)^2
    > plot(somedata ~ x)
    > fm1=nls(somedata ~ (A*x + B*x^2), start=c(A=1,B=1))
    > lines(x,predict(fm1))
    > summary(fm1)
    
    Formula: somedata ~ (A * x + B * x^2)
    
    Parameters:
      Estimate Std. Error t value Pr(>|t|)    
    A   5.9368     1.3015   4.561 1.62e-05 ***
    B   9.7332     0.1671  58.238  < 2e-16 ***
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
    
    Residual standard error: 18.82 on 89 degrees of freedom
    
    Number of iterations to convergence: 1 
    Achieved convergence tolerance: 3.124e-08
    As a side note, you can see that I've simulated data that would have some significant noise in it, but you can just substitute in whatever data you like.
    Linux since: 2001
    Gentoo since: 2004
    - - - - - - - -
    Translation:
    I fix things until they break.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •