Results 1 to 4 of 4
Hi all,
I've recently started working on a project that's requiring me to analyze large amounts of numerical data. I need to be able to fit my datasets to some ...
Enjoy an ad free experience by logging in. Not a member yet? Register.

 01252008 #1
 Join Date
 Jan 2008
 Posts
 4
Mathematics  Regression software for linux?
I've recently started working on a project that's requiring me to analyze large amounts of numerical data. I need to be able to fit my datasets to some fairly intricate (nonlinear) equations (polynomials, transcendentals .etc). I've been playing around with R (The R Project for Statistical Computing) but I'm not sure that it's exactly what I'm looking for. Regression seems to be a pretty small component of it and it really feels awkward to do anything that's much more than a polynomial in it (of course, it could just be that I don't fully know how to use it yet).
Does anyone know of any software that's made to do data analysis like this? On Windows there are things like Sigmaplot or Igor pro (which really are not ideal either, but they seem closer than R...).
Thanks in advance,
Ben
 01252008 #2
Hi Ben!
First of all, welcome to the forums.
2.) How large are these datasets which you are talking about? Size does matter for this type of thing. Since R loads data into memory, you either have to be clever with your scripting or look to another option when single datasets get above 100,000 or so. (This is not a firm number, just an example of when things start to slow down.) If your datasets are very large then you may have to switch languages. Building regression functions in something like C or fortran are necessary when this happens (Don't worry, someone else has probably already made them.)
3.) I use R daily and actually regression is a Major large part of the R picture. However, I fully realize that since R comes off as a strange jumble of S+, matlablike utilities, perl, etc. it can be difficult to find things. Here is a link to what I think is some of the best R help around:
Regression
Other regressions
If you just go to Index of /UNIX/48_R you can see all the topics. (you may want to start with Other regressions and skip to the nonlinear fitting portion)
Another big advantage of R is the easy analysis available once you've got yourself a fitted model.
4.) If you post an example of the regression you need to accomplish I can probably help you out. (or at least try then point you in a new direction...)
5.) Finally, if you do decide to use R and get confused about anything, the developers are amazingly helpful and you can just use their Rhelp mailing list. The other avid users/devs on that mailing list would be able to answer any R questions you could throw at them.Linux since: 2001
Gentoo since: 2004
       
Translation:
I fix things until they break.
 01292008 #3
 Join Date
 Jan 2008
 Posts
 4
Hey, thanks for all the info. I'll have to read over those references carefully.
It's good to know that R is capable of regression. I mean, you would think that it would be, but it just seems awkward to get, for example, a quadratic regression by:
quad_reg < lm(data ~ X + I(X^2))
Seems "unnatural," like I'm using lm for something it wasn't intended to do...
 01292008 #4
You may be having trouble for a couple of simple reasons. First of all, it seems unnatural because you are using the incorrect function to do the model fitting.
lm = linear model
nls = nonlinear least squares
There are actually several ways to skin this cat, but I think it would be best to just look to those two functions. If you can transform your function to be linear then you can use lm, and transform the fitted model back. If you can't do that, or just can't do that easily, then nls works great. Here would be an example of that quadratic:
Code:> x = seq(from=1, to=10, by=.1) > somedata = 4*x + rnorm(91,mean=0,sd=20) + 10*(x)^2 > plot(somedata ~ x) > fm1=nls(somedata ~ (A*x + B*x^2), start=c(A=1,B=1)) > lines(x,predict(fm1)) > summary(fm1) Formula: somedata ~ (A * x + B * x^2) Parameters: Estimate Std. Error t value Pr(>t) A 5.9368 1.3015 4.561 1.62e05 *** B 9.7332 0.1671 58.238 < 2e16 ***  Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 18.82 on 89 degrees of freedom Number of iterations to convergence: 1 Achieved convergence tolerance: 3.124e08
Linux since: 2001
Gentoo since: 2004
       
Translation:
I fix things until they break.