Redundant Result Aborts by Project


Advanced search

Message boards : Number crunching : Redundant Result Aborts by Project

AuthorMessage
STE\/E
Send message
Joined: Oct 11 11
Posts: 3
Credit: 507,649
RAC: 0
Message 1 - Posted 12 Oct 2011 8:22:00 UTC

    Stop Aborting Redundant Results that are already started by a Host, I have many that have been aborted already by the Project, some with 2-3 Hr's of run time when the Project aborts them. This is a total waste of Host time if the Project is going to do this ...

    Pete Broad
    Send message
    Joined: Oct 12 11
    Posts: 1
    Credit: 183,999
    RAC: 0
    Message 4 - Posted 12 Oct 2011 12:09:13 UTC - in response to Message 1.

      Stop Aborting Redundant Results that are already started by a Host, I have many that have been aborted already by the Project, some with 2-3 Hr's of run time when the Project aborts them. This is a total waste of Host time if the Project is going to do this ...



      Yes, same here, very annoying.

      Pete

      Profile Oleg Zaikin [SAT@home]
      Forum moderator
      Project administrator
      Project developer
      Project scientist
      Send message
      Joined: Sep 15 11
      Posts: 133
      Credit: 4,826,453
      RAC: 0
      Message 5 - Posted 12 Oct 2011 12:31:51 UTC - in response to Message 1.

        Last modified: 12 Oct 2011 13:05:04 UTC

        Stop Aborting Redundant Results that are already started by a Host, I have many that have been aborted already by the Project, some with 2-3 Hr's of run time when the Project aborts them. This is a total waste of Host time if the Project is going to do this ...


        In fact aborted wu's are not redudant. SAT problems that are being solved in project has the following features:
        1. Every original CNF is SAT (satisfying set exists, we need to find it)
        2. Every WU is modified CNF (based on original one), SAT problem for every such WU is weaker than the original one.
        3. Only one CNF from WUs is SAT, the others are all UNSAT. If SAT answer was finded on any host, we don't need results of other WUs and becouse of it server abort them.
        I see that their is a problem about credits for aborted WUs. But if server will not abort them, we will need more time for solving every problem. First of all I can make WUs with smaller estimated run time (about 30 min., now it is about 2 hours), so wasting of time would be not so significant. Or server can delete only unsent WUs from DB, so every user will get their credits (I think it is better choice). Maybe you can suggest other decision.

        Profile Cruncher Pete TSBT
        Send message
        Joined: Oct 11 11
        Posts: 4
        Credit: 1,528,564
        RAC: 0
        Message 6 - Posted 12 Oct 2011 13:09:59 UTC - in response to Message 4.

          I have over 40 of those all dated today. If we are not going to get credit for WU's that have been worked on for hours something need to be done to fix this or there will be a lot of unhappy angry volunteers out there..

          Profile Oleg Zaikin [SAT@home]
          Forum moderator
          Project administrator
          Project developer
          Project scientist
          Send message
          Joined: Sep 15 11
          Posts: 133
          Credit: 4,826,453
          RAC: 0
          Message 7 - Posted 12 Oct 2011 13:20:40 UTC - in response to Message 6.

            Last modified: 12 Oct 2011 13:25:50 UTC

            I have over 40 of those all dated today. If we are not going to get credit for WU's that have been worked on for hours something need to be done to fix this or there will be a lot of unhappy angry volunteers out there..


            I understood that aborting of WUs is bad decision. It will be fixed in few days. Sorry for inconvenience.

            STE\/E
            Send message
            Joined: Oct 11 11
            Posts: 3
            Credit: 507,649
            RAC: 0
            Message 8 - Posted 12 Oct 2011 14:38:10 UTC

              Thanks for taking the time to address this problem ...
              ____________
              Steve*

              reklov
              Send message
              Joined: Oct 11 11
              Posts: 1
              Credit: 12,713
              RAC: 0
              Message 11 - Posted 12 Oct 2011 17:46:19 UTC - in response to Message 5.

                An improved solution (variant of your proposal 3):
                4) Or server can delete only unsent WUs from DB and abort wus which haven't started yet, so every user will get their credits (I think it is better choice).
                This option is IMHO available in BOINC - at least it's used by other projects.

                Profile Oleg Zaikin [SAT@home]
                Forum moderator
                Project administrator
                Project developer
                Project scientist
                Send message
                Joined: Sep 15 11
                Posts: 133
                Credit: 4,826,453
                RAC: 0
                Message 15 - Posted 13 Oct 2011 0:29:15 UTC - in response to Message 11.

                  An improved solution (variant of your proposal 3):
                  4) Or server can delete only unsent WUs from DB and abort wus which haven't started yet, so every user will get their credits (I think it is better choice).
                  This option is IMHO available in BOINC - at least it's used by other projects.


                  Yes, this solution better. Can you tell me which projects has this feature? I use DC-API to make server and client applications.

                  zombie67 [MM]
                  Avatar
                  Send message
                  Joined: Oct 10 11
                  Posts: 7
                  Credit: 1,099,438
                  RAC: 0
                  Message 19 - Posted 13 Oct 2011 3:41:59 UTC - in response to Message 15.

                    Yes, this solution better. Can you tell me which projects has this feature? I use DC-API to make server and client applications.


                    Thanks for changing your method of ending a SAT run!

                    The method used to end a run is very important to the volunteers.

                    1) If the answer has been found, and the remaining tasks are not necessary, then yes, ending the run is the right ting to do. There is no point in issuing new work that is not needed.

                    2) For tasks already issued to volunteer machines, there is a way to cancel *only* those who have not yet started crunching. This is the best way to do it. I know the project gains nothing by letting the "in-process" tasks complete. But the volunteers lose credits rightfully earned for processing time. And those credits cost the project nothing. So this is the right method to use.

                    3) You asked which projects use the method described in #2. All of them. Very few projects have cause to regularly end a set of tasks prematurely. But any time they do so, they use that method. If you need help to lean how to do so, just ask, either here on the forum or on the BOINC mail list.
                    ____________
                    Dublin, California
                    Team: SETI.USA

                    Profile Oleg Zaikin [SAT@home]
                    Forum moderator
                    Project administrator
                    Project developer
                    Project scientist
                    Send message
                    Joined: Sep 15 11
                    Posts: 133
                    Credit: 4,826,453
                    RAC: 0
                    Message 20 - Posted 13 Oct 2011 4:55:00 UTC - in response to Message 19.


                      1) If the answer has been found, and the remaining tasks are not necessary, then yes, ending the run is the right ting to do. There is no point in issuing new work that is not needed.

                      2) For tasks already issued to volunteer machines, there is a way to cancel *only* those who have not yet started crunching. This is the best way to do it. I know the project gains nothing by letting the "in-process" tasks complete. But the volunteers lose credits rightfully earned for processing time. And those credits cost the project nothing. So this is the right method to use.

                      3) You asked which projects use the method described in #2. All of them. Very few projects have cause to regularly end a set of tasks prematurely. But any time they do so, they use that method. If you need help to lean how to do so, just ask, either here on the forum or on the BOINC mail list.


                      I have a problem. There are several possible states of WUs in DC-API:
                      DC_WU_READY
                      DC_WU_RUNNING
                      DC_WU_FINISHED
                      DC_WU_SUSPENDED
                      DC_WU_ABORTED
                      DC_WU_UNKNOWN
                      and I don't even know how can I recognize unsent WUs using DC-API. I don't know how to recognize WUs that are on hosts but waiting their turns too. I asked developers of DC-API, now waiting for answer. Maybe somebody can help me with it?

                      STE\/E
                      Send message
                      Joined: Oct 11 11
                      Posts: 3
                      Credit: 507,649
                      RAC: 0
                      Message 21 - Posted 13 Oct 2011 8:21:09 UTC

                        Your still Aborting running Wu's, from looking at my Account more CPU Time gets Aborted than than what gets thru to finish. Many Wu's already this morning in the 10-30 Min Time Range Aborted by the Project ...
                        ____________
                        Steve*

                        Profile Oleg Zaikin [SAT@home]
                        Forum moderator
                        Project administrator
                        Project developer
                        Project scientist
                        Send message
                        Joined: Sep 15 11
                        Posts: 133
                        Credit: 4,826,453
                        RAC: 0
                        Message 22 - Posted 13 Oct 2011 8:49:40 UTC - in response to Message 21.

                          Last modified: 13 Oct 2011 8:51:13 UTC

                          Your still Aborting running Wu's, from looking at my Account more CPU Time gets Aborted than than what gets thru to finish. Many Wu's already this morning in the 10-30 Min Time Range Aborted by the Project ...


                          After finishing of current experiment (it will finish in about 12 hours) I will start new server application without aborting. Sorry again for troubles. Your team is on the top, congratulations!

                          Profile Oleg Zaikin [SAT@home]
                          Forum moderator
                          Project administrator
                          Project developer
                          Project scientist
                          Send message
                          Joined: Sep 15 11
                          Posts: 133
                          Credit: 4,826,453
                          RAC: 0
                          Message 25 - Posted 13 Oct 2011 15:59:37 UTC - in response to Message 21.

                            In new version of server application aborting of WUs was turned off.

                            zombie67 [MM]
                            Avatar
                            Send message
                            Joined: Oct 10 11
                            Posts: 7
                            Credit: 1,099,438
                            RAC: 0
                            Message 26 - Posted 13 Oct 2011 18:39:07 UTC

                              Thanks!

                              ____________
                              Dublin, California
                              Team: SETI.USA

                              Profile AL ADIM
                              Avatar
                              Send message
                              Joined: Oct 10 11
                              Posts: 1
                              Credit: 10,620
                              RAC: 0
                              Message 27 - Posted 13 Oct 2011 20:55:17 UTC

                                But what with all point's for Aborted Redundant Result's?
                                ____________

                                Profile Oleg Zaikin [SAT@home]
                                Forum moderator
                                Project administrator
                                Project developer
                                Project scientist
                                Send message
                                Joined: Sep 15 11
                                Posts: 133
                                Credit: 4,826,453
                                RAC: 0
                                Message 29 - Posted 14 Oct 2011 0:57:03 UTC - in response to Message 27.

                                  But what with all point's for Aborted Redundant Result's?


                                  Point's were not added automatically. If there is any way to add them manually?

                                  Dagorath
                                  Avatar
                                  Send message
                                  Joined: Oct 15 11
                                  Posts: 11
                                  Credit: 63,227
                                  RAC: 0
                                  Message 34 - Posted 15 Oct 2011 6:24:34 UTC - in response to Message 29.

                                    Last modified: 15 Oct 2011 6:36:02 UTC

                                    Oleg,

                                    The server setting to abort tasks on clients is the <send_result_abort>0|1</send_result_abort> setting. Here is what the BOINC Trac wiki says about that option.

                                    If set, and the client is processing a result for a WU that has been canceled or is not in the DB (i.e. there's no chance of getting credit), tell the client to abort the result regardless of state. If client is processing a result for a WU that has been assimilated or is overdue (i.e. there's a chance of not getting credit) tell the client to abort the result if it hasn't started yet. Note: this will increase the load on your DB server.


                                    I think the red not is a typo. I think the phrase should be (i.e. there's a chance of getting credit).

                                    What makes the difference between aborting *all* tasks on a client and aborting only tasks that haven't started is whether the WU has been canceled or whether the WU is assimilated. So, it sounds like you are canceling the unnecessary WUs when you should be assimilating the WUs. If you assimilate then the server will not cancel a WU if a client has started it, only WUs that haven't started will be canceled.

                                    Edit added:

                                    The BOINC Trac wiki page with the above information is here.

                                    Profile Oleg Zaikin [SAT@home]
                                    Forum moderator
                                    Project administrator
                                    Project developer
                                    Project scientist
                                    Send message
                                    Joined: Sep 15 11
                                    Posts: 133
                                    Credit: 4,826,453
                                    RAC: 0
                                    Message 35 - Posted 15 Oct 2011 15:09:46 UTC - in response to Message 34.

                                      Oleg,

                                      The server setting to abort tasks on clients is the <send_result_abort>0|1</send_result_abort> setting. Here is what the BOINC Trac wiki says about that option.

                                      If set, and the client is processing a result for a WU that has been canceled or is not in the DB (i.e. there's no chance of getting credit), tell the client to abort the result regardless of state. If client is processing a result for a WU that has been assimilated or is overdue (i.e. there's a chance of not getting credit) tell the client to abort the result if it hasn't started yet. Note: this will increase the load on your DB server.


                                      I think the red not is a typo. I think the phrase should be (i.e. there's a chance of getting credit).

                                      What makes the difference between aborting *all* tasks on a client and aborting only tasks that haven't started is whether the WU has been canceled or whether the WU is assimilated. So, it sounds like you are canceling the unnecessary WUs when you should be assimilating the WUs. If you assimilate then the server will not cancel a WU if a client has started it, only WUs that haven't started will be canceled.

                                      Edit added:

                                      The BOINC Trac wiki page with the above information is here.


                                      Thank you for interesting solution! I will try it.

                                      Profile Nflight
                                      Send message
                                      Joined: Oct 10 11
                                      Posts: 1
                                      Credit: 121,352
                                      RAC: 0
                                      Message 40 - Posted 25 Oct 2011 9:47:44 UTC - in response to Message 5.

                                        First of all I can make WUs with smaller estimated run time (about 30 min., now it is about 2 hours), so wasting of time would be not so significant. Maybe you can suggest other decision.


                                        Speaking of wasting time, I suddenly am seeing time till completion nearing 30 hours on two WUs. I currently have each WU at 3.9% with 30 hours to go on each, is this a new project or should I abort these two work units?

                                        Nflight
                                        Team AMDusers

                                        ____________

                                        Profile Oleg Zaikin [SAT@home]
                                        Forum moderator
                                        Project administrator
                                        Project developer
                                        Project scientist
                                        Send message
                                        Joined: Sep 15 11
                                        Posts: 133
                                        Credit: 4,826,453
                                        RAC: 0
                                        Message 41 - Posted 25 Oct 2011 17:16:00 UTC - in response to Message 40.

                                          First of all I can make WUs with smaller estimated run time (about 30 min., now it is about 2 hours), so wasting of time would be not so significant. Maybe you can suggest other decision.


                                          Speaking of wasting time, I suddenly am seeing time till completion nearing 30 hours on two WUs. I currently have each WU at 3.9% with 30 hours to go on each, is this a new project or should I abort these two work units?

                                          Nflight
                                          Team AMDusers


                                          There are some atypical WUs which need huge time. We will try to improve our solver for taking these atypical problems into account. If you will wait, these WUs will be processed (deadline for them is not soon) and you will takе many credits.

                                          quel
                                          Send message
                                          Joined: Nov 21 11
                                          Posts: 6
                                          Credit: 10,109,026
                                          RAC: 0
                                          Message 92 - Posted 27 Nov 2011 7:37:25 UTC - in response to Message 41.


                                            There are some atypical WUs which need huge time. We will try to improve our solver for taking these atypical problems into account. If you will wait, these WUs will be processed (deadline for them is not soon) and you will takе many credits.


                                            I'm not concerned about the run times or the credits but am curious about the science behind this. One of my older machines had just finished some 14 hour work units and I was getting ready to have it cease crunching this project *but* it then turned out a 30 minute WU.

                                            Is this a direct result of the computer generated decompositions?

                                            In my reading of this paper I got the impression that the run-time itself was a known quantity prior to execution of the decomposed SAT problem for the parallel distribution. I assume that I greatly oversimplified many things and my understanding requires much more reading so pointing me towards papers is much appreciated. (I have an IACR membership and need to renew my ACM membership but much of the springer LNCS is only available to me at $25USD/article+).

                                            Profile Oleg Zaikin [SAT@home]
                                            Forum moderator
                                            Project administrator
                                            Project developer
                                            Project scientist
                                            Send message
                                            Joined: Sep 15 11
                                            Posts: 133
                                            Credit: 4,826,453
                                            RAC: 0
                                            Message 96 - Posted 28 Nov 2011 13:58:24 UTC - in response to Message 92.

                                              Last modified: 29 Nov 2011 0:58:47 UTC


                                              There are some atypical WUs which need huge time. We will try to improve our solver for taking these atypical problems into account. If you will wait, these WUs will be processed (deadline for them is not soon) and you will takе many credits.


                                              I'm not concerned about the run times or the credits but am curious about the science behind this. One of my older machines had just finished some 14 hour work units and I was getting ready to have it cease crunching this project *but* it then turned out a 30 minute WU.

                                              Is this a direct result of the computer generated decompositions?

                                              In my reading of this paper I got the impression that the run-time itself was a known quantity prior to execution of the decomposed SAT problem for the parallel distribution. I assume that I greatly oversimplified many things and my understanding requires much more reading so pointing me towards papers is much appreciated. (I have an IACR membership and need to renew my ACM membership but much of the springer LNCS is only available to me at $25USD/article+).


                                              In each experiment for A5/1 we choose set of 31 variables (this set is described in paper you mentioned above on page # 12). Next step is to make 2^31 vectors (all variants of values of 31 variables) and solve 2^31 (> 2 billions) SAT problems. Estimation for solving each such problem is about 0.2 seconds, but in practice real times can differs much, because every problem has own additional data (2^31 known variables from 64 "core" variables). We plan to investigate it and improve our SAT solver for faster solving of such "atypical" problems.

                                              quel
                                              Send message
                                              Joined: Nov 21 11
                                              Posts: 6
                                              Credit: 10,109,026
                                              RAC: 0
                                              Message 97 - Posted 28 Nov 2011 18:12:52 UTC - in response to Message 96.

                                                Thanks for your response. I think the error I made was that I assumed the 0.2 second time was fixed instead of an average. It will be interesting to read your future findings including the atypical run times.

                                                Post to thread

                                                Message boards : Number crunching : Redundant Result Aborts by Project


                                                Home | My Account | Message Boards


                                                Copyright © 2019 Institute for System Dynamics and Control Theory of SB RAS and Institute for Information Transmission Problems of RAS