Khác biệt giữa bản sửa đổi của “Suy giảm độ dốc”

Nội dung được xóa Nội dung được thêm vào

Nội tuyến

Phiên bản lúc 12:18, ngày 4 tháng 2 năm 2021

Thuật toán suy giảm độ dốc (còn gọi là giảm độ dốc, tiếng Anh: Gradient descent) là một thuật toán tối ưu hóa lặp bậc nhất để tìm một cực trị của một hàm khả vi. Để tìm cực tiểu cục bộ của một hàm sử dụng suy giảm độ dốc, người ta có thể thực hiện các bước tỷ lệ thuận với âm của gradien (hoặc độ dốc xấp xỉ) của hàm tại điểm hiện tại. Nhưng nếu thực hiện các bước tương ứng với dương của gradien thì tiếp cận được một cực đại cục bộ của hàm số đó; phương pháp này được gọi là tăng độ dốc (gradient ascent). Nói chung, Augustin-Louis Cauchy được ghi công là người gợi ý về vấn đề suy giảm độ dốc vào năm 1847,^[1] nhưng các tính chất hội tụ của nó cho các bài toán tối ưu hóa phi tuyến tính lần đầu tiên được nghiên cứu bởi Haskell Curry năm 1944.^[2]

Mô tả

Độ dốc dựa trên quan sát nếu hàm đa biến $F(\mathbf {x} )$ là chưa xác định và khả vi trong một vùng lân cận của một điểm $\mathbf {a}$ , sau đó $F(\mathbf {x} )$ giảm nhanh nhất nếu đi từ $\mathbf {a}$ theo hướng của độ dốc âm của $F$ tại $\mathbf {a} ,-\nabla F(\mathbf {a} )$ . Nó theo sau nếu

\mathbf {a} _{n+1}=\mathbf {a} _{n}-\gamma \nabla F(\mathbf {a} _{n})

đối với $\gamma \in \mathbb {R} _{+}$ đủ nhỏ, sau đó $F(\mathbf {a_{n}} )\geq F(\mathbf {a_{n+1}} )$ . Theo cách nói khác, thuật ngữ $\gamma \nabla F(\mathbf {a} )$ được trừ khỏi $\mathbf {a}$ ởi vì chúng ta muốn di chuyển ngược lại độ dốc, hướng đến cực tiểu cục bộ. Với quan sát này, người ta bắt đầu với một phỏng đoán $\mathbf {x} _{0}$ đối với một cực tiểu cục bộ của $F$ , và xem chuỗi $\mathbf {x} _{0},\mathbf {x} _{1},\mathbf {x} _{2},\ldots$ là

\mathbf {x} _{n+1}=\mathbf {x} _{n}-\gamma _{n}\nabla F(\mathbf {x} _{n}),\ n\geq 0.

Khi đó, chúng ta có một chuỗi hàm số đơn điệu

F(\mathbf {x} _{0})\geq F(\mathbf {x} _{1})\geq F(\mathbf {x} _{2})\geq \cdots ,

so hopefully the sequence $(\mathbf {x} _{n})$ converges to the desired local minimum. Note that the value of the step size $\gamma$ is allowed to change at every iteration. With certain assumptions on the function $F$ (for example, $F$ Convex function and $\nabla F$ Lipschitz continuity) and particular choices of $\gamma$ (e.g., chosen either via a Line search that satisfies the Wolfe conditions, or the Barzilai–Borwein method^[3]^[4] shown as following),

\gamma _{n}={\frac {\left|\left(\mathbf {x} _{n}-\mathbf {x} _{n-1}\right)^{T}\left[\nabla F(\mathbf {x} _{n})-\nabla F(\mathbf {x} _{n-1})\right]\right|}{\left\|\nabla F(\mathbf {x} _{n})-\nabla F(\mathbf {x} _{n-1})\right\|^{2}}}

Chuỗi hội tụ to a local minimum can be guaranteed. When the function $F$ is Convex function, all local minima are also global minima, so in this case gradient descent can converge to the global solution.

This process is illustrated in the adjacent picture. Here $F$ is assumed to be defined on the plane, and that its graph has a Bát ăn shape. The blue curves are the Đường đồng mứcs, that is, the regions on which the value of $F$ is constant. A red arrow originating at a point shows the direction of the negative gradient at that point. Note that the (negative) gradient at a point is Orthogonality to the contour line going through that point. We see that gradient descent leads us to the bottom of the bowl, that is, to the point where the value of the function $F$ is minimal.

Xem thêm

Tham khảo

^ Lemaréchal, C. (2012). “Cauchy and the Gradient Method” (PDF). Doc Math Extra: 251–254.
^ Curry, Haskell B. (1944). “The Method of Steepest Descent for Non-linear Minimization Problems”. Quart. Appl. Math. 2 (3): 258–261. doi:10.1090/qam/10667.
^ Barzilai, Jonathan; Borwein, Jonathan M. (1988). “Two-Point Step Size Gradient Methods”. IMA Journal of Numerical Analysis. 8 (1): 141–148. doi:10.1093/imanum/8.1.141.
^ Fletcher, R. (2005). “On the Barzilai–Borwein Method”. Trong Qi, L.; Teo, K.; Yang, X. (biên tập). Optimization and Control with Applications. Applied Optimization. 96. Boston: Springer. tr. 235–256. ISBN 0-387-24254-6.

Đọc thêm

Boyd, Stephen; Vandenberghe, Lieven (2004). “Unconstrained Minimization”. Convex Optimization. New York: Cambridge University Press. tr. 457–520. ISBN 0-521-83378-7. Đã bỏ qua tham số không rõ |chapterurl= (trợ giúp)

Liên kết ngoài

Using gradient descent in C++, Boost, Ublas for linear regression
Series of Khan Academy videos discusses gradient ascent
Online book teaching gradient descent in deep neural network context
“Gradient Descent, How Neural Networks Learn”. 3Blue1Brown. ngày 16 tháng 10 năm 2017 – qua YouTube.

Bài viết này vẫn còn sơ khai. Bạn có thể giúp Wikipedia mở rộng nội dung để bài được hoàn chỉnh hơn.

[1] Lemaréchal, C. (2012). “Cauchy and the Gradient Method” (PDF). Doc Math Extra: 251–254.

[2] Curry, Haskell B. (1944). “The Method of Steepest Descent for Non-linear Minimization Problems”. Quart. Appl. Math. 2 (3): 258–261. doi:10.1090/qam/10667.

[3] Barzilai, Jonathan; Borwein, Jonathan M. (1988). “Two-Point Step Size Gradient Methods”. IMA Journal of Numerical Analysis. 8 (1): 141–148. doi:10.1093/imanum/8.1.141.

[4] Fletcher, R. (2005). “On the Barzilai–Borwein Method”. Trong Qi, L.; Teo, K.; Yang, X. (biên tập). Optimization and Control with Applications. Applied Optimization. 96. Boston: Springer. tr. 235–256. ISBN 0-387-24254-6.

[1]

[2]

[3]

[4]

@@ Dòng 3: / Dòng 3: @@
 '''Thuật toán suy giảm độ dốc''' (còn gọi là '''giảm độ dốc''', [[tiếng Anh]]: '''Gradient descent''') là một thuật toán [[Tối ưu hóa (toán học)|tối ưu hóa]] [[thuật toán lặp|lặp]] [[:Category:phương pháp bậc nhất|bậc nhất]] để tìm một [[Cực trị của hàm số|cực trị]] của một hàm khả vi. Để tìm cực tiểu cục bộ của một hàm sử dụng suy giảm độ dốc, người ta có thể thực hiện các bước tỷ lệ thuận với '''âm''' của [[gradien]] (hoặc độ dốc xấp xỉ) của hàm tại điểm hiện tại. Nhưng nếu thực hiện các bước tương ứng với '''dương''' của gradien thì tiếp cận được một [[Cực trị của hàm số|cực đại cục bộ]] của hàm số đó; phương pháp này được gọi là '''tăng độ dốc''' (gradient ascent). Nói chung, [[Augustin-Louis Cauchy]] được ghi công là người gợi ý về vấn đề suy giảm độ dốc vào năm 1847,<ref>{{cite journal |first=C. |last=Lemaréchal |authorlink=Claude Lemaréchal |title=Cauchy and the Gradient Method |journal=Doc Math Extra |pages=251–254 |year=2012 |url=https://www.math.uni-bielefeld.de/documenta/vol-ismp/40_lemarechal-claude.pdf }}</ref> nhưng các tính chất hội tụ của nó cho các bài toán tối ưu hóa phi tuyến tính lần đầu tiên được nghiên cứu bởi [[Haskell Curry]] năm 1944.<ref>{{cite journal |first=Haskell B. |last=Curry |title=The Method of Steepest Descent for Non-linear Minimization Problems |journal=Quart. Appl. Math. |volume=2 |year=1944 |issue=3 |pages=258–261 |doi=10.1090/qam/10667 |doi-access=free }}</ref>
+== Mô tả ==
+{{đang viết}}
+[[File:Gradient descent.svg|thumb|350px|Hình minh họa độ dốc trên một loạt [[tập mức]]]]
+Độ dốc dựa trên quan sát nếu  [[hàm đa biến]] <math>F(\mathbf{x})</math> là [[Chưa xác định (toán học)|chưa xác định]] và [[hàm số khả vi|khả vi]] trong một vùng lân cận của một điểm <math>\mathbf{a}</math>, sau đó <math>F(\mathbf{x})</math> giảm ''nhanh nhất'' nếu đi từ <math>\mathbf{a}</math> theo hướng của độ dốc âm của <math>F</math> tại <math>\mathbf{a}, -\nabla F(\mathbf{a})</math>. Nó theo sau nếu
+:<math> \mathbf{a}_{n+1} = \mathbf{a}_n-\gamma\nabla F(\mathbf{a}_n)</math>
+đối với <math>\gamma \in \R_{+}</math> đủ nhỏ, sau đó <math>F(\mathbf{a_n})\geq F(\mathbf{a_{n+1}})</math>. Theo cách nói khác, thuật ngữ <math>\gamma\nabla F(\mathbf{a})</math> được trừ khỏi <math>\mathbf{a}</math>ởi vì chúng ta muốn di chuyển ngược lại độ dốc, hướng đến cực tiểu cục bộ. Với quan sát này, người ta bắt đầu với một phỏng đoán <math>\mathbf{x}_0</math> đối với một cực tiểu cục bộ của <math>F</math>, và xem chuỗi <math>\mathbf{x}_0, \mathbf{x}_1, \mathbf{x}_2, \ldots</math> là
+:<math>\mathbf{x}_{n+1}=\mathbf{x}_n-\gamma_n \nabla F(\mathbf{x}_n),\ n \ge 0.</math>
+Khi đó, chúng ta có một chuỗi [[hàm số đơn điệu]]
+:<math>F(\mathbf{x}_0)\ge F(\mathbf{x}_1)\ge F(\mathbf{x}_2)\ge \cdots,</math>
+so hopefully the sequence <math>(\mathbf{x}_n)</math> converges to the desired local minimum. Note that the value of the ''step size'' <math>\gamma</math> is allowed to change at every iteration. With certain assumptions on the function <math>F</math> (for example, <math>F</math> [[Convex function]] and <math>\nabla F</math> [[Lipschitz continuity]]) and particular choices of <math>\gamma</math> (e.g., chosen either via a [[Line search]] that satisfies the [[Wolfe conditions]], or the Barzilai–Borwein method<ref>{{cite journal|last1=Barzilai|first1=Jonathan|last2=Borwein|first2=Jonathan M.|year=1988|title=Two-Point Step Size Gradient Methods|journal=IMA Journal of Numerical Analysis|volume=8|issue=1|pages=141–148|doi=10.1093/imanum/8.1.141}}</ref><ref>{{cite book|last=Fletcher|first=R.|title=Optimization and Control with Applications|publisher=Springer|year=2005|isbn=0-387-24254-6|editor-last=Qi|editor-first=L.|series=Applied Optimization|volume=96|location=Boston|pages=235–256|chapter=On the Barzilai–Borwein Method|editor2-last=Teo|editor2-first=K.|editor3-last=Yang|editor3-first=X.}}</ref> shown as following),
+: <math>\gamma_{n} = \frac{ \left | \left (\mathbf x_{n} - \mathbf x_{n-1} \right )^T \left [\nabla F (\mathbf x_{n}) - \nabla F (\mathbf x_{n-1}) \right ] \right |}{\left \|\nabla F(\mathbf{x}_{n}) - \nabla F(\mathbf{x}_{n-1}) \right \|^2}</math>
+[[Chuỗi hội tụ]] to a local minimum can be guaranteed. When the function <math>F</math> is [[Convex function]], all local minima are also global minima, so in this case gradient descent can converge to the global solution.
+This process is illustrated in the adjacent picture. Here <math>F</math> is assumed to be defined on the plane, and that its graph has a [[Bát ăn]] shape.  The blue curves are the [[Đường đồng mức]]s, that is, the regions on which the value of <math>F</math> is constant. A red arrow originating at a point shows the direction of the negative gradient at that point. Note that the (negative) gradient at a point is [[Orthogonality]] to the contour line going through that point. We see that gradient ''descent'' leads us to the bottom of the bowl, that is, to the point where the value of the function <math>F</math> is minimal.
 == Xem thêm ==